Heat Diffusion Workflow¶

This module describes a heat diffusion workflow for analyzing BEL networks with differential gene expression [0].

It has four parts:

Assembling a network, pre-processing, and overlaying data
Generating unbiased candidate mechanisms from the network
Generating random subgraphs from each unbiased candidate mechanism
Applying standard heat diffusion to each subgraph and calculating scores for each unbiased candidate mechanism based on the distribution of scores for its subgraph

In this algorithm, heat is applied to the nodes based on the data set. For the differential gene expression experiment, the log-fold-change values are used instead of the corrected p-values to allow for the effects of up- and down-regulation to be admitted in the analysis. Finally, heat diffusion inspired by previous algorithms publsihed in systems and networks biology [1] [2] is run with the constraint that decreases edges cause the sign of the heat to be flipped. Because of the construction of unbiased candidate mechanisms, all heat will flow towards their seed biological process nodes. The amount of heat on the biological process node after heat diffusion stops becomes the score for the whole unbiased candidate mechanism.

The issue of inconsistent causal networks addressed by SST [3] does not affect heat diffusion algorithms since it can quantify multiple conflicting pathways. However, it does not address the possibility of contradictory edges, for example, when A increases B and A decreases B are both true. A random sampling approach is used on networks with contradictory edges and aggregate statistics over multiple trials are used to assess the robustness of the scores as a function of the topology of the underlying unbiases candidate mechanisms.

Invariants¶

Because heat always flows towards the biological process node, it is possible to remove leaf nodes (nodes with no incoming edges) after each step, since their heat will never change.

Examples¶

This workflow has been applied in several Jupyter notebooks:

Future Work¶

This algorithm can be tuned to allow the use of correlative relationships. Because many multi-scale and multi-modal data are often measured with correlations to molecular features, this enables experiments to be run using SNP or brain imaging features, whose experiments often measure their correlation with the activity of gene products.

[0]	Hoyt, C. T., Konotopez, A., Ebeling, C., & Wren, J. (2018). PyBEL: a computational framework for Biological Expression Language. Bioinformatics (Oxford, England), 34(4), 703–704.

[1]	Bernabò N., et al. (2014). The biological networks in studying cell signal transduction complexity: The examples of sperm capacitation and of endocannabinoid system. Computational and Structural Biotechnology Journal, 11 (18), 11–21.

[2]	Leiserson, M. D. M., et al. (2015). Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nature Genetics, 47 (2), 106–14.

[3]	Vasilyev, D. M., et al. (2014). An algorithm for score aggregation over causal biological networks based on random walk sampling. BMC Research Notes, 7, 516.

pybel_tools.analysis.heat.RESULT_LABELS = ['avg', 'stddev', 'normality', 'median', 'neighbors', 'subgraph_size']¶: The columns in the score tuples

pybel_tools.analysis.heat.calculate_average_scores_on_graph(graph, key=None, tag=None, default_score=None, runs=None, use_tqdm=False)[source]¶

Calculates the scores over all biological processes in the sub-graph.

As an implementation, it simply computes the sub-graphs then calls calculate_average_scores_on_subgraphs() as described in that function’s documentation.

Parameters:	graph (pybel.BELGraph) – A BEL graph with heats already on the nodes key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to `pybel_tools.constants.WEIGHT`. tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’ default_score (Optional[float]) – The initial score for all nodes. This number can go up or down. runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100. use_tqdm (bool) – Should there be a progress bar for runners?
Returns:	A dictionary of {pybel node tuple: results tuple}
Return type:	dict[tuple, tuple]

Suggested usage with pandas:

>>> import pandas as pd
>>> from pybel_tools.analysis.heat import calculate_average_scores_on_graph
>>> graph = ...  # load graph and data
>>> scores = calculate_average_scores_on_graph(graph)
>>> pd.DataFrame.from_items(scores.items(), orient='index', columns=RESULT_LABELS)

pybel_tools.analysis.heat.calculate_average_scores_on_subgraphs(subgraphs, key=None, tag=None, default_score=None, runs=None, use_tqdm=False)[source]¶

Calculate the scores over precomputed candidate mechanisms.

Parameters:	subgraphs (dict[tuple,pybel.BELGraph]) – A dictionary of {tuple node: pybel.BELGraph candidate mechanism} key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to `pybel_tools.constants.WEIGHT`. tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’ default_score (Optional[float]) – The initial score for all nodes. This number can go up or down. runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100. use_tqdm (bool) – Should there be a progress bar for runners?
Returns:	A dictionary of {pybel node tuple: results tuple}
Return type:	dict[tuple, tuple]

Example Usage:

>>> import pandas as pd
>>> from pybel_tools.generation import generate_bioprocess_mechanisms
>>> from pybel_tools.analysis.heat import calculate_average_scores_on_subgraphs
>>> # load graph and data
>>> graph = ...
>>> candidate_mechanisms = generate_bioprocess_mechanisms(graph)
>>> scores = calculate_average_scores_on_subgraphs(candidate_mechanisms)
>>> pd.DataFrame.from_items(scores.items(), orient='index', columns=RESULT_LABELS)

pybel_tools.analysis.heat.workflow(graph, node, key=None, tag=None, default_score=None, runs=None, minimum_nodes=1)[source]¶

Generate candidate mechanisms and run the heat diffusion workflow.

Parameters:	graph (pybel.BELGraph) – A BEL graph node (tuple) – The BEL node that is the focus of this analysis key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to `pybel_tools.constants.WEIGHT`. tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’ default_score (Optional[float]) – The initial score for all nodes. This number can go up or down. runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100. minimum_nodes (int) – The minimum number of nodes a sub-graph needs to try running heat diffusion
Returns:	A list of runners
Return type:	list[Runner]

pybel_tools.analysis.heat.multirun(graph, node, key=None, tag=None, default_score=None, runs=None, use_tqdm=False)[source]¶

Run the heat diffusion workflow multiple times, each time yielding a Runner object upon completion.

Parameters:	graph (pybel.BELGraph) – A BEL graph node (tuple) – The BEL node that is the focus of this analysis key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to `pybel_tools.constants.WEIGHT`. tag (str) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’ default_score (float) – The initial score for all nodes. This number can go up or down. runs (int) – The number of times to run the heat diffusion workflow. Defaults to 100. use_tqdm (bool) – Should there be a progress bar for runners?
Returns:	An iterable over the runners after each iteration
Return type:	iter[Runner]

class pybel_tools.analysis.heat.Runner(graph, target_node, key=None, tag=None, default_score=None)[source]¶

This class houses the data related to a single run of the heat diffusion workflow.

Initializes the heat diffusion runner class

Parameters:

graph (pybel.BELGraph) – A BEL graph
target_node (tuple) – The BEL node that is the focus of this analysis
key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.
tag (str) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’
default_score (float) – The initial score for all nodes. This number can go up or down.

iter_leaves()[source]¶

Returns an iterable over all nodes that are leaves. A node is a leaf if either:

it doesn’t have any predecessors, OR

all of its predecessors have a score in their data dictionaries

Returns:	An iterable over all leaf nodes
Return type:	iter

has_leaves()[source]¶

Returns if the current graph has any leaves.

Implementation is not that smart currently, and does a full sweep.

Returns:	Does the current graph have any leaves?
Return type:	bool

in_out_ratio(node)[source]¶

Calculates the ratio of in-degree / out-degree of a node

Parameters:	node (tuple) – A BEL node
Returns:	The in-degree / out-degree ratio for the given node
Return type:	float

unscored_nodes_iter()[source]¶: Iterates over all nodes without a score

get_random_edge()[source]¶

This function should be run when there are no leaves, but there are still unscored nodes. It will introduce a probabilistic element to the algorithm, where some edges are disregarded randomly to eventually get a score for the network. This means that the score can be averaged over many runs for a given graph, and a better data structure will have to be later developed that doesn’t destroy the graph (instead, annotates which edges have been disregarded, later)

get all unscored

rank by in-degree

weighted probability over all in-edges where lower in-degree means higher probability

pick randomly which edge

Returns:	A random in-edge to the lowest in/out degree ratio node. This is a 3-tuple of (node, node, key)
Return type:	tuple

remove_random_edge()[source]¶: Remove a random in-edge from the node with the lowest in/out degree ratio.

remove_random_edge_until_has_leaves()[source]¶: Remove random edges until there is at least one leaf node.

score_leaves()[source]¶

Calculate the score for all leaves.

Returns:	The set of leaf nodes that were scored
Return type:	set

run()[source]¶: Calculate scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored.

run_with_graph_transformation()[source]¶

Calculate scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored. Also, yields the current graph at every step so you can make a cool animation of how the graph changes throughout the course of the algorithm

Returns:	An iterable of BEL graphs
Return type:	iter[pybel.BELGraph]

done_chomping()[source]¶

Determines if the algorithm is complete by checking if the target node of this analysis has been scored yet. Because the algorithm removes edges when it gets stuck until it is un-stuck, it is always guaranteed to finish.

Returns:	Is the algorithm done running?
Return type:	bool

get_final_score()[source]¶

Return the final score for the target node.

Returns:	The final score for the target node
Return type:	float

calculate_score(node)[source]¶

Calculate the score of the given node.

Parameters:	node (tuple) – A node in the BEL graph
Returns:	The new score of the node
Return type:	float

get_remaining_graph()[source]¶

Allows for introspection on the algorithm at a given point by returning the sub-graph induced by all unscored nodes

Returns:	The remaining unscored BEL graph
Return type:	pybel.BELGraph

pybel_tools.analysis.heat.workflow_aggregate(graph, node, key=None, tag=None, default_score=None, runs=None, aggregator=None)[source]¶

Get the average score over multiple runs.

This function is very simple, and can be copied to do more interesting statistics over the Runner instances. To iterate over the runners themselves, see workflow()

Parameters:	graph (pybel.BELGraph) – A BEL graph node (tuple) – The BEL node that is the focus of this analysis key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to `pybel_tools.constants.WEIGHT`. tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’ default_score (Optional[float]) – The initial score for all nodes. This number can go up or down. runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100. aggregator (Optional[list[float] -> float]) – A function that aggregates a list of scores. Defaults to `numpy.average()`. Could also use: `numpy.mean()`, `numpy.median()`, `numpy.min()`, `numpy.max()`
Returns:	The average score for the target node
Return type:	float

pybel_tools.analysis.heat.workflow_all(graph, key=None, tag=None, default_score=None, runs=None)[source]¶

Run the heat diffusion workflow and get runners for every possible candidate mechanism

Get all biological processes
Get candidate mechanism induced two level back from each biological process
Heat diffusion workflow for each candidate mechanism for multiple runs
Return all runner results

Parameters:	graph (pybel.BELGraph) – A BEL graph key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to `pybel_tools.constants.WEIGHT`. tag (str) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’ default_score (float) – The initial score for all nodes. This number can go up or down. runs (int) – The number of times to run the heat diffusion workflow. Defaults to 100.
Returns:	A dictionary of {node: list of runners}
Return type:	dict[tuple,list[Runner]]

pybel_tools.analysis.heat.workflow_all_aggregate(graph, key=None, tag=None, default_score=None, runs=None, aggregator=None)[source]¶

Run the heat diffusion workflow to get average score for every possible candidate mechanism.

Get all biological processes
Get candidate mechanism induced two level back from each biological process
Heat diffusion workflow on each candidate mechanism for multiple runs
Report average scores for each candidate mechanism

Parameters:	graph (pybel.BELGraph) – A BEL graph key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to `pybel_tools.constants.WEIGHT`. tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’ default_score (Optional[float]) – The initial score for all nodes. This number can go up or down. runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100. aggregator (Optional[list[float] -> float]) – A function that aggregates a list of scores. Defaults to `numpy.average()`. Could also use: `numpy.mean()`, `numpy.median()`, `numpy.min()`, `numpy.max()`
Returns:	A dictionary of {node: upstream causal subgraph}
Return type:	dict

pybel_tools.analysis.heat.calculate_average_score_by_annotation(graph, annotation, key=None, runs=None, use_tdqm=False)[source]¶

For each sub-graph induced over the edges matching the annotation, calculate the average score for all of the contained biological processes

Assumes you haven’t done anything yet

Generates biological process upstream candidate mechanistic sub-graphs with generate_bioprocess_mechanisms()
Calculates scores for each sub-graph with calculate_average_scores_on_sub-graphs()
Overlays data with pbt.integration.overlay_data
Calculates averages with pbt.selection.group_nodes.average_node_annotation

Parameters:	graph (pybel.BELGraph) – A BEL graph annotation (str) – A BEL annotation key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to `pybel_tools.constants.WEIGHT`. runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100. use_tqdm (bool) – Should there be a progress bar for runners?
Returns:	A dictionary from {str annotation value: tuple scores}
Return type:	dict[str,tuple]

Example Usage:

>>> import pybel
>>> from pybel_tools.integration import overlay_data
>>> from pybel_tools.analysis.heat import calculate_average_score_by_annotation
>>> graph = pybel.from_path(...)
>>> scores = calculate_average_score_by_annotation(graph, 'subgraph')