Heat Diffusion Workflow

This module describes a heat diffusion workflow for analyzing BEL networks with differential gene expression 0.

It has four parts:

  1. Assembling a network, pre-processing, and overlaying data

  2. Generating unbiased candidate mechanisms from the network

  3. Generating random sub-graphs from each unbiased candidate mechanism

  4. Applying standard heat diffusion to each sub-graph and calculating scores for each unbiased candidate mechanism based on the distribution of scores for its sub-graph

In this algorithm, heat is applied to the nodes based on the data set. For the differential gene expression experiment, the log-fold-change values are used instead of the corrected p-values to allow for the effects of up- and down-regulation to be admitted in the analysis. Finally, heat diffusion inspired by previous algorithms published in systems and networks biology 1 2 is run with the constraint that decreases edges cause the sign of the heat to be flipped. Because of the construction of unbiased candidate mechanisms, all heat will flow towards their seed biological process nodes. The amount of heat on the biological process node after heat diffusion stops becomes the score for the whole unbiased candidate mechanism.

The issue of inconsistent causal networks addressed by SST 3 does not affect heat diffusion algorithms since it can quantify multiple conflicting pathways. However, it does not address the possibility of contradictory edges, for example, when A increases B and A decreases B are both true. A random sampling approach is used on networks with contradictory edges and aggregate statistics over multiple trials are used to assess the robustness of the scores as a function of the topology of the underlying unbiases candidate mechanisms.

Invariants

  • Because heat always flows towards the biological process node, it is possible to remove leaf nodes (nodes with no incoming edges) after each step, since their heat will never change.

Examples

This workflow has been applied in several Jupyter notebooks:

Future Work

This algorithm can be tuned to allow the use of correlative relationships. Because many multi-scale and multi-modal data are often measured with correlations to molecular features, this enables experiments to be run using SNP or brain imaging features, whose experiments often measure their correlation with the activity of gene products.

0

Hoyt, C. T., et al. (2017). PyBEL: a computational framework for Biological Expression Language. Bioinformatics (Oxford, England), 34(4), 703–704.

1

Bernabò N., et al. (2014). The biological networks in studying cell signal transduction complexity: The examples of sperm capacitation and of endocannabinoid system. Computational and Structural Biotechnology Journal, 11 (18), 11–21.

2

Leiserson, M. D. M., et al. (2015). Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nature Genetics, 47 (2), 106–14.

3

Vasilyev, D. M., et al. (2014). An algorithm for score aggregation over causal biological networks based on random walk sampling. BMC Research Notes, 7, 516.

pybel_tools.analysis.heat.RESULT_LABELS = ['avg', 'stddev', 'normality', 'median', 'neighbors', 'subgraph_size']

The columns in the score tuples

pybel_tools.analysis.heat.calculate_average_scores_on_graph(graph, key=None, tag=None, default_score=None, runs=None, use_tqdm=False)[source]

Calculate the scores over all biological processes in the sub-graph.

As an implementation, it simply computes the sub-graphs then calls calculate_average_scores_on_subgraphs() as described in that function’s documentation.

Parameters
  • graph (BELGraph) – A BEL graph with heats already on the nodes

  • key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.

  • tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’

  • default_score (Optional[float]) – The initial score for all nodes. This number can go up or down.

  • runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100.

  • use_tqdm (bool) – Should there be a progress bar for runners?

Returns

A dictionary of {pybel node tuple: results tuple}

Return type

dict[tuple, tuple]

Suggested usage with pandas:

>>> import pandas as pd
>>> from pybel_tools.analysis.heat import calculate_average_scores_on_graph
>>> graph = ...  # load graph and data
>>> scores = calculate_average_scores_on_graph(graph)
>>> pd.DataFrame.from_items(scores.items(), orient='index', columns=RESULT_LABELS)
pybel_tools.analysis.heat.calculate_average_scores_on_subgraphs(subgraphs, key=None, tag=None, default_score=None, runs=None, use_tqdm=False, tqdm_kwargs=None)[source]

Calculate the scores over precomputed candidate mechanisms.

Parameters
  • subgraphs (Mapping[~H, BELGraph]) – A dictionary of keys to their corresponding subgraphs

  • key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.

  • tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’

  • default_score (Optional[float]) – The initial score for all nodes. This number can go up or down.

  • runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100.

  • use_tqdm (bool) – Should there be a progress bar for runners?

Return type

Mapping[~H, Tuple[float, float, float, float, int, int]]

Returns

A dictionary of keys to results tuples

Example Usage:

>>> import pandas as pd
>>> from pybel_tools.generation import generate_bioprocess_mechanisms
>>> from pybel_tools.analysis.heat import calculate_average_scores_on_subgraphs
>>> # load graph and data
>>> graph = ...
>>> candidate_mechanisms = generate_bioprocess_mechanisms(graph)
>>> scores = calculate_average_scores_on_subgraphs(candidate_mechanisms)
>>> pd.DataFrame.from_items(scores.items(), orient='index', columns=RESULT_LABELS)
pybel_tools.analysis.heat.workflow(graph, node, key=None, tag=None, default_score=None, runs=None, minimum_nodes=1)[source]

Generate candidate mechanisms and run the heat diffusion workflow.

Parameters
  • graph (BELGraph) – A BEL graph

  • node (BaseEntity) – The BEL node that is the focus of this analysis

  • key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.

  • tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’

  • default_score (Optional[float]) – The initial score for all nodes. This number can go up or down.

  • runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100.

  • minimum_nodes (int) – The minimum number of nodes a sub-graph needs to try running heat diffusion

Return type

List[Runner]

Returns

A list of runners

pybel_tools.analysis.heat.multirun(graph, node, key=None, tag=None, default_score=None, runs=None, use_tqdm=False)[source]

Run the heat diffusion workflow multiple times, each time yielding a Runner object upon completion.

Parameters
  • graph (BELGraph) – A BEL graph

  • node (BaseEntity) – The BEL node that is the focus of this analysis

  • key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.

  • tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’

  • default_score (Optional[float]) – The initial score for all nodes. This number can go up or down.

  • runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100.

  • use_tqdm (bool) – Should there be a progress bar for runners?

Return type

Iterable[Runner]

Returns

An iterable over the runners after each iteration

class pybel_tools.analysis.heat.Runner(graph, target_node, key=None, tag=None, default_score=None)[source]

This class houses the data related to a single run of the heat diffusion workflow.

Initialize the heat diffusion runner class.

Parameters
  • graph (BELGraph) – A BEL graph

  • target_node (BaseEntity) – The BEL node that is the focus of this analysis

  • key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.

  • tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’

  • default_score (Optional[float]) – The initial score for all nodes. This number can go up or down.

iter_leaves()[source]

Return an iterable over all nodes that are leaves.

A node is a leaf if either:

  • it doesn’t have any predecessors, OR

  • all of its predecessors have a score in their data dictionaries

Return type

Iterable[BaseEntity]

has_leaves()[source]

Return if the current graph has any leaves.

Implementation is not that smart currently, and does a full sweep.

Return type

List[BaseEntity]

in_out_ratio(node)[source]

Calculate the ratio of in-degree / out-degree of a node.

Return type

float

unscored_nodes_iter()[source]

Iterate over all nodes without a score.

Return type

BaseEntity

get_random_edge()[source]

This function should be run when there are no leaves, but there are still unscored nodes. It will introduce a probabilistic element to the algorithm, where some edges are disregarded randomly to eventually get a score for the network. This means that the score can be averaged over many runs for a given graph, and a better data structure will have to be later developed that doesn’t destroy the graph (instead, annotates which edges have been disregarded, later)

  1. get all un-scored

  2. rank by in-degree

  3. weighted probability over all in-edges where lower in-degree means higher probability

  4. pick randomly which edge

Returns

A random in-edge to the lowest in/out degree ratio node. This is a 3-tuple of (node, node, key)

Return type

tuple

remove_random_edge()[source]

Remove a random in-edge from the node with the lowest in/out degree ratio.

remove_random_edge_until_has_leaves()[source]

Remove random edges until there is at least one leaf node.

Return type

None

score_leaves()[source]

Calculate the score for all leaves.

Return type

Set[BaseEntity]

Returns

The set of leaf nodes that were scored

run()[source]

Calculate scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored.

Return type

None

run_with_graph_transformation()[source]

Calculate scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored. Also, yields the current graph at every step so you can make a cool animation of how the graph changes throughout the course of the algorithm

Return type

Iterable[BELGraph]

Returns

An iterable of BEL graphs

done_chomping()[source]

Determines if the algorithm is complete by checking if the target node of this analysis has been scored yet. Because the algorithm removes edges when it gets stuck until it is un-stuck, it is always guaranteed to finish.

Return type

bool

Returns

Is the algorithm done running?

get_final_score()[source]

Return the final score for the target node.

Return type

float

Returns

The final score for the target node

calculate_score(node)[source]

Calculate the new score of the given node.

Return type

float

get_remaining_graph()[source]

Allows for introspection on the algorithm at a given point by returning the sub-graph induced by all unscored nodes

Return type

BELGraph

Returns

The remaining un-scored BEL graph

pybel_tools.analysis.heat.workflow_aggregate(graph, node, key=None, tag=None, default_score=None, runs=None, aggregator=None)[source]

Get the average score over multiple runs.

This function is very simple, and can be copied to do more interesting statistics over the Runner instances. To iterate over the runners themselves, see workflow()

Parameters
  • graph (BELGraph) – A BEL graph

  • node (BaseEntity) – The BEL node that is the focus of this analysis

  • key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.

  • tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’

  • default_score (Optional[float]) – The initial score for all nodes. This number can go up or down.

  • runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100.

  • aggregator (Optional[Callable[[Iterable[float]], float]]) – A function that aggregates a list of scores. Defaults to numpy.average(). Could also use: numpy.mean(), numpy.median(), numpy.min(), numpy.max()

Return type

Optional[float]

Returns

The average score for the target node

pybel_tools.analysis.heat.workflow_all(graph, key=None, tag=None, default_score=None, runs=None)[source]

Run the heat diffusion workflow and get runners for every possible candidate mechanism

  1. Get all biological processes

  2. Get candidate mechanism induced two level back from each biological process

  3. Heat diffusion workflow for each candidate mechanism for multiple runs

  4. Return all runner results

Parameters
  • graph (BELGraph) – A BEL graph

  • key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.

  • tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’

  • default_score (Optional[float]) – The initial score for all nodes. This number can go up or down.

  • runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100.

Return type

Mapping[BaseEntity, List[Runner]]

Returns

A dictionary of {node: list of runners}

pybel_tools.analysis.heat.workflow_all_aggregate(graph, key=None, tag=None, default_score=None, runs=None, aggregator=None)[source]

Run the heat diffusion workflow to get average score for every possible candidate mechanism.

  1. Get all biological processes

  2. Get candidate mechanism induced two level back from each biological process

  3. Heat diffusion workflow on each candidate mechanism for multiple runs

  4. Report average scores for each candidate mechanism

Parameters
  • graph (BELGraph) – A BEL graph

  • key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.

  • tag (Optional[str]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’

  • default_score (Optional[float]) – The initial score for all nodes. This number can go up or down.

  • runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100.

  • aggregator (Optional[Callable[[Iterable[float]], float]]) – A function that aggregates a list of scores. Defaults to numpy.average(). Could also use: numpy.mean(), numpy.median(), numpy.min(), numpy.max()

Returns

A dictionary of {node: upstream causal subgraph}

pybel_tools.analysis.heat.calculate_average_score_by_annotation(graph, annotation, key=None, runs=None, use_tqdm=False)[source]

For each sub-graph induced over the edges matching the annotation, calculate the average score for all of the contained biological processes

Assumes you haven’t done anything yet

  1. Generates biological process upstream candidate mechanistic sub-graphs with generate_bioprocess_mechanisms()

  2. Calculates scores for each sub-graph with calculate_average_scores_on_sub-graphs()

  3. Overlays data with pbt.integration.overlay_data

  4. Calculates averages with pbt.selection.group_nodes.average_node_annotation

Parameters
  • graph (BELGraph) – A BEL graph

  • annotation (str) – A BEL annotation

  • key (Optional[str]) – The key in the node data dictionary representing the experimental data. Defaults to pybel_tools.constants.WEIGHT.

  • runs (Optional[int]) – The number of times to run the heat diffusion workflow. Defaults to 100.

  • use_tqdm (bool) – Should there be a progress bar for runners?

Return type

Mapping[str, float]

Returns

A dictionary from {str annotation value: tuple scores}

Example Usage:

>>> import pybel
>>> from pybel_tools.integration import overlay_data
>>> from pybel_tools.analysis.heat import calculate_average_score_by_annotation
>>> graph = pybel.from_path(...)
>>> scores = calculate_average_score_by_annotation(graph, 'subgraph')