# Heat Diffusion Workflow¶

An variant of the Network Perturbation Amplitude algorithm

In this algorithm, heat is applied to the nodes based on the data set. For the differential gene expression experiment, the log-fold-change values are used instead of the corrected p-values to allow for the effects of up- and down-regulation to be admitted in the analysis. Finally, heat diffusion is run with the constraint that decreases edges cause the sign of the heat to be flipped. Because of the construction of unbiased candidate mechanisms, all heat will flow towards their seed biological process nodes. The amount of heat on the biological process node after heat diffusion stops becomes the score for the whole candidate mechanism.

The issue of inconsistent causal networks addressed by the SST algorithm does not affect heat diffusion algorithms since it can quantify multiple conflicting pathways. However, it does not address the possibility of contradictory edges, for example, when A increases B and A decreases B are both true. A random sampling approach is used on networks with contradictory edges and aggregate statistics over multiple trials are used to assess the robustness of the scores as a function of the topology of the underlying candidate mechanisms.

## Invariants¶

• Because heat always flows towards the biological process node, it is possible to remove leaf nodes (nodes with no incoming edges) after each step, since their heat will never change.

## Future Work¶

This algorithm can be tuned to allow the use of correlative relationships. Because many multi-scale and multi-modal data are often measured with correlations to molecular features, this enables experiments to be run using SNP or brain imaging features, whose experiments often measure their correlation with the activity of gene products.

pybel_tools.analysis.cmpa.RESULT_LABELS = ['avg', 'stddev', 'normality', 'median', 'neighbors', 'subgraph_size']

The columns in the score tuples

class pybel_tools.analysis.cmpa.Runner(graph, target_node, key, tag=None, default_score=None)[source]

The NpaRunner class houses the data related to a single run of the CMPA analysis

Initializes the CMPA runner class

Parameters: graph (pybel.BELGraph) – A BEL graph target_node (tuple) – The BEL node that is the focus of this analysis key (str) – The key for the nodes’ data dictionaries that points to their original experimental measurements tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’ default_score (float) – The initial CMPA score for all nodes. This number can go up or down.
iter_leaves()[source]

Returns an iterable over all nodes that are leaves. A node is a leaf if either:

• it doesn’t have any predecessors, OR
• all of its predecessors have CMPA score in their data dictionaries
Returns: An iterable over all leaf nodes iter
has_leaves()[source]

Returns if the current graph has any leaves.

Implementation is not that smart currently, and does a full sweep.

Returns: Does the current graph have any leaves? bool
in_out_ratio(node)[source]

Calculates the ratio of in-degree / out-degree of a node

Parameters: node (tuple) – A BEL node The in-degree / out-degree ratio for the given node float
unscored_nodes_iter()[source]

Iterates over all nodes without a CMPA score

get_random_edge()[source]

This function should be run when there are no leaves, but there are still unscored nodes. It will introduce a probabilistic element to the algorithm, where some edges are disregarded randomly to eventually get a score for the network. This means that the CMPA score can be averaged over many runs for a given graph, and a better data structure will have to be later developed that doesn’t destroy the graph (instead, annotates which edges have been disregarded, later)

1. get all unscored
2. rank by in-degree
3. weighted probability over all in-edges where lower in-degree means higher probability
4. pick randomly which edge
Returns: A random in-edge to the lowest in/out degree ratio node. This is a 3-tuple of (node, node, key) tuple
remove_random_edge()[source]

Removes a random in-edge from the node with the lowest in/out degree ratio

remove_random_edge_until_has_leaves()[source]

Removes random edges until there is at least one leaf node

score_leaves()[source]

Calculates the CMPA score for all leaves

Returns: The set of leaf nodes that were scored set
run()[source]

Calculates CMPA scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored

run_with_graph_transformation()[source]

Calculates CMPA scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored. Also, yields the current graph at every step so you can make a cool animation of how the graph changes throughout the course of the algorithm

Returns: An iterable of BEL graphs iter
done_chomping()[source]

Determines if the algorithm is complete by checking if the target node of this analysis has been scored yet. Because the algorithm removes edges when it gets stuck until it is un-stuck, it is always guaranteed to finish.

Returns: Is the algorithm done running? bool
get_final_score()[source]

Returns the final score for the target node

Returns: The final score for the target node float
calculate_score(node)[source]

Calculates the score of the given node

Parameters: node (tuple) – A node in the BEL graph The new score of the node float
get_remaining_graph()[source]

Allows for introspection on the algorithm at a given point by returning the subgraph induced by all unscored nodes

Returns: The remaining unscored BEL graph pybel.BELGraph
pybel_tools.analysis.cmpa.multirun(graph, node, key, tag=None, default_score=None, runs=None)[source]

Runs CMPA multiple times and yields the NpaRunner object after each run has been completed

Parameters: graph (pybel.BELGraph) – A BEL graph node (tuple) – The BEL node that is the focus of this analysis key (str) – The key for the nodes’ data dictionaries that points to their original experimental measurements tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’ default_score (float) – The initial CMPA score for all nodes. This number can go up or down. runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000. An iterable over the runners after each iteration iter[NpaRunner]
pybel_tools.analysis.cmpa.workflow(graph, node, key, tag=None, default_score=None, runs=None)[source]

Generates candidate mechanism and runs CMPA.

Parameters: graph (pybel.BELGraph) – A BEL graph node (tuple) – The BEL node that is the focus of this analysis key (str) – The key in the node data dictionary representing the experimental data tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’ default_score (float) – The initial CMPA score for all nodes. This number can go up or down. runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000. A list of runners
pybel_tools.analysis.cmpa.workflow_average(graph, node, key, tag=None, default_score=None, runs=None)[source]

Gets the average CMPA score over multiple runs.

This function is very simple, and can be copied to do more interesting statistics over the NpaRunner instances. To iterate over the runners themselves, see workflow()

Parameters: graph (pybel.BELGraph) – A BEL graph node (tuple) – The BEL node that is the focus of this analysis key (str) – The key for the nodes’ data dictionaries that points to their original experimental measurements tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’ default_score (float) – The initial CMPA score for all nodes. This number can go up or down. runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000. The average score for the target node float
pybel_tools.analysis.cmpa.workflow_all(graph, key, tag=None, default_score=None, runs=None)[source]

Runs CMPA and get runners for every possible candidate mechanism

1. Get all biological processes
2. Get candidate mechanism induced two level back from each biological process
3. CMPA on each candidate mechanism for multiple runs
4. Return all runner results
Parameters: graph (pybel.BELGraph) – A BEL graph key (str) – The key in the node data dictionary representing the experimental data tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’ default_score (float) – The initial CMPA score for all nodes. This number can go up or down. runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000. A dictionary of {node: list of runners} dict
pybel_tools.analysis.cmpa.workflow_all_average(graph, key, tag=None, default_score=None, runs=None)[source]

Runs CMPA to get average score for every possible candidate mechanism

1. Get all biological processes
2. Get candidate mechanism induced two level back from each biological process
3. CMPA on each candidate mechanism for multiple runs
4. Report average CMPA scores for each candidate mechanism
Parameters: graph (pybel.BELGraph) – A BEL graph key (str) – The key in the node data dictionary representing the experimental data tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’ default_score (float) – The initial CMPA score for all nodes. This number can go up or down. runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000. A dictionary of {node: upstream causal subgraph} dict
pybel_tools.analysis.cmpa.calculate_average_scores_on_subgraphs(candidate_mechanisms, key, tag=None, default_score=None, runs=None)[source]

Calculates the scores over precomputed candidate mechanisms

Parameters: candidate_mechanisms (dict[tuple, pybel.BELGraph]) – A dictionary of {tuple node: pybel.BELGraph candidate mechanism} key (str) – The key in the node data dictionary representing the experimental data tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’ default_score (float) – The initial CMPA score for all nodes. This number can go up or down. runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000. A dictionary of {pybel node tuple: results tuple}

Example Usage:

>>> import pandas as pd
>>> import pybel_tools as pbt
>>> from pybel_tools.analysis.cmpa import *
>>> # load graph and data
>>> key = ...
>>> graph = ...
>>> candidate_mechanisms = pbt.generation.generate_bioprocess_mechanisms(graph, key)
>>> scores = calculate_average_scores_on_subgraphs(candidate_mechanisms, key)
>>> pd.DataFrame.from_items(scores.items(), orient='index', columns=RESULT_LABELS)

pybel_tools.analysis.cmpa.calculate_average_score_by_annotation(graph, key, annotation, runs=None)[source]

For each subgraph induced over the edges matching the annotation, calculate the average CMPA score for all of the contained biological processes

Assumes you haven’t done anything yet

1. Generates biological process upstream candidate mechanistic subgraphs with generate_bioprocess_mechanisms()
2. Calculates scores for each subgraph with calculate_average_scores_on_subgraphs()
3. Overlays data with pbt.integration.overlay_data
4. Calculates averages with pbt.selection.group_nodes.average_node_annotation
Parameters: graph (pybel.BELGraph) – A BEL graph key (str) – The key in the node data dictionary representing the experimental data annotation (str) – A BEL annotation runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000. A dictionary from {str annotation value: tuple scores} dict[str,tuple]

Example Usage:

>>> import pybel
>>> from pybel_tools.integration import overlay_data
>>> from pybel_tools.analysis.cmpa import calculate_average_score_by_annotation
>>> graph = pybel.from_path(...)
>>> key = ...
>>>
>>> scores = calculate_average_score_by_annotation(graph, key, 'Subgraph')