Heat Diffusion Workflow¶
An variant of the Network Perturbation Amplitude algorithm
In this algorithm, heat is applied to the nodes based on the data set. For the differential gene expression experiment, the logfoldchange values are used instead of the corrected pvalues to allow for the effects of up and downregulation to be admitted in the analysis. Finally, heat diffusion is run with the constraint that decreases edges cause the sign of the heat to be flipped. Because of the construction of unbiased candidate mechanisms, all heat will flow towards their seed biological process nodes. The amount of heat on the biological process node after heat diffusion stops becomes the score for the whole candidate mechanism.
The issue of inconsistent causal networks addressed by the SST algorithm does not affect heat diffusion algorithms since it can quantify multiple conflicting pathways. However, it does not address the possibility of contradictory edges, for example, when A increases B and A decreases B are both true. A random sampling approach is used on networks with contradictory edges and aggregate statistics over multiple trials are used to assess the robustness of the scores as a function of the topology of the underlying candidate mechanisms.
Invariants¶
 Because heat always flows towards the biological process node, it is possible to remove leaf nodes (nodes with no incoming edges) after each step, since their heat will never change.
Future Work¶
This algorithm can be tuned to allow the use of correlative relationships. Because many multiscale and multimodal data are often measured with correlations to molecular features, this enables experiments to be run using SNP or brain imaging features, whose experiments often measure their correlation with the activity of gene products.

pybel_tools.analysis.cmpa.
RESULT_LABELS
= ['avg', 'stddev', 'normality', 'median', 'neighbors', 'subgraph_size']¶ The columns in the score tuples

class
pybel_tools.analysis.cmpa.
Runner
(graph, target_node, key, tag=None, default_score=None)[source]¶ The NpaRunner class houses the data related to a single run of the CMPA analysis
Initializes the CMPA runner class
Parameters:  graph (pybel.BELGraph) – A BEL graph
 target_node (tuple) – The BEL node that is the focus of this analysis
 key (str) – The key for the nodes’ data dictionaries that points to their original experimental measurements
 tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’
 default_score (float) – The initial CMPA score for all nodes. This number can go up or down.

iter_leaves
()[source]¶ Returns an iterable over all nodes that are leaves. A node is a leaf if either:
 it doesn’t have any predecessors, OR
 all of its predecessors have CMPA score in their data dictionaries
Returns: An iterable over all leaf nodes Return type: iter

has_leaves
()[source]¶ Returns if the current graph has any leaves.
Implementation is not that smart currently, and does a full sweep.
Returns: Does the current graph have any leaves? Return type: bool

in_out_ratio
(node)[source]¶ Calculates the ratio of indegree / outdegree of a node
Parameters: node (tuple) – A BEL node Returns: The indegree / outdegree ratio for the given node Return type: float

get_random_edge
()[source]¶ This function should be run when there are no leaves, but there are still unscored nodes. It will introduce a probabilistic element to the algorithm, where some edges are disregarded randomly to eventually get a score for the network. This means that the CMPA score can be averaged over many runs for a given graph, and a better data structure will have to be later developed that doesn’t destroy the graph (instead, annotates which edges have been disregarded, later)
 get all unscored
 rank by indegree
 weighted probability over all inedges where lower indegree means higher probability
 pick randomly which edge
Returns: A random inedge to the lowest in/out degree ratio node. This is a 3tuple of (node, node, key) Return type: tuple

remove_random_edge
()[source]¶ Removes a random inedge from the node with the lowest in/out degree ratio

remove_random_edge_until_has_leaves
()[source]¶ Removes random edges until there is at least one leaf node

score_leaves
()[source]¶ Calculates the CMPA score for all leaves
Returns: The set of leaf nodes that were scored Return type: set

run
()[source]¶ Calculates CMPA scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored

run_with_graph_transformation
()[source]¶ Calculates CMPA scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored. Also, yields the current graph at every step so you can make a cool animation of how the graph changes throughout the course of the algorithm
Returns: An iterable of BEL graphs Return type: iter

done_chomping
()[source]¶ Determines if the algorithm is complete by checking if the target node of this analysis has been scored yet. Because the algorithm removes edges when it gets stuck until it is unstuck, it is always guaranteed to finish.
Returns: Is the algorithm done running? Return type: bool

get_final_score
()[source]¶ Returns the final score for the target node
Returns: The final score for the target node Return type: float

calculate_score
(node)[source]¶ Calculates the score of the given node
Parameters: node (tuple) – A node in the BEL graph Returns: The new score of the node Return type: float

get_remaining_graph
()[source]¶ Allows for introspection on the algorithm at a given point by returning the subgraph induced by all unscored nodes
Returns: The remaining unscored BEL graph Return type: pybel.BELGraph

pybel_tools.analysis.cmpa.
multirun
(graph, node, key, tag=None, default_score=None, runs=None)[source]¶ Runs CMPA multiple times and yields the NpaRunner object after each run has been completed
Parameters:  graph (pybel.BELGraph) – A BEL graph
 node (tuple) – The BEL node that is the focus of this analysis
 key (str) – The key for the nodes’ data dictionaries that points to their original experimental measurements
 tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’
 default_score (float) – The initial CMPA score for all nodes. This number can go up or down.
 runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000.
Returns: An iterable over the runners after each iteration
Return type: iter[NpaRunner]

pybel_tools.analysis.cmpa.
workflow
(graph, node, key, tag=None, default_score=None, runs=None)[source]¶ Generates candidate mechanism and runs CMPA.
Parameters:  graph (pybel.BELGraph) – A BEL graph
 node (tuple) – The BEL node that is the focus of this analysis
 key (str) – The key in the node data dictionary representing the experimental data
 tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’
 default_score (float) – The initial CMPA score for all nodes. This number can go up or down.
 runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000.
Returns: A list of runners
Return type:

pybel_tools.analysis.cmpa.
workflow_average
(graph, node, key, tag=None, default_score=None, runs=None)[source]¶ Gets the average CMPA score over multiple runs.
This function is very simple, and can be copied to do more interesting statistics over the
NpaRunner
instances. To iterate over the runners themselves, seeworkflow()
Parameters:  graph (pybel.BELGraph) – A BEL graph
 node (tuple) – The BEL node that is the focus of this analysis
 key (str) – The key for the nodes’ data dictionaries that points to their original experimental measurements
 tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’
 default_score (float) – The initial CMPA score for all nodes. This number can go up or down.
 runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000.
Returns: The average score for the target node
Return type:

pybel_tools.analysis.cmpa.
workflow_all
(graph, key, tag=None, default_score=None, runs=None)[source]¶ Runs CMPA and get runners for every possible candidate mechanism
 Get all biological processes
 Get candidate mechanism induced two level back from each biological process
 CMPA on each candidate mechanism for multiple runs
 Return all runner results
Parameters:  graph (pybel.BELGraph) – A BEL graph
 key (str) – The key in the node data dictionary representing the experimental data
 tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’
 default_score (float) – The initial CMPA score for all nodes. This number can go up or down.
 runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000.
Returns: A dictionary of {node: list of runners}
Return type:

pybel_tools.analysis.cmpa.
workflow_all_average
(graph, key, tag=None, default_score=None, runs=None)[source]¶ Runs CMPA to get average score for every possible candidate mechanism
 Get all biological processes
 Get candidate mechanism induced two level back from each biological process
 CMPA on each candidate mechanism for multiple runs
 Report average CMPA scores for each candidate mechanism
Parameters:  graph (pybel.BELGraph) – A BEL graph
 key (str) – The key in the node data dictionary representing the experimental data
 tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’
 default_score (float) – The initial CMPA score for all nodes. This number can go up or down.
 runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000.
Returns: A dictionary of {node: upstream causal subgraph}
Return type:

pybel_tools.analysis.cmpa.
calculate_average_scores_on_subgraphs
(candidate_mechanisms, key, tag=None, default_score=None, runs=None)[source]¶ Calculates the scores over precomputed candidate mechanisms
Parameters:  candidate_mechanisms (dict[tuple, pybel.BELGraph]) – A dictionary of {tuple node: pybel.BELGraph candidate mechanism}
 key (str) – The key in the node data dictionary representing the experimental data
 tag (str) – The key for the nodes’ data dictionaries where the CMPA scores will be put. Defaults to ‘score’
 default_score (float) – The initial CMPA score for all nodes. This number can go up or down.
 runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000.
Returns: A dictionary of {pybel node tuple: results tuple}
Return type: Example Usage:
>>> import pandas as pd >>> import pybel_tools as pbt >>> from pybel_tools.analysis.cmpa import * >>> # load graph and data >>> key = ... >>> graph = ... >>> candidate_mechanisms = pbt.generation.generate_bioprocess_mechanisms(graph, key) >>> scores = calculate_average_scores_on_subgraphs(candidate_mechanisms, key) >>> pd.DataFrame.from_items(scores.items(), orient='index', columns=RESULT_LABELS)

pybel_tools.analysis.cmpa.
calculate_average_score_by_annotation
(graph, key, annotation, runs=None)[source]¶ For each subgraph induced over the edges matching the annotation, calculate the average CMPA score for all of the contained biological processes
Assumes you haven’t done anything yet
 Generates biological process upstream candidate mechanistic subgraphs with
generate_bioprocess_mechanisms()
 Calculates scores for each subgraph with
calculate_average_scores_on_subgraphs()
 Overlays data with pbt.integration.overlay_data
 Calculates averages with pbt.selection.group_nodes.average_node_annotation
Parameters:  graph (pybel.BELGraph) – A BEL graph
 key (str) – The key in the node data dictionary representing the experimental data
 annotation (str) – A BEL annotation
 runs (int) – The number of times to run the CMPA algorithm. Defaults to 1000.
Returns: A dictionary from {str annotation value: tuple scores}
Return type: Example Usage:
>>> import pybel >>> from pybel_tools.integration import overlay_data >>> from pybel_tools.analysis.cmpa import calculate_average_score_by_annotation >>> graph = pybel.from_path(...) >>> key = ... >>> >>> scores = calculate_average_score_by_annotation(graph, key, 'Subgraph')
 Generates biological process upstream candidate mechanistic subgraphs with