Selection¶
This module contains functions to help select data from networks

pybel_tools.selection.
group_nodes_by_annotation
(graph, annotation='Subgraph')[source]¶ Groups the nodes occurring in edges by the given annotation
Parameters:  graph (pybel.BELGraph) – A BEL graph
 annotation (str) – An annotation to use to group edges
Returns: dict of sets of BELGraph nodes
Return type:

pybel_tools.selection.
average_node_annotation
(graph, key, annotation='Subgraph', aggregator=None)[source]¶ Groups graph into subgraphs and assigns each subgraph a score based on the average of all nodes values for the given node key
Parameters:  graph (pybel.BELGraph) – A BEL graph
 key (str) – The key in the node data dictionary representing the experimental data
 annotation (str) – A BEL annotation to use to group nodes
 aggregator (lambda) – A function from list of values > aggregate value. Defaults to taking the average of a list of floats.

pybel_tools.selection.
group_nodes_by_annotation_filtered
(graph, node_filters=None, annotation='Subgraph')[source]¶ Groups the nodes occurring in edges by the given annotation, with a node filter applied
Parameters:  graph (pybel.BELGraph) – A BEL graph
 node_filters (types.FunctionType or iter[types.FunctionType]) – A predicate or list of predicates (graph, node) > bool
 annotation – The annotation to use for grouping
Returns: A dictionary of {annotation value: set of nodes}
Return type:

pybel_tools.selection.
get_subgraph_by_induction
(graph, nodes)[source]¶ Induces a graph over the given nodes. Returns None if none of the nodes are in the given graph.
Parameters:  graph (pybel.BELGraph) – A BEL graph
 nodes (iter[tuple]) – A list of BEL nodes in the graph
Return type: Optional[pybel.BELGraph]

pybel_tools.selection.
get_subgraph_by_edge_filter
(graph, edge_filters)[source]¶ Induces a subgraph on all edges that pass the given filters
Parameters:  graph (pybel.BELGraph) – A BEL graph
 edge_filters ((pybel.BELGraph, tuple, tuple, int) > bool or list[(pybel.BELGraph, tuple, tuple, int) > bool]) – A predicate or list of predicates (graph, node, node, key, data) > bool
Returns: A BEL subgraph induced over the edges passing the given filters
Return type:

pybel_tools.selection.
get_subgraph_by_node_filter
(graph, node_filters)[source]¶ Induces a graph on the nodes that pass all filters
Parameters:  graph (pybel.BELGraph) – A BEL graph
 node_filters (types.FunctionType or iter[types.FunctionType]) – A node filter or list/tuple of node filters
Returns: A subgraph induced over the nodes passing the given filters
Return type:

pybel_tools.selection.
get_subgraph_by_neighborhood
(graph, nodes)[source]¶ Gets a BEL graph around the neighborhoods of the given nodes. Returns none if no nodes are in the graph
Parameters:  graph (pybel.BELGraph) – A BEL graph
 nodes (iter[tuple]) – An iterable of BEL nodes
Returns: A BEL graph induced around the neighborhoods of the given nodes
Return type: Optional[pybel.BELGraph]

pybel_tools.selection.
get_subgraph_by_second_neighbors
(graph, nodes, filter_pathologies=False)[source]¶ Gets a BEL graph around the neighborhoods of the given nodes, and expands to the neighborhood of those nodes
Parameters:  graph (pybel.BELGraph) – A BEL graph
 nodes (iter[tuple]) – An iterable of BEL nodes
 filter_pathologies (bool) – Should expansion take place around pathologies?
Returns: A BEL graph induced around the neighborhoods of the given nodes
Return type: Optional[pybel.BELGraph]

pybel_tools.selection.
get_subgraph_by_all_shortest_paths
(graph, nodes, weight=None, remove_pathologies=True)[source]¶ Induces a subgraph over the nodes in the pairwise shortest paths between all of the nodes in the given list
Parameters:  graph (pybel.BELGraph) – A BEL graph
 nodes (set[tuple]) – A set of nodes over which to calculate shortest paths
 weight (str) – Edge data key corresponding to the edge weight. If None, performs unweighted search
 remove_pathologies (bool) – Should the pathology nodes be deleted before getting shortest paths?
Returns: A BEL graph induced over the nodes appearing in the shortest paths between the given nodes
Return type: Optional[pybel.BELGraph]

pybel_tools.selection.
get_subgraph_by_annotation_value
(graph, annotation, value)[source]¶ Builds a new subgraph induced over all edges whose annotations match the given key and value
Parameters:  graph (pybel.BELGraph) – A BEL graph
 annotation (str) – The annotation to group by
 value (str) – The value for the annotation
Returns: A subgraph of the original BEL graph
Return type:

pybel_tools.selection.
get_subgraph_by_annotations
(graph, annotations, or_=None)[source]¶ Returns the subgraph given an annotations filter.
Parameters:  graph – pybel.BELGraph graph: A BEL graph
 annotations (dict[str,set[str]]) – Annotation filters (match all with
pybel.utils.subdict_matches()
)  or (boolean) – if True any annotation should be present, if False all annotations should be present in the edge. Defaults to True.
Returns: A subgraph of the original BEL graph
Return type:

pybel_tools.selection.
get_subgraph_by_data
(graph, annotations)[source]¶ Returns the subgraph filtering for Citation, Evidence or Annotation in the edges.
Parameters:  graph (pybel.BELGraph) – A BEL graph
 annotations (dict) – Annotation filters (match all with
pybel.utils.subdict_matches()
)
Returns: A subgraph of the original BEL graph
Return type:

pybel_tools.selection.
get_subgraph_by_pubmed
(graph, pubmed_identifiers)[source]¶ Induces a subgraph over the edges retrieved from the given PubMed identifier(s)
Parameters:  graph (pybel.BELGraph) – A BEL graph
 or list[str] pubmed_identifiers (str) – A PubMed identifier or list of PubMed identifiers
Return type:
Induces a subgraph over the edges retrieved publications by the given author(s)
Parameters:  graph (pybel.BELGraph) – A BEL graph
 or list[str] authors (str) – An author or list of authors
Return type:

pybel_tools.selection.
get_subgraph_by_node_search
(graph, query)[source]¶ Gets a subgraph induced over all nodes matching the query string
Parameters:  graph (pybel.BELGraph) – A BEL Graph
 or iter[str] query (str) – A query string or iterable of query strings for node names
Returns: A subgraph induced over the original BEL graph
Return type: Thinly wraps
search_node_names()
andget_subgraph_by_induction()
.

pybel_tools.selection.
get_causal_subgraph
(graph)[source]¶ Builds a new subgraph induced over all edges that are causal
Parameters: graph (pybel.BELGraph) – A BEL graph Returns: A subgraph of the original BEL graph Return type: pybel.BELGraph

pybel_tools.selection.
get_subgraph
(graph, seed_method=None, seed_data=None, expand_nodes=None, remove_nodes=None)[source]¶ Runs pipeline query on graph with multiple subgraph filters and expanders.
Order of Operations:
 Seeding by given function name and data
 Add nodes
 Remove nodes
Parameters:  graph (pybel.BELGraph) – A BEL graph
 seed_method (str) – The name of the get_subgraph_by_* function to use
 seed_data – The argument to pass to the get_subgraph function
 expand_nodes (list[tuple]) – Add the neighborhoods around all of these nodes
 remove_nodes (list[tuple]) – Remove these nodes and all of their in/out edges
Return type: Optional[pybel.BELGraph]

pybel_tools.selection.
get_multi_causal_upstream
(graph, nbunch)[source]¶ Gets the union of all the 2level deep causal upstream subgraphs from the nbunch
Parameters:  graph (pybel.BELGraph) – A BEL graph
 or list[tuple] nbunch (tuple) – A BEL node or list of BEL nodes
Returns: A subgraph of the original BEL graph
Return type:

pybel_tools.selection.
get_multi_causal_downstream
(graph, nbunch)[source]¶ Gets the union of all of the 2level deep causal downstream subgraphs from the nbunch
Parameters:  graph (pybel.BELGraph) – A BEL graph
 or list[tuple] nbunch (tuple) – A BEL node or list of BEL nodes
Returns: A subgraph of the original BEL graph
Return type:

pybel_tools.selection.
get_random_subgraph
(graph, number_edges=None, number_seed_edges=None, seed=None)[source]¶ Randomly picks a node from the graph, and performs a weighted random walk to sample the given number of edges around it
Parameters:  graph (pybel.BELGraph) –
 number_edges (Optional[int]) – Maximum number of edges. Defaults to
pybel_tools.constants.SAMPLE_RANDOM_EDGE_COUNT
(250).  number_seed_edges (Optional[int]) – Number of nodes to start with (which likely results in different components
in large graphs). Defaults to
SAMPLE_RANDOM_EDGE_SEED_COUNT
.  seed (Optional[int]) – A seed for the random state
Return type:

pybel_tools.selection.
get_upstream_leaves
(graph)[source]¶ Gets all leaves of the graph (with no incoming edges and only one outgoing edge)
See also
upstream_leaf_predicate()
Parameters: graph (pybel.BELGraph) – A BEL graph Returns: An iterator over nodes that are upstream leaves Return type: iter[tuple]

pybel_tools.selection.
get_unweighted_upstream_leaves
(graph, key)[source]¶ Gets all leaves of the graph with no incoming edges, one outgoing edge, and without the given key in its data dictionary
See also
data_does_not_contain_key_builder()
Parameters:  graph (pybel.BELGraph) – A BEL graph
 key (str) – The key in the node data dictionary representing the experimental data
Returns: An iterable over leaves (nodes with an indegree of 0) that don’t have the given annotation
Return type:

pybel_tools.selection.
get_gene_leaves
(graph)[source]¶ Iterate over all genes who have only one connection, that’s a transcription to its RNA
Parameters: graph (pybel.BELGraph) – A BEL graph Return type: iter[tuple]

pybel_tools.selection.
get_rna_leaves
(graph)[source]¶ Iterate over all RNAs who have only one connection, that’s a translation to its protein
Parameters: graph (pybel.BELGraph) – A BEL graph Return type: iter[tuple]

pybel_tools.selection.
get_leaves_by_type
(graph, function=None, prune_threshold=1)[source]¶  Returns an iterable over all nodes in graph (inplace) with only a connection to one node. Useful for gene and
 RNA. Allows for optional filter by function type.
Parameters:  graph (pybel.BELGraph) – A BEL graph
 function (str) – If set, filters by the node’s function from
pybel.constants
likepybel.constants.GENE
,pybel.constants.RNA
,pybel.constants.PROTEIN
, orpybel.constants.BIOPROCESS
 prune_threshold (int) – Removes nodes with less than or equal to this number of connections. Defaults to
1
Returns: An iterable over nodes with only a connection to one node
Return type:

pybel_tools.selection.
get_nodes_in_all_shortest_paths
(graph, nodes, weight=None, remove_pathologies=True)[source]¶ Gets all shortest paths from all nodes to all other nodes in the given list and returns the set of all nodes contained in those paths using
networkx.all_shortest_paths()
.Parameters:  graph (pybel.BELGraph) – A BEL graph
 nodes (iter[tuple]) – The list of nodes to use to use to find all shortest paths
 weight (str) – Edge data key corresponding to the edge weight. If none, uses unweighted search.
 remove_pathologies (bool) – Should pathology nodes be removed first?
Returns: A set of nodes appearing in the shortest paths between nodes in the BEL graph
Return type: Note
This can be trivially parallelized using
networkx.single_source_shortest_path()

pybel_tools.selection.
get_shortest_directed_path_between_subgraphs
(graph, a, b)[source]¶ Calculate the shortest path that occurs between two disconnected subgraphs A and B going through nodes in the source graph
Parameters:  graph (pybel.BELGraph) – A BEL graph
 a (pybel.BELGraph) – A subgraph of
graph
, disjoint fromb
 b (pybel.BELGraph) – A subgraph of
graph
, disjoint froma
Returns: A list of the shortest paths between the two subgraphs
Return type:

pybel_tools.selection.
get_shortest_undirected_path_between_subgraphs
(graph, a, b)[source]¶ Get the shortest path between two disconnected subgraphs A and B, disregarding directionality of edges in graph
Parameters:  graph (pybel.BELGraph) – A BEL graph
 a (pybel.BELGraph) – A subgraph of
graph
, disjoint fromb
 b (pybel.BELGraph) – A subgraph of
graph
, disjoint froma
Returns: A list of the shortest paths between the two subgraphs
Return type:

pybel_tools.selection.
get_random_path
(graph)[source]¶ Gets a random path from the graph as a list of nodes
Parameters: graph (pybel.BELGraph) – A BEL graph Return type: list[tuple]

pybel_tools.selection.
search_node_names
(graph, query)[source]¶ Search for nodes containing a given string(s).
Parameters:  graph (pybel.BELGraph) – A BEL graph
 query (str or iter[str]) – The search query
Returns: An iterator over nodes whose names match the search query
Return type: Example:
>>> from pybel.examples import sialic_acid_graph >>> from pybel_tools.selection import search_node_names >>> list(search_node_names(sialic_acid_graph, 'CD33')) [('Protein', 'HGNC', 'CD33'), ('Protein', 'HGNC', 'CD33', ('pmod', ('bel', 'Ph')))]

pybel_tools.selection.
search_node_namespace_names
(graph, query, namespace)[source]¶ Search for nodes with the given namespace(s) and whose names containing a given string(s).
Parameters: Returns: An iterator over nodes whose names match the search query
Return type:

pybel_tools.selection.
search_node_hgnc_names
(graph, query)[source]¶ Search for nodes with the HGNC namespace and whose names containing a given string(s).
Parameters:  graph (pybel.BELGraph) – A BEL graph
 query (str or iter[str]) – The search query
Returns: An iterator over nodes whose names match the search query
Return type:

pybel_tools.selection.
search_node_cnames
(graph, query)[source]¶ Search for nodes whose canonical names contain a given string(s).
Parameters:  graph (pybel.BELGraph) – A BEL graph
 query (str or iter[str]) – The search query
Returns: An iterator over nodes whose canonical names match the search query
Return type:

pybel_tools.selection.
convert_path_to_metapath
(graph, nodes)[source]¶ Converts a list of nodes to their corresponding functions
Parameters: nodes (list[tuple]) – A list of BEL node tuples Return type: list[str]

pybel_tools.selection.
get_walks_exhaustive
[source]¶ Gets all walks under a given length starting at a given node
Parameters:  graph (networkx.Graph) – A graph
 node – Starting node
 length (int) – The length of walks to get
Returns: A list of paths
Return type:

pybel_tools.selection.
match_simple_metapath
(graph, node, simple_metapath)[source]¶ Matches a simple metapath starting at the given node
Parameters:  graph (pybel.BELGraph) – A BEL graph
 node (tuple) – A BEL node
 simple_metapath (list[str]) – A list of BEL Functions
Returns: An iterable over paths from the node matching the metapath
Return type: