Selection

This module contains functions to help select data from networks

pybel_tools.selection.group_nodes_by_annotation(graph, annotation='Subgraph')[source]

Groups the nodes occurring in edges by the given annotation

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • annotation (str) – An annotation to use to group edges
Returns:

dict of sets of BELGraph nodes

Return type:

dict

pybel_tools.selection.average_node_annotation(graph, key, annotation='Subgraph', aggregator=None)[source]

Groups graph into subgraphs and assigns each subgraph a score based on the average of all nodes values for the given node key

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • key (str) – The key in the node data dictionary representing the experimental data
  • annotation (str) – A BEL annotation to use to group nodes
  • aggregator (lambda) – A function from list of values -> aggregate value. Defaults to taking the average of a list of floats.
pybel_tools.selection.group_nodes_by_annotation_filtered(graph, node_filters=None, annotation='Subgraph')[source]

Groups the nodes occurring in edges by the given annotation, with a node filter applied

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • node_filters (types.FunctionType or iter[types.FunctionType]) – A predicate or list of predicates (graph, node) -> bool
  • annotation – The annotation to use for grouping
Returns:

A dictionary of {annotation value: set of nodes}

Return type:

dict[str,set[tuple]]

pybel_tools.selection.get_subgraph_by_induction(graph, nodes)[source]

Induce a sub-graph over the given nodes or return None if none of the nodes are in the given graph.

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • nodes (iter[tuple]) – A list of BEL nodes in the graph
Return type:

Optional[pybel.BELGraph]

pybel_tools.selection.get_subgraph_by_node_filter(graph, node_filters)[source]

Induces a graph on the nodes that pass all filters

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • node_filters (types.FunctionType or iter[types.FunctionType]) – A node filter or list/tuple of node filters
Returns:

A subgraph induced over the nodes passing the given filters

Return type:

pybel.BELGraph

pybel_tools.selection.get_subgraph_by_neighborhood(graph, nodes)[source]

Get a BEL graph around the neighborhoods of the given nodes. Returns none if no nodes are in the graph.

Parameters:
Returns:

A BEL graph induced around the neighborhoods of the given nodes

Return type:

Optional[pybel.BELGraph]

pybel_tools.selection.get_subgraph_by_second_neighbors(graph, nodes, filter_pathologies=False)[source]

Get a graph around the neighborhoods of the given nodes and expand to the neighborhood of those nodes.

Returns none if none of the nodes are in the graph.

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • nodes (iter[tuple]) – An iterable of BEL nodes
  • filter_pathologies (bool) – Should expansion take place around pathologies?
Returns:

A BEL graph induced around the neighborhoods of the given nodes

Return type:

Optional[pybel.BELGraph]

pybel_tools.selection.get_subgraph_by_all_shortest_paths(graph, nodes, weight=None, remove_pathologies=False)[source]

Induce a subgraph over the nodes in the pairwise shortest paths between all of the nodes in the given list.

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • nodes (iter[tuple]) – A set of nodes over which to calculate shortest paths
  • weight (str) – Edge data key corresponding to the edge weight. If None, performs unweighted search
  • remove_pathologies (bool) – Should the pathology nodes be deleted before getting shortest paths?
Returns:

A BEL graph induced over the nodes appearing in the shortest paths between the given nodes

Return type:

Optional[pybel.BELGraph]

pybel_tools.selection.get_subgraph_by_annotation_value(graph, annotation, values)[source]

Induce a sub-graph over all edges whose annotations match the given key and value.

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • annotation (str) – The annotation to group by
  • values (str or iter[str]) – The value(s) for the annotation
Returns:

A subgraph of the original BEL graph

Return type:

pybel.BELGraph

pybel_tools.selection.get_subgraph_by_annotations(graph, annotations, or_=None)[source]

Induce a sub-graph given an annotations filter.

Parameters:
  • graph – pybel.BELGraph graph: A BEL graph
  • annotations (dict[str,iter[str]]) – Annotation filters (match all with pybel.utils.subdict_matches())
  • or (boolean) – if True any annotation should be present, if False all annotations should be present in the edge. Defaults to True.
Returns:

A subgraph of the original BEL graph

Return type:

pybel.BELGraph

pybel_tools.selection.get_subgraph_by_pubmed(graph, pubmed_identifiers)[source]

Induce a sub-graph over the edges retrieved from the given PubMed identifier(s).

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • or list[str] pubmed_identifiers (str) – A PubMed identifier or list of PubMed identifiers
Return type:

pybel.BELGraph

pybel_tools.selection.get_subgraph_by_authors(graph, authors)[source]

Induce a sub-graph over the edges retrieved publications by the given author(s).

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • or list[str] authors (str) – An author or list of authors
Return type:

pybel.BELGraph

Gets a subgraph induced over all nodes matching the query string

Parameters:
  • graph (pybel.BELGraph) – A BEL Graph
  • or iter[str] query (str) – A query string or iterable of query strings for node names
Returns:

A subgraph induced over the original BEL graph

Return type:

pybel.BELGraph

Thinly wraps search_node_names() and get_subgraph_by_induction().

pybel_tools.selection.get_causal_subgraph(graph)[source]

Builds a new subgraph induced over all edges that are causal

Parameters:graph (pybel.BELGraph) – A BEL graph
Returns:A subgraph of the original BEL graph
Return type:pybel.BELGraph
pybel_tools.selection.get_subgraph(graph, seed_method=None, seed_data=None, expand_nodes=None, remove_nodes=None)[source]

Run a pipeline query on graph with multiple sub-graph filters and expanders.

Order of Operations:

  1. Seeding by given function name and data
  2. Add nodes
  3. Remove nodes
Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • seed_method (str) – The name of the get_subgraph_by_* function to use
  • seed_data – The argument to pass to the get_subgraph function
  • expand_nodes (list[tuple]) – Add the neighborhoods around all of these nodes
  • remove_nodes (list[tuple]) – Remove these nodes and all of their in/out edges
Return type:

Optional[pybel.BELGraph]

pybel_tools.selection.get_multi_causal_upstream(graph, nbunch)[source]

Get the union of all the 2-level deep causal upstream subgraphs from the nbunch.

Parameters:
Returns:

A subgraph of the original BEL graph

Return type:

pybel.BELGraph

pybel_tools.selection.get_multi_causal_downstream(graph, nbunch)[source]

Get the union of all of the 2-level deep causal downstream subgraphs from the nbunch.

Parameters:
Returns:

A subgraph of the original BEL graph

Return type:

pybel.BELGraph

pybel_tools.selection.get_random_subgraph(graph, number_edges=None, number_seed_edges=None, seed=None, invert_degrees=None)[source]

Generate a random subgraph based on weighted random walks from random seed edges.

Parameters:
  • number_edges (Optional[int]) – Maximum number of edges. Defaults to pybel_tools.constants.SAMPLE_RANDOM_EDGE_COUNT (250).
  • number_seed_edges (Optional[int]) – Number of nodes to start with (which likely results in different components in large graphs). Defaults to SAMPLE_RANDOM_EDGE_SEED_COUNT (5).
  • seed (Optional[int]) – A seed for the random state
  • invert_degrees (Optional[bool]) – Should the degrees be inverted? Defaults to true.
Return type:

pybel.BELGraph

pybel_tools.selection.get_leaves_by_type(graph, func=None, prune_threshold=1)[source]
Returns an iterable over all nodes in graph (in-place) with only a connection to one node. Useful for gene and
RNA. Allows for optional filter by function type.
Parameters:
Returns:

An iterable over nodes with only a connection to one node

Return type:

iter[tuple]

pybel_tools.selection.get_nodes_in_all_shortest_paths(graph, nodes, weight=None, remove_pathologies=False)[source]

Get a set of nodes in all shortest paths between the given nodes.

Thinly wraps networkx.all_shortest_paths().

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • nodes (iter[tuple]) – The list of nodes to use to use to find all shortest paths
  • weight (Optional[str]) – Edge data key corresponding to the edge weight. If none, uses unweighted search.
  • remove_pathologies (bool) – Should pathology nodes be removed first?
Returns:

A set of nodes appearing in the shortest paths between nodes in the BEL graph

Return type:

set[tuple]

Note

This can be trivially parallelized using networkx.single_source_shortest_path()

pybel_tools.selection.get_shortest_directed_path_between_subgraphs(graph, a, b)[source]

Calculate the shortest path that occurs between two disconnected subgraphs A and B going through nodes in the source graph

Parameters:
Returns:

A list of the shortest paths between the two subgraphs

Return type:

list

pybel_tools.selection.get_shortest_undirected_path_between_subgraphs(graph, a, b)[source]

Get the shortest path between two disconnected subgraphs A and B, disregarding directionality of edges in graph

Parameters:
Returns:

A list of the shortest paths between the two subgraphs

Return type:

list

pybel_tools.selection.search_node_names(graph, query)[source]

Search for nodes containing a given string(s).

Parameters:
Returns:

An iterator over nodes whose names match the search query

Return type:

iter

Example:

>>> from pybel.examples import sialic_acid_graph
>>> from pybel_tools.selection import search_node_names
>>> list(search_node_names(sialic_acid_graph, 'CD33'))
[('Protein', 'HGNC', 'CD33'), ('Protein', 'HGNC', 'CD33', ('pmod', ('bel', 'Ph')))]
pybel_tools.selection.search_node_namespace_names(graph, query, namespace)[source]

Search for nodes with the given namespace(s) and whose names containing a given string(s).

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • query (str or iter[str]) – The search query
  • namespace (str or iter[str]) – The namespace(s) to filter
Returns:

An iterator over nodes whose names match the search query

Return type:

iter

pybel_tools.selection.search_node_hgnc_names(graph, query)[source]

Search for nodes with the HGNC namespace and whose names containing a given string(s).

Parameters:
Returns:

An iterator over nodes whose names match the search query

Return type:

iter

pybel_tools.selection.convert_path_to_metapath(graph, nodes)[source]

Converts a list of nodes to their corresponding functions

Parameters:nodes (list[tuple]) – A list of BEL node tuples
Return type:list[str]
pybel_tools.selection.get_walks_exhaustive[source]

Gets all walks under a given length starting at a given node

Parameters:
  • graph (networkx.Graph) – A graph
  • node – Starting node
  • length (int) – The length of walks to get
Returns:

A list of paths

Return type:

list[tuple]

pybel_tools.selection.match_simple_metapath(graph, node, simple_metapath)[source]

Matches a simple metapath starting at the given node

Parameters:
Returns:

An iterable over paths from the node matching the metapath

Return type:

iter[tuple]