Integration

This module contains functions that help add more data to the network

class pybel_tools.integration.NodeAnnotator(namespace)[source]

The base class for node annotators.

Parameters:or list[str] namespace (str) – The name of the namespace or namespaces that this node annotator services
annotate(graph)[source]

Annotates all nodes in this annotator’s namespace

Parameters:graph (pybel.BELGraph) – A BEL graph
get_description(name)[source]

Gets the description for the given name in this annotator’s namespace.

get_label(name)[source]

Gets the label for the given name in. If not overridden, uses each node’s name as its label.

populate_by_graph(graph)[source]

Optional hook for populating the annotator based on the nodes in a graph. Override this if your node annotator downloads data ahead of time, such as grouping requests to an external API.

Parameters:graph (pybel.BELGraph) – A BEL graph
class pybel_tools.integration.HGNCAnnotator(preload=True)[source]

Annotates the labels and descriptions of Genes with HGNC identifiers using a mapping provided by HGNC and then the Entrez Gene Service.

Parameters:preload (bool) – Should the data be pre-downloaded?
get_unpopulated_entrez(entrez_ids)[source]

Gets the Entrez Gene Identifiers from this list that aren’t already cached

load_hgnc_entrez_map()[source]

Preloads the HGNC-Entrez map

map_entrez_ids(entrez_ids)[source]

Maps a list of Entrez Gene Identifiers to HGNC Gene Symbols

map_hgnc(hgnc_symbols)[source]

Maps a list of HGNC Gene Symbols to Entrez Gene Identifiers

populate(entrez_ids, group_size=200, sleep_time=1)[source]

Download the descriptions from Entrez Gene Service for a given list of Entrez Gene Identifiers

Parameters:
  • entrez_ids (iter) – An iterable of Entrez Gene Identifiers
  • group_size (int) – The number of entrez gene id’s to send per query
  • sleep_time (int) – The number of seconds to sleep between queries
populate_by_graph(graph)[source]

Downloads the gene information only for genes in the given graph

Parameters:graph (pybel.BELGraph) – A BEL graph
populate_constrained(hgnc_symbols, group_size=200, sleep_time=1)[source]

Downloads the gene information only for genes in the list of HGNC Gene Symbols

populate_unconstrained(group_size=200, sleep_time=1)[source]

Downloads all descriptions for all Entrez Gene Identifiers

pybel_tools.integration.overlay_data(graph, data, label, overwrite=False)[source]

Overlays tabular data on the network

Parameters:
  • graph (pybel.BELGraph) – A BEL Graph
  • data (dict) – A dictionary of {tuple node: data for that node}
  • label (str) – The annotation label to put in the node dictionary
  • overwrite (bool) – Should old annotations be overwritten?
pybel_tools.integration.overlay_type_data(graph, data, label, function, namespace, overwrite=False, impute=None)[source]

Overlays tabular data on the network for data that comes from an data set with identifiers that lack namespaces.

For example, if you want to overlay differential gene expression data from a table, that table probably has HGNC identifiers, but no specific annotations that they are in the HGNC namespace or that the entities to which they refer are RNA.

Parameters:
  • graph (pybel.BELGraph) – A BEL Graph
  • data (dict) – A dictionary of {name: data}
  • label (str) – The annotation label to put in the node dictionary
  • function (str) – The function of the keys in the data dictionary
  • namespace (str) – The namespace of the keys in the data dictionary
  • overwrite (bool) – Should old annotations be overwritten?
  • impute – The value to use for missing data