Integration

This module contains functions that help add more data to the network

pybel_tools.integration.overlay_data(graph, data, label=None, overwrite=False)[source]

Overlays tabular data on the network

Parameters:
  • graph (pybel.BELGraph) – A BEL Graph
  • data (dict) – A dictionary of {tuple node: data for that node}
  • label (Optional[str]) – The annotation label to put in the node dictionary
  • overwrite (bool) – Should old annotations be overwritten?
pybel_tools.integration.overlay_type_data(graph, data, func, namespace, label=None, overwrite=False, impute=None)[source]

Overlay tabular data on the network for data that comes from an data set with identifiers that lack namespaces.

For example, if you want to overlay differential gene expression data from a table, that table probably has HGNC identifiers, but no specific annotations that they are in the HGNC namespace or that the entities to which they refer are RNA.

Parameters:
  • graph (pybel.BELGraph) – A BEL Graph
  • dict data (dict[str,float]) – A dictionary of {name: data}
  • func (str) – The function of the keys in the data dictionary
  • namespace (str) – The namespace of the keys in the data dictionary
  • label (Optional[str]) – The annotation label to put in the node dictionary
  • overwrite (bool) – Should old annotations be overwritten?
  • impute (Optional[float]) – The value to use for missing data
pybel_tools.integration.load_differential_gene_expression(path, gene_symbol_column='Gene.symbol', logfc_column='logFC', aggregator=None)[source]

Load and preprocess a differential gene expression data.

Parameters:
  • path (str) – The path to the CSV
  • gene_symbol_column (str) – The header of the gene symbol column in the data frame
  • logfc_column (str) – The header of the log-fold-change column in the data frame
  • aggregator (Optional[list[float] -> float]) – A function that aggregates a list of differential gene expression values. Defaults to numpy.median(). Could also use: numpy.mean(), numpy.average(), numpy.min(), or numpy.max()
Returns:

A dictionary of {gene symbol: log fold change}

Return type:

dict[str,float]