Integration

This module contains functions that help add more data to the network

pybel_tools.integration.overlay_data(graph, data, label=None, overwrite=False)[source]

Overlays tabular data on the network

Parameters
  • graph (BELGraph) – A BEL Graph

  • data (Mapping[BaseEntity, Any]) – A dictionary of {tuple node: data for that node}

  • label (Optional[str]) – The annotation label to put in the node dictionary

  • overwrite (bool) – Should old annotations be overwritten?

Return type

None

pybel_tools.integration.overlay_type_data(graph, data, func, namespace, label=None, overwrite=False, impute=None)[source]

Overlay tabular data on the network for data that comes from an data set with identifiers that lack namespaces.

For example, if you want to overlay differential gene expression data from a table, that table probably has HGNC identifiers, but no specific annotations that they are in the HGNC namespace or that the entities to which they refer are RNA.

Parameters
  • graph (BELGraph) – A BEL Graph

  • data (dict) – A dictionary of {name: data}

  • func (str) – The function of the keys in the data dictionary

  • namespace (str) – The namespace of the keys in the data dictionary

  • label (Optional[str]) – The annotation label to put in the node dictionary

  • overwrite (bool) – Should old annotations be overwritten?

  • impute (Optional[float]) – The value to use for missing data

Return type

None

pybel_tools.integration.load_differential_gene_expression(path, gene_symbol_column='Gene.symbol', logfc_column='logFC', aggregator=None)[source]

Load and pre-process a differential gene expression data.

Parameters
  • path (str) – The path to the CSV

  • gene_symbol_column (str) – The header of the gene symbol column in the data frame

  • logfc_column (str) – The header of the log-fold-change column in the data frame

  • aggregator (Optional[Callable[[List[float]], float]]) – A function that aggregates a list of differential gene expression values. Defaults to numpy.median(). Could also use: numpy.mean(), numpy.average(), numpy.min(), or numpy.max()

Return type

Mapping[str, float]

Returns

A dictionary of {gene symbol: log fold change}