Summary¶

These scripts are designed to assist in the analysis of errors within BEL documents and provide some suggestions for fixes.

pybel_tools.summary.count_relations(graph)[source]¶

Return a histogram over all relationships in a graph.

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A Counter from {relation type: frequency}
Return type:	collections.Counter

pybel_tools.summary.get_edge_relations(graph)[source]¶

Builds a dictionary of {node pair: set of edge types}

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A dictionary of {(node, node): set of edge types}
Return type:	dict[tuple[tuple,tuple],set[str]]

pybel_tools.summary.count_unique_relations(graph)[source]¶

Returns a histogram of the different types of relations present in a graph.

Note: this operation only counts each type of edge once for each pair of nodes

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	Counter from {relation type: frequency}
Return type:	collections.Counter

pybel_tools.summary.count_annotations(graph)[source]¶

Counts how many times each annotation is used in the graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A Counter from {annotation key: frequency}
Return type:	collections.Counter

pybel_tools.summary.get_annotations(graph)[source]¶

Gets the set of annotations used in the graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A set of annotation keys
Return type:	set[str]

pybel_tools.summary.get_annotations_containing_keyword(graph, keyword)[source]¶

Gets annotation/value pairs for values for whom the search string is a substring

Parameters:	graph (pybel.BELGraph) – A BEL graph keyword (str) – Search for annotations whose values have this as a substring
Return type:	list[dict[str,str]

pybel_tools.summary.count_annotation_values(graph, annotation)[source]¶

Counts in how many edges each annotation appears in a graph

Parameters:	graph (pybel.BELGraph) – A BEL graph annotation (str) – The annotation to count
Returns:	A Counter from {annotation value: frequency}
Return type:	collections.Counter

pybel_tools.summary.count_annotation_values_filtered(graph, annotation, source_filter=None, target_filter=None)[source]¶

Counts in how many edges each annotation appears in a graph, but filter out source nodes and target nodes

See pybel_tools.utils.keep_node() for a basic filter.

Parameters:	graph (pybel.BELGraph) – A BEL graph annotation (str) – The annotation to count source_filter (types.FunctionType) – A predicate (graph, node) -> bool for keeping source nodes target_filter (types.FunctionType) – A predicate (graph, node) -> bool for keeping target nodes
Returns:	A Counter from {annotation value: frequency}
Return type:	Counter

pybel_tools.summary.pair_is_consistent(graph, u, v)[source]¶

Return if the edges between the given nodes are consistent, meaning they all have the same relation.

Parameters:	graph (pybel.BELGraph) – A BEL graph u (tuple) – The source BEL node v (tuple) – The target BEL node
Returns:	If the edges aren’t consistent, return false, otherwise return the relation type
Return type:	bool or str

pybel_tools.summary.get_consistent_edges(graph)[source]¶

Yields pairs of (source node, target node) for which all of their edges have the same type of relation.

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	An iterator over (source, target) node pairs corresponding to edges with many inconsistent relations
Return type:	iter[tuple]

pybel_tools.summary.pair_has_contradiction(graph, u, v)[source]¶

Checks if a pair of nodes has any contradictions in their causal relationships.

Parameters:	graph (pybel.BELGraph) – A BEL graph u (tuple) – The source BEL node v (tuple) – The target BEL node
Returns:	Do the edges between these nodes have a contradiction?
Return type:	bool

pybel_tools.summary.get_contradictory_pairs(graph)[source]¶

Iterates over contradictory node pairs in the graph based on their causal relationships

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	An iterator over (source, target) node pairs that have contradictory causal edges
Return type:	iter

pybel_tools.summary.count_pathologies(graph)[source]¶

Returns a counter of all of the mentions of pathologies in a network

Parameters:	graph (pybel.BELGraph) – A BEL graph
Return type:	Counter

pybel_tools.summary.relation_set_has_contradictions(relations)[source]¶

Return if the set of relations contains a contradiction.

Parameters:	relations (set[str]) – A set of relations
Return type:	bool

pybel_tools.summary.get_unused_annotations(graph)[source]¶

Gets the set of all annotations that are defined in a graph, but are never used.

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A set of annotations
Return type:	set[str]

pybel_tools.summary.get_unused_list_annotation_values(graph)[source]¶

Gets all of the unused values for list annotations

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A dictionary of {str annotation: set of str values that aren’t used}
Return type:	dict[str, set[str]]

pybel_tools.summary.count_error_types(graph)[source]¶

Counts the occurrence of each type of error in a graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A Counter of {error type: frequency}
Return type:	collections.Counter

pybel_tools.summary.count_naked_names(graph)[source]¶

Counts the frequency of each naked name (names without namespaces)

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A Counter from {name: frequency}
Return type:	collections.Counter

pybel_tools.summary.get_naked_names(graph)[source]¶

Gets the set of naked names in the graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Return type:	set[str]

pybel_tools.summary.get_incorrect_names_by_namespace(graph, namespace)[source]¶

Returns the set of all incorrect names from the given namespace in the graph

Parameters:	graph (pybel.BELGraph) – A BEL graph namespace (str) – The namespace to filter by
Returns:	The set of all incorrect names from the given namespace in the graph
Return type:	set[str]

pybel_tools.summary.get_incorrect_names(graph)[source]¶

Returns the dict of the sets of all incorrect names from the given namespace in the graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	The set of all incorrect names from the given namespace in the graph
Return type:	dict[str,set[str]]

pybel_tools.summary.get_undefined_namespaces(graph)[source]¶

Gets all namespaces that aren’t actually defined

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	The set of all undefined namespaces
Return type:	set[str]

pybel_tools.summary.get_undefined_namespace_names(graph, namespace)[source]¶

Gets the names from a namespace that wasn’t actually defined

Parameters:	graph (pybel.BELGraph) – A BEL graph namespace (str) – The namespace to filter by
Returns:	The set of all names from the undefined namespace
Return type:	set[str]

pybel_tools.summary.calculate_incorrect_name_dict(graph)[source]¶

Groups all of the incorrect identifiers in a dict of {namespace: list of erroneous names}

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A dictionary of {namespace: list of erroneous names}
Return type:	dict[str, str]

pybel_tools.summary.calculate_error_by_annotation(graph, annotation)[source]¶

Groups the graph by a given annotation and builds lists of errors for each

Parameters:	graph (pybel.BELGraph) – A BEL graph annotation (str) – The annotation to group errors by
Returns:	A dictionary of {annotation value: list of errors}
Return type:	dict[str, list[str]]

pybel_tools.summary.group_errors(graph)[source]¶

Groups the errors together for analysis of the most frequent error

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A dictionary of {error string: list of line numbers}
Return type:	dict[str, list[int]]

pybel_tools.summary.get_names_including_errors(graph)[source]¶

Takes the names from the graph in a given namespace and the erroneous names from the same namespace and returns them together as a unioned set

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	The dict of the sets of all correct and incorrect names from the given namespace in the graph
Return type:	dict[str,set[str]]

pybel_tools.summary.get_names_including_errors_by_namespace(graph, namespace)[source]¶

Takes the names from the graph in a given namespace (pybel.struct.summary.get_names_by_namespace()) and the erroneous names from the same namespace (get_incorrect_names_by_namespace()) and returns them together as a unioned set

Parameters:	graph (pybel.BELGraph) – A BEL graph namespace (str) – The namespace to filter by
Returns:	The set of all correct and incorrect names from the given namespace in the graph
Return type:	set[str]

pybel_tools.summary.get_undefined_annotations(graph)[source]¶

Gets all annotations that aren’t actually defined

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	The set of all undefined annotations
Return type:	set[str]

pybel_tools.summary.get_namespaces_with_incorrect_names(graph)[source]¶

Returns the set of all namespaces with incorrect names in the graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Return type:	set[str]

pybel_tools.summary.get_most_common_errors(graph, number=20)[source]¶

Gets the most common errors in a graph

Parameters:	graph (pybel.BELGraph) – number (int) –
Return type:	Counter

pybel_tools.summary.plot_summary_axes(graph, lax, rax, logx=True)[source]¶

Plots your graph summary statistics on the given axes.

After, you should run plt.tight_layout() and you must run plt.show() to view.

Shows: 1. Count of nodes, grouped by function type 2. Count of edges, grouped by relation type

Parameters:	graph (pybel.BELGraph) – A BEL graph lax – An axis object from matplotlib rax – An axis object from matplotlib

Example usage:

>>> import matplotlib.pyplot as plt
>>> from pybel import from_pickle
>>> from pybel_tools.summary import plot_summary_axes
>>> graph = from_pickle('~/dev/bms/aetionomy/parkinsons.gpickle')
>>> fig, axes = plt.subplots(1, 2, figsize=(10, 4))
>>> plot_summary_axes(graph, axes[0], axes[1])
>>> plt.tight_layout()
>>> plt.show()

pybel_tools.summary.plot_summary(graph, plt, logx=True, **kwargs)[source]¶

Plots your graph summary statistics. This function is a thin wrapper around plot_summary_axis(). It automatically takes care of building figures given matplotlib’s pyplot module as an argument. After, you need to run plt.show().

plt is given as an argument to avoid needing matplotlib as a dependency for this function

Shows:

Count of nodes, grouped by function type
Count of edges, grouped by relation type

Parameters:	graph (pybel.BELGraph) – A BEL graph plt – Give `matplotlib.pyplot` to this parameter kwargs – keyword arguments to give to `plt.subplots()`

Example usage:

>>> import matplotlib.pyplot as plt
>>> from pybel import from_pickle
>>> from pybel_tools.summary import plot_summary
>>> graph = from_pickle('~/dev/bms/aetionomy/parkinsons.gpickle')
>>> plot_summary(graph, plt, figsize=(10, 4))
>>> plt.show()

pybel_tools.summary.info_list(graph)[source]¶

Returns useful information about the graph as a list of tuples

Parameters:	graph (pybel.BELGraph) – A BEL graph
Return type:	list

pybel_tools.summary.info_str(graph)[source]¶

Puts useful information about the graph in a string

Parameters:	graph (pybel.BELGraph) – A BEL graph
Return type:	str

pybel_tools.summary.info_json(graph)[source]¶

Returns useful information about the graph as a dictionary

Parameters:	graph (pybel.BELGraph) – A BEL graph
Return type:	dict

pybel_tools.summary.print_summary(graph, file=None)[source]¶

Prints useful information about the graph

Parameters:	graph (pybel.BELGraph) – A BEL graph file – A writeable file or file-like object. If None, defaults to `sys.stdout`

pybel_tools.summary.is_causal_relation(data)[source]¶

Check if the given relation is causal.

Parameters:	data (dict) – The PyBEL edge data dictionary
Return type:	bool

pybel_tools.summary.get_causal_out_edges(graph, nbunch)[source]¶

Gets the out-edges to the given node that are causal

Parameters:	graph (pybel.BELGraph) – A BEL graph nbunch (tuple) – A BEL node or iterable of BEL nodes
Returns:	A set of (source, target) pairs where the source is the given node
Return type:	set[tuple]

pybel_tools.summary.get_causal_in_edges(graph, node)[source]¶

Gets the in-edges to the given node that are causal

Parameters:	graph (pybel.BELGraph) – A BEL graph node (tuple) – A BEL node
Returns:	A set of (source, target) pairs where the target is the given node
Return type:	set

pybel_tools.summary.is_causal_source(graph, node)[source]¶

Return true of the node is a causal source.

Doesn’t have any causal in edge(s)
Does have causal out edge(s)

Parameters:	graph (pybel.BELGraph) – A BEL graph node (tuple) – A BEL node
Returns:	If the node is a causal source
Return type:	bool

pybel_tools.summary.is_causal_central(graph, node)[source]¶

Return true if the node is neither a causal sink nor a causal source.

Does have causal in edges(s)
Does have causal out edge(s)

Parameters:	graph (pybel.BELGraph) – A BEL graph node (tuple) – A BEL node
Returns:	If the node is neither a causal sink nor a causal source
Return type:	bool

pybel_tools.summary.is_causal_sink(graph, node)[source]¶

Return true if the node is a causal sink.

Does have causal in edge(s)
Doesn’t have any causal out edge(s)

Parameters:	graph (pybel.BELGraph) – A BEL graph node (tuple) – A BEL node
Returns:	If the node is a causal source
Return type:	bool

pybel_tools.summary.get_causal_source_nodes(graph, function)[source]¶

Returns a set of all nodes that have an in-degree of 0, which likely means that it is an external perturbagen and is not known to have any causal origin from within the biological system.

These nodes are useful to identify because they generally don’t provide any mechanistic insight.

Parameters:	graph (pybel.BELGraph) – A BEL graph function (str) – The BEL function to filter by
Returns:	A set of source nodes
Return type:	set[tuple]

pybel_tools.summary.get_causal_central_nodes(graph, function)[source]¶

Returns a set of all nodes that have both an in-degree > 0 and out-degree > 0. This means that they are an integral part of a pathway, since they are both produced and consumed.

Parameters:	graph (pybel.BELGraph) – A BEL graph function (str) – The BEL function to filter by
Returns:	A set of central ABUNDANCE nodes
Return type:	set

pybel_tools.summary.get_causal_sink_nodes(graph, function)[source]¶

Returns a set of all ABUNDANCE nodes that have an causal out-degree of 0, which likely means that the knowledge assembly is incomplete, or there is a curation error.

Parameters:	graph (pybel.BELGraph) – A BEL graph function (str) – The BEL function to filter by
Returns:	A set of sink ABUNDANCE nodes
Return type:	set[tuple]

pybel_tools.summary.get_degradations(graph)[source]¶

Gets all nodes that are degraded

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A set of nodes that are degraded
Return type:	set[tuple]

pybel_tools.summary.get_activities(graph)[source]¶

Gets all nodes that have molecular activities

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A set of nodes that have molecular activities
Return type:	set[tuple]

pybel_tools.summary.get_translocated(graph)[source]¶

Gets all nodes that are translocated

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A set of nodes that are translocated
Return type:	set[tuple]

pybel_tools.summary.count_top_centrality(graph, number=30)[source]¶

Gets top centrality dictionary

Parameters:	graph – number (int) –
Return type:	dict[tuple,int]

pybel_tools.summary.get_modifications_count(graph)[source]¶

Gets a modifications count dictionary

Parameters:	graph (pybel.BELGraph) –
Return type:	dict[str,int]

pybel_tools.summary.count_subgraph_sizes(graph, annotation='Subgraph')[source]¶

Counts the number of nodes in each subgraph induced by an anotation

Parameters:	graph (pybel.BELGraph) – A BEL graph annotation (str) – The annotation to group by and compare. Defaults to ‘Subgraph’
Returns:	A dictionary from {annotation value: number of nodes}
Return type:	dict[str, int]

pybel_tools.summary.calculate_subgraph_edge_overlap(graph, annotation='Subgraph')[source]¶

Builds a dataframe to show the overlap between different subgraphs

Options: 1. Total number of edges overlap (intersection) 2. Percentage overlap (tanimoto similarity)

Parameters:	graph (pybel.BELGraph) – A BEL graph annotation (str) – The annotation to group by and compare. Defaults to ‘Subgraph’
Returns:	{subgraph: set of edges}, {(subgraph 1, subgraph2): set of intersecting edges}, {(subgraph 1, subgraph2): set of unioned edges}, {(subgraph 1, subgraph2): tanimoto similarity},

pybel_tools.summary.summarize_subgraph_edge_overlap(graph, annotation='Subgraph')[source]¶

Returns a similarity matrix between all subgraphs (or other given annotation)

Parameters:	graph (pybel.BELGraph) – A BEL graph annotation (str) – The annotation to group by and compare. Defaults to `"Subgraph"`
Returns:	A similarity matrix in a dict of dicts
Return type:	dict

pybel_tools.summary.rank_subgraph_by_node_filter(graph, node_filters, annotation='Subgraph', reverse=True)[source]¶

Ranks subgraphs by which have the most nodes matching an given filter

Parameters:	graph (pybel.BELGraph) – A BEL graph node_filters (types.FunctionType or iter[types.FunctionType]) – A predicate or list of predicates (graph, node) -> bool annotation (str) – reverse (bool) –
Return type:	list

A use case for this function would be to identify which subgraphs contain the most differentially expressed genes.

>>> from pybel import from_pickle
>>> from pybel.constants import *
>>> from pybel_tools.integration import overlay_type_data
>>> from pybel_tools.summary import rank_subgraph_by_node_filter
>>> import pandas as pd
>>> graph = from_pickle('~/dev/bms/aetionomy/alzheimers.gpickle')
>>> df = pd.read_csv('~/dev/bananas/data/alzheimers_dgxp.csv', columns=['Gene', 'log2fc'])
>>> data = {gene: log2fc for _, gene, log2fc in df.itertuples()}
>>> overlay_type_data(graph, data, 'log2fc', GENE, 'HGNC', impute=0)
>>> results = rank_subgraph_by_node_filter(graph, lambda g, n: 1.3 < abs(g.node[n]['log2fc']))

pybel_tools.summary.summarize_subgraph_node_overlap(graph, node_filters=None, annotation='Subgraph')[source]¶

Calculates the subgraph similarity tanimoto similarity in nodes passing the given filter

Provides an alternate view on subgraph similarity, from a more node-centric view

pybel_tools.summary.count_pmids(graph)[source]¶

Counts the frequency of PubMed documents in a graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A Counter from {(pmid, name): frequency}
Return type:	collections.Counter

pybel_tools.summary.get_pmid_by_keyword(keyword, graph=None, pubmed_identifiers=None)[source]¶

Gets the set of PubMed identifiers beginning with the given keyword string

Parameters:	graph (pybel.BELGraph) – A BEL graph keyword (str) – The beginning of a PubMed identifier pubmed_identifiers (set[str]) – A set of pre-cached PubMed identifiers
Returns:	A set of PubMed identifiers starting with the given string
Return type:	set[str]

pybel_tools.summary.count_citations(graph, **annotations)[source]¶

Counts the citations in a graph based on a given filter

Parameters:	graph (pybel.BELGraph) – A BEL graph annotations (dict) – The annotation filters to use
Returns:	A counter from {(citation type, citation reference): frequency}
Return type:	collections.Counter

pybel_tools.summary.count_citations_by_annotation(graph, annotation)[source]¶

Groups the citation counters by subgraphs induced by the annotation

Parameters:	graph (pybel.BELGraph) – A BEL graph annotation (str) – The annotation to use to group the graph
Returns:	A dictionary of Counters {subgraph name: Counter from {citation: frequency}}

pybel_tools.summary.count_authors(graph)[source]¶

Counts the contributions of each author to the given graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A Counter from {author name: frequency}
Return type:	collections.Counter

pybel_tools.summary.count_unique_authors(graph)[source]¶

Counts all authors in the given graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	The number of unique authors whose publications contributed to the graph
Return type:	int

pybel_tools.summary.count_author_publications(graph)[source]¶

Counts the number of publications of each author to the given graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A Counter from {author name: frequency}
Return type:	collections.Counter

pybel_tools.summary.count_unique_citations(graph)[source]¶

Returns the number of unique citations

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	The number of unique citations in the graph.
Return type:	int

pybel_tools.summary.get_authors(graph)[source]¶

Gets the set of all authors in the given graph

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A set of author names
Return type:	set[str]

pybel_tools.summary.get_authors_by_keyword(keyword, graph=None, authors=None)[source]¶

Gets authors for whom the search term is a substring

Parameters:	graph (pybel.BELGraph) – A BEL graph keyword (str) – The keyword to search the author strings for authors (set[str]) – An optional set of pre-cached authors calculated from the graph
Returns:	A set of authors with the keyword as a substring
Return type:	set[str]

pybel_tools.summary.count_authors_by_annotation(graph, annotation='Subgraph')[source]¶

Groups the author counters by subgraphs induced by the annotation

Parameters:	graph (pybel.BELGraph) – A BEL graph annotation (str) – The annotation to use to group the graph
Returns:	A dictionary of Counters {subgraph name: Counter from {author: frequency}}
Return type:	dict

pybel_tools.summary.get_evidences_by_pmid(graph, pmids)[source]¶

Gets a dictionary from the given PubMed identifiers to the sets of all evidence strings associated with each in the graph

Parameters:	graph (pybel.BELGraph) – A BEL graph or iter[str] pmids (str) – An iterable of PubMed identifiers, as strings. Is consumed and converted to a set.
Returns:	A dictionary of {pmid: set of all evidence strings}
Return type:	dict

pybel_tools.summary.count_citation_years(graph)[source]¶

Counts the number of citations in each year

Parameters:	graph (pybel.BELGraph) – A BEL graph
Returns:	A Counter of {int year: int frequency}
Return type:	collections.Counter

pybel_tools.summary.create_timeline(year_counter)[source]¶

Completes the Counter timeline

Parameters:	year_counter (Counter) – counter dict for each year
Returns:	complete timeline
Return type:	list[tuple[int,int]]

pybel_tools.summary.get_citation_years(graph)[source]¶

Creates a citation timeline counter

Parameters:	graph (pybel.BELGraph) – A BEL graph
Return type:	list[tuple[int,int]]