Document Utilities

Creating Definition Documents

Utilities for serializing to BEL namespace and BEL annotation files

pybel_tools.definition_utils.make_namespace_header(name, keyword, domain, query_url=None, description=None, species=None, version=None, created=None)[source]

Makes the [Namespace] section of a BELNS file

Parameters:
  • name (str) – The namespace name
  • keyword (str) – Preferred BEL Keyword, maximum length of 8
  • domain (str) – One of: pybel.constants.NAMESPACE_DOMAIN_BIOPROCESS, pybel.constants.NAMESPACE_DOMAIN_CHEMICAL, pybel.constants.NAMESPACE_DOMAIN_GENE, or pybel.constants.NAMESPACE_DOMAIN_OTHER
  • query_url (str) – HTTP URL to query for details on namespace values (must be valid URL)
  • description (str) – Namespace description
  • species (str) – Comma-separated list of species taxonomy id’s
  • version (str) – Namespace version. Defaults to current date in YYYYMMDD format.
  • created (str) – Namespace public timestamp, ISO 8601 datetime
Returns:

An iterator over the lines of the [Namespace] section of a BELNS file

Return type:

iter[str]

pybel_tools.definition_utils.make_author_header(name=None, contact=None, copyright_str=None)[source]

Makes the [Author] section of a BELNS file

Parameters:
  • name (str) – Namespace’s authors
  • contact (str) – Namespace author’s contact info/email address
  • copyright_str (str) – Namespace’s copyright/license information. Defaults to Other/Proprietary
Returns:

An iterable over the lines of the [Author] section of a BELNS file

Return type:

iter[str]

pybel_tools.definition_utils.make_citation_header(name, description=None, url=None, version=None, date=None)[source]

Makes the [Citation] section of a BEL config file.

Parameters:
  • name (str) – Citation name
  • description (str) – Citation description
  • url (str) – URL to more citation information
  • version (str) – Citation version
  • date (str) – Citation publish timestamp, ISO 8601 Date
Returns:

An iterable over the lines of the [Citation] section of a BEL config file

Return type:

iter[str]

pybel_tools.definition_utils.make_properties_header(case_sensitive=True, delimiter='|', cacheable=True)[source]

Makes the [Processing] section of a BEL config file.

Parameters:
  • case_sensitive (bool) – Should this config file be interpreted as case-sensitive?
  • delimiter (str) – The delimiter between names and labels in this config file
  • cacheable (bool) – Should this config file be cached?
Returns:

An iterable over the lines of the [Processing] section of a BEL config file

Return type:

iter[str]

pybel_tools.definition_utils.write_namespace(namespace_name, namespace_keyword, namespace_domain, author_name, citation_name, values, namespace_description=None, namespace_species=None, namespace_version=None, namespace_query_url=None, namespace_created=None, author_contact=None, author_copyright=None, citation_description=None, citation_url=None, citation_version=None, citation_date=None, case_sensitive=True, delimiter='|', cacheable=True, functions=None, file=None, value_prefix='', sort_key=None)[source]

Writes a BEL namespace (BELNS) to a file

Parameters:
  • namespace_name (str) – The namespace name
  • namespace_keyword (str) – Preferred BEL Keyword, maximum length of 8
  • namespace_domain (str) – One of: pybel.constants.NAMESPACE_DOMAIN_BIOPROCESS, pybel.constants.NAMESPACE_DOMAIN_CHEMICAL, pybel.constants.NAMESPACE_DOMAIN_GENE, or pybel.constants.NAMESPACE_DOMAIN_OTHER
  • author_name (str) – The namespace’s authors
  • citation_name (str) – The name of the citation
  • values (iter[str]) – An iterable of values (strings)
  • namespace_query_url (str) – HTTP URL to query for details on namespace values (must be valid URL)
  • namespace_description (str) – Namespace description
  • namespace_species (str) – Comma-separated list of species taxonomy id’s
  • namespace_version (str) – Namespace version
  • namespace_created (str) – Namespace public timestamp, ISO 8601 datetime
  • author_contact (str) – Namespace author’s contact info/email address
  • author_copyright (str) – Namespace’s copyright/license information
  • citation_description (str) – Citation description
  • citation_url (str) – URL to more citation information
  • citation_version (str) – Citation version
  • citation_date (str) – Citation publish timestamp, ISO 8601 Date
  • case_sensitive (bool) – Should this config file be interpreted as case-sensitive?
  • delimiter (str) – The delimiter between names and labels in this config file
  • cacheable (bool) – Should this config file be cached?
  • functions (str) – The encoding for the elements in this namespace. See pybel.constants.belns_encodings
  • file (file) – A writable file or file-like
  • value_prefix (str) – a prefix for each name
  • sort_key – A function to sort the values with sorted(). Give False to not sort
pybel_tools.definition_utils.make_annotation_header(keyword, description=None, usage=None, version=None, created=None)[source]

Makes the [AnnotationDefinition] section of a BELANNO file

Parameters:
  • keyword (str) – Preferred BEL Keyword, maximum length of 8
  • description (str) – A description of this annotation
  • usage (str) – How to use this annotation
  • version (str) – Namespace version. Defaults to date in YYYYMMDD format.
  • created (str) – Namespace public timestamp, ISO 8601 datetime
Returns:

A iterator over the lines for the [AnnotationDefinition] section

Return type:

iter[str]

pybel_tools.definition_utils.write_annotation(keyword, values, citation_name, description=None, usage=None, version=None, created=None, author_name=None, author_copyright=None, author_contact=None, case_sensitive=True, delimiter='|', cacheable=True, file=None, value_prefix='')[source]

Writes a BEL annotation (BELANNO) to a file

Parameters:
  • keyword (str) – The annotation keyword
  • str] values (dict[str,) – A dictionary of {name: label}
  • citation_name (str) – The citation name
  • description (str) – A description of this annotation
  • usage (str) – How to use this annotation
  • version (str) – The version of this annotation. Defaults to date in YYYYMMDD format.
  • created (str) – The annotation’s public timestamp, ISO 8601 datetime
  • author_name (str) – The author’s name
  • author_copyright (str) – The copyright information for this annotation. Defaults to Other/Proprietary
  • author_contact (str) – The contact information for the author of this annotation.
  • case_sensitive (bool) – Should this config file be interpreted as case-sensitive?
  • delimiter (str) – The delimiter between names and labels in this config file
  • cacheable (bool) – Should this config file be cached?
  • file (file) – A writable file or file-like
  • value_prefix (str) – An optional prefix for all values
pybel_tools.definition_utils.export_namespace(graph, namespace, directory=None, cacheable=False)[source]

Exports all names and missing names from the given namespace to its own BEL Namespace files in the given directory.

Could be useful during quick and dirty curation, where planned namespace building is not a priority.

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • namespace (str) – The namespace to process
  • directory (str) – The path to the directory where to output the namespace. Defaults to the current working directory returned by os.getcwd()
  • cacheable (bool) – Should the namespace be cacheable? Defaults to False because, in general, this operation will probably be used for evil, and users won’t want to reload their entire cache after each iteration of curation.
pybel_tools.definition_utils.export_namespaces(graph, namespaces, directory=None, cacheable=False)[source]

Thinly wraps export_namespace() for an iterable of namespaces.

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • namespaces (iter[str]) – An iterable of strings for the namespaces to process
  • directory (str) – The path to the directory where to output the namespaces. Defaults to the current working directory returned by os.getcwd()
  • cacheable (bool) – Should the namespaces be cacheable? Defaults to False because, in general, this operation will probably be used for evil, and users won’t want to reload their entire cache after each iteration of curation.
pybel_tools.definition_utils.get_merged_namespace_names(locations, check_keywords=True)[source]

Loads many namespaces and combines their names.

Parameters:
  • locations (iter[str]) – An iterable of URLs or file paths pointing to BEL namespaces.
  • check_keywords (bool) – Should all the keywords be the same? Defaults to True
Returns:

A dictionary of {names: labels}

Return type:

dict[str, str]

Example Usage

>>> graph = ...
>>> original_ns_url = ...
>>> export_namespace(graph, 'MBS') # Outputs in current directory to MBS.belns
>>> value_dict = get_merged_namespace_names([original_ns_url, 'MBS.belns'])
>>> with open('merged_namespace.belns', 'w') as f:
>>> ...  write_namespace('MyBrokenNamespace', 'MBS', 'Other', 'Charles Hoyt', 'PyBEL Citation', value_dict, file=f)
pybel_tools.definition_utils.merge_namespaces(input_locations, output_path, namespace_name, namespace_keyword, namespace_domain, author_name, citation_name, namespace_description=None, namespace_species=None, namespace_version=None, namespace_query_url=None, namespace_created=None, author_contact=None, author_copyright=None, citation_description=None, citation_url=None, citation_version=None, citation_date=None, case_sensitive=True, delimiter='|', cacheable=True, functions=None, value_prefix='', sort_key=None, check_keywords=True)[source]

Merges namespaces from multiple locations to one.

Parameters:
  • input_locations (iter) – An iterable of URLs or file paths pointing to BEL namespaces.
  • output_path (str) – The path to the file to write the merged namespace
  • namespace_name (str) – The namespace name
  • namespace_keyword (str) – Preferred BEL Keyword, maximum length of 8
  • namespace_domain (str) – One of: pybel.constants.NAMESPACE_DOMAIN_BIOPROCESS, pybel.constants.NAMESPACE_DOMAIN_CHEMICAL, pybel.constants.NAMESPACE_DOMAIN_GENE, or pybel.constants.NAMESPACE_DOMAIN_OTHER
  • author_name (str) – The namespace’s authors
  • citation_name (str) – The name of the citation
  • namespace_query_url (str) – HTTP URL to query for details on namespace values (must be valid URL)
  • namespace_description (str) – Namespace description
  • namespace_species (str) – Comma-separated list of species taxonomy id’s
  • namespace_version (str) – Namespace version
  • namespace_created (str) – Namespace public timestamp, ISO 8601 datetime
  • author_contact (str) – Namespace author’s contact info/email address
  • author_copyright (str) – Namespace’s copyright/license information
  • citation_description (str) – Citation description
  • citation_url (str) – URL to more citation information
  • citation_version (str) – Citation version
  • citation_date (str) – Citation publish timestamp, ISO 8601 Date
  • case_sensitive (bool) – Should this config file be interpreted as case-sensitive?
  • delimiter (str) – The delimiter between names and labels in this config file
  • cacheable (bool) – Should this config file be cached?
  • functions (iterable of characters) – The encoding for the elements in this namespace
  • value_prefix (str) – a prefix for each name
  • sort_key – A function to sort the values with sorted()
  • check_keywords (bool) – Should all the keywords be the same? Defaults to True
pybel_tools.definition_utils.check_cacheable(config)[source]

Checks the config returned by pybel.utils.get_bel_resource() to determine if the resource should be cached.

If cannot be determined, returns False

Parameters:config (dict) – A configuration dictionary representing a BEL resource
Returns:Should this resource be cached
Return type:bool

Creating Knowledge Documents

Utilities to merge multiple BEL documents on the same topic

pybel_tools.document_utils.merge(output_path, input_paths, merged_name=None, merged_contact=None, merged_description=None, merged_author=None)[source]

Merges multiple BEL documents and maintains author information in comments

Steps:

  1. Load all documents
  2. Identify document metadata information and ns/annot defs
  3. Postpend all statement groups with “- {author email}” and add comments with document information
Parameters:
  • output_path (str) – Path to file to write merged BEL document
  • input_paths (iter[str]) – List of paths to input BEL document files
  • merged_name (str) – name for combined document
  • merged_contact (str) – contact information for combine document
  • merged_description (str) – description of combine document
pybel_tools.document_utils.make_document_metadata(name, contact, description, authors, version=None, copyright=None, licenses=None)[source]

Builds a list of lines for the document metadata section of a BEL document

Parameters:
  • name (str) – The unique name for this BEL document
  • contact (str) – The email address of the maintainer
  • description (str) – A description of the contents of this document
  • authors (str) – The authors of this document
  • version (str) – The version. Defaults to date in YYYYMMDD format.
  • copyright (str) – Copyright information about this document
  • licenses (str) – The license applied to this document
Returns:

An iterator over the lines for the document metadata section

Return type:

iter[str]

pybel_tools.document_utils.make_document_namespaces(namespace_dict=None, namespace_patterns=None)[source]

Builds a list of lines for the namespace definitions

Parameters:
  • namespace_dict (dict[str,str]) – dictionary of {str name: str URL} of namespaces
  • namespace_patterns (dict[str,str]) – A dictionary of {str name: str regex}
Returns:

An iterator over the lines for the namespace definitions

Return type:

iter[str]

pybel_tools.document_utils.make_document_annotations(annotation_dict=None, annotation_patterns=None)[source]

Builds a list of lines for the annotation definitions

Parameters:
  • annotation_dict (dict[str,str]) – A dictionary of {str name: str URL} of annotations
  • annotation_patterns (dict[str,str]) – A dictionary of {str name: str regex}
Returns:

An iterator over the lines for the annotation definitions

Return type:

iter[str]

pybel_tools.document_utils.make_pubmed_abstract_group(pmids)[source]

Builds a skeleton for the citations’ statements

Parameters:pmids (iter[str] or iter[int]) – A list of PubMed identifiers
Returns:An iterator over the lines of the citation section
Return type:iter[str]
pybel_tools.document_utils.get_entrez_gene_data(entrez_ids)[source]

Gets gene info from Entrez

pybel_tools.document_utils.make_pubmed_gene_group(entrez_ids)[source]

Builds a skeleton for gene summaries

Parameters:entrez_ids (list[str]) – A list of entrez id’s to query the pubmed service
Returns:An iterator over statement lines for NCBI entrez gene summaries
Return type:iter[str]
pybel_tools.document_utils.write_boilerplate(document_name, contact, description, authors, version=None, copyright=None, licenses=None, namespace_dict=None, namespace_patterns=None, annotations_dict=None, annotations_patterns=None, pmids=None, entrez_ids=None, file=None)[source]

Writes a boilerplate BEL document, with standard document metadata, definitions. Optionally, if a list of PubMed identifiers are given, the citations and abstracts will be written for each.

Parameters:
  • document_name (str) – The unique name for this BEL document
  • contact (str) – The email address of the maintainer
  • description (str) – A description of the contents of this document
  • authors (str) – The authors of this document
  • version (str) – The version. Defaults to current date in format YYYYMMDD.
  • copyright (str) – Copyright information about this document
  • licenses (str) – The license applied to this document
  • str] namespace_dict (dict[str,) – an optional dictionary of {str name: str URL} of namespaces
  • str] namespace_patterns (dict[str,) – An optional dictionary of {str name: str regex} namespaces
  • str] annotations_dict (dict[str,) – An optional dictionary of {str name: str URL} of annotations
  • str] annotations_patterns (dict[str,) – An optional dictionary of {str name: str regex} annotations
  • or iter[int] pmids (iter[str]) – A list of PubMed identifiers to auto-populate with citation and abstract
  • or iter[int] entrez_ids (iter[str]) – A list of Entrez identifiers to autopopulate the gene summary as evidence
  • file (file) – A writable file or file-like. If None, defaults to sys.stdout