Document Utilities

Creating Definition Documents

pybel_tools.definition_utils.get_merged_namespace_names(locations, check_keywords=True)[source]

Loads many namespaces and combines their names.

Parameters:
  • locations (iter[str]) – An iterable of URLs or file paths pointing to BEL namespaces.
  • check_keywords (bool) – Should all the keywords be the same? Defaults to True
Returns:

A dictionary of {names: labels}

Return type:

dict[str, str]

Example Usage

>>> from pybel.resources.definitions import write_namespace
>>> from pybel_tools.definition_utils import export_namespace, get_merged_namespace_names
>>> graph = ...
>>> original_ns_url = ...
>>> export_namespace(graph, 'MBS') # Outputs in current directory to MBS.belns
>>> value_dict = get_merged_namespace_names([original_ns_url, 'MBS.belns'])
>>> with open('merged_namespace.belns', 'w') as f:
>>> ...  write_namespace('MyBrokenNamespace', 'MBS', 'Other', 'Charles Hoyt', 'PyBEL Citation', value_dict, file=f)
pybel_tools.definition_utils.merge_namespaces(input_locations, output_path, namespace_name, namespace_keyword, namespace_domain, author_name, citation_name, namespace_description=None, namespace_species=None, namespace_version=None, namespace_query_url=None, namespace_created=None, author_contact=None, author_copyright=None, citation_description=None, citation_url=None, citation_version=None, citation_date=None, case_sensitive=True, delimiter='|', cacheable=True, functions=None, value_prefix='', sort_key=None, check_keywords=True)[source]

Merges namespaces from multiple locations to one.

Parameters:
  • input_locations (iter) – An iterable of URLs or file paths pointing to BEL namespaces.
  • output_path (str) – The path to the file to write the merged namespace
  • namespace_name (str) – The namespace name
  • namespace_keyword (str) – Preferred BEL Keyword, maximum length of 8
  • namespace_domain (str) – One of: pybel.constants.NAMESPACE_DOMAIN_BIOPROCESS, pybel.constants.NAMESPACE_DOMAIN_CHEMICAL, pybel.constants.NAMESPACE_DOMAIN_GENE, or pybel.constants.NAMESPACE_DOMAIN_OTHER
  • author_name (str) – The namespace’s authors
  • citation_name (str) – The name of the citation
  • namespace_query_url (str) – HTTP URL to query for details on namespace values (must be valid URL)
  • namespace_description (str) – Namespace description
  • namespace_species (str) – Comma-separated list of species taxonomy id’s
  • namespace_version (str) – Namespace version
  • namespace_created (str) – Namespace public timestamp, ISO 8601 datetime
  • author_contact (str) – Namespace author’s contact info/email address
  • author_copyright (str) – Namespace’s copyright/license information
  • citation_description (str) – Citation description
  • citation_url (str) – URL to more citation information
  • citation_version (str) – Citation version
  • citation_date (str) – Citation publish timestamp, ISO 8601 Date
  • case_sensitive (bool) – Should this config file be interpreted as case-sensitive?
  • delimiter (str) – The delimiter between names and labels in this config file
  • cacheable (bool) – Should this config file be cached?
  • functions (iterable of characters) – The encoding for the elements in this namespace
  • value_prefix (str) – a prefix for each name
  • sort_key – A function to sort the values with sorted()
  • check_keywords (bool) – Should all the keywords be the same? Defaults to True
pybel_tools.definition_utils.export_namespace(graph, namespace, directory=None, cacheable=False)[source]

Exports all names and missing names from the given namespace to its own BEL Namespace files in the given directory.

Could be useful during quick and dirty curation, where planned namespace building is not a priority.

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • namespace (str) – The namespace to process
  • directory (str) – The path to the directory where to output the namespace. Defaults to the current working directory returned by os.getcwd()
  • cacheable (bool) – Should the namespace be cacheable? Defaults to False because, in general, this operation will probably be used for evil, and users won’t want to reload their entire cache after each iteration of curation.
pybel_tools.definition_utils.export_namespaces(graph, namespaces, directory=None, cacheable=False)[source]

Thinly wraps export_namespace() for an iterable of namespaces.

Parameters:
  • graph (pybel.BELGraph) – A BEL graph
  • namespaces (iter[str]) – An iterable of strings for the namespaces to process
  • directory (str) – The path to the directory where to output the namespaces. Defaults to the current working directory returned by os.getcwd()
  • cacheable (bool) – Should the namespaces be cacheable? Defaults to False because, in general, this operation will probably be used for evil, and users won’t want to reload their entire cache after each iteration of curation.

Creating Knowledge Documents

pybel_tools.document_utils.get_entrez_gene_data(entrez_ids)[source]

Gets gene info from Entrez

pybel_tools.document_utils.write_boilerplate(name, version=None, description=None, authors=None, contact=None, copyright=None, licenses=None, disclaimer=None, namespace_url=None, namespace_owl=None, namespace_patterns=None, annotation_url=None, annotation_owl=None, annotation_patterns=None, annotation_list=None, pmids=None, entrez_ids=None, file=None)[source]

Writes a boilerplate BEL document, with standard document metadata, definitions.

Parameters:
  • name (str) – The unique name for this BEL document
  • contact (str) – The email address of the maintainer
  • description (str) – A description of the contents of this document
  • authors (str) – The authors of this document
  • version (str) – The version. Defaults to current date in format YYYYMMDD.
  • copyright (str) – Copyright information about this document
  • licenses (str) – The license applied to this document
  • disclaimer (str) – The disclaimer for this document
  • namespace_url (dict[str,str]) – an optional dictionary of {str name: str URL} of namespaces
  • namespace_owl (dict[str,str]) – an optional dictionary of {str name: str URL} of namespaces
  • namespace_patterns (dict[str,str]) – An optional dictionary of {str name: str regex} namespaces
  • annotation_url (dict[str,str]) – An optional dictionary of {str name: str URL} of annotations
  • annotation_owl (dict[str,str]) – An optional dictionary of {str name: str URL} of OWL annotations
  • annotation_patterns (dict[str,str]) – An optional dictionary of {str name: str regex} of regex annotations
  • annotation_list (dict[str,set[str]]) – An optional dictionary of {str name: set of names} of list annotations
  • or iter[int] pmids (iter[str]) – A list of PubMed identifiers to auto-populate with citation and abstract
  • or iter[int] entrez_ids (iter[str]) – A list of Entrez identifiers to autopopulate the gene summary as evidence
  • file (file) – A writable file or file-like. If None, defaults to sys.stdout
pybel_tools.document_utils.lint_file(in_file, out_file=None)[source]

Helps remove extraneous whitespace from the lines of a file

Parameters:
  • in_file (file) – A readable file or file-like
  • out_file (file) – A writable file or file-like
pybel_tools.document_utils.lint_directory(source, target)[source]

Adds a linted version of each document in the source directory to the target directory

Parameters:
  • source (str) – Path to directory to lint
  • target (str) – Path to directory to output