Document Utilities¶

Creating Definition Documents¶

pybel_tools.definition_utils.get_merged_namespace_names(locations, check_keywords=True)[source]¶

Loads many namespaces and combines their names.

Parameters:	locations (iter[str]) – An iterable of URLs or file paths pointing to BEL namespaces. check_keywords (bool) – Should all the keywords be the same? Defaults to `True`
Returns:	A dictionary of {names: labels}
Return type:	dict[str, str]

Example Usage

>>> from pybel.resources.definitions import write_namespace
>>> from pybel_tools.definition_utils import export_namespace, get_merged_namespace_names
>>> graph = ...
>>> original_ns_url = ...
>>> export_namespace(graph, 'MBS') # Outputs in current directory to MBS.belns
>>> value_dict = get_merged_namespace_names([original_ns_url, 'MBS.belns'])
>>> with open('merged_namespace.belns', 'w') as f:
>>> ...  write_namespace('MyBrokenNamespace', 'MBS', 'Other', 'Charles Hoyt', 'PyBEL Citation', value_dict, file=f)

pybel_tools.definition_utils.merge_namespaces(input_locations, output_path, namespace_name, namespace_keyword, namespace_domain, author_name, citation_name, namespace_description=None, namespace_species=None, namespace_version=None, namespace_query_url=None, namespace_created=None, author_contact=None, author_copyright=None, citation_description=None, citation_url=None, citation_version=None, citation_date=None, case_sensitive=True, delimiter='|', cacheable=True, functions=None, value_prefix='', sort_key=None, check_keywords=True)[source]¶

Merges namespaces from multiple locations to one.

Parameters:

input_locations (iter) – An iterable of URLs or file paths pointing to BEL namespaces.
output_path (str) – The path to the file to write the merged namespace
namespace_name (str) – The namespace name
namespace_keyword (str) – Preferred BEL Keyword, maximum length of 8
namespace_domain (str) – One of: pybel.constants.NAMESPACE_DOMAIN_BIOPROCESS, pybel.constants.NAMESPACE_DOMAIN_CHEMICAL, pybel.constants.NAMESPACE_DOMAIN_GENE, or pybel.constants.NAMESPACE_DOMAIN_OTHER
author_name (str) – The namespace’s authors
citation_name (str) – The name of the citation
namespace_query_url (str) – HTTP URL to query for details on namespace values (must be valid URL)
namespace_description (str) – Namespace description
namespace_species (str) – Comma-separated list of species taxonomy id’s
namespace_version (str) – Namespace version
namespace_created (str) – Namespace public timestamp, ISO 8601 datetime
author_contact (str) – Namespace author’s contact info/email address
author_copyright (str) – Namespace’s copyright/license information
citation_description (str) – Citation description
citation_url (str) – URL to more citation information
citation_version (str) – Citation version
citation_date (str) – Citation publish timestamp, ISO 8601 Date
case_sensitive (bool) – Should this config file be interpreted as case-sensitive?
delimiter (str) – The delimiter between names and labels in this config file
cacheable (bool) – Should this config file be cached?
functions (iterable of characters) – The encoding for the elements in this namespace
value_prefix (str) – a prefix for each name
sort_key – A function to sort the values with sorted()
check_keywords (bool) – Should all the keywords be the same? Defaults to True

pybel_tools.definition_utils.export_namespace(graph, namespace, directory=None, cacheable=False)[source]¶

Exports all names and missing names from the given namespace to its own BEL Namespace files in the given directory.

Could be useful during quick and dirty curation, where planned namespace building is not a priority.

Parameters:

graph (pybel.BELGraph) – A BEL graph
namespace (str) – The namespace to process
directory (str) – The path to the directory where to output the namespace. Defaults to the current working directory returned by os.getcwd()
cacheable (bool) – Should the namespace be cacheable? Defaults to False because, in general, this operation will probably be used for evil, and users won’t want to reload their entire cache after each iteration of curation.

pybel_tools.definition_utils.export_namespaces(graph, namespaces, directory=None, cacheable=False)[source]¶

Thinly wraps export_namespace() for an iterable of namespaces.

Parameters:

graph (pybel.BELGraph) – A BEL graph
namespaces (iter[str]) – An iterable of strings for the namespaces to process
directory (str) – The path to the directory where to output the namespaces. Defaults to the current working directory returned by os.getcwd()
cacheable (bool) – Should the namespaces be cacheable? Defaults to False because, in general, this operation will probably be used for evil, and users won’t want to reload their entire cache after each iteration of curation.

Creating Knowledge Documents¶

pybel_tools.document_utils.write_boilerplate(name, version=None, description=None, authors=None, contact=None, copyright=None, licenses=None, disclaimer=None, namespace_url=None, namespace_patterns=None, annotation_url=None, annotation_patterns=None, annotation_list=None, pmids=None, entrez_ids=None, file=None)[source]¶

Writes a boilerplate BEL document, with standard document metadata, definitions.

Parameters:

name (str) – The unique name for this BEL document
contact (str) – The email address of the maintainer
description (str) – A description of the contents of this document
authors (str) – The authors of this document
version (str) – The version. Defaults to current date in format YYYYMMDD.
copyright (str) – Copyright information about this document
licenses (str) – The license applied to this document
disclaimer (str) – The disclaimer for this document
namespace_url (dict[str,str]) – an optional dictionary of {str name: str URL} of namespaces
namespace_patterns (dict[str,str]) – An optional dictionary of {str name: str regex} namespaces
annotation_url (dict[str,str]) – An optional dictionary of {str name: str URL} of annotations
annotation_patterns (dict[str,str]) – An optional dictionary of {str name: str regex} of regex annotations
annotation_list (dict[str,set[str]]) – An optional dictionary of {str name: set of names} of list annotations
or iter[int] pmids (iter[str]) – A list of PubMed identifiers to auto-populate with citation and abstract
or iter[int] entrez_ids (iter[str]) – A list of Entrez identifiers to autopopulate the gene summary as evidence
file (file) – A writable file or file-like. If None, defaults to sys.stdout

pybel_tools.document_utils.lint_file(in_file, out_file=None)[source]¶

Helps remove extraneous whitespace from the lines of a file

Parameters:	in_file (file) – A readable file or file-like out_file (file) – A writable file or file-like

pybel_tools.document_utils.lint_directory(source, target)[source]¶

Adds a linted version of each document in the source directory to the target directory

Parameters:	source (str) – Path to directory to lint target (str) – Path to directory to output