Orthology Resolution Workflow

This module has tools for downloading and structuring gene orthology data from HGNC, RGD, and MGI

pybel_tools.orthology.FULL_RESOURCE = 'http://www.genenames.org/cgi-bin/download?col=gd_hgnc_id&col=gd_app_sym&col=gd_mgd_id&col=md_mgd_id&col=md_rgd_id&status=Approved&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&submit=submit'

Annotations from HGNC Columns: HGNC ID, HGNC Symbol, MGI Curated, MGI Dump, RGD Dump

pybel_tools.orthology.MGI_ONLY = 'http://www.genenames.org/cgi-bin/download?col=gd_app_sym&col=md_mgd_id&status=Approved&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&submit=submit'

Annotations from HGNC Columns: HGNC Symbol, MGI Symbols

pybel_tools.orthology.RGD_ONLY = 'http://www.genenames.org/cgi-bin/download?col=gd_app_sym&col=md_rgd_id&status=Approved&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&submit=submit'

Annotations from HGNC Columns: HGNC Symbol, RGD Symbols

pybel_tools.orthology.MGI_ANNOTATIONS = 'http://www.informatics.jax.org/downloads/reports/MGI_MRK_Coord.rpt'

Annotations from the Jackson Lab, that maintains the Mouse Genome Informatics Database

pybel_tools.orthology.MGI_ORTHOLOGY = 'http://www.informatics.jax.org/downloads/reports/HMD_HumanPhenotype.rpt'

Columns: Human Marker Symbol, Human Entrez Gene ID, HomoloGene ID, Mouse Marker Symbol, MGI Marker Accession ID, High-level Mammalian Phenotype ID (space-delimited) See: http://www.informatics.jax.org/downloads/reports/index.html#pheno

pybel_tools.orthology.RGD_ORTHOLOGY = 'ftp://ftp.rgd.mcw.edu/pub/data_release/RGD_ORTHOLOGS.txt'

Columns: RAT_GENE_SYMBOL RAT_GENE_RGD_ID RAT_GENE_NCBI_GENE_ID HUMAN_ORTHOLOG_SYMBOL HUMAN_ORTHOLOG_RGD HUMAN_ORTHOLOG_NCBI_GENE_ID HUMAN_ORTHOLOG_SOURCE MOUSE_ORTHOLOG_SYMBOL MOUSE_ORTHOLOG_RGD MOUSE_ORTHOLOG_NCBI_GENE_ID MOUSE_ORTHOLOG_MGI MOUSE_ORTHOLOG_SOURCE HUMAN_ORTHOLOG_HGNC_ID First 52 rows are comments with # at beginning and line 53 is the header

pybel_tools.orthology.download_orthologies_from_hgnc(path)[source]

Downloads the full dump to the given path

Parameters:path – output path
pybel_tools.orthology.structure_orthologies_from_hgnc(lines=None)[source]

Structures the orthology data to two lists of pairs of (HGNC, MGI) and (HGNC, RGD) identifiers

Parameters:lines – The iterable over the downloaded orthologies from HGNC. If None, downloads from HGNC
Returns:
pybel_tools.orthology.add_orthology_statements(graph, orthologies, namespace)[source]

Adds orthology statements for all orthologous nodes to HGNC nodes

Parameters:
  • graph (pybel.BELGraph) –
  • orthologies (list) – An iterable over pairs of (HGNC, ORTHOLOG) identifiers
pybel_tools.orthology.integrate_orthologies_from_hgnc(graph, lines=None)[source]

Adds orthology statements to graph using HGNC symbols, MGI IDs, and RGD IDs.

For MGI symbols and RGD symbols, use integrate_orthologies_from_rgd()

Parameters:
pybel_tools.orthology.integrate_orthologies_from_rgd(graph, path=None)[source]

Adds orthology statements to graph using HGNC symbols, MGI symbols, and RGD symbols.

For MGI IDs and RGD IDs, use integrate_orthologies_from_hgnc()

Parameters:
  • graph (pybel.BELGraph) – A BEL Graph
  • path – optional path to local RGD_ORTHOLOGS.txt. Defaults to downloading directly from RGD FTP server with pandas
pybel_tools.orthology.collapse_orthologies(graph)[source]

Collapses all orthology relations.

Assumes: orthologies are annotated for edge (u,v) where u is the higher priority node

Parameters:graph (pybel.BELGraph) – A BEL Graph

Warning

This won’t work for two way orthology annotations, so it’s best to use integrate_orthologies_from_rgd() first