Skip to main content
Home

Main menu

  • Home
  • Search
  • Statistics
  • Contact us
  • Download
  • Help

Help

Help

General Questions

Welcome to metaPhOrs

MetaPhOrs is a public repository of phylogeny-based orthologs and paralogs that were computed using phylogenetic trees available in 12 public repositories. Currently, over 110,970,320 of unique homologs are deposited in MetaPhOrs database. These predictions were retrieved from 13,097,138 Maximum Likelihood trees for 6,384 species. For each prediction, MetaPhOrs provides a Consistency Score and Evidence Level.

The last metaPhOrs paper:
MetaPhOrs 2.0: integrative, phylogeny-based inference of orthology and paralogy across the tree of life
Uciel Chorostecki, Manuel Molina, Leszek P Pryszcz, Toni Gabaldón
Nucleic Acids Research, gkaa282

The metaPhOrs consistency-based algorithm is described here:
MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score
LP Pryszcz, J Huerta-Cepas, T Gabaldon
Nucleic acids research 39 (5), e32-e32

Phylogenetic databases included in MetaPhOrs

Recently, several projects have addressed the reconstruction of large collections of high-quality phylogenetic trees from which orthology can be inferred. This provides us with the opportunity to infer the evolutionary relationships of two genes from multiple, independent, phylogenetic trees and use the consistency across predictions as a reliability measure of an orthology assignment.

MetaPhOrs derives orthology information from gene trees present in different databases. Orthology and Paralogy is derived at the species level and not at lower taxonomic levels such as subspecies or strains.

MetaPhors uses genome annotations provided by other databases such as Uniprot, Ensembl, or GenBank, and is not responsible for any annotation error.

MetaPhors plans to have a major new release on a yearly basis. Currently uses the following databases:
database nº trees reference
Ensembl vertebrates 63,517 Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24. [Link]
Ensembl bacteria 37,806 Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24. [Link]
Ensembl fungi 190,142 Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24. [Link]
Ensembl metazoa 124,442 Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24. [Link]
Ensembl pan 115,449 Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24. [Link]
Ensembl plants 127,876 Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24. [Link]
Ensembl protists 125,350 Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009 Feb;19(2):327-35. doi: 10.1101/gr.073585.107. Epub 2008 Nov 24. [Link]
PhylomeDB 7,551,464 Huerta-Cepas J, Capella-Gutiérrez S, Pryszcz LP, Marcet-Houben M, Gabaldón T. PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome. Nucleic Acids Res. 2014;42(Database issue):D897–D902. doi:10.1093/nar/gkt1177 [Link]
Eggnog 4,120,483 Huerta-Cepas J, Szklarczyk D, Forslund K, et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 2016;44(D1):D286–D293. doi:10.1093/nar/gkv1248 [Link]
Hogenom 564,333 Penel S, Arigon AM, Dufayard JF, et al. Databases of homologous gene families for comparative genomics. BMC Bioinformatics. 2009;10 Suppl 6(Suppl 6):S3. Published 2009 Jun 16. doi:10.1186/1471-2105-10-S6-S3 [Link]
Treefam 15,321 Schreiber F, Patricio M, Muffato M, Pignatelli M, Bateman A. TreeFam v9: a new website, more species and orthology-on-the-fly. Nucleic Acids Res. 2014;42(Database issue):D922–D925. doi:10.1093/nar/gkt1055 [Link]
EvolclustDB 60,955 Marcet-Houben M, Gabaldón T. EvolClust: automated inference of evolutionary conserved gene clusters in eukaryotes. Bioinformatics. 2020 Feb 15;36(4):1265-1266. doi: 10.1093/bioinformatics/btz706. [Link]

FAQS

My search for a gene name does not find a hit despite the species being in the database

Try to search by “Gene description” changing this value in “Advanced Search”.

Is there a tutorial to know the options that Metaphors offers me?


Tree processing

Tree processing

All trees are processed with a common bioinformatics pipeline to retrieve phylogeny-based orthology and paralogy predictions.

Speciation/duplication detection

Duplications and speciations are computed using the species overlap algorithm.
(Genome Biol. 2007;8(6):R109.The human phylome.Huerta-Cepas J1, Dopazo H, Dopazo J, Gabaldón T.)

The species-overlap algorithm is an alternative approach of inferring evolutionary events from gene phylogenies. The only evolutionary information required by such algorithm is a rooted gene tree. This method requires neither a fully-resolved species phylogeny, nor reconciliation steps. To decide whether a given node represents a speciation or a duplication event, this algorithm employs the level of overlap between species represented under its two descendant nodes. In brief, a species-overlap score (SOS) is calculated for every node as the proportion of shared species between child branches over the total number of organisms under the node. If the SOS is higher than given threshold, the parental node is mapped as duplication, otherwise as speciation event. The best performance of the algorithm has been reported to be associated with the use of a SOS threshold equal to 0.0, so speciation is only assumed if no species overlap is detected between its descendant nodes.This the SOS used in MetaPhOrs. 


Computing meta-homologs

MetaPhOrs combines information from multiple strains into single meta-proteome for each species. As a result, the phylogenetic signals from multiple strains of one species present in given tree are counted multiple times and number of trees in orthology tables may be slightly larger than number of trees retrieved in tree page.

Consistency score

Orthology/paralogy assignments from different trees are combined into a single orthology/paralogy predictions using a consistency-based approach. For this a Consistency Score (CS) is computed. CS ranges from 0 to 1. In brief, the closer the value of CS to 1.0, the more confident the prediction. 

Consistency score is the ratio of the number of trees confirming given relationship over the total number of trees that were used to infer the relationship between particular protein pair. Orthology Consistency Score (CSo) is calculated for orthology searches, respectively paralogy Consistency Score (CSp) for paralogy queries, as follows:

  • CSo = To / (To + Tp)
  • CSp = Tp / (To + Tp)

where: 

  • To stands for number of trees confirming orthology
  • Tp for number of trees confirming paralogy relationship.

The recommended CSo threshold for orthology prediction is 0.5. The CS might be altered by the user in order to adjust sensitivity/positivity of each query accordingly. All homology relationships are returned when CS cut-off of 0.0 is applied, while CS cut-off of 1.0 returns only fully consistent predictions.

Evidence level

Evidence level defines the number of independent sources (databases), that support the prediction. In general the higher evidence level, the better reliability of the prediction as more sources were used to infer it.

Evidence level may vary from 1 to 12 (as trees were retrieved from 12 databases). The Evidence Level cut-off has to be altered with care, as external databases overlap only partially, and for some pairs of species there is only one source of data (Evidence Level of 1). It's recommended to start queries with Evidence Level cut-off of 1, and then eventually increase the cut-off.

BSC
INB
IRB
PhylomeDB
Ensembl
Eggnog
EvolclustDB
PRABI
Treefam
 
  • Home
  • Logomakr is licensed under CC BY 3.0
  • © COPYRIGHT 2018-2023
 
  • Home
  • Search
  • Statistics
  • Contact us
 
  • Citation
  • Cookies & Privacy
 
  • About us
  • About BSC
 

Follow us

All data you can find in this website is under a CC-BY-NC license.
CC