MetaPhOrs is a public repository of phylogeny-based orthologs and paralogs that were computed using phylogenetic trees available in twelve public repositories. Currently, over 6.8 billion of unique homologs are deposited in MetaPhOrs database. These predictions were retrieved from 7 million Maximum Likelihood trees for 2,714 species. For each prediction, MetaPhOrs provides a Consistency Score and Evidence Level describing its goodness, together with number of trees and links to their source databases.
The metaPhOrs consistency-based algorithm is described here:
MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score
LP Pryszcz, J Huerta-Cepas, T Gabaldon
Nucleic acids research 39 (5), e32-e32
Duplications and speciations are computed using the species overlap algorithm (Huerta-Cepas et. al. 2017).
The species-overlap algorithm is an alternative approach of inferring evolutionary events from gene phylogenies. The only evolutionary information required by such algorithm is a rooted gene tree. This method requires neither a fully-resolved species phylogeny, nor reconciliation steps. To decide whether a given node represents a speciation or a duplication event, this algorithm employs the level of overlap between species represented under its two descendant nodes. In brief, a species-overlap score (SOS) is calculated for every node as the proportion of shared species between child branches over the total number of organisms under the node. If the SOS is higher than given threshold, the parental node is mapped as duplication, otherwise as speciation event. The best performance of the algorithm has been reported to be associated with the use of a SOS threshold equal to 0.0, so speciation is only assumed if no species overlap is detected between its descendant nodes.This the SOS used in MetaPhOrs.
MetaPhOrs combines information from multiple strains into single meta-proteome for each species. As a result, the phylogenetic signals from multiple strains of one species present in given tree are counted multiple times and number of trees in orthology tables may be slightly larger than number of trees retrieved in tree page.
Orthology/paralogy assignment in MetaPhOrs is based on Consistency Score (CS). Consistency score ranges from 0 to 1. In brief, the closer the value of CS to 1.0, the more confident the prediction.
Consistency score is the ratio of the number of trees confirming given relationship over the total number of trees that were used to infer the relationship between particular protein pair. Orthology Consistency Score (CSo) is calculated for orthology searches, respectively paralogy Consistency Score (CSp) for paralogy queries, as follows:
CSo = To / (To + Tp)
CSp = Tp / (To + Tp)
The recommended CSo threshold for orthology prediction is 0.5. The CS might be altered by the user in order to adjust sensitivity/positivity of each query accordingly. All homology relationships are returned when CS cut-off of 0.0 is applied, while CS cut-off of 1.0 returns only fully consistent predictions.
Evidence level defines the number of independent sources (databases), in which trees confirming each prediction have been found. In general the higher evidence level, the better reliability of the prediction as more sources were used to infer it.
Evidence level may vary from 1 to 12 (as trees were retrieved from 12 databases). The Evidence Level cut-off has to altered with care, as external databases overlap partially, and for some pairs of species there is only one source of data (Evidence Level of 1). It's recommended to start queries with Evidence Level cut-off of 1, and then eventually increase the cut-off.
Note, in the first releases (200909 and 200911), evidence level was counting different phylomes as independent source. From release 201405 on, only different databases are counted.