Help & Documentation

Disease associations

Once the statistical analysis associates phenotypes to each mutant line, the mouse phenotype profiles are compared against descriptions of human diseases using an algorithm called [Phenodigm](https://academic.oup.com/database/article/doi/10.1093/database/bat025/333089). This procedure uses phenotype annotations, encoded as terms in the [Human Phenotype Ontology](https://hpo.jax.org/), for all diseases described in [OMIM](https://www.omim.org/), [ORPHANET](https://www.orpha.net/), and [DECIPHER](https://decipher.sanger.ac.uk/).

In brief, the Phenodigm algorithm proceeds in two stages. The first stage compares individual disease phenotypes (HP terms) to individual mouse phenotypes (MP terms). The outcome of each comparison depends on the semantic similarity of the terms and their prevalence. Thus, pairs of phenotypes that are biologically similar and relatively rare (e.g. human ‘Ataxia’ and mouse ‘ataxia’) obtain a high score, while pairs that refer to different biological entities and are quite vague (e.g. human ‘Abnormality of the cardiovascular system’ and mouse ‘adipose tissue phenotype’) obtain a very low score. Scores from this stage fall in the range [0, infinity], although typical values are [0, 6]. 

The second stage of the calculation aggregates pairwise phenotypes scores into a single value. Conceptually, this aggregation is an attempt to summarize an overall similarity between the mouse and disease phenotypes, given the available data and annotations. The outcome of this stage is the Phenodigm score and falls in the range [0, 100], with low/high values indicating small/high concordance between the mouse and disease phenotypes.

The IMPC Newsletter

Get highlights of the most important data releases, news and events, delivered straight to your email inbox

Subscribe to newsletter