Help & Documentation

Statistical analysis

IMPC data originates from many types of assays. Each assay has different characteristics so they cannot be analysed with a single statistical approach. Therefore, the IMPC uses a toolkit called [PhenStat](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0131274), available as an [R package](https://bioconductor.org/packages/release/bioc/html/PhenStat.html), to apply appropriate statistical methods for each data type. 

Categorical data

Categorical data are analysed using the Fisher exact test. This method involves constructing a confusion matrix by counting the number of animals with a particular feature and without, among mutants and among controls. Examples of this type of analysis include [abnormal coat color](https://www.mousephenotype.org/data/phenotypes/MP:0002075) and [abnormal eye morphology](https://www.mousephenotype.org/data/phenotypes/MP:0002092).

Continuous data

Continuous data are analysed using a linear mixed model. This method uses a linear equation to model measurements as a function of genotype and additional variables – in particular gender, weight, and batch. This allows the statistical procedure to determine significant shifts that can be attributed to the genotype as opposed to the other covariates. Examples of this type of analysis include [hypoactivity in open field](https://www.mousephenotype.org/data/phenotypes/MP:0001402)  and [circulating insulin levels in clinical chemistry](https://www.mousephenotype.org/data/phenotypes/MP:0001560).

Possible citations: http://europepmc.org/abstract/MED/25343444, http://europepmc.org/abstract/MED/22829707

Continuous data with fixed time

Continuous data that are collected over several time points are analysed using a technique that is related, albeit distinct, from the method described above. This technique also uses a linear model to study the effects of genotype, gender, weight, and batch, but also uses the batch variable as a fixed effect. Mathematical details are described in the [PhenStat publication](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0131274). This approach is not used on data by default, but is applied in situations when mutant data were collected over many time points due to operational idiosyncracies.

Continuous data with low N

Continuous data for which the number of mutant animals is lower than expected can be treated with a reference range model. This technique involves comparing measurements to predefined ranges that are deemed to capture acceptable variability in wildtypes. These comparisons, which lead to discrete values (within range vs. out-of-range), are then assessed statistically using a Fisher exact test.

Viability and fertility data

Viability and fertility data are treated separately with a custom approach. These data are collected and processed by the phenotyping centers, which use statistical methods appropriate for their breeding scheme.

The IMPC Newsletter

Get highlights of the most important data releases, news and events, delivered straight to your email inbox

Subscribe to newsletter