PhenStat, a dedicated statistical package
The statistical methods used by the IMPC have been formalised as an R package called PhenStat, which provides statistical methods for the identification of abnormal phenotypes with an emphasis on high-throughput dataflows. This statistical analysis tool suite was developed based on known variation in experimental workflow and design of phenotyping pipelines.
The PhenStat package contains:
- Dataset checks and cleaning in preparation for the analysis
- Four statistical frameworks for genotype-to-phenotype identification:
- Fisher’s exact (FE) test for categorical data
- Linear mixed model (MM) for continuous data
- Time as Fixed (TF) Effect method
- Reference range plus (RR) model for low N continuous data
- Tools to visualise data and results
- Additional functions that help to choose the correct analysis method
Statistical analyses applied to IMPC data are indicated on the Chart pages, which depict the actual phenotype data collected by the IMPC for a particular gene and parameter. Statistical methods applied to the data can be retrieved from the data files available through the FTP site or the API.