Starting from DR 15.1, users of the IMPC website will find a soft windowing approach implemented to the statistical analysis. This approach is applied to continuous and not time dependent data, such as HDL-cholesterol (Figure 1). Windowing is not applied to time dependent parameters, such as, for example, body weight. Results are visible on the chart pages of the IMPC website.
Soft windowing is a novel approach that allows to give more weight to controls collected close in time to mutant mice. While mutants are collected in a narrow time frame, controls are collected over long periods of time, typically years in the case of the IMPC. This approach reduces the unwanted but time-dependent noise level in the data, thus avoiding fluctuations in the data that are due to unknown effects, such as experimenter’s effect.
The soft windowing approach was published in the Journal of Bioinformatics (please refer to “Soft Windowing Application to Improve Analysis of High-throughput Phenotyping Data” Journal of Bioinformatics, 2019 for the original publication). Soft windowing applies a weighing function to produce weights in the form of a window of time. Control data collected close in time to the mutants are assigned the maximal weight, while data collected earlier or later have less weight. This method has the capability of producing individual windows as well as merging intersected ones. Moreover, the method was implemented to automatically select window size and shape.
Further, the soft windowing approach addresses the scaling issues associated with the analysis of an ever-increasing set of control data, a characteristic of any long-term project, such as is the case of the IMPC project. By eliminating controls with weights sufficiently close to zero from future analysis and once a window of control data is determined for a dataset, there would be no further requirement to re-analyse the dataset with each subsequent data release. This will reduce the computational resources needed and the resulting gene-phenotype associations will remain stable over time, greatly facilitating data exchange with other research groups using IMPC-generated data.
Figure 1. Scatterplots from two chart pages in the IMPC website showing the implementation of the Soft Windowing approach. The parameter measured is HDL-cholesterol in wildtype (orange) and knockout (blue) mice for Far2 (left panel, chart page here) and Dym (right panel, chart page here). Circles and triangles in the scatterplot represent individual mice, female and male, respectively.
Data release 15.1 is now available, and the full notes can be found here.