Data archiving strategy
Data for the International Mouse Phenotyping Consortium (IMPC) is generated at geographically distributed phenotyping centers. This data is then sent to the IMPC Data Coordination Center (DCC) at MRC Harwell for aggregation and quality control checking. Periodically, the data is released from the IMPC DCC to the IMPC Central Data Archive (CDA).
The IMPC CDA stores the data and maintains a web portal and APIs for data access. Accessing the released IMPC data can be done through the web portal instance available at https://www.mousephenotype.org or through several APIs including: Raw data API, Genotype-phenotype API, and others. The portal and API access methods represent the current released data of the IMPC production process. As new data and new analysis methods become available, new IMPC data releases are deployed and the previous data release is then only available from an archive.
For long term storage, the IMPC data archive is hosted on the European Bioinformatics Institute FTP server. A directory is created for each release.
This archive variously includes:
- A set of reports focusing on different aspects of the data
- A set of genotype-phenotype CSV formatted files with full results, useful for end users or for programmatic access to the data
- A full copy of the MySQL data base used to generate the indexes required to serve the web portal and APIs
- A full copy of the Solr cores required to run the web portal
- A docker container encapsulating the IMPC web portal and data