The International Mouse Phenotype Consortium (IMPC) is composed of several international research institutions. At each institute, mutant mice are phenotyped and the data produced is sent to a Data Coordination Center (DCC). Once the data has passed a set of Quality Control (QC) checks, the data is statistically analyzed.
Each institute has unique operating standards and must track specimen and phenotype data to allow downstream analysis. To facilitate this requirement, the DCC defined a method to uniquely identify a phenotyping effort. One key data point used for this is the Colony ID. The Colony ID is used to encapsulate phenotyping mutants from a single allele on a background strain. The colony ID is associated to an allele/gene/background strain for display and dissemination. Each center can use a different scheme for generating the colony ID, and the colony IDs are unique across the IMPC effort.
Internally, the IMPC uses the Colony ID (among other attributes) to assemble mutant and control data points into an appropriate data set used for analysis. If the analysis registers a significant change in the mutant due to the genotype effect, the IMPC generates an association between the mutant gene and a mammalian phenotype ontology (MP) term specific to the measured parameter as defined in IMPReSS.
The genotype-phenotype REST API provides programmatic access to the genotype to phenotype associations that the IMPC produces. The API also includes access to data from legacy projects (EuroPhenome and MGP legacy) which have been analyzed using the IMPC statistical pipeline. There are many ways to get information about the MP terms associated to the different KO genes. You can select data per:
The genotype-phenotype REST API provides the fields described in the table below. Each field may be used for restricting the set of data you wish to retreive. The full SOLR select syntax is available for use in querying the REST API. See http://wiki.apache.org/solr/SolrQuerySyntax and http://wiki.apache.org/solr/CommonQueryParameters for a more complete list of query options.
|doc_id||int||the unique ID of the document|
|mp_term_id||string||the term ID of the associated mammalian phenotype term|
|mp_term_name||string||the term name of the associated mammalian phenotype term|
|top_level_mp_term_id||string||a list of the top level term ids of the associated mammalian phenotype term|
|top_level_mp_term_name||string||a list of the top level term names of the associated mammalian phenotype term|
|top_level_mp_term_definition||string||a list of the top level term definitions of the associated mammalian phenotype term|
|top_level_mp_term_synonym||string||a list of alternate strings for the top level term name of the associated mammalian phenotype term|
|intermediate_mp_term_id||string||a list of the intermediate level term ids of the associated mammalian phenotype term|
|intermediate_mp_term_name||string||a list of the intermediate level term names of the associated mammalian phenotype term|
|intermediate_mp_term_definition||string||a list of the intermediate level term definitions of the associated mammalian phenotype term|
|intermediate_mp_term_synonym||string||a list of alternate strings for the intermediate level term name of the associated mammalian phenotype term|
|marker_symbol||string||the associated marker symbol|
|marker_accession_id||string||the associated marker accession ID|
|colony_id||string||the colony ID|
|allele_name||string||the name of the allele|
|allele_symbol||string||the allele symbol|
|allele_accession_id||string||the allele accession ID|
|string||Deprecated. Please see genetic_background description|
|strain_accession_id||string||The background strain MGI accession ID (or IMPC ID when MGI accession is not available)|
|genetic_background||string||The background strain name of the specimen|
|phenotyping_center||string||the center at which the phenotyping was performed|
|project_external_id||string||(legacy) the identifier of the project at the phenotyping center at which the work was performed|
|project_name||string||the shortname of the project for which the phenotyping was performed|
|project_fullname||string||the full name of the project for which the phenotyping was performed|
|resource_name||string||the resource for which the phenotyping was performed|
|resource_fullname||string||the full name of the resource for which the phenotyping was performed|
|sex||string||the sex of the mutant specimens on which the association was made|
|zygosity||string||the zygosity of the mutant specimens on which the association was made|
|pipeline_name||string||the name of the IMPReSS pipeline|
|pipeline_stable_id||string||the stable ID of the IMPReSS pipeline|
|pipeline_stable_key||string||the stable key of the IMPReSS pipeline|
|procedure_name||string||the name of the IMPReSS procedure performed|
|procedure_stable_id||string||the stable ID of the IMPReSS procedure performed|
|procedure_stable_key||string||the stable key of the IMPReSS procedure performed|
|parameter_name||string||the name of the IMPReSS parameter measured|
|parameter_stable_id||string||the stable ID of the IMPReSS parameter measured|
|parameter_stable_key||string||the stable key of the IMPReSS parameter measured|
|statistical_method||string||the statistical method used to determine the P value|
|percentage_change||string||for continuous data, a standardized effect measure|
|p_value||double||the statistical significance of the association|
|effect_size||double||the size of the effect|
|external_id||string||(legacy) internal ID of the association at the phenotyping center|
This is the basic request to get all the results from the Solr service in JSON format (open this link in browser)
A bit of explanation:
We'll now constrain the results by adding a condition to the q (query) parameter using the specific marker_symbol field. For example, for Akt2, simply specify q=marker_symbol:Akt2
curl \ --basic \ -X GET \ 'http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=marker_symbol:Akt2&wt=json'
Now we constrain the results by adding a condition to the q (query) parameter using the specific mp_term_name field. To retrieve the genotype associated to "decreased body weight", simply specify q=mp_term_name:"decreased%20body%20weight"Note the use of %20 replacing the spaces between the words.
Alternatively, we may filter by the MP term identifier by specifying the mp_term_id:
Now we constrain the results by adding a condition to the q (query) parameter using the specific top_level_mp_term_name field. This also works with top_level_mp_term_id if you pass an identifier instead of the MP term name. To retrieve the genotype associated to "decreased body weight", simply specify q=top_level_mp_term_name:"nervous system phenotype"
Note the use of %20 replacing the space between "body" and "phenotype".
In this example, we apply a cut-off to the previous query and add a condition to the q (query) command. In Solr, you can specify a range to retrieve results. For instance, if you want P values below 0.0001, you can add the condition p_value:[0 TO 0.0001] to retrieve the genotype associated to a nervous system phenotype with a P value cut-off of 0.00005.
Note the use of %20 replacing the spaces between the words, and %5b and %5d replacing the "[" and "]" characters. Alternately, you could instead use the -g flag in the curl command
Now we constrain the results by adding a condition to the q (query) parameter using the specific phenotyping_center field. To retrieve all MP associations to the "WTSI" (Wellcome Trust Sanger Institute) phenotyping center, specify q=phenotyping_center:"WTSI"
Start by a simple request to get the different phenotyping resource names (EuroPhenome, MGP, IMPC). This will be the basis to filter historical phenotyping resources like EuroPhenome or active resources like the IMPC project.
Solr queries are based on filters and facets. Using facets enables the retrieval of distinct values from a specific field. Using filters enables us to sub-select specific fields to retrieve, or, alternatively, all the fields from a Solr document. In this example we want to retrieve the distinct phenotyping resource names. The fields we are interested in are resource_name and resource_fullname.
If you look carefully at the request:
Next, we look at more advanced query parameter examples.
In this example, only the selected field changes. Use the project_name and/or project_fullname fields.
To retrieve all the phenotyping pipelines from EUMODIC, we'll use the fq (filter query) parameter to filter the query on project_name:EUMODIC. As we are only interested at the distinct pipeline names, we'll use the facet.field parameter to facet on pipeline_name.
Again, we'll use the fq command to filter the query on pipeline_name using double-quotes and select the facet.field called procedure_name.Note the use of %20 replacing the spaces between the words.
Note the use of %20 replacing the spaces between the words.