REST API documentation for Genotype associated phenotype calls

The International Mouse Phenotype Consortium (IMPC) is composed of several international research institutions. At each institute, mutant mice are phenotyped and the data produced is sent to a Data Coordination Center (DCC). Once the data has passed a set of Quality Control (QC) checks, the data is statistically analyzed.

Each institute has unique operating standards and must track specimen and phenotype data to allow downstream analysis. To facilitate this requirement, the DCC defined a method to uniquely identify a phenotyping effort. One key data point used for this is the Colony ID. The Colony ID is used to encapsulate phenotyping mutants from a single allele on a background strain. The colony ID is associated to an allele/gene/background strain for display and dissemination. Each center can use a different scheme for generating the colony ID, and the colony IDs are unique across the IMPC effort.

Internally, the IMPC uses the Colony ID (among other attributes) to assemble mutant and control data points into an appropriate data set used for analysis. If the analysis registers a significant change in the mutant due to the genotype effect, the IMPC generates an association between the mutant gene and a mammalian phenotype ontology (MP) term specific to the measured parameter as defined in IMPReSS.

The genotype-phenotype REST API provides programmatic access to the genotype to phenotype associations that the IMPC produces. The API also includes access to data from legacy projects (EuroPhenome and MGP legacy) which have been analyzed using the IMPC statistical pipeline. There are many ways to get information about the MP terms associated to the different KO genes. You can select data per:

The genotype-phenotype REST API provides the fields described in the table below. Each field may be used for restricting the set of data you wish to retreive. The full SOLR select syntax is available for use in querying the REST API. See http://wiki.apache.org/solr/SolrQuerySyntax and http://wiki.apache.org/solr/CommonQueryParameters for a more complete list of query options.

Field name Datatype Description
doc_idintthe unique ID of the document
mp_term_idstringthe term ID of the associated mammalian phenotype term
mp_term_namestringthe term name of the associated mammalian phenotype term
top_level_mp_term_idstringa list of the top level term ids of the associated mammalian phenotype term
top_level_mp_term_namestringa list of the top level term names of the associated mammalian phenotype term
top_level_mp_term_definitionstringa list of the top level term definitions of the associated mammalian phenotype term
top_level_mp_term_synonymstringa list of alternate strings for the top level term name of the associated mammalian phenotype term
intermediate_mp_term_idstringa list of the intermediate level term ids of the associated mammalian phenotype term
intermediate_mp_term_namestringa list of the intermediate level term names of the associated mammalian phenotype term
intermediate_mp_term_definitionstringa list of the intermediate level term definitions of the associated mammalian phenotype term
intermediate_mp_term_synonymstringa list of alternate strings for the intermediate level term name of the associated mammalian phenotype term
marker_symbolstringthe associated marker symbol
marker_accession_idstringthe associated marker accession ID
colony_idstringthe colony ID
allele_namestringthe name of the allele
allele_symbolstringthe allele symbol
allele_accession_idstringthe allele accession ID
strain_namestringDeprecated. Please see genetic_background description
strain_accession_idstringThe background strain MGI accession ID (or IMPC ID when MGI accession is not available)
genetic_backgroundstringThe background strain name of the specimen
phenotyping_centerstringthe center at which the phenotyping was performed
project_external_idstring(legacy) the identifier of the project at the phenotyping center at which the work was performed
project_namestringthe shortname of the project for which the phenotyping was performed
project_fullnamestringthe full name of the project for which the phenotyping was performed
resource_namestringthe resource for which the phenotyping was performed
resource_fullnamestringthe full name of the resource for which the phenotyping was performed
sexstringthe sex of the mutant specimens on which the association was made
zygositystringthe zygosity of the mutant specimens on which the association was made
pipeline_namestringthe name of the IMPReSS pipeline
pipeline_stable_idstringthe stable ID of the IMPReSS pipeline
pipeline_stable_keystringthe stable key of the IMPReSS pipeline
procedure_namestringthe name of the IMPReSS procedure performed
procedure_stable_idstringthe stable ID of the IMPReSS procedure performed
procedure_stable_keystringthe stable key of the IMPReSS procedure performed
parameter_namestringthe name of the IMPReSS parameter measured
parameter_stable_idstringthe stable ID of the IMPReSS parameter measured
parameter_stable_keystringthe stable key of the IMPReSS parameter measured
statistical_methodstringthe statistical method used to determine the P value
percentage_changestringfor continuous data, a standardized effect measure
p_valuedoublethe statistical significance of the association
effect_sizedoublethe size of the effect
external_idstring(legacy) internal ID of the association at the phenotyping center

Retrieve all genotype-phenotype associations

This is the basic request to get all the results from the Solr service in JSON format (open this link in browser)

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=*:*&rows=10&wt=json&indent=1
        

A bit of explanation:

Retrieve all genotype-phenotype associations for a specific marker using a command line tool, curl

We'll now constrain the results by adding a condition to the q (query) parameter using the specific marker_symbol field. For example, for Akt2, simply specify q=marker_symbol:Akt2

        curl \
        --basic \
        -X GET \
        'http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=marker_symbol:Akt2&wt=json'
        

Retrieve all genotype-phenotype associations for a specific MP term

Now we constrain the results by adding a condition to the q (query) parameter using the specific mp_term_name field. To retrieve the genotype associated to "decreased body weight", simply specify q=mp_term_name:"decreased%20body%20weight"Note the use of  %20 replacing the spaces between the words.

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=mp_term_name:"decreased%20body%20weight"&wt=json&indent=1
        

Alternatively, we may filter by the MP term identifier by specifying the mp_term_id:

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=mp_term_id:"MP:0001262"&wt=json&indent=1
        

Retrieve all genotype-phenotype associations for a top level MP term

Now we constrain the results by adding a condition to the q (query) parameter using the specific top_level_mp_term_name field. This also works with top_level_mp_term_id if you pass an identifier instead of the MP term name. To retrieve the genotype associated to "decreased body weight", simply specify q=top_level_mp_term_name:"nervous system phenotype"

Note the use of  %20 replacing the space between "body" and "phenotype".

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=top_level_mp_term_name:"growth/size/body%20region%20phenotype"&wt=json&indent=1
        

Retrieve all genotype-phenotype associations with a P value cut-off

In this example, we apply a cut-off to the previous query and add a condition to the q (query) command. In Solr, you can specify a range to retrieve results. For instance, if you want P values below 0.0001, you can add the condition p_value:[0 TO 0.0001] to retrieve the genotype associated to a nervous system phenotype with a P value cut-off of 0.00005.

Note the use of  %20 replacing the spaces between the words, and %5b and %5d replacing the "[" and "]" characters. Alternately, you could instead use the -g flag in the curl command

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=top_level_mp_term_name:%22growth/size/body%20region%20phenotype%22%20AND%20p_value:%5b0%20TO%200.00005%5d&wt=json&indent=1
        
or, alternatively, you could replace the [ (%5b) and ] (%5d) encoding characters with the -g flag in the curl command:
        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=top_level_mp_term_name:%22growth/size/body%20region%20phenotype%22%20AND%20p_value:[0%20TO%200.00005]&wt=json&indent=1
        

Retrieve all genotype-phenotype associations for a specific phenotyping center

Now we constrain the results by adding a condition to the q (query) parameter using the specific phenotyping_center field. To retrieve all MP associations to the "WTSI" (Wellcome Trust Sanger Institute) phenotyping center, specify q=phenotyping_center:"WTSI"

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=phenotyping_center:"WTSI"&wt=json&indent=1
        

Get the phenotyping resource names

Start by a simple request to get the different phenotyping resource names (EuroPhenome, MGP, IMPC). This will be the basis to filter historical phenotyping resources like EuroPhenome or active resources like the IMPC project.

Solr queries are based on filters and facets. Using facets enables the retrieval of distinct values from a specific field. Using filters enables us to sub-select specific fields to retrieve, or, alternatively, all the fields from a Solr document. In this example we want to retrieve the distinct phenotyping resource names. The fields we are interested in are resource_name and resource_fullname.

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select/?q=*:*&version=2.2&start=0&rows=0&indent=on&wt=json&fl=resource_name&fl=resource_fullname&facet=on&facet.field=resource_fullname&facet.field=resource_name
        

If you look carefully at the request:

Next, we look at more advanced query parameter examples.

Retrieve all the phenotyping projects

In this example, only the selected field changes. Use the project_name and/or project_fullname fields.

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select/?q=*:*&version=2.2&start=0&rows=0&indent=on&wt=json&fl=project_name&facet=on&facet.field=project_name
        

Retrieve all pipelines from a specific project

To retrieve all the phenotyping pipelines from EUMODIC, we'll use the fq (filter query) parameter to filter the query on project_name:EUMODIC. As we are only interested at the distinct pipeline names, we'll use the facet.field parameter to facet on pipeline_name.

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=*:*&fq=project_name:EUMODIC&rows=0&fl=project_name,pipeline_name&facet=on&facet.field=pipeline_name&facet.mincount=1&wt=json&indent=1
        

Retrieve all procedures from a specific pipeline

Again, we'll use the fq command to filter the query on pipeline_name using double-quotes and select the facet.field called procedure_name.

Note the use of  %20 replacing the spaces between the words.
        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=*:*&fq=pipeline_name:"EUMODIC%20Pipeline%201"&rows=0&fl=procedure_name,pipeline_name&facet=on&facet.field=procedure_name&facet.mincount=1&wt=json&indent=1
        

Retrieve all parameters from a specific procedure which produced an MP call

Note the use of  %20 replacing the spaces between the words.

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=*:*&fq=pipeline_name:"EUMODIC%20Pipeline%201"&fq=procedure_name:"Calorimetry"&rows=0&fl=procedure_name,parameter_name&facet=on&facet.field=parameter_name&facet.mincount=1&facet.limit=-1&wt=json&indent=1
        

Retrieve all MP calls grouped by top level MP terms first and then by resources (MGP, EuroPhenome)

        http://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select/?q=*:*&version=2.2&start=0&rows=0&indent=on&wt=json&fq=-resource_name:%22IMPC%22&fl=top_level_mp_term_name&facet=on&facet.pivot=top_level_mp_term_name,resource_name