Help & Documentation

Retrieve phenotype data

The genotype-phenotype REST API provides programmatic access to the genotype to phenotype associations that the IMPC produces. The API also includes access to data from legacy projects (EuroPhenome and MGP legacy), which have been analyzed using the IMPC statistical pipeline.

When a statistical analysis registers a significant change in the mutant due to the genotype effect, the IMPC generates an association between the mutant gene and a mammalian phenotype ontology (MP) term specific to the measured parameter as defined in IMPReSS. Individual data points are associated to an IMPReSS parameter. Parameters are organised into procedures. Procedures are organised into pipelines.

There are many ways to get information about the MP terms associated to the different knockout genes. You can select data per:

  • phenotyping center (e.g., UCD, Wellcome Trust Sanger Institute, JAX, etc.)
  • phenotyping procedure or parameter (e.g., Procedure IMPC_VIA_001; Parameter IMPC_VIA_001_001; Parameter IMPC_ABR_006_001). For a full list of procedure and parameters, see the phenotyping pipelines at IMPReSS.
  • allele name or MGI allele ID
  • background strain name or MGI strain ID
  • gene symbol or MGI gene ID
  • any combination of these fields

The genotype-phenotype REST API provides the fields described in the table below. Each field may be used for restricting the set of data you wish to retrieve. The full SOLR select syntax is available for use in querying the REST API.

See the Solr Wiki, Solr Query Syntax and Common Query Parameters, for a more complete list of query options.

Field name

Datatype 

Description

doc_id

int

the unique ID of the document

mp_term_id

string

the term ID of the associated mammalian phenotype term

mp_term_name

string

the term name of the associated mammalian phenotype term

top_level_mp_term_id

string

a list of the top level term ids of the associated mammalian phenotype term

top_level_mp_term_name

string

a list of the top level term names of the associated mammalian phenotype term

top_level_mp_term_definition

string

a list of the top level term definitions of the associated mammalian phenotype term

top_level_mp_term_synonym

string

a list of alternate strings for the top level term name of the associated mammalian phenotype term

intermediate_mp_term_id

string

a list of the intermediate level term ids of the associated mammalian phenotype term

intermediate_mp_term_name

string

a list of the intermediate level term names of the associated mammalian phenotype term

intermediate_mp_term_definition

string

a list of the intermediate level term definitions of the associated mammalian phenotype term

intermediate_mp_term_synonym

string

a list of alternate strings for the intermediate level term name of the associated mammalian phenotype term

marker_symbol

string

the associated marker symbol

marker_accession_id

string

the associated marker accession ID

colony_id

string

the colony ID

allele_name

string

the name of the allele

allele_symbol

string

the allele symbol

allele_accession_id

string

the allele accession ID

strain_name

string

Deprecated. Please see genetic_background description

strain_accession_id

string

The background strain MGI accession ID (or IMPC ID when MGI accession is not available)

genetic_background

string

The background strain name of the specimen

phenotyping_center

string

the center at which the phenotyping was performed

project_external_id

string

(legacy) the identifier of the project at the phenotyping center at which the work was performed

project_name

string

the shortname of the project for which the phenotyping was performed

project_fullname

string

the full name of the project for which the phenotyping was performed

resource_name

string

the resource for which the phenotyping was performed

resource_fullname

string

the full name of the resource for which the phenotyping was performed

sex

string

the sex of the mutant specimens on which the association was made

zygosity

string

the zygosity of the mutant specimens on which the association was made

pipeline_name

string

the name of the IMPReSS pipeline

pipeline_stable_id

string

the stable ID of the IMPReSS pipeline

pipeline_stable_key

string

the stable key of the IMPReSS pipeline

procedure_name

string

the name of the IMPReSS procedure performed

procedure_stable_id

string

the stable ID of the IMPReSS procedure performed

procedure_stable_key

string

the stable key of the IMPReSS procedure performed

parameter_name

string

the name of the IMPReSS parameter measured

parameter_stable_id

string

the stable ID of the IMPReSS parameter measured

parameter_stable_key

string

the stable key of the IMPReSS parameter measured

statistical_method

string

the statistical method used to determine the P value

percentage_change

string

for continuous data, a standardized effect measure

p_value

double

the statistical significance of the association

effect_size

double

the size of the effect

external_id

string

(legacy) internal ID of the association at the phenotyping center

 

Retrieve all genotype-phenotype associations

This is the basic request to get all the results from the Solr service in JSON format. Open this link in a browser:

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=*:*&rows=10&wt=json&indent=1

Note that, in the above query, the number of rows have been limited by default to 10.

A bit of explanation:

  • genotype-phenotype is the name of the Solr core service to query
  • select is the method used to query the Solr REST interface
  • q=*:* queries everything without any filtering on any field
  • rows limits the number of results returned
  • wt=json is the response format (try “csv” or “xml” instead of “json”)
  • indent=1 or indent=true indents the output into a more human-readable form

 

Retrieve all genotype-phenotype associations for a specific marker using curl in the command line

We’ll now constrain the results by adding a condition to the q (query) parameter using the specific marker_symbol field. For example, for Akt2, simply specify q=marker_symbol:Akt2

curl \       
--basic \       
-X GET \        
'https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=marker_symbol:Akt2&wt=json'

 

Retrieve all genotype-phenotype associations for a specific MP term

Now we constrain the results by adding a condition to the q (query) parameter using the specific mp_term_name field. To retrieve the genotype associated to “decreased body weight”, simply specify q=mp_term_name:”decreased%20body%20weight”

Note the use of  %20 replacing the spaces between the words.

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=mp_term_name:"decreased%20body%20weight"&wt=json&indent=1

Alternatively, we may filter by the MP term identifier by specifying the mp_term_id:

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=mp_term_id:"MP:0001262"&wt=json&indent=1

 

Retrieve all genotype-phenotype associations for a top level MP term

Now we constrain the results by adding a condition to the q (query) parameter using the specific top_level_mp_term_name field. This also works with top_level_mp_term_id if you pass an identifier instead of the MP term name. To retrieve the genotype associated to “decreased body weight”, simply specify q=top_level_mp_term_name:”nervous system phenotype”

Note the use of  %20 replacing the space between “body” and “phenotype”.

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=top_level_mp_term_name:"growth/size/body%20region%20phenotype"&wt=json&indent=1

 

Retrieve all genotype-phenotype associations with a P value cut-off

In this example, we apply a cut-off to the previous query and add a condition to the q (query) command. In Solr, you can specify a range to retrieve results. For instance, if you want P values below 0.0001, you can add the condition p_value:[0 TO 0.0001] to retrieve the genotype associated to a nervous system phenotype with a P value cut-off of 0.00005.

Note the use of  %20 replacing the spaces between the words, and %5b and %5d replacing the [ and ] characters. Alternatively, you could instead use the -g flag in the curl command.

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=top_level_mp_term_name:%22growth/size/body%20region%20phenotype%22%20AND%20p_value:%5b0%20TO%200.00005%5d&wt=json&indent=1

Alternatively, you could replace the [ (%5b) and ] (%5d) encoding characters with the -g flag in the curl command:

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=top_level_mp_term_name:%22growth/size/body%20region%20phenotype%22%20AND%20p_value:[0%20TO%200.00005]&wt=json&indent=1

 

Retrieve all genotype-phenotype associations for a specific phenotyping center

Now we constrain the results by adding a condition to the q (query) parameter using the specific phenotyping_center field. To retrieve all MP associations to the “WTSI” (Wellcome Trust Sanger Institute) phenotyping center, specify q=phenotyping_center:”WTSI”

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=phenotyping_center:"WTSI"&wt=json&indent=1

 

Get the phenotyping resource names

Start by a simple request to get the different phenotyping resource names (EuroPhenome, MGP, IMPC). This will be the basis to filter historical phenotyping resources like EuroPhenome or active resources like the IMPC project.

Solr queries are based on filters and facets. Using facets enables the retrieval of distinct values from a specific field. Using filters enables us to sub-select specific fields to retrieve, or, alternatively, all the fields from a Solr document. In this example we want to retrieve the distinct phenotyping resource names. The fields we are interested in are resource_name and resource_fullname.

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select/?q=*:*&start=0&rows=0&indent=on&wt=json&fl=resource_name&fl=resource_fullname&facet=on&facet.field=resource_fullname&facet.field=resource_name

If you look carefully at the request:

  • parameter fl means ‘filter’: we want to filter the results and keep only the resource_fullname and resource-name fields
  • parameter facet=on means we want to have faceted results
  • parameter facet.field means we are looking at all the possible combinations of resource_name and resource_fullname
  • parameter q is the query parameter. q=* means we don’t want any text matching and want to get all the resource_name and resource_fullname results.

Next, we look at more advanced query parameter examples.

Retrieve all the phenotyping projects

In this example, only the selected field changes. Use the project_name and/or project_fullname fields.

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select/?q=*:*&start=0&rows=0&indent=on&wt=json&fl=project_name&facet=on&facet.field=project_name

 

Retrieve all pipelines from a specific project

To retrieve all the phenotyping pipelines from EUMODIC, we’ll use the fq (filter query) parameter to filter the query on project_name:EUMODIC. As we are only interested at the distinct pipeline names, we’ll use the facet.field parameter to facet on pipeline_name.

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=*:*&fq=project_name:EUMODIC&rows=0&fl=project_name,pipeline_name&facet=on&facet.field=pipeline_name&facet.mincount=1&wt=json&indent=1

 

Retrieve all procedures from a specific pipeline

Again, we’ll use the fq command to filter the query on pipeline_name using double-quotes and select the facet.field called procedure_name.

Note the use of  %20 replacing the spaces between the words.

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=*:*&fq=pipeline_name:"EUMODIC%20Pipeline%201"&rows=0&fl=procedure_name,pipeline_name&facet=on&facet.field=procedure_name&facet.mincount=1&wt=json&indent=1

 

Retrieve all parameters from a specific procedure which produced an MP call

Note the use of  %20 replacing the spaces between the words.

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=*:*&fq=pipeline_name:"EUMODIC%20Pipeline%201"&fq=procedure_name:"Calorimetry"&rows=0&fl=procedure_name,parameter_name&facet=on&facet.field=parameter_name&facet.mincount=1&facet.limit=-1&wt=json&indent=1

 

Retrieve all MP calls grouped by top level MP terms first and then by resources (MGP, EuroPhenome)

https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select/?q=*:*&start=0&rows=0&indent=on&wt=json&fq=-resource_name:%22IMPC%22&fl=top_level_mp_term_name&facet=on&facet.pivot=top_level_mp_term_name,resource_name

The IMPC Newsletter

Get highlights of the most important data releases, news and events, delivered straight to your email inbox

Subscribe to newsletter