Validation Documentation

Introduction


What is this API

This service is a standalone, complete, version of the new Validation module that has been included in the 2.0 version of our data capturing and processing tool, also known as the Crawler.

How it works

The service validates single xml files, not zip files.

Whenever a request is made it is added to a temporary queue awaiting processing.

You can submit multiple requests but the files will be queued and processed one at a time.

The results are stored in memory.

By default the service runs on port 4567, this can be changed in the properties.

How to use it

  1. In order to validate an xml you need to use the /validation/submit endpoint as described further in this document
  2. You can monitor the progress of your files using the /validation/status endpoint which returns the number of files in the queue and the numbers of results already collected.
  3. Finally you can then get the actual results using the /validation/results endpoint. You can either get all results or filter them by filename as described further in this document.

Requirements


  • Java 8 JRE
  • A minimum of 1GB of RAM. Suggested 4GB or more, depending on the number and size of files submitted
  • A properties file with at least a username and password for the iMits API must be provided. If you do not have such an account please contact iMits
  • It is recommended that you specify your own path to a directory that the requests will be temporarily cached. Otherwise the files will be stored on the executables current working path.

Properties


property keydefault valueuse
port.number4567Which port the service should run on
storage.location./Full path to a location where the requests will be temporarily stored on disk before getting processed. Files are deleted immediately after validation
inactive.wait600How long to wait (in seconds) before shuting down the service if no file has been processed within this time
maximum.results100The maximum number of results to maintain, excess results will be discarded
imits.usernameAn iMits username – Mandatory
imits.passwordPassword for imits.username – Mandatory

Running the executable


Assuming both the executable and the properties files are within the same directory, otherwise you will need to provide a full path to the properties file.

# UNIX/Linux
java -jar phenodcc-validation-api.jar -p $(pwd)/phenodcc-validation-api.properties
# Windows
java -jar phenodcc-validation-api.jar -p %CD%\phenodcc-validation-api.properties

Status


idstatusnotes
0SuccessWhen a submissions is successfully added to the queue
1FailureWhen :fileType is incorrect or filename or content of the file is empty
100No errorsBut it can still contain warnings
200Contains validation errorsAt least one error was detected in the submitted xml
300Contains XSD errorsThe file contains structural errors like wrong xml tags or incorrect values for fields like centreId etc. No further validation will be done if a file fails because of XSD errors
500Unknown statusNot related to validation. May be encountered if unexpected behaviour occurs within the service but not because of the validation of the request

Level


The level of and issue detected during validation.

  • Errors will prevent data from being published.
  • Warnings will not prevent the data from being published but indicate that similar data have a higher chance of failing in the future when/if validation rules become more strict.
idlevelnotes
0errorOne or more validation errors will give a file as status of 200 – Contains errors
1warningA file may contain warnings but still be valid (100 – No errors)

Calls


GET /validation/strain/list

Returns an array of all DCC acceptable strains, only these strains will pass validation.

Example request using curl

# request
$ curl -X GET http://www.mousephenotype.org:4567/validation/strain/list

Example response

[
    {
        "strainId": 15,
        "strain": "B6Brd;B6Dnk;B6N-Tyr<c-Brd>",
        "mgiStrainId": "MGI:5446362"
    },
    {
        "strainId": 18,
        "strain": "C57BL/6N",
        "mgiStrainId": "MGI:2159965"
    },
    ...
]

GET /validation/status

  • pending: number of files in the queue
  • queueList: list of file-names in the queue
  • currentFile: the file-name of the file currently being processed
  • processed: number of files that have been processed (results)
  • resultList: list of files that have been processed (results) including the timestamp of when the file finished processing, the file-name and the status. To see issues and further details use the /validation/results endpoint

Example request using curl

# request
$ curl -X GET http://www.mousephenotype.org:4567/validation/status

Example response

{
    "pending": 2,
    "queueList": [
        "file4.xml",
        "file5.xml"
    ],
    "currentFile": "file3.xml",
    "processed": 2,
    "resultList": [
        {
            "timeStamp": "1543490038622",
            "fileName": "file1.xml",
            "status": 200
        },
        {
            "timeStamp": "1543490040352",
            "fileName": "file2.xml",
            "status": 200
        }
    ]
}

GET /validation/results?[filename={xxx}.xml]

Returns the results of all validated files or if the filename option is used it returns all results for a given filename. If a filename is submitted multiple times it returns a list of all instances for that filename.

  • Each result has a timestamp as its unique id
  • file: the filename that was used during the submission
  • result
    • status: one of the available statuses as defined in Status
    • statusMessage: human readable form of the status
    • logs[]: array holding all warnings and errors produced during validation
  • logs[]
    • Validation error logs – status 200: parameterType and parameterKey are populated when the log is relevant to an error or warning for an existing and present parameter. For example missing mandatory parameters or unknown parameters will not have a parameterType or parameterKey.
      • specimenId: null for line level procedures
      • centreId: as defined in the xml <centre> tag
      • phenotypingCentre: null for experiment and line procedures
      • colonyId: null for experiment and line procedures. Can be null for specimens
      • procedureId: null for specimens
      • sequenceId: null for specimens. Can be null for experiment or line procedures
      • issues[]
        • parameterType: values defined in the XSD
        • parameterKey: values defined in IMPReSS
        • message: the validation error or warning
        • level: defines if this is an error or warning, values as defined in Level
        • specimenId: null for line level procedures
    • XSD error logs – status 300
      • message: The XSD error message
      • lineNumber: The number of the line where the error was detected

Example request using curl

# request
$ curl -X GET http://www.mousephenotype.org:4567/validation/results?filename=procedure.xml

Example response with validation warnings

{
    "1542380690444": {
        "file": "procedure.xml",
        "result": {
            "status": 100,
            "statusMessage": "No errors",
            "logs": [
                {
                    "specimenId": "ABC123",
                    "centreId": "TEST",
                    "phenotypingCentre": "TEST",
                    "colonyId": null,
                    "procedureId": "IMPC_OFD_001",
                    "sequenceId": null,
                    "issues": [
                        {
                            "id": 1,
                            "parameterType": "SimpleParameter",
                            "parameterKey": "IMPC_OFD_011_001",
                            "message": "Badly encoded delimiter for status code IMPC_PARAMSC_005### in parameter IMPC_OFD_011_001",
                            "level": 1,
                            "specimenId": "12345678"
                        }
                    ]
                }
            ]
        }
    }
}

Example response with XSD errors

{
    "1543419612049": {
        "file": "a-procedure.xml",
        "result": {
            "status": 300,
            "statusMessage": "Contains XSD errors",
            "logs": [
                {
                    "message": "cvc-enumeration-valid: Value 'ABC' is not facet-valid with respect to enumeration '[Bcm, Gmc, H, Ics, J, Krb, Ning, Rbrc, Tcp, Ucd, Wtsi, CDTA, Crl, Kmpc, Ph, Biat, Ccpcz]'. It must be a value from the enumeration.",
                    "lineNumber": 3
                },
                {
                    "message": "cvc-attribute.3: The value 'ABC' of attribute 'centreID' on element 'centre' is not valid with respect to its type, 'CentreILARcode'.",
                    "lineNumber": 3
                }
            ]
        }
    }
}

POST /validation/submit/:fileType?filename={xxx}.xml

Submits a file to the validation queue.

The :fileType can be one of the following two options:

  • centre-specimen-set
  • centre-procedure-set

The filename options is mandatory and can be any alphanumerical with an extension of .xml

The xml file itself needs to be posted as the raw body of the request, for added clarity the request type can be set to application/xml

Example request using curl

The @ symbol before the file-path is necessary for curl to include the file as binary data

# request
$ curl -X POST -H 'Content-type: application/xml' --data-binary @/path/to/file/some-file.xml http://www.mousephenotype.org:4567/validation/submit/centre-specimen-set?filename=some-file.xml

Example response

{
    "status":0,
    "statusMessage":"File some-file.xml added to queue",
    "logs":[]
}

Queuing and filenames

  1. It is important to note that any request will replace any previous one under the same filename that is still in the queue.
  2. After a request is processed its results will remain available and not be overridden by any future request.
  3. Any future request under the same filename will be added to the queue as described in point 1.

Therefore the use of unique filenames for each request is strongly encouraged.


Copyright 2018-2019 Medical Research Council Harwell, Licensed under the Apache License, Version 2.0