Showing posts with label wsdl. Show all posts
Showing posts with label wsdl. Show all posts

16 November 2011

"VCF annotation" with the NHLBI GO Exome Sequencing Project (JAX-WS)

The NHLBI Exome Sequencing Project (ESP) has released a web service to query their data. "The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.".
In the current post, I'll show how I've used this web service to annotate a VCF file with this information.
The web service provided by the ESP is based on the SOAP protocol.
Here is an example of the XML response: We can generate the java classes for a client invoking this Web Service by using ${JAVA_HOME}/bin/wsimport.

$ wsimport -keep "http://evs.gs.washington.edu/wsEVS/EVSDataQueryService?wsdl"

parsing WSDL...
generating code...
compiling code...

Here is the java code running this client. It scans the VCF, calls the webservice for each variation and insert the annotation as JSON in a new column .
... and the makefile:

Result (some columns have been cut)

curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/EUR.2of4intersection_allele_freq.20100804.sites.vcf.gz" |\
 gunzip -c |\
 java -jar evsclient.jar 



##fileformat=VCFv4.0
##filedat=20101112
##datarelease=20100804
##samples=629
##description="Where BI calls are present, genotypes and alleles are from BI.  In there absence, UM genotypes are used.  If neither are available, no genotype information is present and the alleles are from the NCBI calls."
(...)
#CHROM POS ID EVS
1 10469 rs117577454 {"start":10469,"chromosome":"1","stop":10470,"strand":"+","snpList":[],"setOfSiteCoverageInfo":[]}
1 10583 rs58108140 {"start":10583,"chromosome":"1","stop":10584,"strand":"+","snpList":[],"setOfSiteCoverageInfo":[]}
1 11508 . {"start":11508,"chromosome":"1","stop":11509,"strand":"
(...)
1 69511 . {"start":69511,"chromosome":"1","stop":69512,"strand":"+","snpList":[{"chromosome":"1","conservationScore":"1.0","conservationScoreGERP":"0.5","refAllele":"A","ancestralAllele":"G","filters":"PASS","clinicalLink":"unknown","positionString":"1:69511","chrPosition":69511,"alleles":"G/A","uaAlleleCounts":"1373/47","aaAlleleCounts":"880/600","totalAlleleCounts":"2253/647","uaAlleleAndCount":"G=1373/A=47","aaAlleleAndCount":"G=880/A=600","totalAlleleAndCount":"G=2253/A=647","uaMAF":3.3099,"aaMAF":40.5405,"totalMAF":22.3103,"avgSampleReadDepth":185,"geneList":"OR4F5","snpFunction":{"chromosome":"1","position":69511,"conservationScore":"1.0","conservationScoreGERP":"0.5","snpFxnList":[{"mrnaAccession":"NM_001005484","fxnClassGVS":"missense","aminoAcids":"THR,ALA","proteinPos":"141/306","cdnaPos":421,"pphPrediction":"benign","granthamScore":"58"}],"refAllele":"A","ancestralAllele":"G","firstRsId":75062661,"secondRsId":0,"filters":"PASS","clinicalLink":"unknown"},"altAlleles":"G","hasAtLeastOneAccession":"true","rsIds":"rs75062661"}],"setOfSiteCoverageInfo":[{"chromosome":"1","position":69511,"avgSampleReadDepth":185.0,"totalSamplesCovered":1452,"eaSamplesCovered":712,"avgEaSampleReadDepth":157.0,"aaSamplesCovered":740,"avgAaSampleReadDepth":211.0},{"chromosome":"1","position":69512,"avgSampleReadDepth":180.0,"totalSamplesCovered":1501,"eaSamplesCovered":739,"avgEaSampleReadDepth":153.0,"aaSamplesCovered":762,"avgAaSampleReadDepth":207.0}]}
(...)
1 901923 . {"start":901923,"chromosome":"1","stop":901924,"strand":"+","snpList":[{"chromosome":"1","conservationScore":"1.0","conservationScoreGERP":"5.0","refAllele":"C","ancestralAllele":"C","filters":"PASS","clinicalLink":"unknown","positionString":"1:901923","chrPosition":901923,"alleles":"A/C","uaAlleleCounts":"2/2542","aaAlleleCounts":"52/1934","totalAlleleCounts":"54/4476","uaAlleleAndCount":"A=2/C=2542","aaAlleleAndCount":"A=52/C=1934","totalAlleleAndCount":"A=54/C=4476","uaMAF":0.0786,"aaMAF":2.6183,"totalMAF":1.1921,"avgSampleReadDepth":35,"geneList":"PLEKHN1","snpFunction":{"chromosome":"1","position":901923,"conservationScore":"1.0","conservationScoreGERP":"5.0","snpFxnList":[{"mrnaAccession":"NM_032129","fxnClassGVS":"missense","aminoAcids":"SER,ARG","proteinPos":"4/612","cdnaPos":12,"pphPrediction":"probably-damaging","granthamScore":"110"}],"refAllele":"C","ancestralAllele":"C","firstRsId":0,"secondRsId":0,"filters":"PASS","clinicalLink":"unknown"},"altAlleles":"A","hasAtLeastOneAccession":"true","rsIds":"none"}],"setOfSiteCoverageInfo":[{"chromosome":"1","position":901923,"avgSampleReadDepth":35.0,"totalSamplesCovered":2280,"eaSamplesCovered":1272,"avgEaSampleReadDepth":32.0,"aaSamplesCovered":1008,"avgAaSampleReadDepth":38.0},{"chromosome":"1","position":901924,"avgSampleReadDepth":35.0,"totalSamplesCovered":2283,"eaSamplesCovered":1273,"avgEaSampleReadDepth":32.0,"aaSamplesCovered":1010,"avgAaSampleReadDepth":38.0}]}
1 902069 rs116147894 {"start":902069,"chromosome":"1","stop":902070,"strand":"+","snpList":[{"chromosome":"1","conservationScore":"0.0","conservationScoreGERP":"1.0","refAllele":"T","ancestralAllele":"T","filters":"PASS","clinicalLink":"unknown","positionString":"1:902069","chrPosition":902069,"alleles":"C/T","uaAlleleCounts":"2/320","aaAlleleCounts":"18/212","totalAlleleCounts":"20/532","uaAlleleAndCount":"C=2/T=320","aaAlleleAndCount":"C=18/T=212","totalAlleleAndCount":"C=20/T=532","uaMAF":0.6211,"aaMAF":7.8261,"totalMAF":3.6232,"avgSampleReadDepth":13,"geneList":"PLEKHN1","snpFunction":{"chromosome":"1","position":902069,"conservationScore":"0.0","conservationScoreGERP":"1.0","snpFxnList":[{"mrnaAccession":"NM_032129","fxnClassGVS":"intron","aminoAcids":"none","proteinPos":"NA","cdnaPos":-1,"pphPrediction":"unknown","granthamScore":"NA"}],"refAllele":"T","ancestralAllele":"T","firstRsId":0,"secondRsId":0,"filters":"PASS","clinicalLink":"unknown"},"altAlleles":"C","hasAtLeastOneAccession":"true","rsIds":"none"}],"setOfSiteCoverageInfo":[{"chromosome":"1","position":902069,"avgSampleReadDepth":13.0,"totalSamplesCovered":304,"eaSamplesCovered":169,"avgEaSampleReadDepth":13.0,"aaSamplesCovered":135,"avgAaSampleReadDepth":12.0},{"chromosome":"1","position":902070,"avgSampleReadDepth":12.0,"totalSamplesCovered":338,"eaSamplesCovered":190,"avgEaSampleReadDepth":13.0,"aaSamplesCovered":148,"avgAaSampleReadDepth":12.0}]}
1 902108 rs62639981 {"start":902108,"chromosome":"1","stop":902109,"strand":"+","snpList":[{"chromosome":"1","conservationScore":"0.0","conservationScoreGERP":"-8.7","refAllele":"C","ancestralAllele":"unknown","filters":"PASS","clinicalLink":"unknown","positionString":"1:902108","chrPosition":902108,"alleles":"T/C","uaAlleleCounts":"5/333","aaAlleleCounts":"0/248","totalAlleleCounts":"5/581","uaAlleleAndCount":"T=5/C=333","aaAlleleAndCount":"T=0/C=248","totalAlleleAndCount":"T=5/C=581","uaMAF":1.4793,"aaMAF":0.0,"totalMAF":0.8532,"avgSampleReadDepth":13,"geneList":"PLEKHN1","snpFunction":{"chromosome":"1","position":902108,"conservationScore":"0.0","conservationScoreGERP":"-8.7","snpFxnList":[{"mrnaAccession":"NM_032129","fxnClassGVS":"coding-synonymous","aminoAcids":"none","proteinPos":"36/612","cdnaPos":108,"pphPrediction":"unknown","granthamScore":"NA"}],"refAllele":"C","ancestralAllele":"unknown","firstRsId":62639981,"secondRsId":0,"filters":"PASS","clinicalLink":"unknown"},"altAlleles":"T","hasAtLeastOneAccession":"true","rsIds":"rs62639981"}],"setOfSiteCoverageInfo":[{"chromosome":"1","position":902108,"avgSampleReadDepth":13.0,"totalSamplesCovered":294,"eaSamplesCovered":170,"avgEaSampleReadDepth":13.0,"aaSamplesCovered":124,"avgAaSampleReadDepth":13.0},{"chromosome":"1","position":902109,"avgSampleReadDepth":13.0,"totalSamplesCovered":309,"eaSamplesCovered":177,"avgEaSampleReadDepth":13.0,"aaSamplesCovered":132,"avgAaSampleReadDepth":13.0}]}
(...)
That's it
Pierre

05 January 2011

Coding a CXF web service translating a DNA to a protein. My notebook

Apache CXF is a Web Services framework. In this post, I'll will describe how I implemented a Web Service translating a DNA to a protein using the web server Apache Tomcat and the CXF libraries.

Defining the interface

First a simple java interface bio.Translate is needed to describe the service. This simple service receives a string (the dna) and returns a string (the peptide). The annotations will be used by CXF to name the parameters in the WSDL file (see later):

Implementing the service

bio.TranslateImpl implements bio.Translate. The setter/getter for ncbiString will be used by a configuration file to specify a genetic code (standard, mitochondrial) for this service. The methods initIt and cleanUp could be used to acquire and to release some resources for the service when it is created and/or disposed.

Configuring the service

CXF uses the libraries of the Spring framework (I blogged about spring here ). A XML config file beans.xml makes it easy to configure two java beans for the 'standard genetic code' and the 'mitochondrial code'. In this config file, we also tell Spring about the two methods initIt and cleanUp. Those two beans will be used by two Web Services

Defining the CXF application for Tomcat

The following web.xml file only tells tomcat, the web server, to use the CXFServlet to listen to the SOAP queries.

Compile & Deploy

Installing a CXF web service requires many libraries and at the end, the size of the deployed 'war' file was 8.5Mo(!). Currently, my structure for the current project is:
./translate/WEB-INF/classes/bio/TranslateImpl.java
./translate/WEB-INF/classes/bio/Translate.java
./translate/WEB-INF/beans.xml
./translate/WEB-INF/web.xml
The service was compiled and deployed using the following Makefile:
cxf.lib=apache-cxf-2.3.1/lib
all:
mkdir -p translate/WEB-INF/lib
javac -d translate/WEB-INF/classes -sourcepath translate/WEB-INF/classes translate/WEB-INF/classes/bio/TranslateImpl.java
cp ${cxf.lib}/cxf-2.3.1.jar \
${cxf.lib}/geronimo-activation_1.1_spec-1.1.jar \
${cxf.lib}/geronimo-annotation_1.0_spec-1.1.1.jar \
${cxf.lib}/geronimo-javamail_1.4_spec-1.7.1.jar \
${cxf.lib}/geronimo-servlet_3.0_spec-1.0.jar \
${cxf.lib}/geronimo-ws-metadata_2.0_spec-1.1.3.jar \
${cxf.lib}/geronimo-jaxws_2.2_spec-1.0.jar \
${cxf.lib}/geronimo-stax-api_1.0_spec-1.0.1.jar \
${cxf.lib}/jaxb-api-2.2.1.jar \
${cxf.lib}/jaxb-impl-2.2.1.1.jar \
${cxf.lib}/neethi-2.0.4.jar \
${cxf.lib}/saaj-api-1.3.jar \
${cxf.lib}/saaj-impl-1.3.2.jar \
${cxf.lib}/wsdl4j-1.6.2.jar \
${cxf.lib}/XmlSchema-1.4.7.jar \
${cxf.lib}/xml-resolver-1.2.jar \
${cxf.lib}/aopalliance-1.0.jar \
${cxf.lib}/spring-core-3.0.5.RELEASE.jar \
${cxf.lib}/spring-beans-3.0.5.RELEASE.jar \
${cxf.lib}/spring-context-3.0.5.RELEASE.jar \
${cxf.lib}/spring-web-3.0.5.RELEASE.jar \
${cxf.lib}/commons-logging-1.1.1.jar \
${cxf.lib}/spring-asm-3.0.5.RELEASE.jar \
${cxf.lib}/spring-expression-3.0.5.RELEASE.jar \
${cxf.lib}/spring-aop-3.0.5.RELEASE.jar \
translate/WEB-INF/lib
jar cvf translate.war -C translate .
mv translate.war path-to-tomcat/webapps

Checking the URL

We can see that the service was correctly deployed by pointing a web browser at http://localhost:8080/translate/, where we can see the two services:
Available SOAP services:
Translate
  • translate
Endpoint address: http://localhost:8080/translate/translateMit
WSDL : {http://bio/}TranslateService
Target namespace: http://bio/
Translate
  • translate
Endpoint address: http://localhost:8080/translate/translateStd
WSDL : {http://bio/}TranslateService
Target namespace: http://bio/

Here, the URLs link to the WSDL definition for the web service:

Creating a client

For creating a client consuming this service, I first used the code generated by CXF's wsdl2java but there was a bug with one of the generated classe (it is a known bug feature) so here, I'm going to use the standard ${JAVA_HOME}/bin/wsimport.
> wsimport -p generated -d client -keep "http://localhost:8080/translate/translateStd?wsdl"
parsing WSDL...
generating code...
compiling code...
I wrote a java client MyClient.java using this generated API:

Compiling

> cd client
> javac MyClient.java

Running

> java MyClient
EFIDHSIAC*


That's it,

Pierre

17 May 2010

The poor state of the java web services for Bioinformatics

In his latest post Brad Chapman cited Jessica Kissinger who wished the Galaxy community could access the web services listed in the http://www.biocatalogue.org/. This reminded me this thread I started on http://biostar.stackexchange.com/ : "Anyone using 'Biomart + Java Web Services' ?" where Michael Dondrup and I realized that there was a poor support of the JAVA Web services API for Biomart.

I wanted to test the ${JAVA_HOME}/bin/wimport for all the services in the biocatalogue: I created a small java program using the biocatalogue API (see below) and extracting the web services having a WSDL file. Each WSDL URI was processed with the ${JAVA_HOME}/bin/wimport and I observed if any class was generated. The wsimport '-version' was JAX-WS RI 2.1.6 in JDK 6.

The result is available as a Google spreadsheet at :

Result


Number of services: 1644
  Can't access the service, something went wrong:6
  No WSDL: 6
  Found a WSDL: 1590


Number of services where wsimport failed to parse the WSDL: 1179 (74%)

Common Errors:
  690 : [ERROR] rpc/encoded wsdls are not supported in JAXWS 2.0.
  119 : [ERROR] undefined simple or complex type 'soapenc:Array'
  96 : [ERROR] 'EndpointReference' is already defined
  7 : [ERROR] only one "types" element allowed in "definitions"
  6 : [ERROR] undefined simple or complex type 'apachesoap:DataHandler'
  4 : [ERROR] only one of the "element" or "type" attributes is allowed in part "inDoc"


Number of services successfully parsed by wsimport: 411 (26%)

Count by host:


Source Code




That's it
Pierre

07 December 2009

Playing with SOAP. Implementing a WebService for the LocusTree Server

Image via wikipediaIn a previous post I've described the LocusTree server and showed how the wsimport command can be used to generate the java code that will query a Web-Service. Today, I've implemented a few web services in the LocusTree server but I wrote the entire code generating the SOAP messages rather than using the Java API for Web Services (JAXWS-API) because 1) I wanted to learn about the SOAP internals 2) I wanted to return a big volume of data to the client by writing a stream of data rather than building a xml response and then echoing the xml tree (DOM).
Ok. In this example I'm going to describe a WebService returning a list of chromosomes for a given organism-id:

The WSDL file.

The signature of our function is something like: getChromosomesByOrganismId(int orgId). In the WSDL file (the file describing our web services), the operation is called getChromosomes . The input for this function will be a tns:getChromosomes and the object returned by this function is a tns:GetChromosomesResponse. The prefix 'tns' is a reference to a xml schema that is will to defined later.
<portType name="LocusTree">
<operation name="getChromosomes">
<input message="tns:getChromosomes"/>
<output message="tns:GetChromosomesResponse"/>
</operation>
</portType>

We now define those two messages (input and output parameters) for this web service. The input value (the organism-id) is defined in an external xml schema as an element named 'tns:getChromosomes'. The ouput value (a list of chromosomes) is defined in an external xml schema as an element named 'tns:Chromosomes'.
<message name="getChromosomes">
<part name="parameters" element="tns:getChromosomes"/>
</message>
<message name="GetChromosomesResponse">
<part name="parameters" element="tns:Chromosomes"/>
</message>

But where can we find this external schema ? It is referenced in the WSDL file under the <types> element. The following 'types' says that the schema describing our objects is available at 'http://localhost:8080//locustree/static/ws/schema.xsd'
<types>
<xsd:schema>
<xsd:import namespace="http://webservices.cephb.fr/locustree/" schemaLocation="http://localhost:8080//locustree/static/ws/schema.xsd"/>
</xsd:schema>
</types>
The Http protocol will be used to send and receive the SOAP messages, so we have to bind our method 'getChromosomesByOrganismsId' to this protocol.
<soap:binding transport="http://schemas.xmlsoap.org/soap/http"
style="document"/>
<operation name="getChromosomes">
<soap:operation/>
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
</operation>
</binding>
Finally, we tell the client where the server is located
<service name="LocusTreeService">
<port name="LocusTreeSePort" binding="tns:LocusTreePortBinding">
<soap:address location="http://localhost:8080//locustree/locustree/soap"/>
</port>
</service>

The XSD Schema

This XML schema describes the structures that will be used and returned by the server.We need to describe what is...

A Chromosome

A Chromosome is a structure holding an ID, a name, a length, an organism-id etc...
<xs:complexType name="Chromosome">
<xs:annotation>
<xs:documentation>A Chromosome</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="id" type="xs:int" nillable="false" />
<xs:element name="organismId" type="xs:int" nillable="false" />
<xs:element name="name" type="xs:string" nillable="false"/>
<xs:element name="length" type="xs:int" nillable="false"/>
<xs:element name="metadata" type="xs:string"/>
</xs:sequence>
</xs:complexType>

A List of Chromosomes

.. is just a sequence of Chromosomes
<xs:complexType name="Chromosomes">
<xs:annotation>
<xs:documentation>Set of Chromosomes</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element ref="tns:Chromosome" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

GetChromosome

This is the structure that is used as a parameter for our web service. It just holds an organism-id.
<xs:complexType name="getChromosomes">
<xs:annotation>
<xs:documentation>return the chromosomes for a given organism </xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="organismId" type="xs:int" nillable="false">
<xs:annotation>
<xs:documentation>The Organism Id</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>



All in one, here is the WSDL file:
<definitions
xmlns="http://schemas.xmlsoap.org/wsdl/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
xmlns:tns="http://webservices.cephb.fr/locustree/"
targetNamespace="http://webservices.cephb.fr/locustree/"
name="LocusTreeWebServices">

<types>
<xsd:schema>
<xsd:import namespace="http://webservices.cephb.fr/locustree/" schemaLocation="http://localhost:8080//locustree/static/ws/schema.xsd"/>
</xsd:schema>
</types>
<message name="getChromosomes">
<part name="parameters" element="tns:getChromosomes"/>
</message>
<message name="GetChromosomesResponse">
<part name="parameters" element="tns:Chromosomes"/>
</message>
<portType name="LocusTree">
<operation name="getChromosomes">
<input message="tns:getChromosomes"/>
<output message="tns:GetChromosomesResponse"/>
</operation>
</portType>
<binding name="LocusTreePortBinding" type="tns:LocusTree">
<soap:binding transport="http://schemas.xmlsoap.org/soap/http" style="document"/>
<operation name="getChromosomes">
<soap:operation/>
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
</operation>
</binding>
<service name="LocusTreeService">
<port name="LocusTreeSePort" binding="tns:LocusTreePortBinding">
<soap:address location="http://localhost:8080//locustree/locustree/soap"/>
</port>
</service>
</definitions>

...and the XSD/Schema file:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:tns="http://webservices.cephb.fr/locustree/"
targetNamespace="http://webservices.cephb.fr/locustree/"
elementFormDefault="qualified">


<xs:annotation>
<xs:documentation>XML schema for LocusTreeWebServices</xs:documentation>
</xs:annotation>

<xs:element name="Chromosomes" type="tns:Chromosomes"/>
<xs:element name="Chromosome" type="tns:Chromosome"/>
<xs:element name="getChromosomes" type="tns:getChromosomes"/>

<xs:complexType name="getChromosomes">
<xs:annotation>
<xs:documentation>return the chromosomes for a given organism </xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="organismId" type="xs:int" nillable="false">
<xs:annotation>
<xs:documentation>The Organism Id</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>

<xs:complexType name="Chromosomes">
<xs:annotation>
<xs:documentation>Set of Organisms</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element ref="tns:Chromosome" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="Chromosome">
<xs:annotation>
<xs:documentation>A Chromosome</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="id" type="xs:int" nillable="false"/>
<xs:element name="organismId" type="xs:int" nillable="false"/>
<xs:element name="name" type="xs:string" nillable="false"/>
<xs:element name="length" type="xs:int" nillable="false"/>
<xs:element name="metadata" type="xs:string"/>
</xs:sequence>
</xs:complexType>


</xs:schema>

Generating the client

The stubs on the client side are generated using the ${JAVA_HOME}/bin/wsimport command:
> wsimport -keep http://localhost:8080/locustree/locustree/soap?wsdl
parsing WSDL...
generating code...
compiling code...
> find fr
fr/cephb/webservices/locustree/Chromosomes.java
fr/cephb/webservices/locustree/ObjectFactory.java
fr/cephb/webservices/locustree/LocusTreeService.java
fr/cephb/webservices/locustree/LocusTree.java
fr/cephb/webservices/locustree/GetChromosomes.java
fr/cephb/webservices/locustree/Chromosome.java
(...)
> more fr/cephb/webservices/locustree/LocusTree.java
package fr.cephb.webservices.locustree;
(...)
public interface LocusTree
{
(...)
public List<Chromosome> getChromosomes(int organismId);
}
Ok, the function was successfully generated. Let's test it with a tiny java program:

file Test.java
import fr.cephb.webservices.locustree.*;

public class Test
{
public static void main(String args[])
{
LocusTreeService service=new LocusTreeService();
LocusTree locustree=service.getLocusTreeSePort();
final int organismId=36;
for(Chromosome chrom:locustree.getChromosomes(organismId))
{
System.out.println(
chrom.getId()+"\t"+
chrom.getName()+"\t"+
chrom.getOrganismId()+"\t"+
chrom.getLength()
);
}

}
}

Compiling and running:
javac -cp . Test.java
java -cp . Test
1 chr1 36 247249719
2 chr2 36 242951149
3 chr3 36 199501827
4 chr4 36 191273063
5 chr5 36 180857866
6 chr6 36 170899992
7 chr7 36 158821424
8 chr8 36 146274826
9 chr9 36 140273252
10 chr10 36 135374737
11 chr11 36 134452384
12 chr12 36 132349534
13 chr13 36 114142980
14 chr14 36 106368585
15 chr15 36 100338915
(...)

SOAP internals


Here is the XML/SOAP query for getChromosomes sent to the sever via a POST query.
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Body>
<getChromosomes xmlns='http://webservices.cephb.fr/locustree/'>
<organismId>36</organismId>
</getChromosomes>
</S:Body>
</S:Envelope>
This can be checked using curl:
curl \
-X POST\
-H "Content-Type: text/xml" \
-d '<?xml version="1.0" ?>;<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"><S:Body><getChromosomes xmlns="http://webservices.cephb.fr/locustree/"><organismId>36</organismId></getChromosomes></S:Body></S:Envelope>' \
'http://localhost:8080/locustree/locustree/soap'
And here is the (my) response from the server:
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" >
<Body>
<ceph:Chromosomes>
<ceph:Chromosome>
<ceph:id>1</ceph:id>
<ceph:organismId>36</ceph:organismId>
<ceph:name>chr1</ceph:name>
<ceph:length>247249719</ceph:length>
<ceph:metadata>{'type':'autosomal','size':247249719}</ceph:metadata>
</ceph:Chromosome>
<ceph:Chromosome>
<ceph:id>2</ceph:id>
<ceph:organismId>36</ceph:organismId>
<ceph:name>chr2</ceph:name>
<ceph:length>242951149</ceph:length>
<ceph:metadata>{'type':'autosomal','size':242951149}</ceph:metadata>
</ceph:Chromosome>
<ceph:Chromosome>
<ceph:id>3</ceph:id>
<ceph:organismId>36</ceph:organismId>
<ceph:name>chr3</ceph:name>
<ceph:length>199501827</ceph:length>
<ceph:metadata>{'type':'autosomal','size':199501827}</ceph:metadata>
</ceph:Chromosome>
<ceph:Chromosome>
<ceph:id>4</ceph:id>
<ceph:organismId>36</ceph:organismId>
<ceph:name>chr4</ceph:name>
<ceph:length>191273063</ceph:length>
<ceph:metadata>{'type':'autosomal','size':191273063}</ceph:metadata>
</ceph:Chromosome>
(...)
</ceph:Chromosomes>
</Body>
</Envelope>

On the server/servlet side I've decoded the SOAP query using javax.xml.soap.MessageFactory;. It looks like that
(...)
MimeHeaders headers=new MimeHeaders();
Enumeration<?> e=req.getHeaderNames();
(... copy the http headers to 'headers' ...);
SOAPMessage message=getMessageFactory().createMessage(headers,req.getInputStream());
SOAPBody body=message.getSOAPBody();
Iterator<?> iter=body.getChildElements();
while(iter.hasNext())
{
SOAPElement child =SOAPElement.class.cast(iter.next());
Name name= child.getElementName();
if(!name.getURI().equals(child.getNamespaceURI())) continue;
if(name.getLocalName().equals("getChromosomes"))
{
processGetChromosomes(w,message,child,req,res);
return;
}
}
And I'm streaming the response using the XML Streaming API (StaX).
(...)
w.writeStartElement(pfx, "Chromosomes", getTargetNamespace());
w.writeAttribute(XMLConstants.XMLNS_ATTRIBUTE,XMLConstants.XML_NS_URI,pfx,getTargetNamespace());
for(ChromInfo ci:model.getChromsomesByOrganismId(getTransaction(), organismId))
{
w.writeStartElement(pfx, "Chromosome", getTargetNamespace());
w.writeStartElement(pfx, "id", getTargetNamespace());
w.writeCharacters(String.valueOf(ci.getId()));
w.writeEndElement();
w.writeStartElement(pfx, "organismId", getTargetNamespace());
w.writeCharacters(String.valueOf(ci.getOrganismId()));
w.writeEndElement();
(...)
w.writeEndElement();
}
w.writeEndElement();(...)

And as a final note, I'll cite this tweet I received today from Paul Joseph Davis :-)


@yokofakun Everytime someone uses SOAP, an angel cries.
Mon Dec 07 16:50:41



That's it !
Pierre

10 April 2009

Resolving LSID: my notebook

This post is about LSID (The Life Science Identifier) and was inspired by the recent activity of Roderic Page on Twitter and by Roderic's paper "LSID Tester, a tool for testing Life Science Identifier resolution services".

OK.
At the beginning, there is a LSID

urn:lsid:ubio.org:namebank:11815

ubio.org is the authority.It is followed by a database and an id.
We need to resolve this authority to find some metadata about this LSID object. On unix, we put _lsid._tcp before this authority and the host command is used to ask the "DNS for the lsid service record for pdb.org with TCP as the network protocol" (I'm not really sure of what it really means, and I guess this can be a problem for the other bioinformaticians too).
%host -t srv _lsid._tcp.ubio.org
_lsid._tcp.ubio.org has SRV record 1 0 80 ANIMALIA.ubio.org.

So http://ANIMALIA.ubio.org the is location of the LSID service. We append /authority and we get a WSDL file at http://animalia.ubio.org/authority/ (This WSDL is another issue for me, is there so many bioinformaticians knowing how to read such format ?).

<wsdl:definitions targetNamespace="http://www.hyam.net/lsid/Authority">
<import namespace="http://www.omg.org/LSID/2003/AuthorityServiceHTTPBindings"
location="LSIDAuthorityServiceHTTPBindings.wsdl"
/>

<wsdl:service name="MyAuthorityHTTPService">
<wsdl:port name="MyAuthorityHTTPPort" binding="httpsns:LSIDAuthorityHTTPBinding">
<httpsns:address location="http://animalia.ubio.org/authority/index.php"/>
</wsdl:port>
</wsdl:service>
</wsdl:definitions>

At http://animalia.ubio.org/authority/LSIDAuthorityServiceHTTPBindings.wsdl we get the Http bindings.
</><definitions targetNamespace="http://www.omg.org/LSID/2003/AuthorityServiceHTTPBindings">
<import namespace="http://www.omg.org/LSID/2003/Standard/WSDL" location="LSIDPortTypes.wsdl"/>
<binding name="LSIDAuthorityHTTPBinding" type="sns:LSIDAuthorityServicePortType">
<http:binding verb="GET"/>
<operation name="getAvailableServices">
<http:operation location="/authority/"/>
<input>
<http:urlEncoded/>
</input>
<output>
<mime:multipartRelated>
<mime:part>
<mime:content part="wsdl" type="application/octet-stream"/>
</mime:part>
</mime:multipartRelated>
</output>
</operation>
</binding>
</definitions>

This WSDL tells us that http://animalia.ubio.org/authority/ is the URL where we can find some metadata about the LSID and using http+GET. And, by appending metadata.php (why this php extension ? this is not clear for me ) you'll get the following RDF metadata about urn:lsid:ubio.org:namebank:11815 (Very cool, I like this idea of getting a RDF from one identifier). The process of resolving the WSDL can be achieved once and cached.

<rdf:RDF>
<rdf:Description rdf:about="urn:lsid:ubio.org:namebank:11815">
<dc:identifier>urn:lsid:ubio.org:namebank:11815</dc:identifier>
<dc:creator rdf:resource="http://www.ubio.org"/>
<dc:subject>Pternistis leucoscepus (Gray, GR) 1867</dc:subject>
<ubio:taxonomicGroup>Aves</ubio:taxonomicGroup>
<ubio:recordVersion>4</ubio:recordVersion>
<ubio:canonicalName>Pternistis leucoscepus</ubio:canonicalName>
<dc:title>Pternistis leucoscepus</dc:title>
<dc:type>Scientific Name</dc:type>
<ubio:lexicalStatus>Unknown (Default)</ubio:lexicalStatus>
<gla:rank>Species</gla:rank>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:954940"/>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:954941"/>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:1564236"/>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:783787"/>
<gla:vernacularName rdf:resource="urn:lsid:ubio.org:namebank:1580313"/>
<gla:mapping rdf:resource="http://starcentral.mbl.edu/microscope/portal.php?pagetitle=classification&BLCHID=12-4498"/>
<gla:mapping rdf:resource="http://www.cbif.gc.ca/pls/itisca/next?v_tsn=553857&taxa=&p_format=&p_ifx=cbif&p_lang="/>
<gla:hasBasionym rdf:resource="urn:lsid:ubio.org:namebank:12292"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:12292"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:1762007"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:1762032"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:1762051"/>
<gla:objectiveSynonym rdf:resource="urn:lsid:ubio.org:namebank:3408791"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1116259"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1137821"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1173817"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1174615"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1416177"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1672192"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:2233032"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:13853963"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:1909656"/>
<ubio:hasCAVConcept rdf:resource="urn:lsid:ubio.org:classificationbank:2304281"/>
<dcterms:bibliographicCitation>Sclater, W.L., Systema Avium Ethiopicarum, p. 91</dcterms:bibliographicCitation>
</rdf:Description>
</rdf:RDF>


notebook EOF.

22 November 2008

A Web Service for ONSolubility.

This post is about the ONSolubility project (For references search FriendFeed for Solubility). This post is about how I've used Egon's code to create a web service to query the data of solubility. Egon has already done a great job by using the google java spreasheet API to download Jean-Claude's Solubility data. On his side, Rajarshi Guha wrote an HTML page querying those data using the Google Query-API. Here I show how I have created a webservice searching for the measurements based on their solvent/solute/concentration.

Server Side


Classes


I've added some JAXB(Java Architecture for XML Binding) annotations to Egon's Measurement.java. Those annotations help the web-service compiler (wsgen) to understand how the data will be transmitted to the client.
@javax.xml.bind.annotation .XmlRootElement(name="Measurement")
public class Measurement
implements Serializable
{
(...)

Then we create the WebService ONService.java. This service is just a java class containing also a few annotations. First we flag the class as a webservice:
@javax.jws.WebService(
name="onsolubility",
serviceName="ons"
)
public class ONService
{
Then comes the function seach provided by this service. This function will download the data from google using Egon's API and will return a collection of Measurement based on their solute/solvent/concentration. Again the java annotations will help the compiler to implement the service
@WebMethod(action="urn:search",operationName="search")
public List search(
@WebParam(name="solute")String solute,
@WebParam(name="solvent")String solvent,
@WebParam(name="concMin")Double concMin,
@WebParam(name="concMax")Double concMax
) throws Exception
{....
. The web service is launched with only 3 lines of code (!).
ONService service=new ONService();
Endpoint endpoint = Endpoint.create(service);
endpoint.publish("http://localhost:8080/onsolubility");

Compilation


I've create a ant file invoking wsgen generating the stubs and installing the webservice. Here is the ouput
compile-webservice:
[javac] Compiling 1 source file to /home/pierre/tmp/onssolubility/ons.solubility.data/bin
[wsgen] command line: wsgen -classpath (...) -verbose ons.solubility.ws.ONService
[wsgen] Note: ap round: 1
[wsgen] [ProcessedMethods Class: ons.solubility.ws.ONService]
[wsgen] [should process method: search hasWebMethods: true ]
[wsgen] [endpointReferencesInterface: false]
[wsgen] [declaring class has WebSevice: true]
[wsgen] [returning: true]
[wsgen] [WrapperGen - method: search(java.lang.String,java.lang.String,java.lang.Double,java.lang.Double)]
[wsgen] [method.getDeclaringType(): ons.solubility.ws.ONService]
[wsgen] [requestWrapper: ons.solubility.ws.jaxws.Search]
[wsgen] [should process method: main hasWebMethods: true ]
[wsgen] [webMethod == null]
[wsgen] [ProcessedMethods Class: java.lang.Object]
[wsgen] ons/solubility/ws/jaxws/ExceptionBean.java
[wsgen] ons/solubility/ws/jaxws/Search.java
[wsgen] ons/solubility/ws/jaxws/SearchResponse.java
[wsgen] Note: ap round: 2

publish-webservice:
[java] Publishing Service on http://localhost:8080/onsolubility?WSDL
.
And... that's it. When I open my browser on http://localhost:8080/onsolubility?WSDL , I can now see the WSDL description/schema of this service.

Client Side


Writing a client using this api looks the same way I did for a previous post about the IntAct/EBI API where the wsimport command generated the stubs from the WSDL file. I then wrote a simple test ONServiceTest.java, invoking our service several times.
private void test(
String solute,
String solvent,
Double concMin,
Double concMax)
{
try
{
Ons service=new Ons();
Onsolubility port=service.getOnsolubilityPort();
List data=port.search(solute, solvent, concMin, concMax);

for(Measurement measure:data)
{
System.out.println(
" sample :\t"+measure.getSample()+"\n"+
" solute :\t"+measure.getSolute()+"\n"+
" solvent :\t"+measure.getSolvent()+"\n"+
" experiment:\t"+measure.getExperiment()+"\n"+
" reference :\t"+measure.getReference()+"\n"+
" conc :\t"+measure.getConcentration()+"\n"
);
}
} catch(Throwable err)

{
System.err.println("#error:"+err.getMessage());
}
}
private void test()
{
test(null,null,null,null);
test("4-nitrobenzaldehyde",null,null,null);
test("4-nitrobenzaldehyde",null,0.3,0.4);
}
Here is the output
ant test-webservice
Buildfile: build.xml
test-webservice
[wsimport] parsing WSDL...
[wsimport] generating code...
[javac] Compiling 1 source file to onssolubility/ons.solubility.data/bin
[java] ##Searching solute: null solvent: null conc: null-null
[java] sample : 9
[java] solute : D-Glucose
[java] solvent : THF
[java] experiment: 1
[java] reference : http://onschallenge.wikispaces.com/JennyHale-1
[java] conc : 0.00222
[java]
[java] sample : 6
[java] solute : D-Mannitol
[java] solvent : Methanol
[java] experiment: 1
[java] reference : http://onschallenge.wikispaces.com/JennyHale-1
[java] conc : 0.00548
[java]
(...)
[java]
[java] sample : 10
[java] solute : D-Mannitol
[java] solvent : THF
[java] experiment: 1
[java] reference : http://onschallenge.wikispaces.com/JennyHale-1
[java] conc : 0.01098
[java] ##Searching solute: 4-nitrobenzaldehyde solvent: null conc: 0.3-0.4
[java] sample : 2b
[java] solute : 4-nitrobenzaldehyde
[java] solvent : Methanol
[java] experiment: 212
[java] reference : http://usefulchem.wikispaces.com/exp212
[java] conc : 0.38

That's it and that's enough code for the week-end.

Pierre

30 October 2008

The EBI/IntAct Web-Service API, my notebook

This post covers my experience with the IntAct API at EBI. IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.

This web service is invoked for searching binary interactions, it is described (but not documented...) as a WSDL file at http://www.ebi.ac.uk/intact/binary-search-ws/binarysearch?wsdl

Glassfih, the Java Application Server from Sun, comes with a tool called wsimport. It generates a set of java files used to handle this web-service from the wsdl file.



Here are the generated java files :
./uk/ac/ebi/intact/binarysearch/wsclient/generated/BinarySearchService.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/ObjectFactory.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/FindBinaryInteractionsResponse.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/FindBinaryInteractionsLimitedResponse.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/FindBinaryInteractionsLimited.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/SimplifiedSearchResult.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/GetVersionResponse.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/package-info.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/FindBinaryInteractions.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/GetVersion.java
./uk/ac/ebi/intact/binarysearch/wsclient/generated/BinarySearch.java


AFAIK, the WSDL file contained almost no documentation about this service, but eclipse helped me to find the correct methods thanks to the completion of the code editor.
Here is the short program I just wrote: it connects to the webservice and retrieves all the binary interactions with NSP3
import uk.ac.ebi.intact.binarysearch.wsclient.generated.BinarySearch;
import uk.ac.ebi.intact.binarysearch.wsclient.generated.BinarySearchService;
import uk.ac.ebi.intact.binarysearch.wsclient.generated.SimplifiedSearchResult;

public class IntActClient
{
/**
* @param args
*/
public static void main(String[] args) {
try
{
final String query="NSP3";
BinarySearchService service=new BinarySearchService();
BinarySearch port=service.getBinarySearchPort();
SimplifiedSearchResult ssr= port.findBinaryInteractionsLimited(query, 0,500);
System.out.println("#first-result "+ssr.getFirstResult());
System.out.println("#max-results "+ssr.getMaxResults());
System.out.println("#total-results "+ssr.getTotalResults());
System.out.println("#luceneQuery "+ssr.getLuceneQuery());
for(String line:ssr.getInteractionLines())
{
System.out.println(line);
}
}
catch(Throwable err)
{
err.printStackTrace();
}
}
}


The result:
#first-result 0
#max-results 500
#total-results 7
#luceneQuery identifiers:nsp3 pubid:nsp3 pubauth:nsp3 species:nsp3 type:nsp3 detmethod:nsp3 interact
uniprotkb:Q8N5H7|intact:EBI-745980 uniprotkb:O43281|intact:EBI-718488 uniprotkb:SH2D3C uniprotkb:EFS
intact:EBI-1263954 uniprotkb:Q00721|intact:EBI-1263962 - uniprotkb:S7 - uniprotkb:NCVP4|uniprotkb:vn....
uniprotkb:Q00721|intact:EBI-1263962 intact:EBI-1263971 uniprotkb:S7 - uniprotkb:NCVP4|uniprotkb:vn34....
uniprotkb:Q00721|intact:EBI-1263962 uniprotkb:Q00721|intact:EBI-1263962 uniprotkb:S7 uniprotkb:S7 un...
uniprotkb:Q00721|intact:EBI-1263962 uniprotkb:Q9UGR2|intact:EBI-948845 uniprotkb:S7 uniprotkb:ZC3H7B....
uniprotkb:Q04637|intact:EBI-73711 uniprotkb:Q00721|intact:EBI-1263962 uniprotkb:EIF4G1 uniprotkb:S7....
uniprotkb:Q04637|intact:EBI-73711 uniprotkb:P03536|intact:EBI-296448 uniprotkb:EIF4G1 uniprotkb:S7 u...


Ok, it was easy but I'm a little bit disappointed here because the result was 'just' a set of tab delimited lines (and where is the documentation about those columns ??) and I would have rather expected a set of XML objects.
update: the format of the columns was described here:ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psimitab/README.

That's it for tonight....

Pierre