11 May 2006

Data Visualization: gapminder

If you're looking for a new way to visualize interactively your microarray data, have a glance at the charts from Gapminder (hosted at tools.google.com)



New BMC Journal: "Source Code for Biology and Medicine"

As discovered on chem-bla-ics:

Source Code for Biology and Medicine is an open access, peer-reviewed, online journal soon to be launched by BioMed Central. Source Code for Biology and Medicine will encompass all aspects of workflow for information systems, decision support systems, client user networks, database management, and data mining. Source Code for Biology and Medicine aims to publish source code for distribution and use in the public domain in order to advance biological and medical research. Through this dissemination, it may be possible to shorten the time required for solving certain computational problems for which there is limited source code availability or resources.

Unlike sourceforge, submiting a paper in this journal will allow scientists to get a new peer reviewed publication( may be critical for a carreer..). Moreover, I hope this journal will allow programmers (not pure theorists) to get a paper.


10 May 2006

Playing with the connotea API (2/2)

A few monthes ago Ben Lund gave me the opportunity to test a beta-version of the connotea API and I wondered if I was able to build an Annozilla server that could act as a bridge between the firefox web browser and the connotea server allowing scientists to see and share comments about a web site/ a paper. As it is said on the annozilla server:

The Annozilla project is designed to view and create annotations associated with a web page, as defined by the W3C Annotea project. The idea is to store annotations as RDF on a server, using XPointer (or at least XPointer-like constructs) to identify the region of the document being annotated. The intention of Annozilla is to use Mozilla's native facilities to manipulate annotation data - its built-in RDF handling to parse the annotations, and nsIXmlHttpRequest to submit data when creating annotations..

annozilla02
In this example: firefox opened the NCBI home page (right side). Once the page is loaded, annozilla fetches the bookmarks about NCBI from connotea (top left). Double clicking in the annotations makes annozilla download the body of those bookmarks (bottom left).


The JAVA servlet I wrote is available at :



But wait, there is a problem: For "security reasons" Annozilla does not use the "GET" parameters in an URL (I really understand that). So when the following URL is submited:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14870871

Annozilla ignores all the parameters and inserts the comments for:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi



which is really less interesting !!!... Anyway, this was still a nice to write and it was proof of concept on how to use the connotea API.

update: 2010-08-12: source code

/*
Copyright (c) 2006 Pierre Lindenbaum PhD

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
``Software''), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

The name of the authors when specified in the source files shall be
kept unmodified.

THE SOFTWARE IS PROVIDED ``AS IS'', WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL 4XT.ORG BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.


$Id: $
$Author: $
$Revision: $
$Date: $
$Locker: $
$RCSfile: $
$Source: $
$State: $
$Name: $
$Log: $


*************************************************************************/
package org.lindenb.annotea.server;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.StreamTokenizer;
import java.io.StringReader;
import java.io.StringWriter;
import java.net.MalformedURLException;
import java.net.Socket;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.security.MessageDigest;
import java.util.Enumeration;
import java.util.HashSet;
import java.util.Iterator;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;


import org.w3c.dom.CDATASection;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.Text;
import org.xml.sax.SAXException;

import com.oreilly.servlet.Base64Decoder;


/**
* @author lindenb
* http://islande:8080/annotea/Annotea
*/
public class AnnoteaServer extends HttpServlet
{
/**
* serialVersionUID
*/
private static final long serialVersionUID = 1L;
/** flag for debugging on/off */
private static boolean DEBUG=false;

//static declaration of xml namespaces
static public final String RDF = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
static public final String RDFS = "http://www.w3.org/2000/01/rdf-schema#";
static public final String DC = "http://purl.org/dc/elements/1.1/";
static public final String XMLNS= "http://www.w3.org/2000/xmlns/";
static public final String AN = "http://www.w3.org/2000/10/annotation-ns#";
static public final String XHTML = "http://www.w3.org/1999/xhtml" ;
static public final String HTTP = "http://www.w3.org/1999/xx/http#";
static public final String CONNOTEA ="http://www.connotea.org/2005/01/schema#";
static public final String TERMS = "http://purl.org/dc/terms/";



/** query parameter name as specified in the spec */
static final String QUERY_ANNOTATION_PARAMETER="w3c_annotates";
/** default number of rows to fetch from http://www.connotea.org */
static final int CONNOTEA_DEFAULT_NUM_ROWS=10;
/** number of rows to fetch from http://www.connotea.org */
int connoteaNumRowsToFetch =CONNOTEA_DEFAULT_NUM_ROWS;



/** @see javax.servlet.GenericServlet#init() */
public void init() throws ServletException
{
//init number of rows
String s = getInitParameter("connoteaNumRowsToFetch");
if(s!=null)
{
try
{
this.connoteaNumRowsToFetch=Integer.parseInt(s);
if(this.connoteaNumRowsToFetch<=0) this.connoteaNumRowsToFetch=CONNOTEA_DEFAULT_NUM_ROWS;
}
catch(Exception err)
{
throw new ServletException(err);
}
}
}


/** convert a string to MD5 */
static String toMD5(String url) throws ServletException
{
StringBuffer result= new StringBuffer();
try
{
MessageDigest md = MessageDigest.getInstance("MD5");

md.update(url.getBytes());
byte[] digest = md.digest();

for (int k=0; k<digest.length; k++)
{
String hex = Integer.toHexString(digest[k]);
if (hex.length() == 1) hex = "0" + hex;
result.append(hex.substring(hex.length()-2));
}
}
catch(Exception err)
{
throw new ServletException(err);
}
return result.toString();
}

/** @see javax.servlet.http.HttpServlet#doGet(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) */
protected void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
{
res.setContentType("text/xml");
String urlquery= req.getParameter(QUERY_ANNOTATION_PARAMETER);
PrintWriter out=res.getWriter();

String pathInfo = req.getPathInfo(); // /a/b;c=123

/**
*
* WE FOUND URLQUERY
*
*/
if(urlquery!=null)
{
debug("query is "+urlquery);
String authorizationUTF8= URLEncoder.encode(getAuthorization(req),"UTF-8");
String md5=toMD5(urlquery);
Document doc= getConnoteaRDF(
"http://www.connotea.org/data/bookmarks/uri/"+md5,
req
);

String postCount=null;
String created=null;
if(doc!=null)
{
Element root= doc.getDocumentElement();
if(root==null || !isA(root,RDF,"RDF")) throw new ServletException("Bad XML root from connotea");

for(Node n1= root.getFirstChild();n1!=null;n1=n1.getNextSibling())
{
if(!isA(n1,TERMS,"URI")) continue;
for(Node n2= n1.getFirstChild();n2!=null;n2=n2.getNextSibling())
{
if(isA(n2,CONNOTEA,"postCount"))
{
postCount=textContent(n2).replace('\n',' ').trim();
}
else if(isA(n2,CONNOTEA,"created"))
{
created=textContent(n2).replace('\n',' ').trim();
}
}
break;
}
}

debug("postcount="+postCount);

out.print(
"<?xml version=\"1.0\" ?>\n" +
"<r:RDF xmlns:r=\""+RDF+"\"\n" +
" xmlns:a=\""+AN+"\"\n" +
" xmlns:d=\""+DC+"\">\n"
);
if(postCount!=null)
{
out.print(
" <r:Description r:about=\""+getBaseURL(req)+"/"+md5+"\">\n" +
" <r:type r:resource=\""+ AN +"Annotation\"/>\n" +
" <r:type r:resource=\"http://www.w3.org/2000/10/annotationType#Comment\"/>\n" +
" <a:annotates r:resource=\""+urlquery+"\"/>\n" +
" <d:title>"+postCount+" Annotation(s) of "+urlquery+" on connotea</d:title>\n" +
" <a:context>" + urlquery+"#xpointer(/html[1])</a:context>\n" +
" <d:creator>Connotea</d:creator>\n" +
" <a:created>"+created+"</a:created>\n" +
" <d:date>"+created+"</d:date>\n" +
" <a:body r:resource=\""+getBaseURL(req)+"/body/"+md5+"?authorization="+authorizationUTF8+"\">"+
"</r:Description>"
);
}

out.print("</r:RDF>\n");
}
/**
*
* /BODY/ in pathinfo
*
*/
else if(pathInfo!=null && pathInfo.startsWith("/body/"))
{
String md5=pathInfo.substring(6);
Document doc=getConnoteaRDF("http://www.connotea.org/data/uri/"+md5+
"?num="+this.connoteaNumRowsToFetch,
req);
if(doc==null)
{
throw new ServletException("Cannot get http://www.connotea.org/data/uri/"+md5);
}
Element root= doc.getDocumentElement();
if(root==null || !isA(root,RDF,"RDF")) throw new ServletException("Bad XML root from connotea");


out.print("<?xml version=\"1.0\" ?>\n"+
"<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">"
);

printHTMLBodyFromConnotea(new PrintWriter(new FileWriter("/tmp/logAnnotea.txt")),md5,root);
printHTMLBodyFromConnotea(out,md5,root);
}
/**
*
* other
*
*/
else
{
debug("pathinfo not handled ?");
out.print(
"<?xml version=\"1.0\" ?>\n" +
"<r:RDF xmlns:r=\""+RDF+"\"\n" +
" xmlns:a=\""+AN+"\"\n" +
" xmlns:d=\""+DC+"\">\n" +
"<a:annotates r:resource=\""+
req.getRequestURL()+ "\"/>"+
"</r:RDF>\n"
);
}


out.flush();
}

/** return and parse an HTML annotation from connotea */
private void printHTMLBodyFromConnotea(PrintWriter out,String md5,Element root)
{

out.print("<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">"+
"<body>");
out.print("<img src=\"http://www.connotea.org/connotealogo.gif\" alt=\"connotea\"/><br/>");

for(Node n1= root.getFirstChild();n1!=null;n1=n1.getNextSibling())
{
if(isA(n1,CONNOTEA,"Post"))
{

String title=null;
String user=null;
String uri=null;
String created=null;
HashSet subject= new HashSet();
for(Node n2= n1.getFirstChild();n2!=null;n2=n2.getNextSibling())
{
if(isA(n2,DC,"creator")) { user=textContent(n2).trim();}
else if(isA(n2,CONNOTEA,"title")) { title=textContent(n2).trim();}
else if(isA(n2,CONNOTEA,"created")) {
created=textContent(n2).trim();
int T=created.indexOf('T');
if(T!=-1) created= created.substring(0,T);
}
else if(isA(n2,DC,"subject")) { subject.add(textContent(n2).trim());}
else if(isA(n2,CONNOTEA,"uri"))
{
for(Node n3= n2.getFirstChild();n3!=null;n3=n3.getNextSibling())
{
if(isA(n3,TERMS,"URI"))
{
Element e3=(Element)n3;
uri=e3.getAttributeNS(RDF,"about").trim();
}
}
}
}
out.print("<div>");

if(title!=null)
{
if(uri!=null) out.print("<h4><a target=\"ext\" title=\""+escape(uri)+"\" href=\""+escape(uri)+"\">");
out.print(""+escape(title));
if(uri!=null) out.print("</a>");
out.print(" (<a target=\"ext\" title=\"http://www.connotea.org/uri/"+md5+"\" href=\"http://www.connotea.org/uri/"+md5+"\">info</a>)");
out.print("</h4>");
}


if(user!=null)
{
out.print("Posted by <a target=\"ext\" href=\"http://www.connotea.org/user/"+user+"\">"+user+"</a>");
if(!subject.isEmpty())
{
out.print(" to");
for (Iterator iter = subject.iterator(); iter.hasNext();)
{
String sbj=escape((String)iter.next());
out.print(" <a target=\"ext\" href=\"http://www.connotea.org/tag/"+sbj+"\">"+sbj+"</a>");

}
}
if(created!=null ) out.print(" on <a target=\"ext\" href=\"http://www.connotea.org/date/"+created+"\">"+created+"</a>");
out.print("<br/>");
}
out.print("</div><hr/>");
}


}
out.print("<a title=\"mailto:[email protected]\" href=\"mailto:[email protected]\">[email protected]</a> Pierre Lindenbaum PhD<br/>");
out.print("<div align=\"center\"><img border=\"1\" src=\"http://www.integragen.com/img/title.png\"/></div>");
out.print("</body></html>");
out.flush();
}


/**
* @see javax.servlet.http.HttpServlet#doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) */
protected void doPost(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
{
Document doc=null;
try
{
doc= newDocumentBuilder().parse(req.getInputStream());
}
catch(SAXException err)
{
debug("cannot get document from input stream");
throw new ServletException(err);
}

String context=null;
String content=null;
Element root= doc.getDocumentElement();
if(root==null) throw new ServletException("Cannot find root element of input");
debug("DOM1 content is "+DOM2String(root));
for(Node c1= root.getFirstChild();c1!=null;c1=c1.getNextSibling())
{
if(!isA(c1,RDF,"Description")) continue;
for(Node c2= c1.getFirstChild();c2!=null;c2=c2.getNextSibling())
{
if(c2.hasChildNodes()&& isA(c2,AN,"context"))
{
Element e2=(Element)c2;
context=textContent(e2).trim();
int xpointer=context.indexOf("#xpointer");
if(xpointer!=-1) context=context.substring(0,xpointer);
}
else if(c2.hasChildNodes()&& isA(c2,AN,"body"))
{
for(Node c3= c2.getFirstChild();c3!=null;c3=c3.getNextSibling())
{
if(!(c3.hasChildNodes()&& isA(c3,HTTP,"Message"))) continue;

for(Node c4= c3.getFirstChild();c4!=null;c4=c4.getNextSibling())
{
if(!(c4.hasChildNodes()&& isA(c4,HTTP,"Body"))) continue;
content= textContent(c4).trim().replaceAll("[ ]+"," ");
debug("DOM content is "+DOM2String(c4));
}

}
}
}
}
if(context==null)
{
throw new ServletException("Cannot find context in "+root);
}
if(content==null)
{
throw new ServletException("Cannot find content in "+DOM2String(root));
}


/** check if boomarks already exists */
try
{
String usertitle=null;
String comment=null;
StringBuffer description=new StringBuffer();
HashSet tags= new HashSet();

doc= getConnoteaRDF(
"http://www.connotea.org/data/user/"+
getLogin(req)+"/uri/"+
toMD5(context),req
);
if(doc!=null)
{
root= doc.getDocumentElement();
if(root!=null)
{
for(Node n1= root.getFirstChild();n1!=null;n1=n1.getNextSibling())
{
if(isA(n1,CONNOTEA,"Post"))
{
for(Node n2= n1.getFirstChild();n2!=null;n2=n2.getNextSibling())
{
if(isA(n2,CONNOTEA,"title")) { usertitle=textContent(n2).trim();}
else if(isA(n2,DC,"subject")) { tags.add(textContent(n2).trim().toLowerCase());}
else if(isA(n2,CONNOTEA,"description")) { description= new StringBuffer(textContent(n2).trim()+"\n");}
}
break;
}
}
}
debug("existed: title="+usertitle+" subject="+tags+ " desc="+description);
}


/* construct URL to connotea */

BufferedReader buffReader= new BufferedReader(new StringReader(content));
debug("content is "+content);
String line=null;

while((line=buffReader.readLine())!=null)
{
line= line.trim();
String upper= line.toUpperCase();
//parse tags
if(upper.startsWith("TAG:"))
{
try
{
StreamTokenizer parser= new StreamTokenizer(
new StringReader(line.substring(4))
);
parser.quoteChar('"');
parser.quoteChar('\'');
parser.eolIsSignificant(false);
parser.lowerCaseMode(true);
parser.ordinaryChars('0','9');
parser.wordChars('0','9');
parser.whitespaceChars(',',',');
parser.whitespaceChars(';',';');
parser.whitespaceChars('(','(');
parser.whitespaceChars(')',')');

while ( parser.nextToken() != StreamTokenizer.TT_EOF )
{
if ( parser.ttype == StreamTokenizer.TT_WORD)
{
tags.add(parser.sval.toLowerCase());
}
else if ( parser.ttype == StreamTokenizer.TT_NUMBER )
{
//hum... ignoring number
//items.add(""+parser.nval);
}
else if ( parser.ttype == StreamTokenizer.TT_EOL )
{
continue;
}
else if(parser.sval!=null)
{
tags.add(parser.sval.toLowerCase()) ;
}
}
}
catch(Exception ex)
{

}
}
else if(upper.startsWith("TI:"))
{
usertitle= line.substring(3).trim();
if(usertitle.length()==0) usertitle=null;
}
else if(upper.startsWith("COM:"))
{
comment= line.substring(4).trim();
if(comment.length()==0) comment=null;
}
else
{
description.append(line+"\n");
}
}
//put one tag if no tags was declared
if(tags.isEmpty()) tags.add("annoteated");

if(description.length()==0) description= new StringBuffer(context);
//build tags parameter
StringBuffer tagStr= new StringBuffer();
for (Iterator iter = tags.iterator(); iter.hasNext();)
{
if(tagStr.length()!=0) tagStr.append(",");
tagStr.append(iter.next().toString());
}

debug("tagstr="+tagStr);


URL url=new URL("http://www.connotea.org:80/data/add");

String postbody=
"uri=" + URLEncoder.encode(context,"UTF-8")+
(usertitle==null?"":"&usertitle="+URLEncoder.encode(usertitle,"UTF-8"))+
"&description="+URLEncoder.encode(description.toString(),"UTF-8")+
(comment==null?"":"&comment="+URLEncoder.encode(comment,"UTF-8"))+
"&tags=" + URLEncoder.encode(tagStr.toString(),"UTF-8")+
"&annoteaflag=hiBenThisIsPierre"
;

StringBuffer poststring= new StringBuffer();
poststring.append("POST "+url.getFile()+" HTTP/1.1\n");
poststring.append("Host: "+url.getHost()+"\n");
poststring.append("authorization: "+getAuthorization(req)+"\n");
poststring.append("Content-length: "+postbody.length()+"\n");
poststring.append("\n");
poststring.append(postbody.toString());


Socket socket= new Socket(url.getHost(),url.getPort());
InputStream from_server= socket.getInputStream();
PrintWriter to_server= new PrintWriter(
new OutputStreamWriter(socket.getOutputStream()));


to_server.print(poststring.toString());
to_server.flush();


StringBuffer response= new StringBuffer();
int c; while((c=from_server.read())!=-1) { response.append((char)c);}


to_server.close();
from_server.close();
debug("sent "+ poststring+" response is:\n"+response+"\n");
}
catch(Exception err)
{
debug("error "+err);
throw new ServletException(err);
}





{
String msg="<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<rdf:RDF xmlns:rdf=\""+RDF+"\"\n" +
" xmlns:a=\""+AN+"\"\n" +
" xmlns:dc=\""+DC+"\">\n" +
" <rdf:Description rdf:about=\""+
getBaseURL(req)+"/"+toMD5(context)
+"\">\n" +
" <dc:creator>connotea user</dc:creator>\n"+
" <a:created>2006-01-31</a:created>\n"+
" <dc:date>2006-01-31</dc:date>\n"+
" <a:annotates rdf:resource=\""+escape(context)+"\"/>\n" +
" <rdf:type rdf:resource=\""+ AN +"Annotation\"/>\n" +
" <rdf:type rdf:resource=\"http://www.w3.org/2000/10/annotationType#Comment\"/>\n" +
//" <a:body rdf:resource=\""+getBaseURL(req)+"/body/"+toMD5(context) +"\"/>\n" +
" </rdf:Description>\n" +
"</rdf:RDF>";



res.setStatus(HttpServletResponse.SC_CREATED);
res.setHeader("Connection","close");
res.setHeader("Pragma","no-cache");
res.setContentType("text/xml");
res.setContentLength(msg.length());



PrintWriter out= res.getWriter();
out.print(msg);
debug("message sent is "+msg);
out.flush();
}
}




/** return the BASE URL of this servet */
private static String getBaseURL(HttpServletRequest req) throws ServletException
{
String scheme = req.getScheme(); // http
String serverName = req.getServerName(); // hostname.com
int serverPort = req.getServerPort(); // 80
String contextPath = req.getContextPath(); // /mywebapp
String servletPath = req.getServletPath(); // /servlet/MyServlet
//String pathInfo = req.getPathInfo(); // /a/b;c=123
//String queryString = req.getQueryString(); // d=789
return scheme+"://"+serverName+":"+serverPort+contextPath+servletPath;
}

/** creates a new namespace aware DocumentBuilder parsing DOM */
private static DocumentBuilder newDocumentBuilder() throws ServletException
{
DocumentBuilderFactory factory= DocumentBuilderFactory.newInstance();
factory.setCoalescing(true);
factory.setNamespaceAware(true);
factory.setExpandEntityReferences(true);
factory.setIgnoringComments(true);
factory.setValidating(false);
try
{
return factory.newDocumentBuilder();
} catch (ParserConfigurationException error)
{
throw new ServletException(error);
}
}

/** simple escape XML function */
public static String escape(String s)
{
if(s==null) return s;
StringBuffer buff= new StringBuffer();
for(int i=0;i< s.length();++i)
{
switch(s.charAt(i))
{
case('\"') : buff.append("&quot;"); break;
case('\'') : buff.append("&apos;"); break;
case('&') : buff.append("&amp;"); break;
case('<') : buff.append("&lt;"); break;
case('>') : buff.append("&gt;"); break;
default: buff.append(s.charAt(i)); break;
}
}
return buff.toString();
}

/** simple test for XML element */
static public boolean isA(Node e,String ns, String localname)
{
if(e==null) return false;
return ns.equals(e.getNamespaceURI()) && e.getLocalName().equals(localname);
}

/** get text content of a DOM node */
static public String textContent(Node node)
{
return textContent(node,new StringBuffer()).toString();
}

/** get text content of a DOM node */
static private StringBuffer textContent(Node node,StringBuffer s)
{
if(node==null) return s;
for(Node c= node.getFirstChild();c!=null;c=c.getNextSibling())
{
if(isA(c,XHTML,"br"))
{
s.append("\n");
}
else if(c.getNodeType()==Node.CDATA_SECTION_NODE)
{
s.append(((CDATASection)c).getNodeValue());
}
else if(c.getNodeType()==Node.TEXT_NODE)
{
s.append(((Text)c).getNodeValue());
}
else
{
textContent(c,s);
}
}
return s;
}


/** download a document from connotea or null if the url was not found */
private Document getConnoteaRDF(String url,HttpServletRequest req) throws ServletException
{
DocumentBuilder builder= newDocumentBuilder();
try
{
return builder.parse(
openSecureURLInputStream(
url,
req
)
);
}
catch (FileNotFoundException e)
{
debug("file not found for "+url+" returning empty rdf document");
return null;
}
catch(Exception err)
{
debug("cannot download :"+url+ " "+err);
throw new ServletException(err);
}
}


/** get login from header authorization */
static private String getLogin(HttpServletRequest req) throws ServletException
{
return getLoginAndPassword(req)[0];
}

/** get login and passord from header authorization */
static private String[] getLoginAndPassword(HttpServletRequest req) throws ServletException
{
String s64= getAuthorization(req);
if(!s64.startsWith("Basic ")) throw new ServletException("header \"authorization\" does not start with \"Basic \" ");
s64=s64.substring(6);
String decoded= Base64Decoder.decode(s64);
int loc= decoded.indexOf(':');
if(loc==-1) throw new ServletException("no \":\" in decoded authorization "+s64);
return new String[]{decoded.substring(0,loc),decoded.substring(loc+1)};
}

/** find parameter authorization in http request */
static String getAuthorization(HttpServletRequest req) throws ServletException
{
String s64= req.getHeader("authorization");
if(s64==null)
{
s64=req.getParameter("authorization");
}
if(s64==null)
{
s64=req.getParameter("Authorization");
}
if(s64==null)
{
Enumeration e= req.getHeaderNames();
StringBuffer headers= new StringBuffer();
while(e.hasMoreElements())
{
String key=(String)e.nextElement();
headers.append(key).append("=").append(req.getHeader(key)).append(";\n");
}
throw new ServletException("no header \"authorization\" was found in\n"+headers);
}
return s64;
}

/** open a URL, filling authorization header */
static private InputStream openSecureURLInputStream(String urlstr,HttpServletRequest req)
throws ServletException,FileNotFoundException
{
URL url=null;
String s64=null;
try
{
s64= getAuthorization(req);
url= new URL(urlstr);
URLConnection uc = url.openConnection();
uc.setRequestProperty("Authorization", s64);
InputStream content = uc.getInputStream();
return content;
}
catch (MalformedURLException e)
{
throw new ServletException(e);
}
catch (FileNotFoundException e)
{
throw e;
}
catch (IOException e)
{
throw new ServletException(e);
}
}


/** quick n dirty debugging function: append the message to "/tmp/logAnnotea.txt" */
static private void debug(Object o)
{
if(!DEBUG) return;
try
{
File f= new File("/tmp/logAnnotea.txt");
PrintWriter pout= new PrintWriter(new FileWriter(f,true));
pout.println("###"+System.currentTimeMillis()+"########");
pout.println(o);
pout.flush();
pout.close();

}
catch (Exception err)
{
err.printStackTrace();
}

}

private String DOM2String(Node doc)throws ServletException
{
StringWriter out= new StringWriter();
printDOM(new PrintWriter(out,true),doc);
return out.toString();
}

/* print DOM document for debugging purpose...*/
private static void printDOM(PrintWriter log,Node doc) throws ServletException
{
Source source = new DOMSource(doc);
Result result = new StreamResult(log);


try
{
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(source, result);
}
catch(TransformerException error)
{
error.printStackTrace();
error.printStackTrace(log);
log.flush();
throw new ServletException(error);
}
}


}

Playing with the connotea API (1/2)

Connotea is a free online reference management service. It allows you to save links to all your favourite articles, references, websites and other online resources with one click. Connotea is also a social bookmarking tool, so you can view other people's collections to discover new, interesting content. Playing with Connotea is a pretext for me to learn new technologies and as the connotea web API has just been released I wanted to learn AJAX: so I transformed my previous 'old' tool connotea explorer into a dynamic javascript page available at:

http://www.urbigene.com/cx2/connotea.xhtml


An old screenshot of connotea explorer, the result with the html/javascript page is rather the same...


The page displays data from Connotea using SVG, javascript, AJAX, XSLT and the Connotea API using a treemap algorithm. As Firefox now supports the SVG format, this drawing can be displayed in your web browser.

Security Issue: As this script is not hosted on the connotea server, the user is asked to enable his web browser to connect to connotea (UniversalBrowserRead). The user has to open his browser at about:config and check that signed.applets.codebase_principal_support is set to true to get a dialog for enabling the connection to the connotea server) yes this is ugly... :-).

27 April 2006

April 2006

I've been away from this blog for a few days. Here are a few thoughts:

I've created a project on sourceforge used to store the sources of MyFOAFExplorer and SciFOAF programs. The project is hosted on http://urbigene.sourceforge.net, the sources can be downloaded via CVS, I need to write the documentation (worst part of programming...), I also plan to include the sources of the tools I wrote for connotea.

Scientific FOAF profiles can now use to Benjamin Nowack's "Semantic Campus" using the URI "http://semanticcampus.org/ns/sc#" to introduce themselves. I also found DOAC to describe a carreer.

Nature Journal instroduced OTMI (Open Text Mining Interface), a XML (draft) format describing a paper and that can be used by computers for text-mining. OTMI is trying to expose the information that publishers already have about already published material. I guess this format could be used to get more (structured) information than an abstract of pubmed, but a text analysis still need to be performed. Last year I wrote a note about semantic abstracts: publishers could ask authors to join a RDF-based version of their abstract, the information would be curated by the author and directly available to the community.

I wonder why Science is not as innovative as Nature ?

There is much effervescence around microformats at this time. May be there is something for the bioinformatics community here...

The latest build (36.1/hg18) of the Human Genome has been released on the UCSC genome browser.

Two new products from google: calendar and Sketchup.

That's all for tonight !


12 April 2006

Windows Live Academic

After NCBI pubmed, Elsevier's scirus and google'sScholar, here is now Microsoft's Academic Live.

(it indexes) content related to computer science, physics, electrical engineering, and related subject areas. Academic search enables you to search for peer reviewed journal articles contained in journal publisher portals and on the web in locations like citeseer.

My first query was "Rotavirus Roxan". It returned no result today on Academic Live whereas I got the correct answers on Scholar. Nevertheless Microsoft's product seems more interactive than google's one.


Systems Biology with iHOP

A few monthes ago, I registered to Siphs, a social network for researchers. I didn't play with it a long time, as it was a little bit buggy, not much populated,.... However, I still receive e-mail from this site and today I received a mail about an impressive text mining tool called Information Hyperlinked over Proteins (iHOP).



A network of concurring genes and proteins extends through the scientific literature touching on phenotypes, pathologies and gene function. iHOP provides this network as a natural way of accessing millions of PubMed abstracts. By using genes and proteins as hyperlinks between sentences and abstracts, the information in PubMed can be converted into one navigable resource, bringing all advantages of the internet to scientific literature research.

iHOP was written by Dr Robert Hoffmann and published in Nature Genetics 36, 664 (2004): Hoffmann, R., Valencia, A. A gene network for navigating the literature.


11 April 2006

Writing a paper for dummies

A few weeks ago I introduced the method I use to draw pedigrees using the dot program. Dr Zhao had recently same idea as his paper using dot and R has just been published in Bioinformatics:

Zhao JH. Pedigree-drawing with R and graphviz.
Bioinformatics. 2006 Apr 15;22(8):1013-4. Epub 2006 Feb 17.
PMID: 16488908


c'est injuste c'est vraiment trop injuste
Sometimes you find a nice piece of code and you don't even think that it could be used to write a paper :-)...



07 April 2006

Nature Network Boston: an online social space for scientists

On Nascent Timo Hannay spoke about Web 2.0 in science. He also announced:

Nature Network Boston, an online social space for scientists will be launching in May

We might expect a new social network such as LinkedIn, OpenBC,etc... for scientists :-) ! This is not a new idea as SIPHS already tried to build such a network. And is it a good idea to create another network only for scientists ? But there might have good idea such as linking it to Nature's connotea. Nature might also have enough influence to link authors in pubmed to its site...

wait and see.

06 April 2006

Social Scientific Community with Connnotea.

Connotea is a free online reference management service. It allows you to save links to all your favourite articles, references, websites and other online resources with one click. Connotea is also a social bookmarking tool, like del.ico.us, so you can view other people's collections to discover new, interesting content. Some features have been requested by the users on the connotea mailing list and, among them Pedro Horna suggested that anonimity was not a good deal with connotea.

Although nicknames are popular in social bookmarking, chatrooms and poetry, anonymity has always been a very rare event in science basically because it does not make any sense. We scientists constantly struggle to let ourselves known and go to great lengths to find out who we might be interacting with in order to strengthen our network of potential collaborators. I bet most researchers are more than willing to put some info about themselves, probably as a "user note" or "user profile"...

Other people suggested to use a WIKI to store such information.

Today, the Connotea team has just released a new version wich uses a wiki to store your scientific profile ([here is mine]) or to describe your team/group...

This also can now be a nice place to define your unique URI and to create your FOAF profile...

Great !