Showing posts with label canvas. Show all posts
Showing posts with label canvas. Show all posts

09 June 2013

How to fit a sentence in a rectangle with the Hershey vectorial font.

via wikipedia: The Hershey fonts are a collection of vector fonts developed circa 1967 by Dr. A. V. Hershey (...). Vector fonts are easily scaled and rotated in two or three dimensions; consequently the Hershey fonts have been widely used in computer graphics and computer-aided design programs.. When programming, I often have to fit a sentence in a rectangle (for example to write the name of a short-read in the graphical view of a BAM) so I wrote a XML version of the hershey font.

<?xml version="1.0"?>
<hershey>
  <letter id="1" count="9" left="-5" right="5" char="a">
    <moveto x="0" y="-5"/>
    <lineto x="-4" y="4"/>
    <moveto x="0" y="-5"/>
    <lineto x="4" y="4"/>
    <moveto x="-2" y="1"/>
    <lineto x="2" y="1"/>
  </letter>
  <letter id="2" count="16" left="-5" right="5" char="b">
    <moveto x="-3" y="-5"/>
    <lineto x="-3" y="4"/>
    <moveto x="-3" y="-5"/>
    <lineto x="1" y="-5"/>
    <lineto x="3" y="-4"/>
    <lineto x="3" y="-2"/>
    <lineto x="1" y="-1"/>
    <moveto x="-3" y="-1"/>
    <lineto x="1" y="-1"/>
    <lineto x="3" y="0"/>
    <lineto x="3" y="3"/>
From there, I can generate some bindings
for various programming languages using XSLT, for example javascript:
{
 "1":[{t:'M',x:0,y:-5},{t:'L',x:-4,y:4},{t:'M',x:0,y:-5},{t:'L',x:4,y:4},{t:'M',x:-2,y:1},{t:'L',x:2,y:1}],
 "2":[{t:'M',x:-3,y:-5},{t:'L',x:-3,y:4},{t:'M',x:-3,y:-5},{t:'L',x:1,y:-5},{t:'L',x:3,y:-4},{t:'L',x:3,y:-2},{t:'L',x:1,y:-1},{t:'M',x:-3,y:-1},{t:'L',x:1,y:-1},{t:'L',x:3,y:0},{t:'L',x:3,y:3},{t:'L',x:1,y:4},{t:'L',x:-3,y:4}], ...

In the Javascript example below, I'm generating some random rectangles where a sentence is written:


That's it,
Pierre

07 March 2011

Drawing a protein (Biostar #6172)

This post is my answer for this question on Biostar:Drawing a protein:
Dear all I often find protein's image like this (...) Do you know if there's a program to draw them (I mean circles with letters).

I wrote a Java-Swing application named WirePeptide displaying a draggable peptide. This application is available on github at https://github.com/lindenb/jsandbox/blob/master/src/sandbox/WirePeptide.java. The user can save the image as PNG, SVG and HTML+Canvas.

Compile & Run

cd jsandbox
ant wirepeptide
java -jar dist/wirepeptide.jar


Result (Canvas)



That's it,

Pierre

22 September 2010

A Simple tool to get the sex ratio in pubmed.

Just for fun, I wrote a simple java tool to get the sex ratio of the authors in Pubmed. This program fetches a list of names/genders I found in the following perl module: http://cpansearch.perl.org/src/EDALY/Text-GenderFromName-0.33/GenderFromName.pm. The source code is available at

.

(In the following examples, the many names that couldn't be associated to a gender were ignored).

Bioinformatics


Here is the result for "Bioinformatics[journal]"
Women: 3178 (19%) Men: 13149 (80%)
Bioinformatics[Journal]


The 'Lancet' in 2009

Women: 579 (30%) Men: 1331 (69%)
Lancet[Journal] 2009[Date]


Nature in 2009

Women: 1616 (30%) Men: 3768 (69%)
Nature[Journal] 2009[Date]


Nursing in 2009

Women: 29 (70%) Men: 12 (29%)
Nursing[Journal] 2009[Date]



Articles about Charles Darwin

Women: 25 (17%) Men: 118 (82%)
"Darwin C"[PS]



etc... etc..

Source code

/**
* Author:
* Pierre Lindenbaum PhD
* [email protected]
* Source of data:
* http://cpansearch.perl.org/src/EDALY/Text-GenderFromName-0.33/GenderFromName.pm
*/
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLEncoder;
import java.text.Collator;
import java.util.Locale;
import java.util.Map;
import java.util.TreeMap;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
import javax.xml.stream.events.XMLEvent;

/**
* PubmedGender
*/
public class PubmedGender
{
private Map<String,Float> males=null;
private Map<String,Float> females=null;
private int limit=1000;
private String query="";
private int canvasSize=200;
private boolean ignoreUndefined=false;
private PubmedGender()
{
Collator collator= Collator.getInstance(Locale.US);
collator.setStrength(Collator.PRIMARY);
this.males=new TreeMap<String, Float>(collator);
this.females=new TreeMap<String, Float>(collator);
}

private void loadNames()
throws IOException
{
BufferedReader in=new BufferedReader(new InputStreamReader(new URL("http://cpansearch.perl.org/src/EDALY/Text-GenderFromName-0.33/GenderFromName.pm").openStream()));
String line;
Map<String,Float> map=null;
int posAssign=-1;
while((line=in.readLine())!=null)
{
if(line.startsWith("$Males = {"))
{
map=this.males;
}
else if(line.startsWith("$Females = {"))
{
map=this.females;
}
else if(line.contains("}"))
{
map=null;
}
else if(map!=null && ((posAssign=line.indexOf("=>"))!=-1))
{
String name=line.substring(0,posAssign).replaceAll("'","").toLowerCase().trim();
Float freq=Float.parseFloat(line.substring(posAssign+2).replaceAll("[',]","").toLowerCase().trim());
map.put(name, freq);
}
else
{
map=null;
}
}
in.close();
}
private XMLEventReader newReader(URL url) throws IOException,XMLStreamException
{
XMLInputFactory f= XMLInputFactory.newInstance();
f.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
f.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE,Boolean.FALSE);
f.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES,Boolean.TRUE);
f.setProperty(XMLInputFactory.IS_VALIDATING,Boolean.FALSE);
f.setProperty(XMLInputFactory.SUPPORT_DTD,Boolean.FALSE);
XMLEventReader reader=f.createXMLEventReader(url.openStream());
return reader;
}

private void run() throws Exception
{
int countMales=0;
int countFemales=0;
int countUnknown=0;

URL url= new URL(
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="+
URLEncoder.encode(this.query, "UTF-8")+
"&retstart=0&retmax="+this.limit+"&usehistory=y&retmode=xml&email=plindenbaum_at_yahoo.fr&tool=gender");

XMLEventReader reader= newReader(url);
XMLEvent evt;
String QueryKey=null;
String WebEnv=null;
int countId=0;
while(!(evt=reader.nextEvent()).isEndDocument())
{
if(!evt.isStartElement()) continue;
String tag= evt.asStartElement().getName().getLocalPart();
if(tag.equals("QueryKey"))
{
QueryKey= reader.getElementText().trim();
}
else if(tag.equals("WebEnv"))
{
WebEnv= reader.getElementText().trim();
}
else if(tag.equals("Id"))
{
++countId;
}
}
reader.close();

if(countId!=0)
{
url= new URL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&WebEnv="+
URLEncoder.encode(WebEnv,"UTF-8")+
"&query_key="+URLEncoder.encode(QueryKey,"UTF-8")+
"&retmode=xml&retmax="+this.limit+"&email=plindenbaum_at_yahoo.fr&tool=mail");

reader= newReader(url);


while(reader.hasNext())
{
evt=reader.nextEvent();
if(!evt.isStartElement()) continue;
if(!evt.asStartElement().getName().getLocalPart().equals("Author")) continue;
String firstName=null;
String initials=null;

while(reader.hasNext())
{
evt=reader.nextEvent();
if(evt.isStartElement())
{
String localName=evt.asStartElement().getName().getLocalPart();
if(localName.equals("ForeName") || localName.equals("FirstName"))
{
firstName=reader.getElementText().toLowerCase();
}
else if(localName.equals("Initials"))
{
initials=reader.getElementText().toLowerCase();
}
}
else if(evt.isEndElement())
{
if(evt.asEndElement().getName().getLocalPart().equals("Author")) break;
}
}
if( firstName==null ) continue;
if( firstName.length()==1 ||
firstName.equals(initials)) continue;

String tokens[]=firstName.split("[ ]+");
firstName="";
for(String s:tokens)
{
if(s.length()> firstName.length())
{
firstName=s;
}
}


if( firstName.length()==1 ||
firstName.equals(initials)) continue;

Float male= this.males.get(firstName);
Float female= this.females.get(firstName);

if(male==null && female==null)
{
//System.err.println("Undefined "+firstName+" / "+lastName);
countUnknown++;
}
else if(male!=null && female==null)
{
countMales++;
}
else if(male==null && female!=null)
{
countFemales++;
}
else if(male < female)
{
countFemales++;
}
else if(female < male)
{
countMales++;
}
else
{
//System.err.println("Undefined "+firstName+" / "+lastName);
countUnknown++;
}
}
reader.close();
}
if(ignoreUndefined) countUnknown=0;

float total= countMales+countFemales+countUnknown;

double radMale=(countMales/total)*Math.PI*2.0;
double radFemale=(countFemales/total)*Math.PI*2.0;
int radius= (canvasSize-2)/2;
String id= "ctx"+System.currentTimeMillis()+""+(int)(Math.random()*1000);
XMLOutputFactory xmlfactory= XMLOutputFactory.newInstance();
XMLStreamWriter w= xmlfactory.createXMLStreamWriter(System.out,"UTF-8");
w.writeStartElement("html");
w.writeStartElement("body");
w.writeStartElement("div");
w.writeAttribute("style","margin:10px;padding:10px;text-align:center;");
w.writeStartElement("div");
w.writeEmptyElement("canvas");
w.writeAttribute("width", String.valueOf(canvasSize+1));
w.writeAttribute("height", String.valueOf(canvasSize+1));
w.writeAttribute("id", id);
w.writeStartElement("script");
w.writeCharacters(
"function paint"+id+"(){var canvas=document.getElementById('"+id+"');"+
"if (!canvas.getContext) return;var c=canvas.getContext('2d');"+
"c.fillStyle='white';c.strokeStyle='black';"+
"c.fillRect(0,0,"+canvasSize+","+canvasSize+");"+
"c.fillStyle='gray';c.beginPath();c.arc("+(canvasSize/2)+","+(canvasSize/2)+","+radius+",0,Math.PI*2,true);c.fill();c.stroke();"+
"c.fillStyle='blue';c.beginPath();c.moveTo("+(canvasSize/2)+","+(canvasSize/2)+");c.arc("+(canvasSize/2)+","+(canvasSize/2)+","+radius+",0,"+radMale+",false);c.closePath();c.fill();c.stroke();"+
"c.fillStyle='pink';c.beginPath();c.moveTo("+(canvasSize/2)+","+(canvasSize/2)+");c.arc("+(canvasSize/2)+","+(canvasSize/2)+","+radius+","+radMale+","+(radMale+radFemale)+",false);c.closePath();c.fill();c.stroke();}"+
"window.addEventListener('load',function(){ paint"+id+"(); },true);"
);
w.writeEndElement();
w.writeEndElement();

w.writeStartElement("span");
w.writeAttribute("style","color:pink;");
w.writeCharacters("Women: "+countFemales+" ("+(int)((countFemales/total)*100.0)+"%)");
w.writeEndElement();
w.writeCharacters(" ");
w.writeStartElement("span");
w.writeAttribute("style","color:blue;");
w.writeCharacters("Men: "+countMales+" ("+(int)((countMales/total)*100.0)+"%)");
w.writeEndElement();
w.writeCharacters(" ");

if(!this.ignoreUndefined)
{
w.writeStartElement("span");
w.writeAttribute("style","color:gray;");
w.writeCharacters("Undefined : "+countUnknown+" ("+(int)((countUnknown/total)*100.0)+"%)");
w.writeEndElement();
}
w.writeEmptyElement("br");

w.writeStartElement("a");
w.writeAttribute("target","_blank");
w.writeAttribute("href","http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&amp;cmd=search&amp;term="+URLEncoder.encode(this.query,"UTF-8"));
w.writeCharacters(this.query);
w.writeEndElement();


w.writeEndElement();
w.writeEndElement();
w.writeEndElement();
w.flush();
w.close();
}

public static void main(String[] args)
{
try
{
PubmedGender app=new PubmedGender();

int optind=0;
while(optind< args.length)
{
if(args[optind].equals("-h") ||
args[optind].equals("-help") ||
args[optind].equals("--help"))
{
System.err.println("Options:");
System.err.println(" -h help; This screen.");
System.err.println(" -w <int> canvas size default:"+app.canvasSize);
System.err.println(" -L <int> limit number default:"+app.limit);
System.err.println(" -i ignore undefined default:"+app.ignoreUndefined);
System.err.println(" query terms...");
return;
}
else if(args[optind].equals("-L"))
{
app.limit=Integer.parseInt(args[++optind]);
}
else if(args[optind].equals("-w"))
{
app.canvasSize=Integer.parseInt(args[++optind]);
}
else if(args[optind].equals("-i"))
{
app.ignoreUndefined=true;
}
else if(args[optind].equals("--"))
{
optind++;
break;
}
else if(args[optind].startsWith("-"))
{
System.err.println("Unknown option "+args[optind]);
return;
}
else
{
break;
}
++optind;
}
if(optind==args.length)
{
System.err.println("Query missing");
return;
}
app.query="";
while(optind< args.length)
{
if(!app.query.isEmpty()) app.query+=" ";
app.query+=args[optind++];
}
app.query=app.query.trim();
if(app.query.trim().isEmpty())
{
System.err.println("Query is empty");
return;
}
app.loadNames();

app.run();

}
catch (Exception e)
{
e.printStackTrace();
}
}
}


That's it

Pierre

26 November 2009

A Java implementation of Jan Aerts' LocusTree

This post is a description of my implementation of Jan Aerts' LocusTree algorithm (I want to thank Jan, our discussion and his comments were as great source of inspiration) based on BerkeleyDB-JE, a Key/Value datastore. This implementation has been used to build a genome browser displaying its data with the SVG format. In brief: splicing each chromosome using a dichotomic approach allows to quickly find all the features in a given genomic region for a given resolution. A count of the total number of objects in the descent of each child node is used to produce a histogram of the number of objects smaller than the given resolution.
Your browser does not support the <CANVAS> element !

JSON/Metadata

All the information is stored in BerkeleyDB and I've used JSON to add some metadata about each object. The JSON is serialized, gzipped and stored in BerkeleyDB.
Your browser does not support the <CANVAS> element !

Organism

Each organism is defined by an ID and a Name. The Key of the BerkeleyDB is the organism.id.
Your browser does not support the <CANVAS> element !

The organisms are loaded in the database using a simple XML file:
<organisms>
<organism id="36">
<name>hg18</name>
<description>Human Genome Build v.36</description>
<metadata>{"taxon-id":9606}</metadata>
</organism>
</organisms>


Chromosome

Each chromosome is defined by an ID, a Name, its length and its organism-ID. The Key in berkeleyDB is the chromosome ID.
Your browser does not support the <CANVAS> element !

The chromosomes are loaded in the database using a simple XML file:
<chromosomes organism-id="36">
<chromosome id="1">
<name>chr1</name>
<metadata>{"size":247249719,"type":"autosomal"}</metadata>
</chromosome>
<chromosome id="10">
<name>chr10</name>
<metadata>{"size":135374737,"type":"autosomal"}</metadata>
</chromosome>
(...)
</chromosomes>



Track

Each track is defined by an ID and a Name. The Key in berkeleyDB is the track ID.
Your browser does not support the <CANVAS> element !

The descriptions of the tracks are loaded in the database using a simple XML file:
<tracks>
<track id="1">
<name>cytobands</name>
<description>UCSC cytobands</description>
</track>
<track id="2">
<name>knownGene</name>
<description>UCSC knownGene</description>
</track>
<track id="3">
<name>snp130</name>
<description>dbSNP v.130</description>
</track>
<track id="4">
<name>snp130Coding</name>
<description>UCSC coding Snp</description>
</track>
<track id="5">
<name>all_mrna</name>
<description>UCSC All mRNA</description>
</track>
</tracks>

Nodes


Each LocusTree Node (LTNode) is linked to a Chromosome and a Track using a database named 'TrackChrom'. Here the Key of the BerkeleyDB is a composite key (chromosome/track).
Your browser does not support the <CANVAS> element !

The structure of a LTNode is described below. Each node contains a link to its parent, the links to its children as well as a set of genomic entities whose length is greater or equals that 'this.length'.
Your browser does not support the <CANVAS> element !


To load the content of each LocusTree, I've defined a simple java interface called LTStreamLoader which looks like this:
public interface LTLoader
{
public MappedObject getMappedObject();
public String getChromosome();
public Set<String> getKeywords();
}
public interface LTStreamLoader
extends LTLoader
{
public void open(String uri) throws IOException;
public void close() throws IOException;
public boolean next() throws IOException;
}
An instance of this interface is used to load the content of a tab delimited file as defined in the following XML file:
<loaders>
<load organism-id="36" track-id="5" class-loader="fr.cephb.locustree.loaders.UCSCAllMrnaLoader" limit="10000">
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/all_mrna.txt.gz
</load>
<load organism-id="36" track-id="4" class-loader="fr.cephb.locustree.loaders.UCSCSnpCodingLoader" limit="10000">
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130CodingDbSnp.txt.gz
</load>
<load organism-id="36" track-id="1" class-loader="fr.cephb.locustree.loaders.UCSCCytoBandsLoader">
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/cytoBand.txt.gz
</load>
<load organism-id="36" track-id="2" class-loader="fr.cephb.locustree.loaders.UCSCKnownGeneLoader">
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/knownGene.txt.gz
</load>
<load organism-id="36" track-id="3" class-loader="fr.cephb.locustree.loaders.UCSCSnpLoader" limit="10000">
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130.txt.gz
</load>
</loaders>
It took about 3H00 to load 'snp130.txt.gz' and the size of the indexed BerkeleyDB/LocusTree database was 16Go (ouch!).

Building the Genome Browser

The locus tree database was used to create (yet another) Genome Browser. My current implementation runs smoothly under Apache Tomcat. The SVG vectorial format was used to draw and hyperlink the data. Here is a screenshot of the first version I wrote one week ago. As you can see, the objects that were too small to be drawn, were displayed within a histogram.
Later, I've added some labels.
And my latest version uses the JSON metadata available in each objet to display the spliced structure of the genes:
The browser is fast (sorry, I cannot show it at this time) but I need to play with the config of BerkeleyDB to speed up the insertions and reduce the size of the database.

That's it.
Pierre

NB: The figures of this post were created using SVGToCanvas.