Showing posts with label mesh. Show all posts
Showing posts with label mesh. Show all posts

11 October 2010

Playing with the Wordle algorithm: a tag cloud of Mesh Terms

The paper describing Wordle has been recently published (http://www.research.ibm.com/visual/papers/wordle_final2.pdf ). The algorithm was briefly described: “The most distinctive geometric aspect of a Wordle is the layout algorithm, which packs words to make efficient use of space. While many space-filling visualizations exist, they typically work by recursiveLayout proceeds according to this pseudocode:

sort words by weight, decreasing
for each word w:
w.position := makeInitialPosition(w);
while w intersects other words:
updatePosition(w);

The two key procedures here are "makeInitialPosition" and "updatePosition". The makeInitialPosition routine picks a point at random according to a distribution that takes into account the desired overall shape, and, if desired, alphabetical order. The updatePosition routine moves the word on a spiral of increasing radius, radiating from the word's starting position. The updatePosition routine is aware of constraints on the overall shape of the Wordle. Constraining the layout to a rectangular shape causes updatePosition to prefer positions inside of the strict boundaries of the playing field; a blobby overall shape accepts boundary violations. The rectangular constraint is relaxed when the spiral radius exceeds either playing field dimension. ”
Your browser does not support the <CANVAS> element !


Just for fun , I've implemented my own version of the Wordle Alogrithm. The Java code is available on github at http://github.com/lindenb/jsandbox/blob/master/src/sandbox/MyWordle.java. I won't describe the program here, but I'll just say that the code invokes a java.awt.font.TextLayout class to get the shape of the text:
Graphics2D g=(...)
FontRenderContext frc = g.getFontRenderContext();
Font font=new Font("Dialog",Font.BOLD,fontSize);
TextLayout textLayout=new TextLayout(w.getText(), font, frc);
Shape shape=textLayout.getOutline(null);
and a java.awt.geom.Area to test if two shapes intersects.

Ok, let's test this code. FIrst I'm going to dump a pubmed query as XML with another simple tool named PubmedDump. This XML file is then parsed with a javascript program called from the SAX parser saxstream.jar (previously described here):
mesh.js
importPackage(Packages.sandbox);
importPackage(Packages.java.io);
importPackage(Packages.java.awt);
var content=null;
var mesh2count={};

function startElement(uri,localName,name,atts)
{
if(name=="DescriptorName")
{
content="";
}
}
function characters(s)
{
if(content!=null) content+=s;
}
function endElement(uri,localName,name)
{
if(content!=null)
{
var count=mesh2count[content];
if(count===undefined) count=0;
mesh2count[content]=count+1;
}
content=null;
}
function endDocument()
{
var w= new MyWordle();
for(var s in mesh2count)
{
var word= new MyWordle.Word(s,mesh2count[s]);
w.add(word);
}
w.setUseArea(true);/* use shape area instead of bounding boxes */
w.setAllowRotate(true);
w.setSortType(1);/* sort by weight */
w.doLayout();

var f=new File("result.svg");
w.saveAsSVG(f);
}
This javascript code counts the occurence of each MESH term (<DescriptorName>), and when the document has been parsed, a new instance of MyWordle class is created, filled and the result is saved to a SVG file.

Invocation


Here is an example for the query "Rotavirus NSP3 NSP1":
java -jar pubmeddump.jar "Rotavirus NSP3 NSP1" |\
java -cp mywordle.jar:saxscript.jar org.lindenb.tinytools.SAXScript -n -f mesh.js


Result





That's it,
Pierre

30 August 2008

Mesh Fequencies & Pubmed Articles

In a recent post on his blog, David Rothman was asked for a 3rd Party PubMed/MEDLINE Tool: What I’d like to do is to be able to enter the PMIDs of several citations and have the tool search MEDLINE via PubMed for the assigned MeSH terms, and return a single list of the terms used by any of the entered citations with a measurement of frequency. For example, if I input PMIDs 16234728, 15674923, and 17443536, the tool would return results telling me that 100% or 3 of 3 use the term “Catheters, Indwelling”, 2 of 3 use “Time Factors,” 1 of the 3 uses “Urination Disorders,” and so on. Although this example uses 3 PMIDs, I’d like to be able to input at least 10, just based on personal experience.

It took less than half an hour to write this tool using java. The source code is available here and an executable jar file is available here.

The input is a standard pubmed query :

java -jar pubmedfrequencies.jar -term "Lindenbaum P[Author]" -n 20 > ~/page.html
or a list of pmid
java -jar pubmedfrequencies.jar -pmid 8985320,16027742,15047801,18053270,9682060 > ~/page.html


The output is a simple html table:











Mesh89853201602774215047801180532709682060
HumansXXX
AnimalsXX
Autistic DisorderXX
(...)
RNA-Binding ProteinsXX
RotavirusXX
SoftwareX
Two-Hybrid System TechniquesX
Zinc FingersX


well... that was not Big Science....

Note: Rajarshi Guha also suggested another solution: http://www.chembiogrid.org/cheminfo/rest/mesh/16234728,15674923,17443536

Pierre


Addendum:After the first comments, I've added a gui support and a count of the mesh terms in the result.