Playing with the Wordle algorithm: a tag cloud of Mesh Terms
The paper describing Wordle has been recently published (http://www.research.ibm.com/visual/papers/wordle_final2.pdf ). The algorithm was briefly described: “The most distinctive geometric aspect of a Wordle is the layout algorithm, which packs words to make efficient use of space. While many space-filling visualizations exist, they typically work by recursiveLayout proceeds according to this pseudocode:
sort words by weight, decreasing
for each word w:
w.position := makeInitialPosition(w);
while w intersects other words:
updatePosition(w);
The two key procedures here are "makeInitialPosition" and "updatePosition". The makeInitialPosition routine picks a point at random according to a distribution that takes into account the desired overall shape, and, if desired, alphabetical order. The updatePosition routine moves the word on a spiral of increasing radius, radiating from the word's starting position. The updatePosition routine is aware of constraints on the overall shape of the Wordle. Constraining the layout to a rectangular shape causes updatePosition to prefer positions inside of the strict boundaries of the playing field; a blobby overall shape accepts boundary violations. The rectangular constraint is relaxed when the spiral radius exceeds either playing field dimension. ”
Just for fun , I've implemented my own version of the Wordle Alogrithm. The Java code is available on github at http://github.com/lindenb/jsandbox/blob/master/src/sandbox/MyWordle.java. I won't describe the program here, but I'll just say that the code invokes a java.awt.font.TextLayout class to get the shape of the text:
FontRenderContext frc = g.getFontRenderContext();
Font font=new Font("Dialog",Font.BOLD,fontSize);
TextLayout textLayout=new TextLayout(w.getText(), font, frc);
Shape shape=textLayout.getOutline(null);
Ok, let's test this code. FIrst I'm going to dump a pubmed query as XML with another simple tool named PubmedDump. This XML file is then parsed with a javascript program called from the SAX parser saxstream.jar (previously described here):
mesh.js
importPackage(Packages.java.io);
importPackage(Packages.java.awt);
var content=null;
var mesh2count={};
function startElement(uri,localName,name,atts)
{
if(name=="DescriptorName")
{
content="";
}
}
function characters(s)
{
if(content!=null) content+=s;
}
function endElement(uri,localName,name)
{
if(content!=null)
{
var count=mesh2count[content];
if(count===undefined) count=0;
mesh2count[content]=count+1;
}
content=null;
}
function endDocument()
{
var w= new MyWordle();
for(var s in mesh2count)
{
var word= new MyWordle.Word(s,mesh2count[s]);
w.add(word);
}
w.setUseArea(true);/* use shape area instead of bounding boxes */
w.setAllowRotate(true);
w.setSortType(1);/* sort by weight */
w.doLayout();
var f=new File("result.svg");
w.saveAsSVG(f);
}
Invocation
Here is an example for the query "Rotavirus NSP3 NSP1":
java -cp mywordle.jar:saxscript.jar org.lindenb.tinytools.SAXScript -n -f mesh.js
Result
That's it,
Pierre