Showing posts with label spreadsheet. Show all posts
Showing posts with label spreadsheet. Show all posts

13 November 2012

Creating a virtual RDF graph describing a set of OpenOffice spreadsheets with Apache Jena and Fuseki

In the current post, I will use the Jena API for RDF to implement a virtual RDF graph describing the content of a set of openoffice/libreoffice spreasheets.

Fact: An openoffice file (*.ods) is a Zip file

An openoffice file is nothing but a zip file:
$ unzip -t jeter.ods 
Archive:  jeter.ods
    testing: mimetype                 OK
    testing: meta.xml                 OK
    testing: settings.xml             OK
    testing: content.xml              OK
    testing: Thumbnails/thumbnail.png   OK
    testing: Configurations2/images/Bitmaps/   OK
    testing: Configurations2/popupmenu/   OK
    testing: Configurations2/toolpanel/   OK
    testing: Configurations2/statusbar/   OK
    testing: Configurations2/progressbar/   OK
    testing: Configurations2/toolbar/   OK
    testing: Configurations2/menubar/   OK
    testing: Configurations2/accelerator/current.xml   OK
    testing: Configurations2/floater/   OK
    testing: styles.xml               OK
    testing: META-INF/manifest.xml    OK
No errors detected in compressed data of jeter.ods.

The entry content.xml is a XML file describing the tables in the spreadsheet:
$ unzip -c jeter.ods content.xml |\
grep -v Archive |\
grep -v inflating | xmllint --format - |\
head -n 20


<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org/TR/css3-text/" office:version="1.2">
  <office:scripts/>
  <office:font-face-decls>
    <style:font-face style:name="Liberation Sans" svg:font-family="'Liberation Sans'" style:font-family-generic="swiss" style:font-pitch="variable"/>
    <style:font-face style:name="DejaVu Sans" svg:font-family="'DejaVu Sans'" style:font-family-generic="system" style:font-pitch="variable"/>
    <style:font-face style:name="Lohit Hindi" svg:font-family="'Lohit Hindi'" style:font-family-generic="system" style:font-pitch="variable"/>
    <style:font-face style:name="WenQuanYi Micro Hei" svg:font-family="'WenQuanYi Micro Hei'" style:font-family-generic="system" style:font-pitch="variable"/>
  </office:font-face-decls>
  <office:automatic-styles>
    <style:style style:name="co1" style:family="table-column">
      <style:table-column-properties fo:break-before="auto" style:column-width="0.889in"/>
    </style:style>
    <style:style style:name="ro2" style:family="table-row">
      <style:table-row-properties style:row-height="0.178in" fo:break-before="auto" style:use-optimal-row-height="true"/>
    </style:style>
    <style:style style:name="ro3" style:family="table-row">
      <style:table-row-properties style:row-height="0.1681in" fo:break-before="auto" style:use-optimal-row-height="true"/>
    </style:style>
    <style:style style:name="ta1" style:family="table" style:master-page-name="Default">

Fact: Implementing a simple virtual RDF graph with Jena is easy

By virtual I mean that there is no RDFStore, the triples are created on the fly.
Implementing a simple virtual RDF graph with Jena is easy: you simply have to extend the class com.hp.hpl.jena.graph.impl.GraphBase and only implement the method graphBaseFind which returns all the RDF Triples matching a TripleMatch.

(...)
 @Override
    protected ExtendedIterator<Triple> graphBaseFind(TripleMatch matcher)
        {
        return ...;
        }
(...)

The code

My implementation of a RDFGraph for a set of OpenOffice Calc is not effective but it works fine: for each call of graphBaseFind, it creates an "Iterator<Triple>" scanning each content.xml entry of each openoffice file. This iterator creates some new Triples, add them to a list of Triples that will be filtered by the TripleMatcher.

Compilation

the Makefile:
CP=...#path to the jars of JENA/ARQ/etc... e.g: =`find ${ARQ} -name "*.jar" |  | tr "\n" ":"`
.PHONY: all
all:
 javac -cp ${CP} -sourcepath src src/oocalc/OpenOfficeCalcGraph.java
 jar cvf dist/openoffice2rdf.jar -C src .

Querying using sparql

Now that the Graph has been implemented and compiled, one can query it using ARQ, the sparql engine of Jena:

The spreadsheet

I've created the following spreadsheet and saved it in a file named "jeter.ods":
CHROMSTARTENDNAME
chr1100200rs654
chr1150250rs264
chr1200300rs610
chr1250350rs929
chr1300400rs408
chr1350450rs346
chr1400500rs430
chr1450550rs735
chr1500600rs575
chr1550650rs891
chr1600700rs627
chr1650750rs650
chr1700800rs715
chr1750850rs467
chr1800900rs882
chr1850950rs301
chr19001000rs643
chr19501050rs246
chr110001100rs178
chr110501150rs928
chr111001200rs213

The sparql query

The following SPARQL returns the informations about the cells in the 3rd row of the spreadsheet:


Invoke:
java -cp `find /home/lindenb/.ivy2/cache -name "*.jar" | tr "\n" ":"`:dist/openoffice2rdf.jar  \
 oocalc.OpenOfficeCalcGraph test.sparql /home/lindenb/jeter.ods

Result:
-----------------------------------------------------------------------------------------------------------------------------------
| s                                       | p                                                 | o                                 |
===================================================================================================================================
| <file:/home/lindenb/jeter.ods/t1/y3/x1> | office:table                                      | <file:/home/lindenb/jeter.ods/t1> |
| <file:/home/lindenb/jeter.ods/t1/y3/x1> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | office:Cell                       |
| <file:/home/lindenb/jeter.ods/t1/y3/x1> | office:X                                          | "1"^^xsd:int                      |
| <file:/home/lindenb/jeter.ods/t1/y3/x1> | office:Y                                          | "3"^^xsd:int                      |
| <file:/home/lindenb/jeter.ods/t1/y3/x1> | office:value                                      | "chr1"                            |
| <file:/home/lindenb/jeter.ods/t1/y3/x2> | office:table                                      | <file:/home/lindenb/jeter.ods/t1> |
| <file:/home/lindenb/jeter.ods/t1/y3/x2> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | office:Cell                       |
| <file:/home/lindenb/jeter.ods/t1/y3/x2> | office:X                                          | "2"^^xsd:int                      |
| <file:/home/lindenb/jeter.ods/t1/y3/x2> | office:Y                                          | "3"^^xsd:int                      |
| <file:/home/lindenb/jeter.ods/t1/y3/x2> | office:value                                      | "150"^^xsd:float                  |
| <file:/home/lindenb/jeter.ods/t1/y3/x3> | office:table                                      | <file:/home/lindenb/jeter.ods/t1> |
| <file:/home/lindenb/jeter.ods/t1/y3/x3> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | office:Cell                       |
| <file:/home/lindenb/jeter.ods/t1/y3/x3> | office:X                                          | "3"^^xsd:int                      |
| <file:/home/lindenb/jeter.ods/t1/y3/x3> | office:Y                                          | "3"^^xsd:int                      |
| <file:/home/lindenb/jeter.ods/t1/y3/x3> | office:value                                      | "250"^^xsd:float                  |
| <file:/home/lindenb/jeter.ods/t1/y3/x4> | office:table                                      | <file:/home/lindenb/jeter.ods/t1> |
| <file:/home/lindenb/jeter.ods/t1/y3/x4> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | office:Cell                       |
| <file:/home/lindenb/jeter.ods/t1/y3/x4> | office:X                                          | "4"^^xsd:int                      |
| <file:/home/lindenb/jeter.ods/t1/y3/x4> | office:Y                                          | "3"^^xsd:int                      |
| <file:/home/lindenb/jeter.ods/t1/y3/x4> | office:value                                      | "rs264"                           |
-----------------------------------------------------------------------------------------------------------------------------------

Serving the OpenOffice spreadsheets as RDF over HTTP

Fuseki is a SPARQL server. It provides REST-style SPARQL HTTP Update, SPARQL Query, and SPARQL Update using the SPARQL protocol over HTTP. We're going to deploy the OpenOfficeCalcGraph in Fuseki to query a set of OpenOffice files.

Download an install Fuseki

wget https://repository.apache.org/content/repositories/releases/org/apache/jena/jena-fuseki/0.2.5/jena-fuseki-0.2.5-distribution.tar.gz
tar xfz jena-fuseki-0.2.5-distribution.tar.gz
rm jena-fuseki-0.2.5-distribution.tar.gz

Tell Fuseki about our OpenOfficeCalcGraph

We need to create a config file for Fuseki. That was the most complicated part as the process is not clearly documented:

The line:
[] ja:loadClass "oocalc.OpenOfficeCalcGraph" .
loads the class oocalc.OpenOfficeCalcGraph. The class OpenOfficeCalcGraph contains a static initialisation method:
(...)
static { init() ; }
    private static void init()
        {
        (...)
In this static method, a Jena Assembler for OpenOfficeCalcGraph is registered under the resource named: "http://rdf.lindenb.org/build".
public static OpenOfficeAssembler assembler = new OpenOfficeAssembler();
(...)
private static final Resource buildRsrc=ResourceFactory.createResource(NS+"build");
(...)
Assembler.general.implementWith(buildRsrc,assembler);
(...)
An Assembler configures a Graph from a RDF config file. In our example, the config contains the path to the OpenOffice spreadsheets:
<#ooservice> rdf:type openoffice:build ;
    openoffice:file "/home/lindenb/jeter.ods" ;
    openoffice:file "/home/lindenb/jeter2.ods" ;
.
This config is read in the Assembler:
public static class OpenOfficeAssembler extends AssemblerBase implements Assembler
      {
      @Override
      public Object open( Assembler a, Resource root, Mode mode )
            {
            Property fileRsrc=ResourceFactory.createProperty(NS+"file");
            //read the configuration an get the files
            List<File> files=new ArrayList<File>();
            StmtIterator iter=root.listProperties(fileRsrc);
     (...)

Start Fuseki with the config file:

$ cd jena-fuseki-0.2.5
$ java -cp fuseki-server.jar:/path/to/openoffice2rdf.jar  org.apache.jena.fuseki.FusekiCmd \
    --debug  -v --config /path/to/openoffice.ttl
14:11:50 INFO  Config               :: Configuration file: ../openoffice.ttl
14:11:50 INFO  Config               :: Service: :service1
14:11:50 INFO  Config               ::   name = ds
14:11:50 INFO  Config               ::   query = /ds/query
14:11:50 INFO  Config               ::   query = /ds/sparql
14:11:50 INFO  Config               ::   update = /ds/update
14:11:50 INFO  Config               ::   upload = /ds/upload
14:11:50 INFO  Config               ::   graphStore(RW) = /ds/data
14:11:50 INFO  Config               ::   graphStore(R) = /ds/get
14:11:50 INFO  ooffice2rdf          :: Calling OpenOfficeCalcGraph init
14:11:50 INFO  Config               :: Service: OpenOffice Service (R)
14:11:50 INFO  Config               ::   name = openoffice
14:11:50 INFO  Config               ::   query = /openoffice/sparql
14:11:50 INFO  Config               ::   query = /openoffice/query
14:11:50 INFO  Config               ::   update = /openoffice/update
14:11:50 INFO  Config               ::   graphStore(R) = /openoffice/get
14:11:50 INFO  Config               ::   graphStore(R) = /openoffice/data
14:11:51 INFO  Server               :: Dataset path = /ds
14:11:51 INFO  Server               :: Dataset path = /openoffice
14:11:51 INFO  Server               :: Fuseki 0.2.5 2012-10-20T17:03:29+0100
14:11:51 INFO  Server               :: Started 2012/11/13 14:11:51 CET on port 3030
Open your browser at http://localhost:3030, select the control panel at http://localhost:3030/control-panel.tpl and select /openoffice:
Fuseki Control Panel
Dataset:

The following form is displayed:
SPARQL Query




Output:


XSLT style sheet (blank for none):




Force the accept header to text/plain regardless



You can now copy, paste and run the previous sparql query:
--------------------------------------------------------------------------------------------------------------------------------------------------
| s                                        | p                                                 | o                                               |
==================================================================================================================================================
| <file:/home/lindenb/jeter.ods/t1/y3/x1>  | <http://rdf.lindenb.org/table>                    | <file:/home/lindenb/jeter.ods/t1>               |
| <file:/home/lindenb/jeter.ods/t1/y3/x1>  | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://rdf.lindenb.org/Cell>                   |
| <file:/home/lindenb/jeter.ods/t1/y3/x1>  | <http://rdf.lindenb.org/X>                        | "1"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter.ods/t1/y3/x1>  | <http://rdf.lindenb.org/Y>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter.ods/t1/y3/x1>  | <http://rdf.lindenb.org/value>                    | "chr1"                                          |
| <file:/home/lindenb/jeter.ods/t1/y3/x2>  | <http://rdf.lindenb.org/table>                    | <file:/home/lindenb/jeter.ods/t1>               |
| <file:/home/lindenb/jeter.ods/t1/y3/x2>  | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://rdf.lindenb.org/Cell>                   |
| <file:/home/lindenb/jeter.ods/t1/y3/x2>  | <http://rdf.lindenb.org/X>                        | "2"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter.ods/t1/y3/x2>  | <http://rdf.lindenb.org/Y>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter.ods/t1/y3/x2>  | <http://rdf.lindenb.org/value>                    | "150"^^<http://www.w3.org/2001/XMLSchema#float> |
| <file:/home/lindenb/jeter.ods/t1/y3/x3>  | <http://rdf.lindenb.org/table>                    | <file:/home/lindenb/jeter.ods/t1>               |
| <file:/home/lindenb/jeter.ods/t1/y3/x3>  | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://rdf.lindenb.org/Cell>                   |
| <file:/home/lindenb/jeter.ods/t1/y3/x3>  | <http://rdf.lindenb.org/X>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter.ods/t1/y3/x3>  | <http://rdf.lindenb.org/Y>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter.ods/t1/y3/x3>  | <http://rdf.lindenb.org/value>                    | "250"^^<http://www.w3.org/2001/XMLSchema#float> |
| <file:/home/lindenb/jeter.ods/t1/y3/x4>  | <http://rdf.lindenb.org/table>                    | <file:/home/lindenb/jeter.ods/t1>               |
| <file:/home/lindenb/jeter.ods/t1/y3/x4>  | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://rdf.lindenb.org/Cell>                   |
| <file:/home/lindenb/jeter.ods/t1/y3/x4>  | <http://rdf.lindenb.org/X>                        | "4"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter.ods/t1/y3/x4>  | <http://rdf.lindenb.org/Y>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter.ods/t1/y3/x4>  | <http://rdf.lindenb.org/value>                    | "rs264"                                         |
| <file:/home/lindenb/jeter2.ods/t1/y3/x1> | <http://rdf.lindenb.org/table>                    | <file:/home/lindenb/jeter2.ods/t1>              |
| <file:/home/lindenb/jeter2.ods/t1/y3/x1> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://rdf.lindenb.org/Cell>                   |
| <file:/home/lindenb/jeter2.ods/t1/y3/x1> | <http://rdf.lindenb.org/X>                        | "1"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter2.ods/t1/y3/x1> | <http://rdf.lindenb.org/Y>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter2.ods/t1/y3/x1> | <http://rdf.lindenb.org/value>                    | "1"^^<http://www.w3.org/2001/XMLSchema#float>   |
| <file:/home/lindenb/jeter2.ods/t1/y3/x2> | <http://rdf.lindenb.org/table>                    | <file:/home/lindenb/jeter2.od
| <file:/home/lindenb/jeter2.ods/t1/y3/x2> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://rdf.lindenb.org/Cell>                   |
| <file:/home/lindenb/jeter2.ods/t1/y3/x2> | <http://rdf.lindenb.org/X>                        | "2"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter2.ods/t1/y3/x2> | <http://rdf.lindenb.org/Y>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter2.ods/t1/y3/x2> | <http://rdf.lindenb.org/value>                    | "2"^^<http://www.w3.org/2001/XMLSchema#float>   |
| <file:/home/lindenb/jeter2.ods/t1/y3/x3> | <http://rdf.lindenb.org/table>                    | <file:/home/lindenb/jeter2.ods/t1>              |
| <file:/home/lindenb/jeter2.ods/t1/y3/x3> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://rdf.lindenb.org/Cell>                   |
| <file:/home/lindenb/jeter2.ods/t1/y3/x3> | <http://rdf.lindenb.org/X>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter2.ods/t1/y3/x3> | <http://rdf.lindenb.org/Y>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter2.ods/t1/y3/x3> | <http://rdf.lindenb.org/value>                    | "3"^^<http://www.w3.org/2001/XMLSchema#float>   |
| <file:/home/lindenb/jeter2.ods/t1/y3/x4> | <http://rdf.lindenb.org/table>                    | <file:/home/lindenb/jeter2.ods/t1>              |
| <file:/home/lindenb/jeter2.ods/t1/y3/x4> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://rdf.lindenb.org/Cell>                   |
| <file:/home/lindenb/jeter2.ods/t1/y3/x4> | <http://rdf.lindenb.org/X>                        | "4"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter2.ods/t1/y3/x4> | <http://rdf.lindenb.org/Y>                        | "3"^^<http://www.w3.org/2001/XMLSchema#int>     |
| <file:/home/lindenb/jeter2.ods/t1/y3/x4> | <http://rdf.lindenb.org/value>                    | "4"^^<http://www.w3.org/2001/XMLSchema#float>   |
--------------------------------------------------------------------------------------------------------------------------------------------------

That's it,

Pierre

13 September 2012

Translating a DNA sequence in Google Spreadsheet using #GoogleAppScript

Google has released Google App Script.
"Google Apps Script is a JavaScript cloud scripting language that lets you extend Google Apps and build web applications. Scripts are developed in Google Apps Script’s browser-based script editor, and they are stored in and run from Google's servers.
Google Apps Script is very versatile. Here are some examples of things you can do with Google Apps Script":
  • Build custom functions in a Google Spreadsheet
  • Extend certain Google Apps products by creating custom menus linked to scripts
  • Create and publish web applications, which can run on their own or embedded within a Google Site
  • Schedule tasks like report creation and distribution and run them on a custom schedule
  • Automate workflows such as document or expense approvals, order fulfillment, time-tracking, and more

In the current post, I'll show how to create a custom javascript function that will

Translate a
DNA
to a
Protein
into a
Google Spreadsheet

Create a new Google Spreadhseet. Open the menu "Tools" > "Script Manager...". Click on New...

Click on "Create Blank Project".

A new editor is opened. Copy the following javascript code (https://gist.github.com/3716137) into the editor. Save the javascript projet.

Close the script, go back to the spreasheet. You can now use your new function =translateDNA(dna):

That's it,

Pierre

17 December 2008

Putting semantics in the spreadsheets.

Just a few ideas

I've been recently asked to find a way to store a set of heterogeneous files ( pedigrees, linkage, results of unix pipelines.... ). My first idea was to upload the file in a wiki and to append some well choosen categories to then easily retrieve the file later. I also imagined to use a Template to create a form where the user would add some semi-structured annotations (see my test on openwetware.org here ).

But, of course, the users want always more. There must be a Murphy's law for this....

Now I should create a robot that could find any file of a given type (say a linkage file) containing a given information (say a snp defined by its rs-id). So I've started to create a set of two RDFS-based ontologies that could be used to describe what is this file about (e.g. File -> Plain Text -> Tab-Delimited -> Pedigree) , and what are the columns about (e.g. xsd:string -> biological entity -> genetic marker -> snp -> rs-id ). A robot would then be able to identify and parse the files and , for example, would find the columns containing "SNP" or "Microsattelite" if I ask for the columns containing a 'Genetic Marker' "
The two drafts are available here:

  • http://code.google.com/p/fileontology/source/browse/trunk/files/ont/columns.rdf
  • http://code.google.com/p/fileontology/source/browse/trunk/files/ont/files.rdf
  • .

    I don't know if this idea has already been implemented elsewhere. Nevertheless Frank Gibson suggested me to have a look at Information-artifact-ontology: The Information Artifact Ontology (IAO) is a new ontology of information entities, originally driven by work by the OBI digital entity and realizable information entity branch.. Lots of information here...

    I'm still exploring this subject.


    Pierre

    11 December 2008

    Random notes 2008-12:

    Genetic Algorithm


    Evolution of Charles Darwin. I've implemented my own version of the Genetic Algorithm described by Roger Alsing in his blog ( http://rogeralsing.com/2008/12/07/genetic-programming-evolution-of-mona-lisa ). This algorithm finds the best set of colored triangles that could be used to re-create an original image.



    On the left : the original image (via wikipdia), on the right the current image generated by the genetic algorithm at generation 240 (population:20 individuals of 50 triangles). My algorithm is currently running .
    The source is available here: http://tinyurl.com/57xaeb
    A short doc is available here: http://code.google.com/p/lindenb/wiki/GAMonaLisa
    I've also uploaded an executable jar here: http://code.google.com/p/lindenb/downloads/list

    Workbench


    I've uploaded a beta version of a spreadsheet-like program that I wrote for the people of my lab.
    It was designed to help people with handling large tables in a rich graphical environment. It currently performs a few tasks that are common under unix. For example, it can finds the information about a column of SNP and I've implemented a grep/awk function filtering the rows with a simple javascript expression.The data are stored with the help of the Java berkeleyDB API to create an index of each row in a table.


    This screenshot is a java JTable displaying the hapmap genotypes for chr1/build36/CEU. The size of the original file is 146Mo

    The tool is available as a java webstart application. See http://code.google.com/p/cephlib/wiki/Workbench.

    Wiki


    I've done a presentation on how to use a wiki in a lab. Used both OWW and wikipedia. I showed them how to edit/follow/track a page ( http://tinyurl.com/6ejw35), how to create/discuss a page with templates and categories ( http://tinyurl.com/5l5bw5 ), how files can be uploaded in a wiki and commented ( http://tinyurl.com/5ouc7y ). A demo of the wikipedia API ( http://tinyurl.com/2dp5r4 ).
    People were then interested in storing+annotating (linkage) files in a wiki.

    FiendFeed


    Thank you to all the crowd in FriendFeed. Really motivating.


    Pierre