An SWL is a hyperlink on Wikipedia that allows the editor to explicitly specify the type of relationship between the concept described on the page being edited and the concept that is being linked to (http://en.wikipedia.org/wiki/Template:SWL). These SWLs are implemented using MediaWiki templates. (...) any programmer can now write computer programs to parse Wikipedia content for SWLs and import them into third-party tools (e.g. triplestores, etc.)
The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase ({{SWL|type=substrate_for|target=protein kinase A|label=PKA}}) in cardiac muscle.
Using Entrez-Ajax (Loman & al.) and the Wikipedia API, I wrote a HTML+JS interface to accelerate the creation of a semantic SWL wiki-text from a PUBMED-id:
via wikipedia: The Hershey fonts are a collection of vector fonts developed circa 1967 by Dr. A. V. Hershey (...). Vector fonts are easily scaled and rotated in two or three dimensions; consequently the Hershey fonts have been widely used in computer graphics and computer-aided design programs.. When programming, I often have to fit a sentence in a rectangle (for example to write the name of a short-read in the graphical view of a BAM) so I wrote a XML version of the hershey font.
Now, we can convert the XML to whatever we want using XSLT. I wrote a stylesheet picardmetrics2json.xsl converting the XML to JSON (though, I should escape the quotes in the strings ).
I've created a lightweight CGI-based web-application for samtools tview. This C++ program named ngsproject.cgi uses the samtools api, it allows any user to visualize all the alignments in a given NGS project. The projects and their BAMS are defined on the server side using a simple XML document. e.g:
In the current post, I'm using the new HTML5 File Api. This new API can read the content of a file on the client side without needing a remote server. Let me repeat this:
YOU DO NOT NEED A SERVER YOU DO NOT NEED TO COPY AND PASTE THE CONTENT OF THE FILE IN A TEXTAREA
. As an example, the following code reads a whole DNA fasta file stored on your computer and translate each DNA sequence to a protein. When the user selects a new file, a FileReader object is created and a callback function translating the DNA is invoked when the fasta file has been loaded.
I wrote a XSLT stylesheet for the following question on Biostar: I'd like to create an HTML file (from the XML file and XSL stylesheet) similar to what It can be achieved when we performed a BLAST search on the NCBI server.
<BlastOutput> <BlastOutput_program>blastp</BlastOutput_program> <BlastOutput_version>BLASTP 2.2.25+</BlastOutput_version> <BlastOutput_reference>Alejandro A. Schäffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005.</BlastOutput_reference> <BlastOutput_db>N/A</BlastOutput_db> <BlastOutput_query-ID>gi|187956781|gb|AAI40897.1|</BlastOutput_query-ID> <BlastOutput_query-def>EIF4G1 protein [Homo sapiens]</BlastOutput_query-def> <BlastOutput_query-len>1606</BlastOutput_query-len> <BlastOutput_param> <Parameters> <Parameters_matrix>BLOSUM62</Parameters_matrix> <Parameters_expect>10</Parameters_expect> <Parameters_gap-open>11</Parameters_gap-open> <Parameters_gap-extend>1</Parameters_gap-extend> <Parameters_filter>F</Parameters_filter> </Parameters> </BlastOutput_param> <BlastOutput_iterations> <Iteration> <Iteration_iter-num>1</Iteration_iter-num> <Iteration_query-ID>gi|187956781|gb|AAI40897.1|</Iteration_query-ID> <Iteration_query-def>EIF4G1 protein [Homo sapiens]</Iteration_query-def> <Iteration_query-len>1606</Iteration_query-len> <Iteration_hits> <Hit> <Hit_num>1</Hit_num> <Hit_id>gi|293340930|ref|XP_002724789.1|</Hit_id> <Hit_def>PREDICTED: eukaryotic translation initiation factor 4 gamma, 1 isoform 2 [Rattus norvegicus] >gi|293352298|ref|XP_002727969.1| PREDICTED: eukaryotic translation initiation factor 4, gamma 1 isoform 1 [Rattus norvegicus]</Hit_def> <Hit_accession>XP_002727969</Hit_accession> <Hit_len>1584</Hit_len> <Hit_hsps> <Hsp> <Hsp_num>1</Hsp_num> <Hsp_bit-score>2715.64</Hsp_bit-score> <Hsp_score>7038</Hsp_score> <Hsp_evalue>0</Hsp_evalue> <Hsp_query-from>1</Hsp_query-from> <Hsp_query-to>1606</Hsp_query-to> <Hsp_hit-from>1</Hsp_hit-from> <Hsp_hit-to>1584</Hsp_hit-to> <Hsp_query-frame>0</Hsp_query-frame> <Hsp_hit-frame>0</Hsp_hit-frame> <Hsp_identity>1450</Hsp_identity> <Hsp_positive>1450</Hsp_positive> <Hsp_gaps>36</Hsp_gaps> <Hsp_align-len>1613</Hsp_align-len> <Hsp_qseq>MNKAPQSTGPPPAPSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQGGFRSLQHFYPSRAQPPSSAASRVQSAAPARPGPAAHVYPAGSQVMMIPSQISYPASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQGVQQFPTGVAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGGLEPQANGETPQVAVIVRPDDRSQGAIIADRPGLPGPEHSP-SESQPSSPSPTPSPSPVLEPGSEPNLAVLSIPGDTMTT--IQMSVEESTPISRETGEPYRLSPEPTPLAEPILEVEVTLSKPVPESEFSSSPLQAPTPLASHTVEIHEPNGMVPSEDLEPEVESSPELAPPP--ACPSESPVPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEV-TASMAPPTIPSATPATAPSATSPAQEEEMEEEEEEEEGEAGEAGEAESEKGGEELLPPESTPIPANLSQNLEAAAATQVAVSVPKRRRKIKELNKKEAVGDLLDAFKEANPAVPEVENQPPAGSNPGPESEGSGVPPRPEEADETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHISDVVLDKANKTPLRPLDPTRLQGINCGPDFTPSFANLGRTTLSTRGPPRGGPGGELPRGPAGLGPRRSQQGPRKEPRKIIATVLMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRGSNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKGSDKRRGGPPGPPISRGLPLVDDGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDAASEAARPATSTLNRFSALQQAVPTESTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTEDRDRGRDAVKREAALPPVSPLKAALSEEELEKKSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRHGVESTLERSAIAREHMGQLLHQLLCAGHLSTAQYYQGLYEILELAEDMEIDIPHVWLYLAELVTPILQEGGVPMGELFREITKPLRPLGKAASLLLEILGLLCKSMGPKKVGTLWREAGLSWKEFLPEGQDIGAFVAEQKVEYTLGEESEAPGQRALPSEELNRQLEKLLKEGSSNQRVFDWIEANLSEQQIVSNTLVRALMTAVCYSAIIFETPLRVDVAVLKARAKLLQKYLCDEQKELQALYALQALVVTLEQPPNLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFKWLREAE-EESDHN</Hsp_qseq> <Hsp_hseq>MNKAPQPTGPPPARSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ-------HFYPSRAQPPSSAASRVQSAAPARPGPAPHVYPAGSQVMMIPSQISYSASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQSVQQFPASVAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGSLEPQPNGESPQVAVIIRPDDRSQGAAIGGRPGLPGPEHSPGTESQPSSPSPTPSPPPILEPGSESNLGVLSIPGDTMTTGMIPISVEESTPISCESGEPYCLSPEPT-LAEPILEVEVTLSKPIPESEFSSSPLQVSTSLVPHRAETHEPNGVIPSEDLEPEVESSTEPAPPPLSACASESLVPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEVPSAALA--SIVSPTPPVAPSDTSAAQEEEIEED-------EDEDGEAESEKGGEDL-PLDSTPVPAQLSQNLEVAAAPQVAVSVPKRRRKIKELNKKEAVGDLLDAFKEVDPAVPEVENQPPTGSNPSPESEGSAALPQPEEAEETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHITDVVLDKANKTPLRSLDPSRLPGINCGPDFTPSFANLGRPTLSSRGPPRGGPGGELPRGPAGLGPRRSQQGPRKETRKIISSVIMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDFAKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRQSNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKGGDKRRGGPPGPP-------VNDGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDTASEATRPA--TLNRFSALQQTLPVENTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTE--DRGRDPVKREATLPPVSPPKAALAVDEVERKSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRLGIESTLERSTIAREHMGRLLHQLLCAGHLSTAQYYQGLYETLELAEDMEIDIPHVWLYLAELITPILQEDGVPMGELFREITKPLRPMGKATSLLLEILGLLCKSMGPKKVGMLWREAGLSWREFLAEGQDVGSFVAEKKVEYTLGEESEAPGQRALAFEELRRQLEKLLKDGGSNQRVFDWIEANLNEQQIASNTLVRALMTTVCYSAIIFETPLRVDVQVLKVRARLLQKYLSDEQKELQALYALQALVVTLEQPANLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFNWLREAEDEESDHN</Hsp_hseq> <Hsp_midline>MNKAPQ TGPPPA SPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ HFYPSRAQPPSSAASRVQSAAPARPGPA HVYPAGSQVMMIPSQISY ASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQ VQQFP VAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGG LEPQ NGE PQVAVI RPDDRSQGA I RPGLPGPEHSP ESQPSSPSPTPSP P LEPGSE NL VLSIPGDTMTT I SVEESTPIS E GEPY LSPEPT LAEPILEVEVTLSKP PESEFSSSPLQ T L H E HEPNG PSEDLEPEVESS E APPP AC SES VPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEV A A I S TP APS TS AQEEE EE E GEAESEKGGE L P STP PA LSQNLE AAA QVAVSVPKRRRKIKELNKKEAVGDLLDAFKE PAVPEVENQPP GSNP PESEGS P PEEA ETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHI DVVLDKANKTPLR LDP RL GINCGPDFTPSFANLGR TLS RGPPRGGPGGELPRGPAGLGPRRSQQGPRKE RKII V MTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDF KAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLR SNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKG DKRRGGPPGPP V DGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSD ASEA RPA TLNRFSALQQ P E TDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTE DRGRD VKREA LPPVSP KAAL E E KSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVR G ESTLERS IAREHMG LLHQLLCAGHLSTAQYYQGLYE LELAEDMEIDIPHVWLYLAEL TPILQE GVPMGELFREITKPLRP GKA SLLLEILGLLCKSMGPKKVG LWREAGLSW EFL EGQD G FVAE KVEYTLGEESEAPGQRAL EEL RQLEKLLK G SNQRVFDWIEANL EQQI SNTLVRALMT VCYSAIIFETPLRVDV VLK RA LLQKYL DEQKELQALYALQALVVTLEQP NLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFF WLREAE EESDHN</Hsp_midline> </Hsp> </Hit_hsps> </Hit> </Iteration_hits> <Iteration_stat> <Statistics> <Statistics_db-num>0</Statistics_db-num> <Statistics_db-len>0</Statistics_db-len> <Statistics_hsp-len>0</Statistics_hsp-len> <Statistics_eff-space>0</Statistics_eff-space> <Statistics_kappa>-1</Statistics_kappa> <Statistics_lambda>-1</Statistics_lambda> <Statistics_entropy>-1</Statistics_entropy> </Statistics> </Iteration_stat> </Iteration> </BlastOutput_iterations> </BlastOutput>
Query 241 DRSQGAIIADRPGLPGPEHSP-SESQPSSPSPTPSPSPVLEPGSEPNLAVLSIPGDTMTT 299 DRSQGA I RPGLPGPEHSP ESQPSSPSPTPSP P LEPGSE NL VLSIPGDTMTT Sbjct 234 DRSQGAAIGGRPGLPGPEHSPGTESQPSSPSPTPSPPPILEPGSESNLGVLSIPGDTMTT 293
Query 300 --IQMSVEESTPISRETGEPYRLSPEPTPLAEPILEVEVTLSKPVPESEFSSSPLQAPTP 357 I SVEESTPIS E GEPY LSPEPT LAEPILEVEVTLSKP PESEFSSSPLQ T Sbjct 294 GMIPISVEESTPISCESGEPYCLSPEPT-LAEPILEVEVTLSKPIPESEFSSSPLQVSTS 352
Query 358 LASHTVEIHEPNGMVPSEDLEPEVESSPELAPPP--ACPSESPVPIAPTAQPEELLNGAP 415 L H E HEPNG PSEDLEPEVESS E APPP AC SES VPIAPTAQPEELLNGAP Sbjct 353 LVPHRAETHEPNGVIPSEDLEPEVESSTEPAPPPLSACASESLVPIAPTAQPEELLNGAP 412
Query 416 SPPAVDLSPVSEPEEQAKEV-TASMAPPTIPSATPATAPSATSPAQEEEMEEEEEEEEGE 474 SPPAVDLSPVSEPEEQAKEV A A I S TP APS TS AQEEE EE Sbjct 413 SPPAVDLSPVSEPEEQAKEVPSAALA--SIVSPTPPVAPSDTSAAQEEEIEED------- 463
Query 475 AGEAGEAESEKGGEELLPPESTPIPANLSQNLEAAAATQVAVSVPKRRRKIKELNKKEAV 534 E GEAESEKGGE L P STP PA LSQNLE AAA QVAVSVPKRRRKIKELNKKEAV Sbjct 464 EDEDGEAESEKGGEDL-PLDSTPVPAQLSQNLEVAAAPQVAVSVPKRRRKIKELNKKEAV 522
Today, we had a lecture about the "human induced pluripotent stem cells", presented by John De Vos. He introduced Amazonia, a free web atlas that allows an easy query of public human transcriptome data. Although there is no web service (REST/SOAP) to access this data, I was interested in getting some profiles of expression from this database as it is something I've failed to achieve with NCBI/GEO.
I wrote the following java scraper:
Line 84: we search for a gene name
88: if there is a http redirection, the gene has been found
96: the HTML page is downloaded
100-112: fix the HTML to create a valid XML document
133: transform the HTML page to a DOM document
135-151: use XPATH to find the images and the labels
Today I wrote an extension for mediawiki displaying an HTML <iframe/> to the UCSC Genome Browser. This extension will help my colleagues to annotate some candidate genes threw our local wiki.
This extension handles a new tag <ucsciframe> composed of three required parameters: 'chrom', 'start' and 'end'.
My data are stored in big database and it might take some time before all the data are processed and displayed. So my idea was to call the server with some asynchronous ajax queries, retrieve the chunks of data and display each chunk as soon it is returned by the server as soon as it is available.
The code below is a proof of concept. This code is ugly, I wouldn't code things like this for a real piece of software. As a source of data I've used the snp129 and the knownGene tables of the UCSC stored in a mysql database. The server was implemented using PHP.
Client Side
When the document is loaded, the <canvas> element is resized. A first AJAX query is sent to retrieve an array of density of the SNPs on the human chromosome 1. The JSON response is processed, the maximum number of SNPs is found and each item of this array is displayed on the canvas. After that, a second AJAX query is sent to retrieve the density of the genes.
<html xmlns="http://www.w3.org/1999/xhtml"><head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/> <script><![CDATA[ /** the canvas element */ var canvas = null; /** radius of the canvas */ var radius=500; /** AJAX request */ var httpRequest=null; /** Graphics context */ var g=null; /** length of chrom1 */ var CHR1_LENGTH =248000000.0; /** window length (pb) */ var windowLength=0; /** first track is snp129 */ var database="snp129";
/** ajax callback */ function paintSnps() { if (httpRequest.readyState == 4) { // everything is good, the response is received if (httpRequest.status == 200) { var jsondata=eval("("+httpRequest.responseText+")"); var counts=jsondata.counts; //get the maximum of item var max=0; for(var i=0;i< counts.length;++i) { if(counts[i].count > max) max= counts[i].count*1.0; } var r1= radius/2.0; if(database=="knownGene") { r1+= 2+radius/4.0; } //loop over the items for(var i=0;i< counts.length;++i) { var a1= Math.PI*2.0*i/(1.0*counts.length); var a2= Math.PI*2.0*(i+1)/(1.0*counts.length);
var r2= r1+(counts[i].count/max)*(radius/4.0); //draw the item g.beginPath(); g.moveTo( radius + Math.cos(a1)*r1, radius + Math.sin(a1)*r1); g.lineTo( radius + Math.cos(a1)*r2, radius + Math.sin(a1)*r2); g.lineTo( radius + Math.cos(a2)*r2, radius + Math.sin(a2)*r2); g.lineTo( radius + Math.cos(a2)*r1, radius + Math.sin(a2)*r1); g.stroke(); g.fill(); } //if it was snp, then look for knownGene, change the coors if(database=="snp129") { database="knownGene"; g.fillStyle = "yellow"; g.strokeStyle = "blue"; setTimeout("fetchDB()",100); } } else { //boum!! } } else { // still not ready }
}
/** calls the AJAX request */ function fetchDB() { httpRequest= new XMLHttpRequest(); httpRequest.onreadystatechange = paintSnps; httpRequest.open('GET', 'ucsc.php', true); httpRequest.send("length="+windowLength+"database="+database);
}
/** init document */ function init() { canvas=document.getElementById("genome"); //resize canvas canvas.setAttribute("width",2*radius); canvas.setAttribute("height",2*radius); if (!canvas.getContext) return; g = canvas.getContext('2d'); //paint background var lineargradient = g.createLinearGradient(radius,0,radius,2*radius); lineargradient.addColorStop(0,'white'); lineargradient.addColorStop(1,'black'); g.fillStyle = lineargradient; g.fillRect(0,0,2*radius,2*radius); g.strokeStyle = "black"; g.strokeRect(0,0,2*radius,2*radius); g.fillStyle = "red"; g.strokeStyle = "green";
var perimeter= 2*Math.PI*(radius/2.0); windowLength = Math.round(CHR1_LENGTH/perimeter);
//launch the first ajax request setTimeout("fetchDB()",100); }
$sql="SELECT CAST(ROUND(".$nameStart."/".$length.") AS SIGNED INTEGER )*".$length.",count(*) from ".$database." where ". " chrom=\"chr1\" ". " group by CAST(ROUND(".$nameStart."/".$length.") AS SIGNED INTEGER )*".$length. " order by 1" ;
Firefox 3.5 includes a new CSS property called -moz-transform. The -moz-transform CSS property lets you modify the coordinate space of the CSS visual formatting model. Using it, elements can be translated, rotated, scaled, and skewed as this text..
I've used this new property to draw a 3D histogram: