My	
  Seman)cs	
  of	
  “Train”	
  
                                                  <owl:Thing	
  rdf:about="#LevisTrain">	
  
                                                  	
  	
  	
  	
  	
  	
  	
  	
  <rdf:type	
  rdf:resource="#Train"/>	
  
                                                  	
  	
  	
  	
  	
  	
  	
  	
  <rdfs:label>Levi’s	
  Train</rdfs:label>	
  
                                                  	
  	
  	
  	
  	
  	
  	
  	
  <madeOf	
  rdf:resource="#Plas&c"/>	
  
                                                  	
  	
  	
  	
  </owl:Thing>	
  




<owl:Thing	
  rdf:about="#LevisTrain">	
  
	
  	
  	
  	
  	
  	
  	
  	
  <rdf:type	
  rdf:resource="#Train"/>	
  
	
  	
  	
  	
  	
  	
  	
  	
  <rdfs:label>Levi’s	
  Train</rdfs:label>	
  
	
  	
  	
  	
  	
  	
  	
  	
  <madeOf	
  rdf:resource="#Wood"/>	
  
	
  	
  	
  	
  </owl:Thing>	
  
A	
  Usage-­‐dependent	
  Life	
  Cycle	
  


                                               Request	
  to	
  put	
  
  • toy	
  train	
                            away	
  the	
  “train”	
     • toy	
  train	
  
  • made	
  of	
  plas)c	
            • SELECT	
  *	
  WHERE	
  ?t	
       • made	
  of	
  wood	
  
                                        	
  	
  a:madeOf	
  a:Plas)c	
  
                                      • SELECT	
  *	
  WHERE	
  ?t	
  
                                        b:madeOf	
  b:Wood	
  
                                                                                      Nego)ate	
  
           Enter	
  the	
  room	
  
                                                                                    understanding	
  



                                       USAGE	
  
Yet	
  another…	
  
  OTK	
      …	
  
                                       The	
  
 NeOn	
  
                                   Maintenance	
  
        METHONTOLOGY	
              Black	
  Box	
  
DILIGENT	
  



Make	
  it	
  less	
  a	
  methodology	
  but	
  support	
  the	
  
   people	
  to	
  get	
  their	
  “Things”	
  done!	
  
Who	
  is	
  hurt	
  by	
  that?	
  
•  rather	
  small/simple	
  ontologies	
  
   –  min.	
  effort	
  for	
  OE	
  
   –  “under-­‐engineered”	
  
•  unknown	
  user	
  requirements	
  
Hey	
  “LOD	
  people”,	
  do	
  you	
  think	
  that	
  
  ontology	
  engineering	
  maaers?	
  
      Usage-­‐based	
  ontology	
  engineering	
  
Survey	
  covering	
  approx.	
  
                                  25%	
  of	
  all	
  cloud	
  datasets	
  


•  size	
  
•  complexity	
  
•  engineering	
  methodology	
  
•  …	
  
	
  Publishers	
  of	
  75%	
  of	
  the	
  dataset	
  do	
  not	
  feel	
  
                         99%	
  
            responsible	
  for	
  their	
  data?	
  
                                                        Survey	
  ran	
  in	
  October	
  2010	
  
Concrete	
  Example	
  of	
  Usage-­‐based	
  
                  Approach	
  




digging	
  in	
  log	
  files	
  
Usage?	
  

         Request	
  to	
  put	
  
        away	
  the	
  “train”	
  
• SELECT	
  *	
  WHERE	
  ?t	
  
  	
  	
  a:madeOf	
  a:Plas)c	
       Yes*!	
  But	
  beyond?	
  
• SELECT	
  *	
  WHERE	
  ?t	
  
  b:madeOf	
  b:Wood	
  




   USAGE	
                           •  What	
  about	
  the	
  future	
  of	
  SPARQL	
  
                                        endpoints	
  on	
  the	
  WoD?	
  

                                                    *	
  W.r.t.	
  an	
  architecture	
  proposed	
  by	
  a	
  famous	
  “Web-­‐Extremist”	
  
You	
  should	
  have	
  a	
  query	
  endpoint!	
  
                                      Effort Distribution between Publisher and Consumer



      •  You	
  get	
  
Pays-As-You-Go
              something	
  
              valuable	
              Consumer generates/
                                        data mines links

en the data publisher,t	
  
 data integration effort is
              out	
  of	
  i the
                publisher

              which	
  helps	
  
mer and third parties.                        !"#
                                                     Effort
                                          $%&'()) *(+(
                                                  Distribution
                                          ,-+&.'(+"/-
lisher
 s data as RDFyou	
  to	
  play	
            011/'+


              your	
  role	
  
erms from common vocabularies
 s and publishes mappings
                                        Publisher provides
                                               links
                                                                    Links as
                                                                    hints

ties          on	
  the	
  
  p
              WoD!	
  
  pointing at y
         g your data
mappings to the Web
                                      567)"83&'98
                                        011/'+
                                                           23"'4
                                                           5('+:
                                                           011/'+
                                                                               Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)




sumer
                                              ;/-86<&'98
o the rest                                      011/'+
 ta mining techniques for
Usage	
  Analysis	
  
•  queries	
  
    •  paaerns	
  
        •  triples	
  
            •  primi)ves	
  


                       visualize	
  heat	
  
                           maps	
  



                                    zoom	
  in	
  and	
  see	
  
                                        details	
  
Some	
  Results	
  (DBpedia	
  Analysis)	
  
                                                                                                •  ns:Band	
  ns:instrument	
  ?x	
  
                                                                      inconsistent	
            •  ns:Band	
  ns:genre	
  ?y	
  
                                                                          data	
  
                                                                                                •  ns:Band	
  ns:associatedBand	
  ?z	
  




    •  ns:Band	
  ns:knownFor	
  ?x	
  
    •  ns:Band	
  ns:na)onality	
  ?y	
                      missing	
  facts	
  


Complete	
  analysis	
  can	
  be	
  found	
  at	
  hap://page.mi.fu-­‐berlin.de/mluczak/pub/visual-­‐analysis-­‐of-­‐web-­‐of-­‐data-­‐usage-­‐dbpedia33/
Some	
  Thoughts	
  
                                                      about	
  Benefit	
  
                                                 •  usage	
  analysis	
  helps	
  
                                                     to	
  acquire	
  new	
  
                                                     knowledge	
  
                                                        –  links	
  between	
  data	
  
                                                           	
  helps	
  to	
  increase	
  
•  lightweight	
  approach	
                               the	
  quality	
  of	
  data	
  on	
  
   helps	
  to	
  bootstrap	
                              the	
  Web	
  
   linked	
  data	
                                     –  external	
  schema	
  

   It	
  is	
  not	
  necessary	
  to	
  automate	
  everything	
  if	
  the	
  result	
  has	
  
         enough	
  (business)	
  value	
  in	
  a	
  problem	
  domain	
  anyway.	
  	
  
make	
  it	
  less	
  a	
  methodology	
  
                                         •  LOD	
  vocabularies	
  are	
  specific	
  ontologies	
  
                                         •  and	
  need	
  specific	
  life	
  cycle	
  our	
  data	
  
                                            provide	
  (query)	
  access	
  to	
  y support	
  
     T                                   •  endpoint	
  and	
  pcan	
  your	
  to	
  mon	
  the	
  WoD	
  
                                            usage	
  analysis	
   lay	
   help	
  role	
   aintain	
  them	
  
     a                                   •  (and	
  the	
  mplicitly	
  necessary	
  to	
  automate	
  
                                            it	
  is	
  not	
  i data)	
  
     k                                   •  things	
  ahat	
  enable	
  automa)on	
   publisher	
  
                                            this	
  is	
   t	
  benefit	
  for	
  the	
  dataset	
  
     e	
                                    and	
  the	
  Web	
  of	
  data	
  as	
  a	
  whole	
  

     A                 Hey	
  “LOD	
  people”,	
  do	
  you	
  think	
  that	
  
     w                   dataset	
  maintenance	
  maaers?	
  
     a
     y	
  
Markus	
  Luczak-­‐Rösch	
  (luczak@inf.fu-­‐berlin.de)	
  
Freie	
  Universität	
  Berlin,	
  Networked	
  Informa)on	
  Systems	
  (www.ag-­‐nbi.de)	
  
Actual	
  Addi)on	
  
•  “15.500.000	
  people	
  in	
  Germany	
  are	
  not	
  willing	
  
   to	
  use	
  the	
  internet”	
  

   –  emphasis	
  on	
  the	
  ESWC	
  discussion:	
  bridging	
  the	
  
      gap	
  (directly	
  or	
  indirectly)	
  between	
  these	
  people	
  
      and	
  the	
  internet/Web	
  has	
  a	
  high	
  poten)al	
  to	
  
      influence	
  societal	
  transforma)on	
  (they	
  are	
  not	
  
      going	
  to	
  use	
  a	
  browser	
  or	
  an	
  iPhone	
  and	
  they	
  do	
  
      not	
  care	
  for	
  seman)cs)	
  

                                                 Source:	
  ARD-­‐Morgenmagazin,	
  08-­‐07-­‐2011	
  

STI Summit 2011 - Mlr-sm

  • 2.
    My  Seman)cs  of  “Train”   <owl:Thing  rdf:about="#LevisTrain">                  <rdf:type  rdf:resource="#Train"/>                  <rdfs:label>Levi’s  Train</rdfs:label>                  <madeOf  rdf:resource="#Plas&c"/>          </owl:Thing>   <owl:Thing  rdf:about="#LevisTrain">                  <rdf:type  rdf:resource="#Train"/>                  <rdfs:label>Levi’s  Train</rdfs:label>                  <madeOf  rdf:resource="#Wood"/>          </owl:Thing>  
  • 3.
    A  Usage-­‐dependent  Life  Cycle   Request  to  put   • toy  train   away  the  “train”   • toy  train   • made  of  plas)c   • SELECT  *  WHERE  ?t   • made  of  wood      a:madeOf  a:Plas)c   • SELECT  *  WHERE  ?t   b:madeOf  b:Wood   Nego)ate   Enter  the  room   understanding   USAGE  
  • 4.
    Yet  another…   OTK   …   The   NeOn   Maintenance   METHONTOLOGY   Black  Box   DILIGENT   Make  it  less  a  methodology  but  support  the   people  to  get  their  “Things”  done!  
  • 5.
    Who  is  hurt  by  that?   •  rather  small/simple  ontologies   –  min.  effort  for  OE   –  “under-­‐engineered”   •  unknown  user  requirements  
  • 6.
    Hey  “LOD  people”,  do  you  think  that   ontology  engineering  maaers?   Usage-­‐based  ontology  engineering  
  • 7.
    Survey  covering  approx.   25%  of  all  cloud  datasets   •  size   •  complexity   •  engineering  methodology   •  …     Publishers  of  75%  of  the  dataset  do  not  feel   99%   responsible  for  their  data?   Survey  ran  in  October  2010  
  • 8.
    Concrete  Example  of  Usage-­‐based   Approach   digging  in  log  files  
  • 9.
    Usage?   Request  to  put   away  the  “train”   • SELECT  *  WHERE  ?t      a:madeOf  a:Plas)c   Yes*!  But  beyond?   • SELECT  *  WHERE  ?t   b:madeOf  b:Wood   USAGE   •  What  about  the  future  of  SPARQL   endpoints  on  the  WoD?   *  W.r.t.  an  architecture  proposed  by  a  famous  “Web-­‐Extremist”  
  • 10.
    You  should  have  a  query  endpoint!   Effort Distribution between Publisher and Consumer •  You  get   Pays-As-You-Go something   valuable   Consumer generates/ data mines links en the data publisher,t   data integration effort is out  of  i the publisher which  helps   mer and third parties. !"# Effort $%&'()) *(+( Distribution ,-+&.'(+"/- lisher s data as RDFyou  to  play   011/'+ your  role   erms from common vocabularies s and publishes mappings Publisher provides links Links as hints ties on  the   p WoD!   pointing at y g your data mappings to the Web 567)"83&'98 011/'+ 23"'4 5('+: 011/'+ Christian Bizer: Pay-as-you-go Data Integration (21/9/2010) sumer ;/-86<&'98 o the rest 011/'+ ta mining techniques for
  • 11.
    Usage  Analysis   • queries   •  paaerns   •  triples   •  primi)ves   visualize  heat   maps   zoom  in  and  see   details  
  • 12.
    Some  Results  (DBpedia  Analysis)   •  ns:Band  ns:instrument  ?x   inconsistent   •  ns:Band  ns:genre  ?y   data   •  ns:Band  ns:associatedBand  ?z   •  ns:Band  ns:knownFor  ?x   •  ns:Band  ns:na)onality  ?y   missing  facts   Complete  analysis  can  be  found  at  hap://page.mi.fu-­‐berlin.de/mluczak/pub/visual-­‐analysis-­‐of-­‐web-­‐of-­‐data-­‐usage-­‐dbpedia33/
  • 13.
    Some  Thoughts   about  Benefit    •  usage  analysis  helps   to  acquire  new   knowledge   –  links  between  data     helps  to  increase   •  lightweight  approach   the  quality  of  data  on   helps  to  bootstrap   the  Web   linked  data   –  external  schema   It  is  not  necessary  to  automate  everything  if  the  result  has   enough  (business)  value  in  a  problem  domain  anyway.    
  • 14.
    make  it  less  a  methodology   •  LOD  vocabularies  are  specific  ontologies   •  and  need  specific  life  cycle  our  data   provide  (query)  access  to  y support   T •  endpoint  and  pcan  your  to  mon  the  WoD   usage  analysis   lay   help  role   aintain  them   a •  (and  the  mplicitly  necessary  to  automate   it  is  not  i data)   k •  things  ahat  enable  automa)on   publisher   this  is   t  benefit  for  the  dataset   e   and  the  Web  of  data  as  a  whole   A Hey  “LOD  people”,  do  you  think  that   w dataset  maintenance  maaers?   a y   Markus  Luczak-­‐Rösch  ([email protected]­‐berlin.de)   Freie  Universität  Berlin,  Networked  Informa)on  Systems  (www.ag-­‐nbi.de)  
  • 15.
    Actual  Addi)on   • “15.500.000  people  in  Germany  are  not  willing   to  use  the  internet”   –  emphasis  on  the  ESWC  discussion:  bridging  the   gap  (directly  or  indirectly)  between  these  people   and  the  internet/Web  has  a  high  poten)al  to   influence  societal  transforma)on  (they  are  not   going  to  use  a  browser  or  an  iPhone  and  they  do   not  care  for  seman)cs)   Source:  ARD-­‐Morgenmagazin,  08-­‐07-­‐2011