SlideShare a Scribd company logo
Towards an Open Analytics Environment Ian Foster Computation Institute Argonne National Lab & University of Chicago
The Computation Institute A joint institute of Argonne and the University of Chicago, focused on furthering  system-level science  via the development and use of advanced computational   methods. Solutions to many grand challenges facing science    and society today require the analysis and    understanding of entire systems, not just individual    components. They require not reductionist    approaches but the synthesis of knowledge from    multiple levels of a system, whether biological,    physical, or social (or all three). www.ci.uchicago.edu Faculty, fellows, staff, students, computers, projects.
The Good Old Days: Astronomy ~1600 30 years ? years 10 years 6 years 2 years
Astronomy, from 1600  to 2000 Automation 10 -1    10 8  Hz data capture Community 10 0    10 4 astronomers (10 6  amateur) Computation Data 10 6    10 15  B aggregate 10 -1    10 15  Hz peak Literature 10 1    10 5 pages/year
Biomedical Research ~1600
Biomedical Research ~2000 ... atcgaattccaggcgtcacattctcaattcca... MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYT... Protein-Protein Interactions metabolism pathways receptor-ligand 4º structure Polymorphism and Variants genetic variants individual patients epidemiology Physiology Cellular biology Biochemistry Neurobiology Endocrinology etc. >10 6 ESTs  Expression patterns Large-scale screens Genetics and Maps Linkage Cytogenetic  Clone-based From John Wooley >10 6 >10 9 >10 6 >10 5 >10 9 DNA sequences alignments Proteins sequence 2º structure 3º structure
Growth of Sequences and Annotations since 1982 Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges  for the Next Decade,  CTWatch , August 2006.
The Analyst in Denial “ I just need a bigger disk (and workstation)”
An Open Analytics Environment Data in “ No limits” Storage Computing Format Program Allowing for Versioning Provenance Collaboration Annotation Results out Programs & rules in
o·pen [oh-puhn] adjective having the interior immediately accessible relatively free of obstructions to sight, movement, or internal arrangement generous, liberal, or bounteous in operation; live readily admitting new members not constipated
What Goes In (1)
What Goes In (2) Rules Workflows Dryad MapReduce Parallel programs SQL BPEL Swift SCFL R MatLab Octave
How it Cooks Virtualization Run any program, store any data Indexing Automated maintenance Provisioning Policy-driven allocation of resources to competing demands
What Comes Out Data Data Virtual Data Schema
Analysis as (Collaborative) Process Transform Annotate  Search Add to Tag Visualize Discover Extend Group Share
Centralized or Distributed? Both
Towards an Open Analysis Environment: (1) Applications Astrophysics Cognitive science East Asian studies Economics Environmental science Epidemiology Genomic medicine Neuroscience Political science Sociology Solid state physics
Towards an Open Analysis Environment: (2) Hardware SiCortex 6K cores, 6 Top/s IBM BG/P 160K cores, 500 Top/s PADS 10-40 Gbit/s
PADS: Petascale Active Data Store 500 TB  reliable  storage  (data & metadata) 180 TB,  180 GB/s  17 Top/s analysis Data ingest Dynamic  provisioning Parallel analysis Remote access Offload to remote  data centers P A D S Diverse users Diverse data sources 1000 TB tape  backup
Towards an Open Analysis Environment : (3) Methods HPC systems software (MPICH, PVFS, etc.) Collaborative data tagging (GLOSS) Data integration (XDTM) HPC data analytics and visualization Loosely coupled parallelism (Swift, Hadoop) Dynamic provisioning (Falkon) Service authoring (Introduce, caGrid, gRAVI) Provenance recording and query (Swift) Service composition and workflow (Taverna) Virtualization management Distributed data management (GridFTP, etc.)
Tagging &  Social Networking GLOSS :  Generalized  Labels Over Scientific  data Sources
XDTM: XML Data Typing & Mapping ./group23 drwxr-xr-x  4 yongzh users 2048 Nov 12 14:15  AA drwxr-xr-x  4 yongzh users 2048 Nov 11 21:13  CH drwxr-xr-x  4 yongzh users 2048 Nov 11 16:32  EC ./group23/AA : drwxr-xr-x  5 yongzh users 2048 Nov  5 12:41  04nov06aa drwxr-xr-x  4 yongzh users 2048 Dec  6 12:24  11nov06aa . /group23/AA/04nov06aa : drwxr-xr-x  2 yongzh users  2048 Nov  5 12:52  ANATOMY drwxr-xr-x  2 yongzh users 49152 Dec  5 11:40  FUNCTIONAL . /group23/AA/04nov06aa/ANATOMY : -rw-r--r--  1 yongzh users  348 Nov  5 12:29  coplanar.hdr -rw-r--r--  1 yongzh users 16777216 Nov  5 12:29  coplanar.img . /group23/AA/04nov06aa/FUNCTIONAL : -rw-r--r--  1 yongzh users  348 Nov  5 12:32  bold1_0001.hdr -rw-r--r--  1 yongzh users  409600 Nov  5 12:32  bold1_0001.img -rw-r--r--  1 yongzh users  348 Nov  5 12:32  bold1_0002.hdr -rw-r--r--  1 yongzh users  409600 Nov  5 12:32  bold1_0002.img -rw-r--r--  1 yongzh users  496 Nov 15 20:44  bold1_0002.mat -rw-r--r--  1 yongzh users  348 Nov  5 12:32  bold1_0003.hdr -rw-r--r--  1 yongzh users  409600 Nov  5 12:32  bold1_0003.img Logical Physical
fMRI Type Definitions type  Study  {  Group g[ ];  } type  Group  {  Subject s[ ];  } type  Subject  {  Volume anat;  Run run[ ];  } type  Run  {  Volume v[ ];  } type  Volume  {  Image img;  Header hdr;  } type  Image  {}; type  Header  {}; type  Warp  {}; type  Air  {}; type  AirVec  {  Air a[ ];  } type  NormAnat  { Volume anat;  Warp aWarp;  Volume nHires; }
High-Performance Data Analytics Functional MRI Ben Clifford,  Mihael Hatigan,  Mike Wilde, Yong Zhao
SwiftScript for fMRI Data Analysis (Run snr)  functional  ( Run r, NormAnat a,    Air shrink ) { Run  yroRun  =  reorientRun ( r , "y" ); Run roRun =  reorientRun (  yroRun  , "x" ); Volume std = roRun[0]; Run rndr =  random_select ( roRun, 0.1 ); AirVector rndAirVec =  align_linearRun ( rndr, std, 12, 1000, 1000, "81 3 3" ); Run reslicedRndr =  resliceRun ( rndr, rndAirVec, "o", "k" ); Volume meanRand =  softmean ( reslicedRndr, "y", "null" ); Air mnQAAir =  alignlinear ( a.nHires, meanRand, 6, 1000, 4, "81 3 3" ); Warp boldNormWarp =  combinewarp ( shrink, a.aWarp, mnQAAir ); … } (Run or) reorientRun (Run ir,    string direction) { foreach Volume  iv , i in ir.v { or.v[i] = reorient( iv , direction); } }
Provenance Data Model
Multi-level Scheduling SwiftScript Abstract computation Virtual Data Catalog SwiftScript Compiler Specification Execution Virtual Node(s)‏ Worker Nodes Provenance data Provenance data Provenance collector launcher launcher file1 file2 file3 App F1 App F2 Scheduling Execution Engine (Karajan w/ Swift Runtime)‏ Swift runtime callouts C C C C Status reporting Provisioning Falkon Resource Provisioner Amazon EC2
DOCK on SiCortex CPU cores: 5760 Power: 15,000 W Tasks: 92160 Elapsed time: 12821 sec Compute time: 1.94 CPU years (does not include ~800 sec to stage input data) Ioan Raicu, Zhao Zhang
LIGO Gravitational Wave Observatory Birmingham • >1 Terabyte/day to 8 sites 770 TB replicated to date: >120 million replicas MTBF = 1 month Ann Chervenak et al., ISI; Scott Koranda et al, LIGO Cardiff AEI/Golm
Lag Plot for Data Transfers to Caltech Credit: Kevin Flasch, LIGO
SIDGrid: B. Bertenthal et al., U.Chicago, IU, UIC
Social Informatics Data Grid (SIDgrid) TeraGrid PADS … SIDgrid Collaborative, multi-modal analysis of cognitive science data Diverse experimental data & metadata  Browse data Search Content preview Transcode Download Analyze
ELAN SIDGrid Portal
 
A  C ommunity  I ntegrated  M odel for  E conomic  a nd  R esource  T rajectories for  H umankind ( CIM-EARTH ) Dynamics, foresight, uncertainty, resolution, … Agriculture, transport, taxation, … Data  (global, local, …) (Super) computers CIM-EARTH Framework  Community process Open code, data
Alleviating  Poverty in Thailand: Modeling  Entrepreneurship Consider  only wealth, access to  capital Consider also distance to 6 major cities Rob Townsend, Victor Zhorin, et al. Match High Low
Text Mining
GeneWays Online  Journals Pathways GeneWays Andrey Rzhetsky  et al. Screening 250,000 journal articles 2.5M reasoning chains  4M statements
Evidence Integration: Genetics & Disease Susceptibility Identify Genes Phenotype 1  Phenotype 2  Phenotype 3  Phenotype 4 Predictive Disease Susceptibility Physiology Metabolism Endocrine Proteome Immune Transcriptome Biomarker Signatures Morphometrics Pharmacokinetics Ethnicity Environment Age Gender Source: Terry Magnuson
James Evans, U.Chicago Arabidopsis  articles
An Open Analytics Environment Data in “ No limits” Storage Computing Format Program Allowing for Versioning Provenance Collaboration Annotation Results out Programs & rules in
 

More Related Content

What's hot (18)

LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
Luis Daniel Ibáñez
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
Mark Smith
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
Mehwish Alam
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
Ian Foster
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
Kendrick Lo
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
Xiaoqian Liu
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
CIARD Movement
 
Velocity cubes of galaxies
Velocity cubes of galaxiesVelocity cubes of galaxies
Velocity cubes of galaxies
Jose Enrique Ruiz
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
Ansgar Scherp
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
James Tan
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxies
Jose Enrique Ruiz
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit
Tyler Treat
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
William Yetman
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
Ted Dunning
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Vitor Hirota Makiyama
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
Databricks
 
R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo db
MongoDB
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
Luis Daniel Ibáñez
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
Mark Smith
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
Mehwish Alam
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
Ian Foster
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
Kendrick Lo
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
Xiaoqian Liu
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
CIARD Movement
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
Ansgar Scherp
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
James Tan
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxies
Jose Enrique Ruiz
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit
Tyler Treat
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
William Yetman
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
Ted Dunning
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Vitor Hirota Makiyama
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
Databricks
 
R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo db
MongoDB
 

Viewers also liked (17)

Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009
Ian Foster
 
Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...
Ian Foster
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
Ian Foster
 
Running Hot October 2008
Running Hot October 2008Running Hot October 2008
Running Hot October 2008
Ian Foster
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selection
r m
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
Ian Foster
 
Recruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop SeriesRecruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop Series
hholmes75
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
Ian Foster
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selection
r m
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
Ian Foster
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
Ian Foster
 
Campus Bridging with Globus Services
Campus Bridging with Globus ServicesCampus Bridging with Globus Services
Campus Bridging with Globus Services
Ian Foster
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshots
Ian Foster
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
Ian Foster
 
Flitterin For Talent Presentation Slides
Flitterin For Talent Presentation SlidesFlitterin For Talent Presentation Slides
Flitterin For Talent Presentation Slides
hholmes75
 
Mexico talk foster march 2012
Mexico talk foster march 2012Mexico talk foster march 2012
Mexico talk foster march 2012
Ian Foster
 
Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009
Ian Foster
 
Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...
Ian Foster
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
Ian Foster
 
Running Hot October 2008
Running Hot October 2008Running Hot October 2008
Running Hot October 2008
Ian Foster
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selection
r m
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
Ian Foster
 
Recruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop SeriesRecruiting in a Networked World - Workshop Series
Recruiting in a Networked World - Workshop Series
hholmes75
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
Ian Foster
 
Recruitment and Selection
Recruitment and SelectionRecruitment and Selection
Recruitment and Selection
r m
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
Ian Foster
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
Ian Foster
 
Campus Bridging with Globus Services
Campus Bridging with Globus ServicesCampus Bridging with Globus Services
Campus Bridging with Globus Services
Ian Foster
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshots
Ian Foster
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
Ian Foster
 
Flitterin For Talent Presentation Slides
Flitterin For Talent Presentation SlidesFlitterin For Talent Presentation Slides
Flitterin For Talent Presentation Slides
hholmes75
 
Mexico talk foster march 2012
Mexico talk foster march 2012Mexico talk foster march 2012
Mexico talk foster march 2012
Ian Foster
 
Ad

Similar to Open Analytics Environment (20)

Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
inside-BigData.com
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
University of Washington
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
Ian Foster
 
HEPData workshop talk
HEPData workshop talkHEPData workshop talk
HEPData workshop talk
Eamonn Maguire
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
thetfoot
 
Ph. D. Final Dissertation SLides
Ph. D. Final Dissertation SLidesPh. D. Final Dissertation SLides
Ph. D. Final Dissertation SLides
Emanuele Panigati
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
BigData_Europe
 
Extracting City Traffic Events from Social Streams
 Extracting City Traffic Events from Social Streams Extracting City Traffic Events from Social Streams
Extracting City Traffic Events from Social Streams
Pramod Anantharam
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
Paul Groth
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
Enrico Daga
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
Ian Foster
 
Potter’S Wheel
Potter’S WheelPotter’S Wheel
Potter’S Wheel
Dr Anjan Krishnamurthy
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data era
Roku
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 
Enabling semantic integration
Enabling semantic integration Enabling semantic integration
Enabling semantic integration
Jean-Paul Calbimonte
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Thomas Gottron
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Emanuele Della Valle
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
Eamonn Maguire
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
inside-BigData.com
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
University of Washington
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
Ian Foster
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
thetfoot
 
Ph. D. Final Dissertation SLides
Ph. D. Final Dissertation SLidesPh. D. Final Dissertation SLides
Ph. D. Final Dissertation SLides
Emanuele Panigati
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
BigData_Europe
 
Extracting City Traffic Events from Social Streams
 Extracting City Traffic Events from Social Streams Extracting City Traffic Events from Social Streams
Extracting City Traffic Events from Social Streams
Pramod Anantharam
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
Paul Groth
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
Enrico Daga
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
Ian Foster
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data era
Roku
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Thomas Gottron
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18Stream Reasoning: Where we got so far. Oxford 2010.1.18
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Emanuele Della Valle
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
Eamonn Maguire
 
Ad

More from Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
Ian Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
Ian Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
Ian Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
Ian Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
Ian Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
Ian Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
Ian Foster
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
Ian Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Ian Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
Ian Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
Ian Foster
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
Ian Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
Ian Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
Ian Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
Ian Foster
 
Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
Ian Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
Ian Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
Ian Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
Ian Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
Ian Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
Ian Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
Ian Foster
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
Ian Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Ian Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
Ian Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
Ian Foster
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
Ian Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
Ian Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
Ian Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
Ian Foster
 

Recently uploaded (20)

SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
Introducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRCIntroducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRC
Adtran
 
Fortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in CybersecurityFortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in Cybersecurity
VICTOR MAESTRE RAMIREZ
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
Introducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRCIntroducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRC
Adtran
 
Fortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in CybersecurityFortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in Cybersecurity
VICTOR MAESTRE RAMIREZ
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 

Open Analytics Environment

  • 1. Towards an Open Analytics Environment Ian Foster Computation Institute Argonne National Lab & University of Chicago
  • 2. The Computation Institute A joint institute of Argonne and the University of Chicago, focused on furthering system-level science via the development and use of advanced computational methods. Solutions to many grand challenges facing science and society today require the analysis and understanding of entire systems, not just individual components. They require not reductionist approaches but the synthesis of knowledge from multiple levels of a system, whether biological, physical, or social (or all three). www.ci.uchicago.edu Faculty, fellows, staff, students, computers, projects.
  • 3. The Good Old Days: Astronomy ~1600 30 years ? years 10 years 6 years 2 years
  • 4. Astronomy, from 1600 to 2000 Automation 10 -1  10 8 Hz data capture Community 10 0  10 4 astronomers (10 6 amateur) Computation Data 10 6  10 15 B aggregate 10 -1  10 15 Hz peak Literature 10 1  10 5 pages/year
  • 6. Biomedical Research ~2000 ... atcgaattccaggcgtcacattctcaattcca... MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYT... Protein-Protein Interactions metabolism pathways receptor-ligand 4º structure Polymorphism and Variants genetic variants individual patients epidemiology Physiology Cellular biology Biochemistry Neurobiology Endocrinology etc. >10 6 ESTs Expression patterns Large-scale screens Genetics and Maps Linkage Cytogenetic Clone-based From John Wooley >10 6 >10 9 >10 6 >10 5 >10 9 DNA sequences alignments Proteins sequence 2º structure 3º structure
  • 7. Growth of Sequences and Annotations since 1982 Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges for the Next Decade, CTWatch , August 2006.
  • 8. The Analyst in Denial “ I just need a bigger disk (and workstation)”
  • 9. An Open Analytics Environment Data in “ No limits” Storage Computing Format Program Allowing for Versioning Provenance Collaboration Annotation Results out Programs & rules in
  • 10. o·pen [oh-puhn] adjective having the interior immediately accessible relatively free of obstructions to sight, movement, or internal arrangement generous, liberal, or bounteous in operation; live readily admitting new members not constipated
  • 12. What Goes In (2) Rules Workflows Dryad MapReduce Parallel programs SQL BPEL Swift SCFL R MatLab Octave
  • 13. How it Cooks Virtualization Run any program, store any data Indexing Automated maintenance Provisioning Policy-driven allocation of resources to competing demands
  • 14. What Comes Out Data Data Virtual Data Schema
  • 15. Analysis as (Collaborative) Process Transform Annotate Search Add to Tag Visualize Discover Extend Group Share
  • 17. Towards an Open Analysis Environment: (1) Applications Astrophysics Cognitive science East Asian studies Economics Environmental science Epidemiology Genomic medicine Neuroscience Political science Sociology Solid state physics
  • 18. Towards an Open Analysis Environment: (2) Hardware SiCortex 6K cores, 6 Top/s IBM BG/P 160K cores, 500 Top/s PADS 10-40 Gbit/s
  • 19. PADS: Petascale Active Data Store 500 TB reliable storage (data & metadata) 180 TB, 180 GB/s 17 Top/s analysis Data ingest Dynamic provisioning Parallel analysis Remote access Offload to remote data centers P A D S Diverse users Diverse data sources 1000 TB tape backup
  • 20. Towards an Open Analysis Environment : (3) Methods HPC systems software (MPICH, PVFS, etc.) Collaborative data tagging (GLOSS) Data integration (XDTM) HPC data analytics and visualization Loosely coupled parallelism (Swift, Hadoop) Dynamic provisioning (Falkon) Service authoring (Introduce, caGrid, gRAVI) Provenance recording and query (Swift) Service composition and workflow (Taverna) Virtualization management Distributed data management (GridFTP, etc.)
  • 21. Tagging & Social Networking GLOSS : Generalized Labels Over Scientific data Sources
  • 22. XDTM: XML Data Typing & Mapping ./group23 drwxr-xr-x 4 yongzh users 2048 Nov 12 14:15 AA drwxr-xr-x 4 yongzh users 2048 Nov 11 21:13 CH drwxr-xr-x 4 yongzh users 2048 Nov 11 16:32 EC ./group23/AA : drwxr-xr-x 5 yongzh users 2048 Nov 5 12:41 04nov06aa drwxr-xr-x 4 yongzh users 2048 Dec 6 12:24 11nov06aa . /group23/AA/04nov06aa : drwxr-xr-x 2 yongzh users 2048 Nov 5 12:52 ANATOMY drwxr-xr-x 2 yongzh users 49152 Dec 5 11:40 FUNCTIONAL . /group23/AA/04nov06aa/ANATOMY : -rw-r--r-- 1 yongzh users 348 Nov 5 12:29 coplanar.hdr -rw-r--r-- 1 yongzh users 16777216 Nov 5 12:29 coplanar.img . /group23/AA/04nov06aa/FUNCTIONAL : -rw-r--r-- 1 yongzh users 348 Nov 5 12:32 bold1_0001.hdr -rw-r--r-- 1 yongzh users 409600 Nov 5 12:32 bold1_0001.img -rw-r--r-- 1 yongzh users 348 Nov 5 12:32 bold1_0002.hdr -rw-r--r-- 1 yongzh users 409600 Nov 5 12:32 bold1_0002.img -rw-r--r-- 1 yongzh users 496 Nov 15 20:44 bold1_0002.mat -rw-r--r-- 1 yongzh users 348 Nov 5 12:32 bold1_0003.hdr -rw-r--r-- 1 yongzh users 409600 Nov 5 12:32 bold1_0003.img Logical Physical
  • 23. fMRI Type Definitions type Study { Group g[ ]; } type Group { Subject s[ ]; } type Subject { Volume anat; Run run[ ]; } type Run { Volume v[ ]; } type Volume { Image img; Header hdr; } type Image {}; type Header {}; type Warp {}; type Air {}; type AirVec { Air a[ ]; } type NormAnat { Volume anat; Warp aWarp; Volume nHires; }
  • 24. High-Performance Data Analytics Functional MRI Ben Clifford, Mihael Hatigan, Mike Wilde, Yong Zhao
  • 25. SwiftScript for fMRI Data Analysis (Run snr) functional ( Run r, NormAnat a, Air shrink ) { Run yroRun = reorientRun ( r , "y" ); Run roRun = reorientRun ( yroRun , "x" ); Volume std = roRun[0]; Run rndr = random_select ( roRun, 0.1 ); AirVector rndAirVec = align_linearRun ( rndr, std, 12, 1000, 1000, "81 3 3" ); Run reslicedRndr = resliceRun ( rndr, rndAirVec, "o", "k" ); Volume meanRand = softmean ( reslicedRndr, "y", "null" ); Air mnQAAir = alignlinear ( a.nHires, meanRand, 6, 1000, 4, "81 3 3" ); Warp boldNormWarp = combinewarp ( shrink, a.aWarp, mnQAAir ); … } (Run or) reorientRun (Run ir, string direction) { foreach Volume iv , i in ir.v { or.v[i] = reorient( iv , direction); } }
  • 27. Multi-level Scheduling SwiftScript Abstract computation Virtual Data Catalog SwiftScript Compiler Specification Execution Virtual Node(s)‏ Worker Nodes Provenance data Provenance data Provenance collector launcher launcher file1 file2 file3 App F1 App F2 Scheduling Execution Engine (Karajan w/ Swift Runtime)‏ Swift runtime callouts C C C C Status reporting Provisioning Falkon Resource Provisioner Amazon EC2
  • 28. DOCK on SiCortex CPU cores: 5760 Power: 15,000 W Tasks: 92160 Elapsed time: 12821 sec Compute time: 1.94 CPU years (does not include ~800 sec to stage input data) Ioan Raicu, Zhao Zhang
  • 29. LIGO Gravitational Wave Observatory Birmingham • >1 Terabyte/day to 8 sites 770 TB replicated to date: >120 million replicas MTBF = 1 month Ann Chervenak et al., ISI; Scott Koranda et al, LIGO Cardiff AEI/Golm
  • 30. Lag Plot for Data Transfers to Caltech Credit: Kevin Flasch, LIGO
  • 31. SIDGrid: B. Bertenthal et al., U.Chicago, IU, UIC
  • 32. Social Informatics Data Grid (SIDgrid) TeraGrid PADS … SIDgrid Collaborative, multi-modal analysis of cognitive science data Diverse experimental data & metadata Browse data Search Content preview Transcode Download Analyze
  • 34.  
  • 35. A C ommunity I ntegrated M odel for E conomic a nd R esource T rajectories for H umankind ( CIM-EARTH ) Dynamics, foresight, uncertainty, resolution, … Agriculture, transport, taxation, … Data (global, local, …) (Super) computers CIM-EARTH Framework Community process Open code, data
  • 36. Alleviating Poverty in Thailand: Modeling Entrepreneurship Consider only wealth, access to capital Consider also distance to 6 major cities Rob Townsend, Victor Zhorin, et al. Match High Low
  • 38. GeneWays Online Journals Pathways GeneWays Andrey Rzhetsky et al. Screening 250,000 journal articles 2.5M reasoning chains 4M statements
  • 39. Evidence Integration: Genetics & Disease Susceptibility Identify Genes Phenotype 1 Phenotype 2 Phenotype 3 Phenotype 4 Predictive Disease Susceptibility Physiology Metabolism Endocrine Proteome Immune Transcriptome Biomarker Signatures Morphometrics Pharmacokinetics Ethnicity Environment Age Gender Source: Terry Magnuson
  • 40. James Evans, U.Chicago Arabidopsis articles
  • 41. An Open Analytics Environment Data in “ No limits” Storage Computing Format Program Allowing for Versioning Provenance Collaboration Annotation Results out Programs & rules in
  • 42.