SlideShare a Scribd company logo
DATA AS A SERVICE
How Web APIs and Data-Centric
Tools Power the Materials Project
Shreyas Cholia (scholia@lbl.gov)
Dan Gunter (dkgunter@lbl.gov)
Lawrence Berkeley National Laboratory
PyData 2013
Outline
•  Data driven science
•  Materials Project Overview
•  Open data and APIs
•  Dropping APIs on your data
•  Things to think about in your API
•  Writing libraries for code AND data (pymatgen REST
interface)
•  Science stories to back this up
•  Ipython notebook demo
About Us
•  Dan and Shreyas are Computer Scientists/Engineers at
Berkeley Lab.
•  We work with science teams to help build software and
computing infrastructure that facilitates awesome
SCIENCE
Science
•  Science is now a collaborative effort
•  Large teams of people
•  Lots of computational power
The Fourth Paradigm
Big Data
Science is
increasingly data-
driven
Computational
cycles are cheap
Take an –omics
approach to
science
Compute all
interesting things
first, ask questions
later
The –omics approach
•  Instead of trying to derive a solution and compute the
results, just compute the space of all possibilities and look
for the optimal result in there.
•  OK – so we are generating more data than we know what
to do with but that is ok
•  (and might be a topic for another talk …)
An open science initiative that makes available
a huge database of computed materials
properties for all materials researchers.
The Materials Project
Wordcloud showing
frequencies of elements
in Materials Project's
database
..except Oxygen, which appears
12,751 times (3.5x as much as the
next most frequent, Phosphorus)
The Materials Project http://materialsproject.org/
18 years
from creation
to commercial
manufacture!
Teflon
Titanium
Velcro
Polycarbonate
GaAs
Diamond-like Thin
Films
Materials Data from: Eagar, T.; King, M. Technology Review (00401692) 1995, 98, 42.
invented
1960 19701950
"Need for speed" in new materials
Lithium ion
S. Whittingham
Sony
1980 1990 2000
Materials have strategic importance
Sept 7, 2010
Japan arrests
Chinese boat captain
after collision in
disputed waters
China blocks
shipments of Rare
Earth Metals to
Japan
Sept 22, 2010
Japan releases
captain
Sept 24, 2010
Japan invests in induction motors… coincidence?
“Toyota Readying Motors That Don’t Use Rare Earths…”
Jan 14, 2011 1:50 PM PT
Content for this slide courtesy Gerbrand Ceder, MIT & Kristin Persson, LBNL
2010 "Senkaku Boat Collision Incident"
Solution: Computation
Many materials properties can be computed
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Voltage	
  (V)
computed experimental	
  literature
stage I+II
Stage
II
Stage I
stage III
+II
+ =
ΔH = [ E (X) + E (Y) ] –
E(XY)
Photovoltaics, Thermoelectrics,
Energy Storage, Hydrogen,
Catalysts, Magnets….
Infrastructure
Submitted
Materials
Materials
Data
Materials Properties
Supercomputers	
  
•  Over 10 million CPU hours of
calculations in < 6 months
•  Over 40,000 successful VASP
runs
(30,000+ materials)
•  Generalizable to other high-
throughput codes/analyses
Calculation
Workflows
Supercomputers Codes to run
(in sequence)
Atomic positions
0
10000
20000
30000
40000
50000
60000
Jul 2011 Oct 2011 Jan 2012 Apr 2012 Jul 2012 Oct 2012
Date
Numberofruns
state
Failed
Successful
Computation
•  Run VASP on NERSC
supercomputing
resources
•  Use Fireworks to
manage large groups
of runs
•  Results in … data for
MP
Just sit back and
enjoy the
automation..
Total
Data Islands
•  Data is still heavily silo-ed and inaccessible.
•  Data sits on a machine somewhere, and you give people
local ssh/DB accounts to access it.
•  Good luck combining multiple datasets
•  Does not scale!
•  This is 2013 – we can do better!
Sharing (your data) is important!
•  The Most Important Scientific Result Published in the
Last Year
J.M. Wicherts, M. Bakker, and D. Molenaar:
Willingness to Share Research Data Is Related to the
Strength of the Evidence and the Quality of Reporting of
Statistical Results
PLoS ONE, 6(11): e26828, 2011, doi:10.1371/journal.pone.0026828.
Content for this slide courtesy Greg Wilson, Software Carpentry
Data Sharing
•  Open access to data through programmatic interfaces
•  Sub-select the data on demand rather than pulling down
the entire dataset
•  Use your own local tools with centrally managed data
•  Everyone sees the same data – better collaboration
Web portal
•  Materials data stored in a Mongo DB
•  http://materialsproject.org web portal makes materials data
easily accessible
•  Materials Explorer
•  Phase Diagrams
•  Crystal Toolkit
•  Battery Explorer
•  Reaction Calculator
•  Structure Predictor
•  Focus on a highly functional and usable website to query
materials data. (We heart Django!)
•  Additionally we distribute the tools used to compute and
analyze the data as an open source library – pymatgen
API access
•  But we quickly found that scientists wanted programmatic
access to data
•  eg. Give me property X for all materials with Li and O so
that I can pass it through my own codes
•  Lesson – make your data available through an API and
people will start to do amazing things
Why Web APIs?
•  Big push towards HTTP APIs across the web.
•  Web APIs provide programmatic access to data and
resources to developers over the web
•  Access to data as well-defined objects allows users to
develop their own custom applications and code
Enables a thriving COMMUNITY built around data.
What is The Materials API?
An open platform for
accessing Materials
Project data over the
web.
Flexible and scalable to
cater to large number
of collaborators, with
different access
privileges.
Simple to use and code
agnostic.
HTTP API design
https://www.materialsproject.org/rest/v1/materials/Fe2O3/vasp/energy
Preamble URL
Unique Identifier.
Eg. a formula
(Fe2O3), id (1234)
or chemical
system (Li-Fe-O)
Data type
(vasp,
exp, etc.)
Propert
y
Materials API maps URLs to data
objects
Access via an API key
•  To maintain privileged access, each user has an
associated API key (with certain defined access
privileges).
•  To get your key, login to materialsproject.org and go to
www.materialsproject.org/profile
•  All MP https requests must supply API key as:
•  A x-api-key header, e.g., {‘X-API-KEY’: ‘MYKEY’}, or
•  As a GET or POST variable, e.g., {‘API_KEY’: ‘MYKEY’}
Sample JSON output
GET https://www.materialsproject.org/rest/v1/materials/Fe2O3/vasp/energy
{
"created_at": "2013-03-17T09:14:58.158081",
"valid_response": true,
"version": {
"pymatgen": "2.5.4",
"db": "2013.02.25",
"rest": "1.0"
},
"response": [{
"energy": -132.33005625,
"material_id": 542309
}, {
"energy": -66.62512425,
"material_id": 24972
}],
"copyright": "Copyright 2012, The Materials Project"
}
Just the energy and the
id of the material
Getting started – Hello World API
> pip install flask
Our dirty little secret
•  It involves a certain language that ends with “uby” that we
don’t like to talk about in these parts
•  Version 0.0.0 was of the Materials Project was coded in
Sinatra
•  Sinatra is a microframework much like Flask
•  But it proves that this approach is viable and can be the
onramp to more amazing things.
Un-considerations
•  Don’t worry too much about pure REST
•  Initially just think of how URLs and verbs can map to functions
•  Don’t worry too much about data formats
•  JSON is easy and a great place to start
•  Feel free to avoid XML unless you really need it
Our Stack
•  Apache + mod_wsgi
•  Django
•  pymatgen
•  pymongo + Mongo DB
pymatgen
•  The open source python library that powers the Materials
Project.
•  Defines core Python objects for materials data representation.
•  Provides a well-tested set of structure and thermodynamic analysis
tools relevant to many applications.
•  Establishes an open platform for researchers to collaboratively
develop sophisticated analyses of materials data obtained both
from first principles calculations and experiments.
Integration with pymatgen
The Materials API
Powerful Materials
Analytics Tool
Where we’re going with this
•  Libraries that integrate data with computation!
•  The scientific python ecosystem has a ton of data analysis
tools and libraries
•  Just starting to think about baking in datasets directly into
these tools
•  Pymatgen allows you to access core MP data directly
from the library
Compute + data
pymatgen has hooks into the materials data so you can do
stuff like this:
entries = api.get_entries_in_chemsys([’Li', ‘Fe', 'O'])
But it also has computational tools that you can then use to
act on the data
pd = PhaseDiagram(entries)
Blurring the lines
•  Yes – we are blurring the lines between compute and data
•  But this is not a new idea
•  Think of all the tools built around commercial APIs
•  Twitter, Netflix etc. - python clients built around the API
Write First Class Science Functions
•  Web APIs are extremely useful, but ultimately you want to
encapsulate core science functionality as python functions
so that scientists aren’t worrying about things like
How do I set the
X-API-KEY header?
Sample use cases
•  Screening for CO2 sorbents (with Clare Grey)
•  Using the Materials API (MAPI) + pymatgen to calculate reaction
energies of thousands of oxides with CO2.
•  Calculation of XAFS, XANES and other spectra for
clusters of atoms (with Alan Dozier)
•  Alan wrote a io add-on to pymatgen for FEFF input/output.
•  Uses MAPI + pymatgen to extract structures.
•  Defects (with Maciej Haranczyk)
•  Uses MAPI + pymatgen to pull structures to perform Voronoi
analysis to find possible interstitial sites.
Ipython Notebook Examples
•  http://nbviewer.ipython.org/5199610
•  http://nbviewer.ipython.org/5022735
from pymatgen.matproj.rest import MPRester
#This initializes the REST adaptor. Put your own API key in.
a = MPRester("YOUR_API_KEY")
 
#This gives you the Structure corresponding to material id 2254 in
the Materials Project.
structure = a.get_structure_by_material_id(2254)
 
#Entries are the basic unit for thermodynamic and other analyses
in pymatgen.
#This gets all entries belonging to the Ca-C-O system.
entries = a.get_entries_in_chemsys(['Ca', 'C', 'O'])
#With entries, you can do many sophisticated analyses,
#like creating phase diagrams.
pd = PhaseDiagram(entries)
plotter = PDPlotter(pd)
plotter.show()
Materials API + pymatgen example
Sandboxes
•  A virtual private dataset
•  Useful for
•  Everyone as a sort of "scratch"
space
•  Industry partners who want to use
the tools but not share their data
Import format: Structure Notation
Language (SNL)
•  Contains structure/molecule object, and provenance
about
created_at
authors
projects
references
remarks
data
history
Another way to remember the acronym..
Fireworks
•  FireWorks is a code for defining, managing, and executing
scientific workflows
•  It can be used to automate most types of calculations over
arbitrary computing resources, including those that have a
queueing system
•  It is very dynamic: Fireworks can begat other fireworks at
runtime
http://pythonhosted.org/FireWorks/
Pymatgen-db
•  Sick of MongoHub et al.? We were. So we wrote a simple
Web UI using prettytable, pymatgen, and Django
•  https://github.com/materialsproject/pymatgen-db
Which we
proceeded to
use for deep
scientific inquiry
We’re not the only ones …
•  Bioinformatics
•  KBase (http://kbase.us) - DOE predictive and systems biology.
•  Astronomy
•  Sloan Digital Sky Survey (http://skyserver.sdss.org)
•  Spectroscopy
•  Advanced Light Source (ALS), Advanced Photon Source (APS)
•  According to ProgrammableWeb, ~130 others
http://www.programmableweb.com/apis/directory/1?apicat=Science&protocol=REST
..though probably many of these are
More information
•  Materials API + pymatgen examples
•  https://gist.github.com/gists/search?q=materials+api+pymatgen
•  The Materials API wiki
•  https://materialsproject.org/wiki/index.php/The_Materials_API
•  Python Materials Genomics
•  http://packages.python.org/pymatgen/
•  Shyue Ping Ong, William Davidson Richard, Anubhav Jain, Geoffroy
Hautier, Michael Kocher, Shreyas Cholia, Dan Gunter, Vincent
Chevrier, Kristin A. Persson, Gerbrand Ceder. Python Materials
Genomics (pymatgen) : A Robust, Open-Source Python Library for
Materials Analysis. (submitted)
•  These slides:
•  https://speakerdeck.com/shreddd/data-as-a-service-pydata-2013
Takeaways
•  Make scientific data easily available to end-users
•  Friendly, powerful Web UI is a great way to engage, but then..
•  Build APIs around your data to make it easily accessible
•  Write scientific libraries with *both* analysis and data, by
hooking them up to APIs.
We’re hiring
•  Talented, science-loving, web-savvy, math-anything
Python programming code-slingers who would rather pass
a Nobel prize winner on the way to lunch than get free
dry-cleaning
•  downside: or even free coffee (groan)
•  upside: some of your tax dollars go towards your own salary!
•  http://jobs.materialsproject.org/
Contact Us
•  Shreyas Cholia – scholia@lbl.gov
•  Dan Gunter – dkgunter@lbl.gov
•  Materials Project Team – feedback@materialsproject.org
How Web APIs and Data Centric Tools Power the Materials Project (PyData SV 2013)

More Related Content

What's hot (20)

Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future Tense
Paco Nathan
 
Webinar: Deep Learning with H2O
Webinar: Deep Learning with H2OWebinar: Deep Learning with H2O
Webinar: Deep Learning with H2O
Sri Ambati
 
Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
Sri Ambati
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
Yalçın Yenigün
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
Paco Nathan
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
Sri Ambati
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14
Sri Ambati
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
Paco Nathan
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko
Sri Ambati
 
Big Data Science with H2O in R
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
Anqi Fu
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
Noam Shaish
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
Sri Ambati
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Paco Nathan
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
Sri Ambati
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
Paco Nathan
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
elephantscale
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
Travis Oliphant
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future Tense
Paco Nathan
 
Webinar: Deep Learning with H2O
Webinar: Deep Learning with H2OWebinar: Deep Learning with H2O
Webinar: Deep Learning with H2O
Sri Ambati
 
Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
Sri Ambati
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
Paco Nathan
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
Sri Ambati
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14
Sri Ambati
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
Paco Nathan
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko
Sri Ambati
 
Big Data Science with H2O in R
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
Anqi Fu
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
Sri Ambati
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Paco Nathan
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
Sri Ambati
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
Paco Nathan
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
elephantscale
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
Travis Oliphant
 

Similar to How Web APIs and Data Centric Tools Power the Materials Project (PyData SV 2013) (20)

The Materials API
The Materials APIThe Materials API
The Materials API
University of California, San Diego
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
University of California, San Diego
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
Anubhav Jain
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...
University of California, San Diego
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
University of California, San Diego
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
Anubhav Jain
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
Anubhav Jain
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
Anubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
Anubhav Jain
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
Anubhav Jain
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Anubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
Anubhav Jain
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
Anubhav Jain
 
Materials informatics
Materials informaticsMaterials informatics
Materials informatics
Sergey Sozykin
 
MAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodianMAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodian
University of California, San Diego
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
Anubhav Jain
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
aimsnist
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
Anubhav Jain
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
Anubhav Jain
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
University of California, San Diego
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
Anubhav Jain
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...
University of California, San Diego
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
University of California, San Diego
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
Anubhav Jain
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
Anubhav Jain
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
Anubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
Anubhav Jain
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
Anubhav Jain
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Anubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
Anubhav Jain
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
Anubhav Jain
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
Anubhav Jain
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
aimsnist
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
Anubhav Jain
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
Anubhav Jain
 
Ad

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
PyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
PyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
PyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
PyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
PyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
PyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
PyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
PyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
PyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
PyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
PyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
PyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
PyData
 
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
PyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
PyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
PyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
PyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
PyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
PyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
PyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
PyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
PyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
PyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
PyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
PyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
PyData
 
Ad

Recently uploaded (20)

ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Introducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRCIntroducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRC
Adtran
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Introducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRCIntroducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRC
Adtran
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 

How Web APIs and Data Centric Tools Power the Materials Project (PyData SV 2013)

  • 1. DATA AS A SERVICE How Web APIs and Data-Centric Tools Power the Materials Project Shreyas Cholia ([email protected]) Dan Gunter ([email protected]) Lawrence Berkeley National Laboratory PyData 2013
  • 2. Outline •  Data driven science •  Materials Project Overview •  Open data and APIs •  Dropping APIs on your data •  Things to think about in your API •  Writing libraries for code AND data (pymatgen REST interface) •  Science stories to back this up •  Ipython notebook demo
  • 3. About Us •  Dan and Shreyas are Computer Scientists/Engineers at Berkeley Lab. •  We work with science teams to help build software and computing infrastructure that facilitates awesome SCIENCE
  • 4. Science •  Science is now a collaborative effort •  Large teams of people •  Lots of computational power
  • 6. Big Data Science is increasingly data- driven Computational cycles are cheap Take an –omics approach to science Compute all interesting things first, ask questions later
  • 7. The –omics approach •  Instead of trying to derive a solution and compute the results, just compute the space of all possibilities and look for the optimal result in there. •  OK – so we are generating more data than we know what to do with but that is ok •  (and might be a topic for another talk …)
  • 8. An open science initiative that makes available a huge database of computed materials properties for all materials researchers. The Materials Project Wordcloud showing frequencies of elements in Materials Project's database ..except Oxygen, which appears 12,751 times (3.5x as much as the next most frequent, Phosphorus)
  • 9. The Materials Project http://materialsproject.org/
  • 10. 18 years from creation to commercial manufacture! Teflon Titanium Velcro Polycarbonate GaAs Diamond-like Thin Films Materials Data from: Eagar, T.; King, M. Technology Review (00401692) 1995, 98, 42. invented 1960 19701950 "Need for speed" in new materials Lithium ion S. Whittingham Sony 1980 1990 2000
  • 11. Materials have strategic importance Sept 7, 2010 Japan arrests Chinese boat captain after collision in disputed waters China blocks shipments of Rare Earth Metals to Japan Sept 22, 2010 Japan releases captain Sept 24, 2010 Japan invests in induction motors… coincidence? “Toyota Readying Motors That Don’t Use Rare Earths…” Jan 14, 2011 1:50 PM PT Content for this slide courtesy Gerbrand Ceder, MIT & Kristin Persson, LBNL 2010 "Senkaku Boat Collision Incident"
  • 12. Solution: Computation Many materials properties can be computed 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Voltage  (V) computed experimental  literature stage I+II Stage II Stage I stage III +II + = ΔH = [ E (X) + E (Y) ] – E(XY) Photovoltaics, Thermoelectrics, Energy Storage, Hydrogen, Catalysts, Magnets….
  • 13. Infrastructure Submitted Materials Materials Data Materials Properties Supercomputers   •  Over 10 million CPU hours of calculations in < 6 months •  Over 40,000 successful VASP runs (30,000+ materials) •  Generalizable to other high- throughput codes/analyses Calculation Workflows Supercomputers Codes to run (in sequence) Atomic positions
  • 14. 0 10000 20000 30000 40000 50000 60000 Jul 2011 Oct 2011 Jan 2012 Apr 2012 Jul 2012 Oct 2012 Date Numberofruns state Failed Successful Computation •  Run VASP on NERSC supercomputing resources •  Use Fireworks to manage large groups of runs •  Results in … data for MP Just sit back and enjoy the automation.. Total
  • 15. Data Islands •  Data is still heavily silo-ed and inaccessible. •  Data sits on a machine somewhere, and you give people local ssh/DB accounts to access it. •  Good luck combining multiple datasets •  Does not scale! •  This is 2013 – we can do better!
  • 16. Sharing (your data) is important! •  The Most Important Scientific Result Published in the Last Year J.M. Wicherts, M. Bakker, and D. Molenaar: Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results PLoS ONE, 6(11): e26828, 2011, doi:10.1371/journal.pone.0026828. Content for this slide courtesy Greg Wilson, Software Carpentry
  • 17. Data Sharing •  Open access to data through programmatic interfaces •  Sub-select the data on demand rather than pulling down the entire dataset •  Use your own local tools with centrally managed data •  Everyone sees the same data – better collaboration
  • 18. Web portal •  Materials data stored in a Mongo DB •  http://materialsproject.org web portal makes materials data easily accessible •  Materials Explorer •  Phase Diagrams •  Crystal Toolkit •  Battery Explorer •  Reaction Calculator •  Structure Predictor •  Focus on a highly functional and usable website to query materials data. (We heart Django!) •  Additionally we distribute the tools used to compute and analyze the data as an open source library – pymatgen
  • 19. API access •  But we quickly found that scientists wanted programmatic access to data •  eg. Give me property X for all materials with Li and O so that I can pass it through my own codes •  Lesson – make your data available through an API and people will start to do amazing things
  • 20. Why Web APIs? •  Big push towards HTTP APIs across the web. •  Web APIs provide programmatic access to data and resources to developers over the web •  Access to data as well-defined objects allows users to develop their own custom applications and code Enables a thriving COMMUNITY built around data.
  • 21. What is The Materials API? An open platform for accessing Materials Project data over the web. Flexible and scalable to cater to large number of collaborators, with different access privileges. Simple to use and code agnostic.
  • 22. HTTP API design https://www.materialsproject.org/rest/v1/materials/Fe2O3/vasp/energy Preamble URL Unique Identifier. Eg. a formula (Fe2O3), id (1234) or chemical system (Li-Fe-O) Data type (vasp, exp, etc.) Propert y Materials API maps URLs to data objects
  • 23. Access via an API key •  To maintain privileged access, each user has an associated API key (with certain defined access privileges). •  To get your key, login to materialsproject.org and go to www.materialsproject.org/profile •  All MP https requests must supply API key as: •  A x-api-key header, e.g., {‘X-API-KEY’: ‘MYKEY’}, or •  As a GET or POST variable, e.g., {‘API_KEY’: ‘MYKEY’}
  • 24. Sample JSON output GET https://www.materialsproject.org/rest/v1/materials/Fe2O3/vasp/energy { "created_at": "2013-03-17T09:14:58.158081", "valid_response": true, "version": { "pymatgen": "2.5.4", "db": "2013.02.25", "rest": "1.0" }, "response": [{ "energy": -132.33005625, "material_id": 542309 }, { "energy": -66.62512425, "material_id": 24972 }], "copyright": "Copyright 2012, The Materials Project" } Just the energy and the id of the material
  • 25. Getting started – Hello World API > pip install flask
  • 26. Our dirty little secret •  It involves a certain language that ends with “uby” that we don’t like to talk about in these parts •  Version 0.0.0 was of the Materials Project was coded in Sinatra •  Sinatra is a microframework much like Flask •  But it proves that this approach is viable and can be the onramp to more amazing things.
  • 27. Un-considerations •  Don’t worry too much about pure REST •  Initially just think of how URLs and verbs can map to functions •  Don’t worry too much about data formats •  JSON is easy and a great place to start •  Feel free to avoid XML unless you really need it
  • 28. Our Stack •  Apache + mod_wsgi •  Django •  pymatgen •  pymongo + Mongo DB
  • 29. pymatgen •  The open source python library that powers the Materials Project. •  Defines core Python objects for materials data representation. •  Provides a well-tested set of structure and thermodynamic analysis tools relevant to many applications. •  Establishes an open platform for researchers to collaboratively develop sophisticated analyses of materials data obtained both from first principles calculations and experiments.
  • 30. Integration with pymatgen The Materials API Powerful Materials Analytics Tool
  • 31. Where we’re going with this •  Libraries that integrate data with computation! •  The scientific python ecosystem has a ton of data analysis tools and libraries •  Just starting to think about baking in datasets directly into these tools •  Pymatgen allows you to access core MP data directly from the library
  • 32. Compute + data pymatgen has hooks into the materials data so you can do stuff like this: entries = api.get_entries_in_chemsys([’Li', ‘Fe', 'O']) But it also has computational tools that you can then use to act on the data pd = PhaseDiagram(entries)
  • 33. Blurring the lines •  Yes – we are blurring the lines between compute and data •  But this is not a new idea •  Think of all the tools built around commercial APIs •  Twitter, Netflix etc. - python clients built around the API
  • 34. Write First Class Science Functions •  Web APIs are extremely useful, but ultimately you want to encapsulate core science functionality as python functions so that scientists aren’t worrying about things like How do I set the X-API-KEY header?
  • 35. Sample use cases •  Screening for CO2 sorbents (with Clare Grey) •  Using the Materials API (MAPI) + pymatgen to calculate reaction energies of thousands of oxides with CO2. •  Calculation of XAFS, XANES and other spectra for clusters of atoms (with Alan Dozier) •  Alan wrote a io add-on to pymatgen for FEFF input/output. •  Uses MAPI + pymatgen to extract structures. •  Defects (with Maciej Haranczyk) •  Uses MAPI + pymatgen to pull structures to perform Voronoi analysis to find possible interstitial sites.
  • 36. Ipython Notebook Examples •  http://nbviewer.ipython.org/5199610 •  http://nbviewer.ipython.org/5022735
  • 37. from pymatgen.matproj.rest import MPRester #This initializes the REST adaptor. Put your own API key in. a = MPRester("YOUR_API_KEY")   #This gives you the Structure corresponding to material id 2254 in the Materials Project. structure = a.get_structure_by_material_id(2254)   #Entries are the basic unit for thermodynamic and other analyses in pymatgen. #This gets all entries belonging to the Ca-C-O system. entries = a.get_entries_in_chemsys(['Ca', 'C', 'O']) #With entries, you can do many sophisticated analyses, #like creating phase diagrams. pd = PhaseDiagram(entries) plotter = PDPlotter(pd) plotter.show() Materials API + pymatgen example
  • 38. Sandboxes •  A virtual private dataset •  Useful for •  Everyone as a sort of "scratch" space •  Industry partners who want to use the tools but not share their data
  • 39. Import format: Structure Notation Language (SNL) •  Contains structure/molecule object, and provenance about created_at authors projects references remarks data history Another way to remember the acronym..
  • 40. Fireworks •  FireWorks is a code for defining, managing, and executing scientific workflows •  It can be used to automate most types of calculations over arbitrary computing resources, including those that have a queueing system •  It is very dynamic: Fireworks can begat other fireworks at runtime http://pythonhosted.org/FireWorks/
  • 41. Pymatgen-db •  Sick of MongoHub et al.? We were. So we wrote a simple Web UI using prettytable, pymatgen, and Django •  https://github.com/materialsproject/pymatgen-db Which we proceeded to use for deep scientific inquiry
  • 42. We’re not the only ones … •  Bioinformatics •  KBase (http://kbase.us) - DOE predictive and systems biology. •  Astronomy •  Sloan Digital Sky Survey (http://skyserver.sdss.org) •  Spectroscopy •  Advanced Light Source (ALS), Advanced Photon Source (APS) •  According to ProgrammableWeb, ~130 others http://www.programmableweb.com/apis/directory/1?apicat=Science&protocol=REST ..though probably many of these are
  • 43. More information •  Materials API + pymatgen examples •  https://gist.github.com/gists/search?q=materials+api+pymatgen •  The Materials API wiki •  https://materialsproject.org/wiki/index.php/The_Materials_API •  Python Materials Genomics •  http://packages.python.org/pymatgen/ •  Shyue Ping Ong, William Davidson Richard, Anubhav Jain, Geoffroy Hautier, Michael Kocher, Shreyas Cholia, Dan Gunter, Vincent Chevrier, Kristin A. Persson, Gerbrand Ceder. Python Materials Genomics (pymatgen) : A Robust, Open-Source Python Library for Materials Analysis. (submitted) •  These slides: •  https://speakerdeck.com/shreddd/data-as-a-service-pydata-2013
  • 44. Takeaways •  Make scientific data easily available to end-users •  Friendly, powerful Web UI is a great way to engage, but then.. •  Build APIs around your data to make it easily accessible •  Write scientific libraries with *both* analysis and data, by hooking them up to APIs.
  • 45. We’re hiring •  Talented, science-loving, web-savvy, math-anything Python programming code-slingers who would rather pass a Nobel prize winner on the way to lunch than get free dry-cleaning •  downside: or even free coffee (groan) •  upside: some of your tax dollars go towards your own salary! •  http://jobs.materialsproject.org/
  • 46. Contact Us •  Shreyas Cholia – [email protected] •  Dan Gunter – [email protected] •  Materials Project Team – [email protected]