SlideShare a Scribd company logo
Scalable Machine Learning
in R and Python with H2O
Erin LeDell Ph.D.

Machine Learning Scientist

H2O.ai
Berkeley, CA April 2017
East Bay AI and DL Meetup
Introduction
• Statistician & Machine Learning Scientist at H2O.ai,

in Mountain View, California, USA
• Ph.D. in Biostatistics with Designated Emphasis in
Computational Science and Engineering from 

UC Berkeley (focus on Machine Learning)
• Worked as a data scientist at several startups
Agenda
• Who/What is H2O?
• H2O Machine Learning Platform
• H2O in R and Python
• H2O Tutorials: Intro, Grid, DL, Stacking
• New/active developments in H2O
H2O.ai
H2O.ai, the
Company
H2O, the
Platform
• Founded in 2012
• Stanford & Purdue Math & Systems Engineers
• Headquarters: Mountain View, California, USA
• Open Source Software (Apache 2.0 Licensed)
• R, Python, Scala, Java and Web Interfaces
• Distributed Algorithms that Scale to Big Data
Scientific Advisory Council
• John A. Overdeck Professor of Mathematics, Stanford University
• PhD in Statistics, Stanford University
• Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining
• Co-author with John Chambers, Statistical Models in S
• Co-author, Generalized Additive Models
Dr. Trevor Hastie
• Professor of Statistics and Health Research and Policy, Stanford University
• PhD in Statistics, Stanford University
• Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining
• Author, Regression Shrinkage and Selection via the Lasso
• Co-author, An Introduction to the Bootstrap
Dr. Robert Tibshirani
• Professor of Electrical Engineering and Computer Science, Stanford University
• PhD in Electrical Engineering and Computer Science, UC Berkeley
• Co-author, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
• Co-author, Linear Matrix Inequalities in System and Control Theory
• Co-author, Convex Optimization
Dr. Steven Boyd
H2O Platform
H2O Platform Overview
• Distributed implementations of cutting edge ML algorithms.
• Core algorithms written in high performance Java.
• APIs available in R, Python, Scala, REST/JSON.
• Interactive Web GUI called H2O Flow.
• Easily deploy models to production with H2O Steam.
H2O Platform Overview
• Write code in high-level language like R (or use the web
GUI) and output production-ready models in Java.
• To scale, just add nodes to your H2O cluster.
• Works with Hadoop, Spark and your laptop.
H2O Distributed Computing
H2O Cluster
H2O Frame
• Multi-node cluster with shared memory model.
• All computations in memory.
• Each node sees only some rows of the data.
• No limit on cluster size.
• Distributed data frames (collection of vectors).
• Columns are distributed (across nodes) arrays.
• Works just like R’s data.frame or Python Pandas
DataFrame
Current Algorithm Overview
Statistical Analysis
• Linear Models (GLM)
• Naïve Bayes
Ensembles
• Random Forest
• Distributed Trees
• Gradient Boosting Machine
• Stacked Ensembles
Deep Neural Networks
• Multi-layer Feed-Forward Neural
Network
• Auto-encoder
• Anomaly Detection
• Deep Features
Clustering
• K-Means
Dimension Reduction
• Principal Component Analysis
• Generalized Low Rank Models
Solvers & Optimization
• Generalized ADMM Solver
• L-BFGS (Quasi Newton Method)
• Ordinary Least-Square Solver
• Stochastic Gradient Descent
Data Munging
• Scalable Data Frames
• Sort, Slice, Log Transform
H2O in R & Python
h2o R Package
Installation
• Java 7 or later; R 3.1 and above; Linux, Mac, Windows
• The easiest way to install the h2o R package is CRAN.
• Latest version: http://www.h2o.ai/download/h2o/r
Design
All computations are performed in highly optimized
Java code in the H2O cluster, initiated by REST calls
from R.
h2o Python Module
Installation
• Java 7 or later; Python 2.7, 3.5; Linux, Mac, Windows
• The easiest way to install the h2o Python module is PyPI.
• Latest version: http://www.h2o.ai/download/h2o/python
Design
All computations are performed in highly optimized
Java code in the H2O cluster, initiated by REST calls
from Python.
H2O R & Python
Tutorials
https://github.com/h2oai/h2o-tutorials
R & Py Tutorial: Intro to H2O Algorithms
The “Intro to H2O” tutorial introduces
five popular supervised machine
learning algorithms in the context of
a binary classification problem.
The training module demonstrates
how to train models and evaluate
model performance on a test set.
• Generalized Linear Model (GLM)
• Random Forest (RF)
• Gradient Boosting Machine (GBM)
• Deep Learning (DL)
• Naive Bayes (NB)
R & Py Tutorial: Grid Search for Model Selection
The second training
module demonstrates
how to find the best set
of model parameters
for each model using
Grid Search.
R Tutorial: Deep Learning
The “Deep Learning in R” tutorial
gives an overview of how to train
H2O deep neural networks in R.
• Deep Learning via Multilayer
Perceptrons (MLPs)
• Early Stopping
• Random Grid Search
• Deep Learning Autoencoders
• Unsupervised Pretraining
• Deep Features
•Anomaly Detection
Tutorial: Stacked Ensembles
There are two H2O Ensemble tutorials: One for the new Stacked
Ensemble method in h2o and one for the h2oEnsemble R
package which extends the h2o R API.
H2O Resources
• Documentation: http://docs.h2o.ai
• Online Training: http://learn.h2o.ai
• Tutorials: https://github.com/h2oai/h2o-tutorials
• Slidedecks: https://github.com/h2oai/h2o-meetups
• Video Presentations: https://www.youtube.com/user/0xdata
• Events & Meetups: http://h2o.ai/events
H2O Booklets
http://docs.h2o.ai/
Thank you!
@ledell on Github, Twitter
erin@h2o.ai
http://www.stat.berkeley.edu/~ledell
Appendix: New & Active
Developments in H2O
Deep Water
https://github.com/h2oai/deepwater
•Native implementation of Deep Learning models for GPU-optimized
backends (mxnet, Caffe, TensorFlow, etc.)
•State-of-the-art Deep Learning models trained from the H2O Platform
•Provides an easy to use interface to any of the Deep Water backends.
•Extends the H2O platform to include Convolutional Neural Nets
(CNNs) and Recurrent Neural Nets (RNNs) including LSTMs
rsparkling
https://github.com/h2oai/rsparkling
•This provides an interface to H2O's machine learning algorithms on
Spark, using R.
•This is an extension package for RStudio’s sparklyr package that
creates an R front-end for a Spark package (e.g. Sparking Water).
•This package implements only the most basic functionality (creating an
H2OContext, showing the H2O Flow interface, and converting a Spark
DataFrame to an H2OFrame or vice versa).
H2O AutoML
Preview: https://github.com/h2oai/h2o-3/tree/automl
•AutoML stands for “Automatic Machine Learning”
•The idea here is to remove most (or all) of the parameters from the
algorithm, as well as automatically generate derived features that will
aid in learning.
•Single algorithms are tuned automatically using a carefully constructed
random grid search (future: Bayesian Optimization algorithms).
•Optionally, a Stacked Ensemble can be constructed.

More Related Content

What's hot (20)

ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
Sri Ambati
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow
 
Intro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize SeattleIntro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize Seattle
Sri Ambati
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Sri Ambati
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas Meetup
Sri Ambati
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara University
Sri Ambati
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
Sri Ambati
 
ISAX
ISAXISAX
ISAX
Sri Ambati
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
Jo-fai Chow
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Jo-fai Chow
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join Slides
Sri Ambati
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
Sri Ambati
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Scalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2OScalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2O
Sri Ambati
 
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
Jo-fai Chow
 
Spark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scaleSpark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scale
Mateusz Dymczyk
 
Skutil - H2O meets Sklearn - Taylor Smith
Skutil - H2O meets Sklearn - Taylor SmithSkutil - H2O meets Sklearn - Taylor Smith
Skutil - H2O meets Sklearn - Taylor Smith
Sri Ambati
 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2O
Ian Gomez
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2O
odsc
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
Sri Ambati
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow
 
Intro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize SeattleIntro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize Seattle
Sri Ambati
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Sri Ambati
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas Meetup
Sri Ambati
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara University
Sri Ambati
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
Sri Ambati
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
Jo-fai Chow
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Jo-fai Chow
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join Slides
Sri Ambati
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
Sri Ambati
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Scalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2OScalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2O
Sri Ambati
 
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
Jo-fai Chow
 
Spark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scaleSpark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scale
Mateusz Dymczyk
 
Skutil - H2O meets Sklearn - Taylor Smith
Skutil - H2O meets Sklearn - Taylor SmithSkutil - H2O meets Sklearn - Taylor Smith
Skutil - H2O meets Sklearn - Taylor Smith
Sri Ambati
 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2O
Ian Gomez
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2O
odsc
 

Similar to Scalable Machine Learning in R and Python with H2O (20)

New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
Sri Ambati
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
Sri Ambati
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
Introduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
Avkash Chauhan
 
Berlin R Meetup
Berlin R MeetupBerlin R Meetup
Berlin R Meetup
Sri Ambati
 
Intro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer AielloIntro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer Aiello
Sri Ambati
 
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O
Sri Ambati
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
Jo-fai Chow
 
Belgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep WaterBelgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep Water
Sri Ambati
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Jo-fai Chow
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
Open Platform for AI & ML modeling
Open Platform for AI & ML modelingOpen Platform for AI & ML modeling
Open Platform for AI & ML modeling
Institute of Contemporary Sciences
 
Introduction to data science with H2O-Chicago
Introduction to data science with H2O-ChicagoIntroduction to data science with H2O-Chicago
Introduction to data science with H2O-Chicago
Sri Ambati
 
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep GillFrom R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep Gill
Databricks
 
H2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! AalborgH2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! Aalborg
Sri Ambati
 
Sparkling Water Webinar October 29th, 2014
Sparkling Water Webinar October 29th, 2014Sparkling Water Webinar October 29th, 2014
Sparkling Water Webinar October 29th, 2014
Sri Ambati
 
Automatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
Jo-fai Chow
 
Automatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
Sri Ambati
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
Sri Ambati
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
Sri Ambati
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
Introduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
Avkash Chauhan
 
Berlin R Meetup
Berlin R MeetupBerlin R Meetup
Berlin R Meetup
Sri Ambati
 
Intro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer AielloIntro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer Aiello
Sri Ambati
 
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O
Sri Ambati
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
Jo-fai Chow
 
Belgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep WaterBelgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep Water
Sri Ambati
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Jo-fai Chow
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
Introduction to data science with H2O-Chicago
Introduction to data science with H2O-ChicagoIntroduction to data science with H2O-Chicago
Introduction to data science with H2O-Chicago
Sri Ambati
 
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep GillFrom R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep Gill
Databricks
 
H2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! AalborgH2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! Aalborg
Sri Ambati
 
Sparkling Water Webinar October 29th, 2014
Sparkling Water Webinar October 29th, 2014Sparkling Water Webinar October 29th, 2014
Sparkling Water Webinar October 29th, 2014
Sri Ambati
 
Automatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
Jo-fai Chow
 
Automatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
Sri Ambati
 
Ad

More from Sri Ambati (20)

H2O.ai Agents : From Theory to Practice - Support Presentation
H2O.ai Agents : From Theory to Practice - Support PresentationH2O.ai Agents : From Theory to Practice - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Generative AI Starter Track - Support Presentation Slides.pdfH2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide DeckH2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
An In-depth Exploration of Enterprise h2oGPTe  Slide DeckAn In-depth Exploration of Enterprise h2oGPTe  Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
Intro to Enterprise h2oGPTe Presentation Slides
Intro to Enterprise h2oGPTe Presentation SlidesIntro to Enterprise h2oGPTe Presentation Slides
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
Enterprise h2o GPTe Learning Path Slide Deck
Enterprise h2o GPTe Learning Path Slide DeckEnterprise h2o GPTe Learning Path Slide Deck
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
H2O Wave Course Starter - Presentation Slides
H2O Wave Course Starter - Presentation SlidesH2O Wave Course Starter - Presentation Slides
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
Large Language Models (LLMs) - Level 3 Slides
Large Language Models (LLMs) - Level 3 SlidesLarge Language Models (LLMs) - Level 3 Slides
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
Data Science and Machine Learning Platforms (2024) Slides
Data Science and Machine Learning Platforms (2024) SlidesData Science and Machine Learning Platforms (2024) Slides
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
Data Prep for H2O Driverless AI - Slides
Data Prep for H2O Driverless AI - SlidesData Prep for H2O Driverless AI - Slides
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
H2O Cloud AI Developer Services - Slides (2024)
H2O Cloud AI Developer Services - Slides (2024)H2O Cloud AI Developer Services - Slides (2024)
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 2 - Presentation SlidesLLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
LLM Learning Path Level 1 - Presentation Slides
LLM Learning Path Level 1 - Presentation SlidesLLM Learning Path Level 1 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
Hydrogen Torch - Starter Course - Presentation Slides
Hydrogen Torch - Starter Course - Presentation SlidesHydrogen Torch - Starter Course - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
H2O Driverless AI Starter Course - Slides and Assignments
H2O Driverless AI Starter Course - Slides and AssignmentsH2O Driverless AI Starter Course - Slides and Assignments
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O.ai Agents : From Theory to Practice - Support PresentationH2O.ai Agents : From Theory to Practice - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
Sri Ambati
 
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Generative AI Starter Track - Support Presentation Slides.pdfH2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Generative AI Starter Track - Support Presentation Slides.pdf
Sri Ambati
 
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide DeckH2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Sri Ambati
 
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
An In-depth Exploration of Enterprise h2oGPTe  Slide DeckAn In-depth Exploration of Enterprise h2oGPTe  Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Sri Ambati
 
Intro to Enterprise h2oGPTe Presentation Slides
Intro to Enterprise h2oGPTe Presentation SlidesIntro to Enterprise h2oGPTe Presentation Slides
Intro to Enterprise h2oGPTe Presentation Slides
Sri Ambati
 
Enterprise h2o GPTe Learning Path Slide Deck
Enterprise h2o GPTe Learning Path Slide DeckEnterprise h2o GPTe Learning Path Slide Deck
Enterprise h2o GPTe Learning Path Slide Deck
Sri Ambati
 
H2O Wave Course Starter - Presentation Slides
H2O Wave Course Starter - Presentation SlidesH2O Wave Course Starter - Presentation Slides
H2O Wave Course Starter - Presentation Slides
Sri Ambati
 
Large Language Models (LLMs) - Level 3 Slides
Large Language Models (LLMs) - Level 3 SlidesLarge Language Models (LLMs) - Level 3 Slides
Large Language Models (LLMs) - Level 3 Slides
Sri Ambati
 
Data Science and Machine Learning Platforms (2024) Slides
Data Science and Machine Learning Platforms (2024) SlidesData Science and Machine Learning Platforms (2024) Slides
Data Science and Machine Learning Platforms (2024) Slides
Sri Ambati
 
Data Prep for H2O Driverless AI - Slides
Data Prep for H2O Driverless AI - SlidesData Prep for H2O Driverless AI - Slides
Data Prep for H2O Driverless AI - Slides
Sri Ambati
 
H2O Cloud AI Developer Services - Slides (2024)
H2O Cloud AI Developer Services - Slides (2024)H2O Cloud AI Developer Services - Slides (2024)
H2O Cloud AI Developer Services - Slides (2024)
Sri Ambati
 
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 2 - Presentation SlidesLLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 2 - Presentation Slides
Sri Ambati
 
LLM Learning Path Level 1 - Presentation Slides
LLM Learning Path Level 1 - Presentation SlidesLLM Learning Path Level 1 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Sri Ambati
 
Hydrogen Torch - Starter Course - Presentation Slides
Hydrogen Torch - Starter Course - Presentation SlidesHydrogen Torch - Starter Course - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Sri Ambati
 
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
Sri Ambati
 
H2O Driverless AI Starter Course - Slides and Assignments
H2O Driverless AI Starter Course - Slides and AssignmentsH2O Driverless AI Starter Course - Slides and Assignments
H2O Driverless AI Starter Course - Slides and Assignments
Sri Ambati
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
Ad

Recently uploaded (20)

Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...
pranavbodhak
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
Grannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI ExperiencesGrannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI Experiences
Lauren Parr
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...
pranavbodhak
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
Grannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI ExperiencesGrannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI Experiences
Lauren Parr
 

Scalable Machine Learning in R and Python with H2O

  • 1. Scalable Machine Learning in R and Python with H2O Erin LeDell Ph.D.
 Machine Learning Scientist
 H2O.ai Berkeley, CA April 2017 East Bay AI and DL Meetup
  • 2. Introduction • Statistician & Machine Learning Scientist at H2O.ai,
 in Mountain View, California, USA • Ph.D. in Biostatistics with Designated Emphasis in Computational Science and Engineering from 
 UC Berkeley (focus on Machine Learning) • Worked as a data scientist at several startups
  • 3. Agenda • Who/What is H2O? • H2O Machine Learning Platform • H2O in R and Python • H2O Tutorials: Intro, Grid, DL, Stacking • New/active developments in H2O
  • 4. H2O.ai H2O.ai, the Company H2O, the Platform • Founded in 2012 • Stanford & Purdue Math & Systems Engineers • Headquarters: Mountain View, California, USA • Open Source Software (Apache 2.0 Licensed) • R, Python, Scala, Java and Web Interfaces • Distributed Algorithms that Scale to Big Data
  • 5. Scientific Advisory Council • John A. Overdeck Professor of Mathematics, Stanford University • PhD in Statistics, Stanford University • Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining • Co-author with John Chambers, Statistical Models in S • Co-author, Generalized Additive Models Dr. Trevor Hastie • Professor of Statistics and Health Research and Policy, Stanford University • PhD in Statistics, Stanford University • Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining • Author, Regression Shrinkage and Selection via the Lasso • Co-author, An Introduction to the Bootstrap Dr. Robert Tibshirani • Professor of Electrical Engineering and Computer Science, Stanford University • PhD in Electrical Engineering and Computer Science, UC Berkeley • Co-author, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers • Co-author, Linear Matrix Inequalities in System and Control Theory • Co-author, Convex Optimization Dr. Steven Boyd
  • 7. H2O Platform Overview • Distributed implementations of cutting edge ML algorithms. • Core algorithms written in high performance Java. • APIs available in R, Python, Scala, REST/JSON. • Interactive Web GUI called H2O Flow. • Easily deploy models to production with H2O Steam.
  • 8. H2O Platform Overview • Write code in high-level language like R (or use the web GUI) and output production-ready models in Java. • To scale, just add nodes to your H2O cluster. • Works with Hadoop, Spark and your laptop.
  • 9. H2O Distributed Computing H2O Cluster H2O Frame • Multi-node cluster with shared memory model. • All computations in memory. • Each node sees only some rows of the data. • No limit on cluster size. • Distributed data frames (collection of vectors). • Columns are distributed (across nodes) arrays. • Works just like R’s data.frame or Python Pandas DataFrame
  • 10. Current Algorithm Overview Statistical Analysis • Linear Models (GLM) • Naïve Bayes Ensembles • Random Forest • Distributed Trees • Gradient Boosting Machine • Stacked Ensembles Deep Neural Networks • Multi-layer Feed-Forward Neural Network • Auto-encoder • Anomaly Detection • Deep Features Clustering • K-Means Dimension Reduction • Principal Component Analysis • Generalized Low Rank Models Solvers & Optimization • Generalized ADMM Solver • L-BFGS (Quasi Newton Method) • Ordinary Least-Square Solver • Stochastic Gradient Descent Data Munging • Scalable Data Frames • Sort, Slice, Log Transform
  • 11. H2O in R & Python
  • 12. h2o R Package Installation • Java 7 or later; R 3.1 and above; Linux, Mac, Windows • The easiest way to install the h2o R package is CRAN. • Latest version: http://www.h2o.ai/download/h2o/r Design All computations are performed in highly optimized Java code in the H2O cluster, initiated by REST calls from R.
  • 13. h2o Python Module Installation • Java 7 or later; Python 2.7, 3.5; Linux, Mac, Windows • The easiest way to install the h2o Python module is PyPI. • Latest version: http://www.h2o.ai/download/h2o/python Design All computations are performed in highly optimized Java code in the H2O cluster, initiated by REST calls from Python.
  • 14. H2O R & Python Tutorials https://github.com/h2oai/h2o-tutorials
  • 15. R & Py Tutorial: Intro to H2O Algorithms The “Intro to H2O” tutorial introduces five popular supervised machine learning algorithms in the context of a binary classification problem. The training module demonstrates how to train models and evaluate model performance on a test set. • Generalized Linear Model (GLM) • Random Forest (RF) • Gradient Boosting Machine (GBM) • Deep Learning (DL) • Naive Bayes (NB)
  • 16. R & Py Tutorial: Grid Search for Model Selection The second training module demonstrates how to find the best set of model parameters for each model using Grid Search.
  • 17. R Tutorial: Deep Learning The “Deep Learning in R” tutorial gives an overview of how to train H2O deep neural networks in R. • Deep Learning via Multilayer Perceptrons (MLPs) • Early Stopping • Random Grid Search • Deep Learning Autoencoders • Unsupervised Pretraining • Deep Features •Anomaly Detection
  • 18. Tutorial: Stacked Ensembles There are two H2O Ensemble tutorials: One for the new Stacked Ensemble method in h2o and one for the h2oEnsemble R package which extends the h2o R API.
  • 19. H2O Resources • Documentation: http://docs.h2o.ai • Online Training: http://learn.h2o.ai • Tutorials: https://github.com/h2oai/h2o-tutorials • Slidedecks: https://github.com/h2oai/h2o-meetups • Video Presentations: https://www.youtube.com/user/0xdata • Events & Meetups: http://h2o.ai/events
  • 21. Thank you! @ledell on Github, Twitter [email protected] http://www.stat.berkeley.edu/~ledell
  • 22. Appendix: New & Active Developments in H2O
  • 23. Deep Water https://github.com/h2oai/deepwater •Native implementation of Deep Learning models for GPU-optimized backends (mxnet, Caffe, TensorFlow, etc.) •State-of-the-art Deep Learning models trained from the H2O Platform •Provides an easy to use interface to any of the Deep Water backends. •Extends the H2O platform to include Convolutional Neural Nets (CNNs) and Recurrent Neural Nets (RNNs) including LSTMs
  • 24. rsparkling https://github.com/h2oai/rsparkling •This provides an interface to H2O's machine learning algorithms on Spark, using R. •This is an extension package for RStudio’s sparklyr package that creates an R front-end for a Spark package (e.g. Sparking Water). •This package implements only the most basic functionality (creating an H2OContext, showing the H2O Flow interface, and converting a Spark DataFrame to an H2OFrame or vice versa).
  • 25. H2O AutoML Preview: https://github.com/h2oai/h2o-3/tree/automl •AutoML stands for “Automatic Machine Learning” •The idea here is to remove most (or all) of the parameters from the algorithm, as well as automatically generate derived features that will aid in learning. •Single algorithms are tuned automatically using a carefully constructed random grid search (future: Bayesian Optimization algorithms). •Optionally, a Stacked Ensemble can be constructed.