SlideShare a Scribd company logo
BigML Summer 2016 Release
Introducing
Logistic Regression
BigML, Inc 2Summer Release Webinar - September 2016
Summer 2016 Release
POUL PETERSEN (CIO)
Enter questions into chat box – we’ll
answer some via chat; others at the end of
the session
https://bigml.com/releases
ATAKAN CETINSOY, (VP Predictive Applications)
Resources
Moderator
Speaker
Contact info@bigml.com
Twitter @bigmlcom
Questions
Logistic Regression
BigML, Inc 4Summer Release Webinar - September 2016
Logistic Regression
• Introduced by David Cox
in 1958
• BigML API since 2015
• Now Fully "BigML"
BigML, Inc 5Summer Release Webinar - September 2016
BigML Resources
SOURCE DATASET CORRELATION
STATISTICAL
TEST
MODEL ENSEMBLE
LOGISTIC
REGRESSION EVALUATION
ANOMALY
DETECTOR
ASSOCIATION
DISCOVERY
PREDICTION
BATCH
PREDICTIONSCRIPT LIBRARY EXECUTION
Data
Exploration
Supervised
Learning
Unsupervised
Learning
Automation
CLUSTER
Scoring
BigML, Inc 6Summer Release Webinar - September 2016
Supervised Learning
LabelFeatures
Instances
• Learn from instances
• Each instance has features
• And a known label
Label is a categorical
• Will this customer churn?
• What item should I recommend?
• Does this patient have diabetes?
Label is a numeric
• How many customers will churn?
• How much will they spend?
• What is your life expectancy?
Classification Regression
BigML, Inc 7Summer Release Webinar - September 2016
Logistic Regression
• Classification implies a discrete objective. How
can this be a regression?
• Why do we need another classification
algorithm?
• more questions….
Logistic Regression is a classification algorithm
BigML, Inc 8Summer Release Webinar - September 2016
Linear Regression
BigML, Inc 9Summer Release Webinar - September 2016
Linear Regression
BigML, Inc 10Summer Release Webinar - September 2016
Polynomial Regression
BigML, Inc 11Summer Release Webinar - September 2016
Regression
• What function can we fit to discrete data?
Key Take-Away: Fitting a function to the data
BigML, Inc 12Summer Release Webinar - September 2016
Discrete Data Function?
BigML, Inc 13Summer Release Webinar - September 2016
Discrete Data Function?
????
BigML, Inc 14Summer Release Webinar - September 2016
Logistic Function
•x→-∞ : f(x)→0
•x→∞ : f(x)→1
•Looks promising, but still not 

"discrete"
BigML, Inc 15Summer Release Webinar - September 2016
Probabilities
P≈0 P≈10<P<1
BigML, Inc 16Summer Release Webinar - September 2016
Logistic Regression
• Assumes that output is linearly related to
"predictors"

… but we can "fix" this with feature engineering
• How do we "fit" the logistic function to real data?
LR is a classification algorithm … that models
the probability of the output class.
BigML, Inc 17Summer Release Webinar - September 2016
Logistic Regression
β₀ is the "intercept"
β₁ is the "coefficient"
The inverse of the logistic function is called the "logit":
In which case solving is now a linear regression
BigML, Inc 18Summer Release Webinar - September 2016
Logistic Regression
If we have multiple dimensions, add more coefficients:
Logistic Regression Demo #1
BigML, Inc 20Summer Release Webinar - September 2016
LR Parameters
1. Bias: Allows an intercept term.
Important if P(x=0) != 0
2. Regularization:
• L1: prefers zeroing individual coefficients
• L2: prefers pushing all coefficients towards zero
3. EPS: The minimum error between steps to stop.
4. Auto-scaling: Ensures that all features contribute
equally.
• Unless there is a specific need to not auto-scale,
it is recommended.
BigML, Inc 21Summer Release Webinar - September 2016
Logistic Regression
• How do we handle multiple classes?
• What about non-numeric inputs?
BigML, Inc 22Summer Release Webinar - September 2016
LR Multi-Class
• Instead of a binary class ex: [ true, false ], we have multi-
class ex: [ red, green, blue, … ]
• consider “k” classes
• solve “k” one-vs-rest LRs
• Result: coefficients βᵢ for 

each of the “k” classes
BigML, Inc 23Summer Release Webinar - September 2016
LR Field Codings
• LR is expecting numeric values to perform regression.
• How do we handle categorical values, or text?
Class color=red color=blue color=green color=NULL
red 1 0 0 0
blue 0 1 0 0
green 0 0 1 0
NULL 0 0 0 1
One-hot encoding
Only one feature is "hot" for each class
BigML, Inc 24Summer Release Webinar - September 2016
LR Field Codings
Dummy Encoding
Chooses a *reference class*
requires one less degree of freedom
Class color_1 color_2 color_3
*red* 0 0 0
blue 1 0 0
green 0 1 0
NULL 0 0 1
BigML, Inc 25Summer Release Webinar - September 2016
LR Field Codings
Contrast Encoding
Field values must sum to zero
Allows comparison between classes
…. so which one?
Class field
red 0,5
blue -0,25
green -0,25
NULL 0
influence
positive
negative
negative
excluded
BigML, Inc 26Summer Release Webinar - September 2016
LR Field Codings
• The "text" type gives us new features that have
counts of the number of times each token occurs in
the text field. "Items" can be treated the same way.
token "hippo" "safari" "zebra"
instance_1 3 0 1
instance_2 0 11 4
instance_3 0 0 0
instance_4 1 0 3
Text / Items ?
Logistic Regression Demo #2
BigML, Inc 28Summer Release Webinar - September 2016
Curvilinear LR
Instead of
We could add a feature
Where
????
Possible to add any higher order terms or other functions to
match shape of data
Logistic Regression Demo #3
BigML, Inc 30Summer Release Webinar - September 2016
LR versus DT
• Expects a "smooth" linear
relationship with predictors.
• LR is concerned with probability of
a binary outcome.
• Lots of parameters to get wrong: 

regularization, scaling, codings
• Slightly less prone to over-fitting

• Because fits a shape, might work
better when less data available.

• Adapts well to ragged non-linear
relationships
• No concern: classification,
regression, multi-class all fine.
• Virtually parameter free

• Slightly more prone to over-fitting

• Prefers surfaces parallel to
parameter axes, but given enough
data will discover any shape.
Logistic Regression Decision Tree
BigML, Inc 31Summer Release Webinar - September 2016
DT Boundaries
Splits
x <= 0.5
y > -0.29
x < -0.18
z=1
Logistic Regression
BigML, Inc 33Summer Release Webinar - September 2016
BigML Education
• 78 BigML ambassadors and increasing everyday…
BigML, Inc 34Summer Release Webinar - September 2016
BigML Education
• Many students from over 620 universities are learning with
the education program.
BigML, Inc 35Summer Release Webinar - September 2016
BigML Education
• Enjoy the BigML PRO subscription plan, worth $300 per
month, free of charge for a full year.
• Promote BigML in your campus and spread the word.
• We help you organize Machine Learning events,
workshops, meetups, etc., and provide you with learning
material. We are open to new ideas.
• Get a BigML t-shirt and other merchandising material.
• Be part of the BigML community!
Questions?
Twitter: @bigmlcom
Mail: info@bigml.com
Docs: https://bigml.com/releases

More Related Content

PDF
BigML Fall 2016 Release
PDF
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
PDF
VSSML16 L6. Feature Engineering
PDF
VSSML16 LR1. Summary Day 1
PDF
BSSML16 L10. Summary Day 2 Sessions
PDF
VSSML16 L7. REST API, Bindings, and Basic Workflows
PDF
BSSML16 L8. REST API, Bindings, and Basic Workflows
PDF
Web UI, Algorithms, and Feature Engineering
BigML Fall 2016 Release
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L6. Feature Engineering
VSSML16 LR1. Summary Day 1
BSSML16 L10. Summary Day 2 Sessions
VSSML16 L7. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic Workflows
Web UI, Algorithms, and Feature Engineering

What's hot (20)

PDF
VSSML16 L5. Basic Data Transformations
PDF
API, WhizzML and Apps
PDF
BSSML16 L7. Feature Engineering
PDF
BSSML17 - API and WhizzML
PDF
VSSML17 L5. Basic Data Transformations and Feature Engineering
PDF
BSSML17 - Feature Engineering
PDF
VSSML17 Review. Summary Day 2 Sessions
PDF
BigML Education - Feature Engineering with Flatline
PDF
BSSML16 L6. Basic Data Transformations
PDF
BigML Summer 2017 Release
PDF
BSSML17 - Logistic Regressions
PDF
BSSML17 - Deepnets
PDF
MLSD18. Feature Engineering
PDF
BSSML17 - Ensembles
PDF
VSSML18. Feature Engineering
PDF
BSSML16 L4. Association Discovery and Topic Modeling
PPTX
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
PPTX
Linear regression on 1 terabytes of data? Some crazy observations and actions
PPTX
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
PDF
BSSML17 - Time Series
VSSML16 L5. Basic Data Transformations
API, WhizzML and Apps
BSSML16 L7. Feature Engineering
BSSML17 - API and WhizzML
VSSML17 L5. Basic Data Transformations and Feature Engineering
BSSML17 - Feature Engineering
VSSML17 Review. Summary Day 2 Sessions
BigML Education - Feature Engineering with Flatline
BSSML16 L6. Basic Data Transformations
BigML Summer 2017 Release
BSSML17 - Logistic Regressions
BSSML17 - Deepnets
MLSD18. Feature Engineering
BSSML17 - Ensembles
VSSML18. Feature Engineering
BSSML16 L4. Association Discovery and Topic Modeling
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Linear regression on 1 terabytes of data? Some crazy observations and actions
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
BSSML17 - Time Series
Ad

Viewers also liked (20)

PPTX
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
PPTX
Recommending Tags with a Model of Human Categorization
PPTX
Analysis of Reviews on Sony Z3
PPTX
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
PDF
Geometric Aspects of LSA
PPTX
20 cv mil_models_for_words
PDF
AutoCardSorter - Designing the Information Architecture of a web site using L...
PDF
Mathematical approach for Text Mining 1
PPTX
Latent Semantic Indexing and Search Engines Optimimization (SEO)
PDF
Practical Machine Learning
PPT
Mining Features from the Object-Oriented Source Code of a Collection of Softw...
PPT
SNAPP - Learning Analytics and Knowledge Conference 2011
PPTX
A Semantics-based Approach to Machine Perception
PDF
Latent Semantic Transliteration using Dirichlet Mixture
PPTX
An approach to source code plagiarism
PDF
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
PDF
Blei ngjordan2003
PPTX
Intro to Logistic Regression
PDF
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
PDF
Latent Topic-semantic Indexing based Automatic Text Summarization
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
Recommending Tags with a Model of Human Categorization
Analysis of Reviews on Sony Z3
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
Geometric Aspects of LSA
20 cv mil_models_for_words
AutoCardSorter - Designing the Information Architecture of a web site using L...
Mathematical approach for Text Mining 1
Latent Semantic Indexing and Search Engines Optimimization (SEO)
Practical Machine Learning
Mining Features from the Object-Oriented Source Code of a Collection of Softw...
SNAPP - Learning Analytics and Knowledge Conference 2011
A Semantics-based Approach to Machine Perception
Latent Semantic Transliteration using Dirichlet Mixture
An approach to source code plagiarism
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
Blei ngjordan2003
Intro to Logistic Regression
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
Latent Topic-semantic Indexing based Automatic Text Summarization
Ad

Similar to BigML Summer 2016 Release (20)

PDF
BSSML16 L2. Ensembles and Logistic Regressions
PDF
BigML Education - Logistic Regression
PDF
MLSEV. Logistic Regression, Deepnets, and Time Series
PDF
VSSML16 L2. Ensembles and Logistic Regression
PDF
DutchMLSchool. Logistic Regression, Deepnets, Time Series
PDF
Machine learning4dummies
PDF
Machine Learning and Deep Learning 4 dummies
PDF
MLSD18. Ensembles, Logistic Regression, Deepnets
PDF
VSSML17 Review. Summary Day 1 Sessions
PDF
Thomas Jensen. Machine Learning
PPTX
Supervised Machine Learning Algorithms
PPTX
ML Study Jams - Session 3.pptx
PDF
BSSML16 L5. Summary Day 1 Sessions
PDF
DutchMLSchool 2022 - End-to-End ML
PDF
classification_clean.pdf
PDF
Machine learning Introduction
PDF
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
PDF
Introduction to machine learning
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PDF
VSSML18. Introduction to Machine Learning and the BigML Platform
BSSML16 L2. Ensembles and Logistic Regressions
BigML Education - Logistic Regression
MLSEV. Logistic Regression, Deepnets, and Time Series
VSSML16 L2. Ensembles and Logistic Regression
DutchMLSchool. Logistic Regression, Deepnets, Time Series
Machine learning4dummies
Machine Learning and Deep Learning 4 dummies
MLSD18. Ensembles, Logistic Regression, Deepnets
VSSML17 Review. Summary Day 1 Sessions
Thomas Jensen. Machine Learning
Supervised Machine Learning Algorithms
ML Study Jams - Session 3.pptx
BSSML16 L5. Summary Day 1 Sessions
DutchMLSchool 2022 - End-to-End ML
classification_clean.pdf
Machine learning Introduction
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Introduction to machine learning
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
VSSML18. Introduction to Machine Learning and the BigML Platform

More from BigML, Inc (20)

PDF
Digital Transformation and Process Optimization in Manufacturing
PDF
DutchMLSchool 2022 - Automation
PDF
DutchMLSchool 2022 - ML for AML Compliance
PDF
DutchMLSchool 2022 - Multi Perspective Anomalies
PDF
DutchMLSchool 2022 - My First Anomaly Detector
PDF
DutchMLSchool 2022 - Anomaly Detection
PDF
DutchMLSchool 2022 - History and Developments in ML
PDF
DutchMLSchool 2022 - A Data-Driven Company
PDF
DutchMLSchool 2022 - ML in the Legal Sector
PDF
DutchMLSchool 2022 - Smart Safe Stadiums
PDF
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
PDF
DutchMLSchool 2022 - Anomaly Detection at Scale
PDF
DutchMLSchool 2022 - Citizen Development in AI
PDF
Democratizing Object Detection
PDF
BigML Release: Image Processing
PDF
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
PDF
Machine Learning in Retail: ML in the Retail Sector
PDF
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
PDF
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
PDF
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
Digital Transformation and Process Optimization in Manufacturing
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Citizen Development in AI
Democratizing Object Detection
BigML Release: Image Processing
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: ML in the Retail Sector
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance

Recently uploaded (20)

PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
New ISO 27001_2022 standard and the changes
PDF
Business Analytics and business intelligence.pdf
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
IMPACT OF LANDSLIDE.....................
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Introduction to the R Programming Language
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Introduction to Data Science and Data Analysis
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
STERILIZATION AND DISINFECTION-1.ppthhhbx
IBA_Chapter_11_Slides_Final_Accessible.pptx
New ISO 27001_2022 standard and the changes
Business Analytics and business intelligence.pdf
importance of Data-Visualization-in-Data-Science. for mba studnts
IMPACT OF LANDSLIDE.....................
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Introduction to the R Programming Language
A Complete Guide to Streamlining Business Processes
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Optimise Shopper Experiences with a Strong Data Estate.pdf
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Introduction to Data Science and Data Analysis
SAP 2 completion done . PRESENTATION.pptx
ISS -ESG Data flows What is ESG and HowHow

BigML Summer 2016 Release

  • 1. BigML Summer 2016 Release Introducing Logistic Regression
  • 2. BigML, Inc 2Summer Release Webinar - September 2016 Summer 2016 Release POUL PETERSEN (CIO) Enter questions into chat box – we’ll answer some via chat; others at the end of the session https://bigml.com/releases ATAKAN CETINSOY, (VP Predictive Applications) Resources Moderator Speaker Contact [email protected] Twitter @bigmlcom Questions
  • 4. BigML, Inc 4Summer Release Webinar - September 2016 Logistic Regression • Introduced by David Cox in 1958 • BigML API since 2015 • Now Fully "BigML"
  • 5. BigML, Inc 5Summer Release Webinar - September 2016 BigML Resources SOURCE DATASET CORRELATION STATISTICAL TEST MODEL ENSEMBLE LOGISTIC REGRESSION EVALUATION ANOMALY DETECTOR ASSOCIATION DISCOVERY PREDICTION BATCH PREDICTIONSCRIPT LIBRARY EXECUTION Data Exploration Supervised Learning Unsupervised Learning Automation CLUSTER Scoring
  • 6. BigML, Inc 6Summer Release Webinar - September 2016 Supervised Learning LabelFeatures Instances • Learn from instances • Each instance has features • And a known label Label is a categorical • Will this customer churn? • What item should I recommend? • Does this patient have diabetes? Label is a numeric • How many customers will churn? • How much will they spend? • What is your life expectancy? Classification Regression
  • 7. BigML, Inc 7Summer Release Webinar - September 2016 Logistic Regression • Classification implies a discrete objective. How can this be a regression? • Why do we need another classification algorithm? • more questions…. Logistic Regression is a classification algorithm
  • 8. BigML, Inc 8Summer Release Webinar - September 2016 Linear Regression
  • 9. BigML, Inc 9Summer Release Webinar - September 2016 Linear Regression
  • 10. BigML, Inc 10Summer Release Webinar - September 2016 Polynomial Regression
  • 11. BigML, Inc 11Summer Release Webinar - September 2016 Regression • What function can we fit to discrete data? Key Take-Away: Fitting a function to the data
  • 12. BigML, Inc 12Summer Release Webinar - September 2016 Discrete Data Function?
  • 13. BigML, Inc 13Summer Release Webinar - September 2016 Discrete Data Function? ????
  • 14. BigML, Inc 14Summer Release Webinar - September 2016 Logistic Function •x→-∞ : f(x)→0 •x→∞ : f(x)→1 •Looks promising, but still not 
 "discrete"
  • 15. BigML, Inc 15Summer Release Webinar - September 2016 Probabilities P≈0 P≈10<P<1
  • 16. BigML, Inc 16Summer Release Webinar - September 2016 Logistic Regression • Assumes that output is linearly related to "predictors"
 … but we can "fix" this with feature engineering • How do we "fit" the logistic function to real data? LR is a classification algorithm … that models the probability of the output class.
  • 17. BigML, Inc 17Summer Release Webinar - September 2016 Logistic Regression β₀ is the "intercept" β₁ is the "coefficient" The inverse of the logistic function is called the "logit": In which case solving is now a linear regression
  • 18. BigML, Inc 18Summer Release Webinar - September 2016 Logistic Regression If we have multiple dimensions, add more coefficients:
  • 20. BigML, Inc 20Summer Release Webinar - September 2016 LR Parameters 1. Bias: Allows an intercept term. Important if P(x=0) != 0 2. Regularization: • L1: prefers zeroing individual coefficients • L2: prefers pushing all coefficients towards zero 3. EPS: The minimum error between steps to stop. 4. Auto-scaling: Ensures that all features contribute equally. • Unless there is a specific need to not auto-scale, it is recommended.
  • 21. BigML, Inc 21Summer Release Webinar - September 2016 Logistic Regression • How do we handle multiple classes? • What about non-numeric inputs?
  • 22. BigML, Inc 22Summer Release Webinar - September 2016 LR Multi-Class • Instead of a binary class ex: [ true, false ], we have multi- class ex: [ red, green, blue, … ] • consider “k” classes • solve “k” one-vs-rest LRs • Result: coefficients βᵢ for 
 each of the “k” classes
  • 23. BigML, Inc 23Summer Release Webinar - September 2016 LR Field Codings • LR is expecting numeric values to perform regression. • How do we handle categorical values, or text? Class color=red color=blue color=green color=NULL red 1 0 0 0 blue 0 1 0 0 green 0 0 1 0 NULL 0 0 0 1 One-hot encoding Only one feature is "hot" for each class
  • 24. BigML, Inc 24Summer Release Webinar - September 2016 LR Field Codings Dummy Encoding Chooses a *reference class* requires one less degree of freedom Class color_1 color_2 color_3 *red* 0 0 0 blue 1 0 0 green 0 1 0 NULL 0 0 1
  • 25. BigML, Inc 25Summer Release Webinar - September 2016 LR Field Codings Contrast Encoding Field values must sum to zero Allows comparison between classes …. so which one? Class field red 0,5 blue -0,25 green -0,25 NULL 0 influence positive negative negative excluded
  • 26. BigML, Inc 26Summer Release Webinar - September 2016 LR Field Codings • The "text" type gives us new features that have counts of the number of times each token occurs in the text field. "Items" can be treated the same way. token "hippo" "safari" "zebra" instance_1 3 0 1 instance_2 0 11 4 instance_3 0 0 0 instance_4 1 0 3 Text / Items ?
  • 28. BigML, Inc 28Summer Release Webinar - September 2016 Curvilinear LR Instead of We could add a feature Where ???? Possible to add any higher order terms or other functions to match shape of data
  • 30. BigML, Inc 30Summer Release Webinar - September 2016 LR versus DT • Expects a "smooth" linear relationship with predictors. • LR is concerned with probability of a binary outcome. • Lots of parameters to get wrong: 
 regularization, scaling, codings • Slightly less prone to over-fitting
 • Because fits a shape, might work better when less data available.
 • Adapts well to ragged non-linear relationships • No concern: classification, regression, multi-class all fine. • Virtually parameter free
 • Slightly more prone to over-fitting
 • Prefers surfaces parallel to parameter axes, but given enough data will discover any shape. Logistic Regression Decision Tree
  • 31. BigML, Inc 31Summer Release Webinar - September 2016 DT Boundaries Splits x <= 0.5 y > -0.29 x < -0.18 z=1
  • 33. BigML, Inc 33Summer Release Webinar - September 2016 BigML Education • 78 BigML ambassadors and increasing everyday…
  • 34. BigML, Inc 34Summer Release Webinar - September 2016 BigML Education • Many students from over 620 universities are learning with the education program.
  • 35. BigML, Inc 35Summer Release Webinar - September 2016 BigML Education • Enjoy the BigML PRO subscription plan, worth $300 per month, free of charge for a full year. • Promote BigML in your campus and spread the word. • We help you organize Machine Learning events, workshops, meetups, etc., and provide you with learning material. We are open to new ideas. • Get a BigML t-shirt and other merchandising material. • Be part of the BigML community!
  • 36. Questions? Twitter: @bigmlcom Mail: [email protected] Docs: https://bigml.com/releases