Lecture 2.2 Example Data Preparation Feature Engineering
Lecture 2.2 Example Data Preparation Feature Engineering
Simple
How machines learn rules from examples.
definition:
Determine Split data into Tune model Use the tuned Compare the
question of training and parameters model to form predictions
interest, get test sets. Fit predictions with the
informative model to about your actual values
data. training set. test set for the test set
Example: A spam classifier learns rules from this training set of emails1
Goal:
Use known output values to learn the patterns of the input.
Predict the output value of new examples.
Linear regression
• Models output as linear combination of inputs
Fast to train, effective on high-dimensional data.
Neural networks
• Algorithms inspired by structure and function of the brain.
• Scalable, highly accurate on tasks like image recognition.
Building a model
Use Case & Model
Tune Predict Evaluate
Data training
Use Case & Model
Tune Predict Evaluate
Data training
Build labeled dataset for question of interest
Use Case & Model
Tune Predict Evaluate
Data training
Split training and test data
Build model on
the training set
* Overfitting: a complex model that memorizes the test set (including noise in it)
but fails to generalize to new data.
Complexity vs. accuracy
We can build models of lower or higher complexity by
changing their hyper-parameters.
Aim for the ‘sweet spot’ that maximizes performance but
avoids overfitting.
* Overfitting: a complex model that memorizes the test set (including noise in it)
but fails to generalize to new data.
Use Case & Model
Tune Predict Evaluate
Data training
Make predictions
Image: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2016)
EXAMPLE:
Predicting mode of
transport and music
taste
Decision tree model for transport planning
Scenario: The World Bank has hired a cohort of 100 new staff, who start this
summer. GSD needs to decide how many bike racks or parking spaces to build for
them.
Decision tree model for transport planning
Scenario: The World Bank has hired a cohort of 100 new staff, who start this
summer. GSD needs to decide how many bike racks or parking spaces to build for
them.