0% found this document useful (0 votes)
8 views12 pages

Machine Learning (Unit I)

The document provides an overview of machine learning (ML), emphasizing its reliance on data to optimize performance and make predictions without explicit programming. It outlines the machine learning process, including problem exploration, data engineering, model engineering, and ML operations, highlighting the importance of quality data and iterative model training. Additionally, it touches on statistical concepts and decision theory that support machine learning algorithms in making informed predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views12 pages

Machine Learning (Unit I)

The document provides an overview of machine learning (ML), emphasizing its reliance on data to optimize performance and make predictions without explicit programming. It outlines the machine learning process, including problem exploration, data engineering, model engineering, and ML operations, highlighting the importance of quality data and iterative model training. Additionally, it touches on statistical concepts and decision theory that support machine learning algorithms in making informed predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

MACHINE

MACHINE
MACHINE
LEARNING
LEARNING
UNIT I

LEARNING Dr S Devidhanshrii
Assistant Professor
Data Science & Analytics
Why “Learn” ?
• Machine learning is programming computers to optimize a performance criterion
usingexample data or past experience.
• There is no need to “learn” to calculate payroll Learning is used when:
• Human expertise does not exist (navigating on Mars),
• Humans are unable to explain their expertise (speech recognition)
• Solution changes in time (routing on a computer network)
• Solution needs to be adapted to particular cases (user biometrics)•Learning general
models from a data of particular examples
• Data is cheap and abundant (data warehouses, data marts); knowledge is expensive
and scarce.
• Example in retail: Customer transactions to consumer behavior:
• People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven”
(www.amazon.com)
• Build a model that is a good and useful approximation to the data.
Machine
Learning
• Machine learning (ML) allows computers to learn and make
decisions without being explicitly programmed.
• It involves feeding data into algorithms to identify patterns
and make predictions on new data.
• It is used in various applications like image recognition,
speech processing, language translation, recommender
systems, etc.
• Machine Learning solves these problems by learning from
examples and making predictions without fixed rules.
Importance of Data in Machine
Learning
Data is the foundation of machine learning (ML) without quality
data ML models cannot learn, perform or make accurate
predictions.
• Data provides the examples from which models learn patterns
and relationships.
• High-quality and diverse data improves how well models
perform and generalize to new situations.
• It helps models to understand real-world scenarios and adapt to
practical uses.
• Features extracted from data are important for effective
training.
• Separate datasets for validation and testing measure how well
Thynk Unlimited

MACHINE LEARNING PROCESS


• The machine learning process defines the
flow of work that a data science team
executes to create and deliver a machine
learning model.
• In addition, the ML process also defines
how the team works and collaborates
together, to create the most useful
predictive model.
• A High Level Machine Learning Process
• A high level view of the steps in the
machine learning process was described in
our post on machine learning life cycles.
• In short, this workflow includes problem
exploration, data engineering, model
engineering and ML Ops.
Problem
Exploration
First focus on how the model will be used. In the process, assess
the desired model accuracy and explore other details, such as if
false positives are worse than false negatives. This phase also
includes understanding what data might be available.
• Define Success: Define the problem to be solved. For example,
what should be predicted. This helps define what data will be
needed. Also, make sure it’s clear how success will be
measured.
• Evaluate Data: Determine what are the relevant data sources.
• In other words, evaluate what data the team will need, how
that data is collected, and where the data is stored.
DATA ENGINEERING
Design and build data pipelines. These pipelines get, clean and transform data into
a format that is more easily used to build a predictive model. Note that this data
might be coming from multiple data sources, so merging the data is also a key
aspect of data engineering. This is often where the most time is spent in an ML
project.
• Obtain Data: Assembling the data. This includes connecting to remove data stored
and databases, which might be in different formats. For example, some data might
be in CSV format, and other data could be available in JSON via web services.
• Scrub Data: The process of re-formatting particular attributes and correcting errors
in data, such as missing values imputation. Datasets are often missing values, or
they may contain values of the wrong type or range. Cleaning can include removing
duplicates, correcting errors, dealing with missing values, normalization, and
handling data type conversions.
• Explore / Validate Data: Get a basic understanding of the data. This exploratory
analysis includes data profiling to obtain information about the content and
structure of the data. The goal is to both understand the data attributes as well as
the quality of the data.
MODEL ENGINEERING

This is the phase that most people associate with building a machine learning model. During this
phase, data is used to train and evaluate the model. This is often an iterative task, where the different
models are tried, and the model is tuned.
• Select & Train Model: The process of identifying an appropriate model, and then building / training
the model (on training data). The goal of training is to answer a question or make a prediction
correctly as often as possible.
• Test Model: Run the model on data that the model has not yet seen (such as testing data). In other
words, perform model testing by using data that was withheld from training (i.e., backtesting).
• Evaluate & Interpret Model: Objectively measure the performance of the model. Note that basic
evaluation explores metrics such as accuracy and precision, to determine if the model is useable,
and which model is best for the specific problem being explored. This evaluation also includes an
understanding of when the model makes mistakes. More generally, validating the trained model
helps to ensure the model meets original organizational objectives before the ML model is put into
production.
• Tune Model: This step refers to parameter tuning, which, depending on the model being used, can
be more an art than a science. In short, models typically have parameters (i.e., dials for tuning the
model), which allows the model to get improved performance via parameter refinement. Simple
model parameters may include attributes such as the number of training steps and the initialization
of certain values.
ML Ops

Broadly defined, machine learning operations (ML Ops) spans a wide set of practices,
systems, and responsibilities that data scientists, data engineers, cloud engineers, IT
operations, and business stakeholders use to deploy, scale, and maintain machine
learning solutions.
• Deploy Model: Package and put the model to use (i.e., into production). While this
varies from one group to another, the team needs to understand the expected
model performance, how the model will be monitored, and in general, key
performance indicators (KPIs) of the model.
• Monitor Model: Maintain the model in production. This includes monitoring the KPIs
and proactively working to ensure stable and robust predictions.
Testing Machine Learning Turning Data into Probabilities
Preliminaries for ML
Algorithms
• ML often uses probabilities to make
• Understand domain and data. • Train/Test Split decisions.
• Prepare datasets (cleaning, • Cross-validation • E.g., Logistic Regression predicts
transformation). • Hyperparameter tuning probability of class membership.
• Select evaluation metrics (accuracy, F1- • Bias-Variance tradeoff • Probabilistic models deal with
score, etc.) • Tools: sklearn, TensorFlow, PyTorch uncertainty in predictions.

Statistics for Machine Learning Probability Distributions Decision Theory

• Statistics help in understanding and • Discrete Distributions: Bernoulli, • Helps machines make optimal decisions
preparing data: Binomial under uncertainty.
• Mean, Median, Mode • Continuous Distributions: Normal, • Expected Value, Loss Function, Risk
• Variance, Standard Deviation Gaussian Minimization
• Correlation, Covariance • Important for modeling uncertainties in • Foundation for models like decision
ML. trees and Bayesian classifiers.
THANK YOU
THANK YOU
THANK YOU

You might also like