0% found this document useful (0 votes)
2 views

Machine learning notes

Machine Learning (ML) is a subset of artificial intelligence that allows systems to learn from data and improve their performance without explicit programming. It encompasses various techniques, including supervised learning (with labeled data) and unsupervised learning (with unlabeled data), and includes methods like regression, classification, and clustering. Key concepts include data, models, training, and evaluation, with techniques such as regularization to prevent overfitting and improve model generalization.

Uploaded by

ajit91554
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Machine learning notes

Machine Learning (ML) is a subset of artificial intelligence that allows systems to learn from data and improve their performance without explicit programming. It encompasses various techniques, including supervised learning (with labeled data) and unsupervised learning (with unlabeled data), and includes methods like regression, classification, and clustering. Key concepts include data, models, training, and evaluation, with techniques such as regularization to prevent overfitting and improve model generalization.

Uploaded by

ajit91554
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Machine learning

Machine Learning (ML) is a branch of artificial intelligence (AI) that enables systems to learn
and improve their performance on a specific task without being explicitly programmed. Instead
of relying on hard-coded rules, machine learning models analyze data, identify patterns, and
make decisions or predictions based on that data.

Key Concepts in Machine Learning

1. Data: The foundation of ML. Models learn from historical data to make future
predictions or decisions.

o Features: Input variables used by the model to learn.

o Labels: Output or target variables (in supervised learning).

2. Model: A mathematical representation of a real-world process, built to make


predictions or understand patterns.

3. Training: The process of teaching a machine learning model using data.

4. Testing: Evaluating the performance of the model on unseen data.

How Machine Learning Works (Simplified)

1. Input Data: Collect and preprocess data.

2. Feature Selection: Identify important input variables.

3. Model Training: Fit a model to the training data.

4. Evaluation: Measure performance using metrics.

5. Prediction: Use the model to make predictions on new data.

Types of Machine Learning

Machine learning is categorized based on the nature of the learning process and the type of data
used. Below are the main types and their subtypes, along with the methods used in each
category.

1. Supervised Learning

Definition:

Supervised learning involves training a model on labeled data, where the input data has
corresponding output labels. The goal is to map inputs to outputs accurately.

Subtypes:

1. Regression:
Predicting continuous values.

o Methods:

▪ Linear Regression
▪ Polynomial Regression
▪ Ridge and Lasso Regression
▪ Support Vector Regression (SVR)

2. Classification:
Predicting discrete categories.

o Methods:

▪ Logistic Regression
▪ Decision Trees
▪ Random Forest
▪ Support Vector Machines (SVM)
▪ Naïve Bayes
▪ k-Nearest Neighbours (KNN)

2. Unsupervised Learning

Definition:

Unsupervised learning deals with unlabelled data. The goal is to identify patterns, structures, or
groupings within the data.

Subtypes:

1. Clustering:
Grouping data points into clusters.

o Methods:

▪ K-Means
▪ K-Medoids
▪ Hierarchical Clustering
▪ DBSCAN (Density-Based Spatial Clustering)
▪ Gaussian Mixture Models (GMM)

2. Dimensionality Reduction:
Reducing the number of features while retaining important information.

o Methods:

▪Principal Component Analysis (PCA)


▪t-SNE (t-Distributed Stochastic Neighbor Embedding)
▪Autoencoders
3. Anomaly Detection:
Identifying data points that deviate significantly from the norm.

o Methods:

▪ Isolation Forest
▪ One-Class SVM
▪ Clustering-based techniques
Regression in machine learning is a supervised learning technique used to model and predict
continuous outcomes. It involves identifying the relationship between dependent (target) and
independent (input) variables.

Key Characteristics of Regression:

• Goal: Predict a numerical value.


• Data Requirement: Requires labeled data (input-output pairs).
• Output: A continuous value (e.g., house prices, temperature, stock prices).

Steps in Regression Analysis

1. Data Preparation:
o Collect and preprocess data (handle missing values, scaling, etc.).
2. Model Selection:
o Choose the appropriate regression model based on the problem and data
characteristics.
3. Training:
o Fit the model to the training dataset.
4. Evaluation:
o Use metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-
squared to evaluate performance.
5. Prediction:
o Make predictions on unseen (test) data.
Linear Regression

Linear Regression is a supervised learning technique in machine learning used to model the
relationship between a dependent variable (target) and one or more independent variables
(features). It predicts a continuous output based on this relationship.

How Linear Regression Works

1. The Linear Relationship:

o Linear regression assumes a linear relationship between the input variables


(features) and the target variable. The model fits a straight line or a hyperplane to
the data points.

The equation for simple linear regression (one feature) is:

For multiple features, it becomes:

o
o y: Target variable.

o xix_ixi: Independent variables (features).


o β0\beta_0β0: Intercept (where the line crosses the y-axis).

o βi\beta_iβi: Coefficients (slopes) of the features.

o ϵ\epsilonϵ: Error term (difference between actual and predicted values).

2. Objective:

o The goal is to find the coefficients (β0,β1,…\beta_0, \beta_1, \dotsβ0,β1,…) that


minimize the difference between the actual and predicted target values.

3. Error Minimization:

o Linear regression minimizes the error using a cost function, typically Mean
Squared Error (MSE):

o
o Here, yiy_iyi are the actual values, and y^i\hat{y}_iy^i are the predicted values.
The coefficients are chosen to minimize this MSE.

4. Fitting the Model:

o The coefficients are calculated using methods like Ordinary Least Squares (OLS)
or Gradient Descent. Once fitted, the model uses the learned coefficients to
make predictions on new data.

Accuracy Checking in Linear Regression

To evaluate the accuracy of a linear regression model, the following metrics and analyses are
used:

1. R-squared (R2):

• This metric indicates the proportion of variance in the target variable explained by the
independent variables.


• R2 ranges from 0 to 1:

o R2=1: Perfect fit (all variance is explained).

o R2=0: Model does not explain any variance.

• Higher R2R^2R2 values generally indicate better model performance.

2. Mean Absolute Error (MAE):

• MAE measures the average magnitude of errors in the predictions, without considering
their direction.

• Lower MAE values indicate a better model.

4. Mean Squared Error (MSE):

• MSE calculates the average squared differences between the actual and predicted
values.

• It penalizes larger errors more than smaller ones, making it sensitive to outliers.

• Lower MSE values indicate better accuracy.

5. Root Mean Squared Error (RMSE):

• RMSE is the square root of MSE. It provides an error metric in the same units as the
target variable.

• Lower RMSE values indicate better accuracy.

6. Residual Analysis:

• Residuals are the differences between actual and predicted values.

• A good linear regression model will have residuals that:

o Are randomly distributed.


o Have a mean close to zero.

• If residuals show patterns, the linear model may not be appropriate.

Key Assumptions of Linear Regression

For linear regression to perform well, these assumptions should hold:

1. Linearity: The relationship between features and the target is linear.

2. Independence: Observations are independent of each other.

3. Homoscedasticity: The variance of residuals is constant across all levels of the


independent variables.

4. Normality of Residuals: Residuals are normally distributed.

5. No Multicollinearity: Features are not highly correlated with each other.

Advantages of Linear Regression

1. Simple and interpretable.

2. Efficient for small and medium-sized datasets.

3. Works well when the relationship is approximately linear.

Limitations of Linear Regression

1. Assumes linear relationships, which might not hold in real-world data.

2. Sensitive to outliers, which can significantly skew results.

3. Does not perform well with multicollinearity.

4. Limited to predicting continuous values and struggles with complex, non-linear


relationships.

Regularization is a technique used in machine learning and statistics to prevent overfitting by


adding a penalty to the model's complexity. It helps the model generalize better to unseen data,
reducing the chances of overfitting while training on limited data. Overfitting occurs when a
model learns not only the underlying patterns but also the noise and random fluctuations in the
training data, leading to poor performance on new data.

Why Regularization is Important:

1. Overfitting: In machine learning, if a model is too complex (e.g., with many features), it
might memorize the training data, leading to overfitting. This means that while the model
performs well on the training set, its performance on the test set (or any new data) is
poor.
2. Bias-Variance Tradeoff: Regularization helps to manage the tradeoff between bias
(error due to overly simple models) and variance (error due to overly complex models).
By penalizing complexity, regularization helps to find a balance that leads to better
generalization.

Types of Regularization:

1. L1 Regularization (Lasso Regression)

• Concept: In L1 regularization, the penalty term added to the loss function is the sum of
the absolute values of the model parameters (weights). This results in a sparsity effect,
where some of the model coefficients become exactly zero. This can be interpreted as
automatic feature selection, as it tends to eliminate irrelevant features from the model.

• Mathematical Formulation: For linear regression, the objective function is:

Where:

o RSS is the Residual Sum of Squares (the standard loss function).

o wi are the model weights.

o λ is the regularization parameter (controls the strength of the penalty).

• Effect: L1 regularization can lead to sparse models where some features are effectively
ignored (with weights set to zero). It is useful when you suspect that many features are
irrelevant.

• Use Cases: Lasso (Least Absolute Shrinkage and Selection Operator) regression is
commonly used when feature selection is important, and you want to automatically
reduce the number of variables.

2. L2 Regularization (Ridge Regression)

• Concept: In L2 regularization, the penalty term added to the loss function is the sum of
the squares of the model parameters. This shrinks the coefficients, but unlike L1, it does
not set them exactly to zero. L2 regularization leads to a smoother model where all
features are still included, but their influence is reduced.

• Mathematical Formulation: For linear regression, the objective function is:

Where:

o Wi are the model weights.

o λ is the regularization parameter.


• Effect: L2 regularization encourages the model to have smaller weights and prevents
overfitting by distributing the penalty more evenly across all parameters. The weights are
shrunk, but they rarely become exactly zero.

• Use Cases: Ridge regression is used when you believe that most features should
contribute to the prediction but with reduced magnitude. It’s often applied when there is
multicollinearity (correlation among features) in the data.

3. Elastic Net Regularization

• Concept: Elastic Net is a combination of L1 and L2 regularization. It adds both L1 and


L2 penalties to the objective function, allowing it to inherit both feature selection from
L1 and the stability provided by L2.

• Mathematical Formulation: For linear regression, the objective function is:

Where:

o λ1 and λ2 control the relative importance of L1 and L2 regularization terms.

• Effect: Elastic Net is useful when there are many correlated features. It will both shrink
the coefficients and perform feature selection, while addressing situations where L1 or
L2 alone might not work effectively.

• Use Cases: Elastic Net is commonly used when you have a large number of features
and some of them are highly correlated.

4. Dropout (for Neural Networks)

• Concept: Dropout is a regularization method used in neural networks. During training,


dropout randomly sets a fraction of the input units (neurons) to zero at each update
step. This prevents the network from becoming overly reliant on certain neurons,
encouraging it to learn more robust features.

• Effect: Dropout helps prevent overfitting by forcing the network to generalize better. It
also prevents neurons from "co-adapting," where two or more neurons learn to work
together too specifically for the training set.

• Use Cases: Dropout is widely used in deep learning models, especially in convolutional
and fully connected layers of neural networks.

5. Early Stopping

• Concept: Early stopping is a technique used in iterative training processes (e.g.,


gradient descent). It involves stopping the training process before the model reaches the
point of overfitting. The model is evaluated on a validation set, and training is halted
once the validation error starts to increase, even though the training error may still be
decreasing.
• Effect: Early stopping prevents the model from fitting too closely to the training data,
thus avoiding overfitting.

• Use Cases: Early stopping is particularly useful in training deep neural networks where
overfitting is a common problem.

How to Choose Regularization

• L1 Regularization is useful when you expect many features to be irrelevant and want to
eliminate them completely.

• L2 Regularization is useful when you expect most features to be relevant but want to
reduce their impact.

• Elastic Net is useful when you have many correlated features and want to benefit from
both L1 and L2 penalties.

• Dropout is used specifically in neural networks, especially in deep learning, to prevent


overfitting.

• Early Stopping is typically used in iterative models and can be a very effective way to
avoid overfitting without adjusting the model's architecture.

Logistic regression is a supervised learning algorithm used for classification tasks. It predicts
the probability that a given input belongs to a specific category (class). Despite its name, logistic
regression is used for classification problems rather than regression problems.

Key Concepts:

1. Binary Classification: Logistic regression is commonly used for binary classification


problems (e.g., predicting if an email is spam or not).

2. Logit Function: Logistic regression models the relationship between the dependent
variable (target) and independent variables (features) using the logit function.

Here, p is the probability of the positive class.

3. Sigmoid Function: The output of logistic regression is obtained by applying the sigmoid
function to a linear combination of the input features:

The sigmoid function ensures the output is in the range [0,1] representing a probability.

4. Cost Function: Logistic regression uses the log-loss (or cross-entropy) as its cost
function to optimize the parameters w and b:
J

Where mmm is the number of samples, y(i) is the true label, and y^(i)is the predicted
probability.

5. Prediction: A threshold (e.g., 0.5) is applied to the output probabilities to assign a class
label:

Solved Example:

Problem:

Predict whether a student passes an exam based on their study hours. We have the following
dataset:

Study Hours (x) Passed (1/0) (y)

1 0

2 0

3 0

4 1

5 1

Solution:

1. Model Representation: The logistic regression model is:

2. Initialize Parameters: Start with random values of w and b.

3. Optimization (Gradient Descent): Using the cost function, optimize www and bbb
using gradient descent.

4. Final Model: Suppose after training, we get:


Since y^>0.5, the model predicts the student will pass.

This example demonstrates how logistic regression models probabilities and makes predictions
based on the decision boundary defined by y^

Decision Tree in Machine Learning

A decision tree is a supervised learning algorithm used for both classification and regression
tasks. It works by splitting the data into subsets based on the value of input features. The
structure resembles a tree, where:

• Root Node: Represents the entire dataset.

• Internal Nodes: Represent decisions based on feature values.

• Leaf Nodes: Represent outcomes or predictions.

Key Concepts:
1. Splitting: At each node, the algorithm chooses the best feature and value to split the
data to achieve the highest information gain or lowest impurity.

2. Criteria for Splitting:

o Gini Impurity:

o Where pi is the probability of class iii in a subset, and ccc is the number of
classes.

o Entropy (used in information gain):

o
o Information Gain:

o
o Where n is the number of samples in the child node, and NNN is the total
number of samples.

3. Stopping Criteria: Splitting stops when:

o All instances belong to the same class.

o Maximum depth is reached.

o Minimum number of samples per node is below a threshold.

4. Prediction:

o For classification: The majority class of a leaf node is the prediction.

o For regression: The average of the target values in a leaf node is the prediction.

You might also like