0% found this document useful (0 votes)

25 views8 pages

Week-7 DS Practical

The document provides an overview of machine learning, detailing its types, key components, and the implementation of linear and logistic regression. It explains concepts such as supervised and unsupervised learning, overfitting, and underfitting, along with evaluation metrics for regression and classification models. Additionally, it covers hyperparameter tuning using GridSearchCV for optimizing logistic regression.

Uploaded by

vimalraj17r

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views8 pages

Week-7 DS Practical

Uploaded by

vimalraj17r

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Machine Learning Basics & Linear Regression

Introduction to Machine Learning

Definition:
○ Machine Learning (ML) is a branch of artificial intelligence that enables
systems to learn from data and make predictions without being explicitly
programmed.

Types of ML:

○ Supervised Learning – Uses labeled data (e.g., classification, regression).

○ Unsupervised Learning – Works with unlabeled data (e.g., clustering,

dimensionality reduction).

○ Reinforcement Learning – Learning through rewards and penalties (e.g.,

self-driving cars).

Key Components of ML Models:

○ Features (X) – Input variables used for prediction.

○ Target (Y) – The output or label the model aims to predict.
○ Training and Testing Data – Splitting data to train and evaluate the
model.

What is Linear Regression?

○ A supervised learning algorithm used for predicting continuous values.

○ Example: Predicting house prices based on features like size, location, etc.

Mathematical Representation:

○ Y=mX+bY = mX + bY=mX+b (Where m is the slope and b is the

intercept).
Cost Function & Gradient Descent:

○ The model minimizes the error by optimizing the cost function (Mean
Squared Error - MSE).

Overfitting vs. Underfitting:

○ Overfitting – Model learns too much from training data (low bias, high
variance).

○ Underfitting – Model is too simple and fails to capture patterns (high bias,
low variance).

Implementing Linear Regression

Implementation Steps:
● Load the dataset (California housing data).

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Load dataset (e.g., Boston Housing Prices)

from sklearn.datasets import fetch_california_housing

data = fetch_california_housing()

df = pd.DataFrame(data.data, columns=data.feature_names)

df['Price'] = data.target
df.head()

● Preprocess the data (select features and target variable).

# Select Features and Target

X = df[['MedInc', 'HouseAge', 'AveRooms']] # Selecting some features

y = df['Price']

# Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

● Split data into training and testing sets.

● Train a Linear Regression model using Scikit-learn.

model = LinearRegression()

model.fit(X_train, y_train)

● Make predictions on test data.

● Evaluate model performance using RMSE and R² score.

y_pred = model.predict(X_test)

# Performance Metrics

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

r2 = r2_score(y_test, y_pred)

print(f'RMSE: {rmse:.2f}')

print(f'R-squared: {r2:.2f}')

● Visualize predictions using a scatter plot.

plt.scatter(y_test, y_pred)

plt.xlabel("Actual Prices")

plt.ylabel("Predicted Prices")

plt.title("Actual vs Predicted Prices")

plt.show()

Understanding Logistic Regression

● What is Logistic Regression?

○ A supervised learning algorithm used for classification problems (predicts

categorical values).

○ Example: Predicting if an email is spam or not.

● Sigmoid Function & Decision Boundary:

○ Converts continuous outputs into probabilities (0 to 1).

○ If probability > 0.5 → Class 1, otherwise Class 0.

● Difference from Linear Regression:

○ Linear Regression predicts continuous values, while Logistic Regression

predicts probabilities.

Implementing Logistic Regression

Implementation Steps:

● Load the dataset ( Cancer dataset from sklearn).

from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()

df = pd.DataFrame(data.data, columns=data.feature_names)

df['Target'] = data.target
df.head()

● Preprocess data (select features and target).

● Split the dataset into training and testing sets.

# Features and Target

X = df.iloc[:, :-1] # All features except target

y = df['Target']

# Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

● Train a Logistic Regression model using Scikit-learn.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)

model.fit(X_train, y_train)

● Make predictions and evaluate model performance.

● Use classification metrics such as accuracy, confusion matrix, and

classification report.

y_pred = model.predict(X_test)

# Performance Metrics

from sklearn.metrics import accuracy_score, confusion_matrix,

classification_report

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')
print("Confusion Matrix:")

print(confusion_matrix(y_test, y_pred))

print("Classification Report:")

print(classification_report(y_test, y_pred))

y_pred = model.predict(X_test)

# Performance Metrics

from sklearn.metrics import accuracy_score, confusion_matrix,

classification_report

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')

print("Confusion Matrix:")

print(confusion_matrix(y_test, y_pred))

print("Classification Report:")

print(classification_report(y_test, y_pred))

● Visualize the confusion matrix using a heatmap.

import seaborn as sns

cm = confusion_matrix(y_test, y_pred)

sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")

plt.xlabel("Predicted Label")

plt.ylabel("True Label")

plt.title("Confusion Matrix")

plt.show()
Model Evaluation & Comparison
Regression Model (Linear Regression) Evaluation Metrics:

● Root Mean Squared Error (RMSE): Measures the difference between actual and
predicted values.

● R² Score: Measures how well the model explains variance in the data.

Classification Model (Logistic Regression) Evaluation Metrics:

● Accuracy Score: Measures overall correctness of predictions.

● Confusion Matrix: Shows True Positives (TP), False Positives (FP), True
Negatives (TN), and False Negatives (FN).

● Precision, Recall, and F1 Score: Useful for handling imbalanced datasets.

Hyperparameter Tuning

Optimizing Logistic Regression using GridSearchCV:

● GridSearchCV helps find the best parameters (C value in Logistic Regression).

● Example:

○ Try different values of C = [0.1, 1, 10, 100].

○ Select the best performing model.

from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.1, 1, 10, 100]}

grid = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5)

grid.fit(X_train, y_train)

print(f"Best Parameters: {grid.best_params_}")

print(f"Best Accuracy: {grid.best_score_:.2f}")

Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
ML Lap
No ratings yet
ML Lap
23 pages
DSBDL - Write - Ups - 4 To 7
No ratings yet
DSBDL - Write - Ups - 4 To 7
11 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
ML Report 1
No ratings yet
ML Report 1
23 pages
1 - Lab Manual (ML)
No ratings yet
1 - Lab Manual (ML)
42 pages
Ritesh Mangla ML PracticalFile
No ratings yet
Ritesh Mangla ML PracticalFile
55 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Unit 6
No ratings yet
Unit 6
107 pages
Hemraj Python Ass1
No ratings yet
Hemraj Python Ass1
7 pages
Day 45 All Machine Learning Algorithms With Code When To Use Each
No ratings yet
Day 45 All Machine Learning Algorithms With Code When To Use Each
67 pages
ML Manual
No ratings yet
ML Manual
24 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
28 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
Moocs Ritesh
No ratings yet
Moocs Ritesh
22 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Machine Learning Model Building Guide
No ratings yet
Machine Learning Model Building Guide
53 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Logistic Regression for Classification
No ratings yet
Logistic Regression for Classification
13 pages
Lab Manual 04
No ratings yet
Lab Manual 04
12 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
ML 01 (Shubham)
No ratings yet
ML 01 (Shubham)
14 pages
Good-Logistic Regression With A Real-World Example in Python - MarkTechPost
No ratings yet
Good-Logistic Regression With A Real-World Example in Python - MarkTechPost
9 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
CL IV Manual
No ratings yet
CL IV Manual
108 pages
Logistic Regression Algorithm
No ratings yet
Logistic Regression Algorithm
8 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
Train
No ratings yet
Train
17 pages
ML Lab Record - 250625 - 105014
No ratings yet
ML Lab Record - 250625 - 105014
29 pages
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
No ratings yet
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
9 pages
Chapter 4
No ratings yet
Chapter 4
5 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
ML Lab Manual1
No ratings yet
ML Lab Manual1
23 pages
PythonForML2023 Laboratory07 08 Regression Classification Update2
No ratings yet
PythonForML2023 Laboratory07 08 Regression Classification Update2
6 pages
Common Machine Learning Algorithms Guide
No ratings yet
Common Machine Learning Algorithms Guide
38 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
30 pages
Da 012307
No ratings yet
Da 012307
8 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
2-Machine Learning Algorithms
No ratings yet
2-Machine Learning Algorithms
16 pages
Supervised Learning
No ratings yet
Supervised Learning
14 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Cheat Sheet Linear and Logistic Regression
No ratings yet
Cheat Sheet Linear and Logistic Regression
2 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
Unit 2 Supervised Learning
No ratings yet
Unit 2 Supervised Learning
20 pages
Day.12 Logistic Regression
No ratings yet
Day.12 Logistic Regression
8 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
12 pages
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
3 pages
Aiml 4
No ratings yet
Aiml 4
107 pages
Front
No ratings yet
Front
1 page
Is Unit Test 1 QP
No ratings yet
Is Unit Test 1 QP
2 pages
Introduction to Exploratory Data Analysis
No ratings yet
Introduction to Exploratory Data Analysis
12 pages
Day-4 DS Practicals
No ratings yet
Day-4 DS Practicals
5 pages
SDN Lab Manual
No ratings yet
SDN Lab Manual
12 pages
Oops Lab Manual
No ratings yet
Oops Lab Manual
222 pages
CS3271 C Lab
No ratings yet
CS3271 C Lab
28 pages
Things To Remember - Principal Component Analysis
No ratings yet
Things To Remember - Principal Component Analysis
2 pages
Innomatics Data Science Curriculum Overview
No ratings yet
Innomatics Data Science Curriculum Overview
10 pages
Bacterial Growth Media ANOVA Analysis
No ratings yet
Bacterial Growth Media ANOVA Analysis
4 pages
Time Series Forecasting Tools in R
No ratings yet
Time Series Forecasting Tools in R
121 pages
Bba1fbs 2021 Oct Fundamentals of Business Statistics
No ratings yet
Bba1fbs 2021 Oct Fundamentals of Business Statistics
2 pages
Korelasi Pengetahuan dan Sikap dengan ASI Eksklusif
No ratings yet
Korelasi Pengetahuan dan Sikap dengan ASI Eksklusif
1 page
Pertemuan 7z
No ratings yet
Pertemuan 7z
31 pages
Ionosphere Radar Data Analysis in R
No ratings yet
Ionosphere Radar Data Analysis in R
22 pages
SPSS Exploratory Data Analysis Guide
No ratings yet
SPSS Exploratory Data Analysis Guide
24 pages
Kepuasan Kerja
No ratings yet
Kepuasan Kerja
8 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
Marginal Effects in Censored Regression
No ratings yet
Marginal Effects in Censored Regression
7 pages
Accurate Estimation of Cross-Excitation in Multivariate Hawkes Process Models of Infectious Diseases
No ratings yet
Accurate Estimation of Cross-Excitation in Multivariate Hawkes Process Models of Infectious Diseases
8 pages
Mixed Models Theory and Applications With R 2nd Edition Complete Ebook Edition
100% (16)
Mixed Models Theory and Applications With R 2nd Edition Complete Ebook Edition
17 pages
T Test
No ratings yet
T Test
17 pages
CHAPTER 4 Biometry
No ratings yet
CHAPTER 4 Biometry
63 pages
Canonical Correlation
No ratings yet
Canonical Correlation
7 pages
Different Types of Post
100% (1)
Different Types of Post
4 pages
Essay Question On ARIMA Models
No ratings yet
Essay Question On ARIMA Models
15 pages
Econometrics I Course Syllabus 2024
No ratings yet
Econometrics I Course Syllabus 2024
5 pages
Overview of Hypothesis Testing Analysis
No ratings yet
Overview of Hypothesis Testing Analysis
3 pages
Econometrics Homework Solutions
No ratings yet
Econometrics Homework Solutions
2 pages
Measure of Association
No ratings yet
Measure of Association
29 pages
Regression Analysis
100% (2)
Regression Analysis
28 pages
Delay in Pipeline Construction Projects in The Oil and Gas Industry: Part 2 (Prediction Models)
No ratings yet
Delay in Pipeline Construction Projects in The Oil and Gas Industry: Part 2 (Prediction Models)
9 pages
ARIMA and GARCH Analysis of Gold Prices
No ratings yet
ARIMA and GARCH Analysis of Gold Prices
41 pages
Fase 4 Modelos de Regresión Con Información Cualitativa ECONOMETRIA
0% (1)
Fase 4 Modelos de Regresión Con Información Cualitativa ECONOMETRIA
5 pages
Psychometric Properties Reliability Full
No ratings yet
Psychometric Properties Reliability Full
4 pages
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
No ratings yet
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
18 pages

Week-7 DS Practical

Uploaded by

Week-7 DS Practical

Uploaded by

Machine Learning Basics & Linear Regression

Introduction to Machine Learning

○ Supervised Learning – Uses labeled data (e.g., classification, regression).

○ Unsupervised Learning – Works with unlabeled data (e.g., clustering,

○ Reinforcement Learning – Learning through rewards and penalties (e.g.,

Key Components of ML Models:

○ Features (X) – Input variables used for prediction.

What is Linear Regression?

○ A supervised learning algorithm used for predicting continuous values.

○ Y=mX+bY = mX + bY=mX+b (Where m is the slope and b is the

Overfitting vs. Underfitting:

Implementing Linear Regression

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Load dataset (e.g., Boston Housing Prices)

from sklearn.datasets import fetch_california_housing

● Preprocess the data (select features and target variable).

# Select Features and Target

X = df[['MedInc', 'HouseAge', 'AveRooms']] # Selecting some features

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

● Split data into training and testing sets.

● Train a Linear Regression model using Scikit-learn.

● Make predictions on test data.

● Evaluate model performance using RMSE and R² score.

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

● Visualize predictions using a scatter plot.

plt.title("Actual vs Predicted Prices")

Understanding Logistic Regression

● What is Logistic Regression?

○ A supervised learning algorithm used for classification problems (predicts

○ Example: Predicting if an email is spam or not.

● Sigmoid Function & Decision Boundary:

○ If probability > 0.5 → Class 1, otherwise Class 0.

● Difference from Linear Regression:

○ Linear Regression predicts continuous values, while Logistic Regression

Implementing Logistic Regression

● Load the dataset ( Cancer dataset from sklearn).

from sklearn.datasets import load_breast_cancer

● Preprocess data (select features and target).

● Split the dataset into training and testing sets.

# Features and Target

X = df.iloc[:, :-1] # All features except target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

● Train a Logistic Regression model using Scikit-learn.

from sklearn.linear_model import LogisticRegression

● Make predictions and evaluate model performance.

● Use classification metrics such as accuracy, confusion matrix, and

from sklearn.metrics import accuracy_score, confusion_matrix,

accuracy = accuracy_score(y_test, y_pred)

from sklearn.metrics import accuracy_score, confusion_matrix,

accuracy = accuracy_score(y_test, y_pred)

● Visualize the confusion matrix using a heatmap.

import seaborn as sns

sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")

Classification Model (Logistic Regression) Evaluation Metrics:

● Accuracy Score: Measures overall correctness of predictions.

● Precision, Recall, and F1 Score: Useful for handling imbalanced datasets.

Optimizing Logistic Regression using GridSearchCV:

● GridSearchCV helps find the best parameters (C value in Logistic Regression).

○ Try different values of C = [0.1, 1, 10, 100].

○ Select the best performing model.

from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.1, 1, 10, 100]}

grid = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5)

print(f"Best Parameters: {grid.best_params_}")

print(f"Best Accuracy: {grid.best_score_:.2f}")

You might also like