0% found this document useful (0 votes)
25 views8 pages

Week-7 DS Practical

The document provides an overview of machine learning, detailing its types, key components, and the implementation of linear and logistic regression. It explains concepts such as supervised and unsupervised learning, overfitting, and underfitting, along with evaluation metrics for regression and classification models. Additionally, it covers hyperparameter tuning using GridSearchCV for optimizing logistic regression.

Uploaded by

vimalraj17r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

Week-7 DS Practical

The document provides an overview of machine learning, detailing its types, key components, and the implementation of linear and logistic regression. It explains concepts such as supervised and unsupervised learning, overfitting, and underfitting, along with evaluation metrics for regression and classification models. Additionally, it covers hyperparameter tuning using GridSearchCV for optimizing logistic regression.

Uploaded by

vimalraj17r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Machine Learning Basics & Linear Regression

Introduction to Machine Learning

Definition:
○ Machine Learning (ML) is a branch of artificial intelligence that enables
systems to learn from data and make predictions without being explicitly
programmed.

Types of ML:

○ Supervised Learning – Uses labeled data (e.g., classification, regression).

○ Unsupervised Learning – Works with unlabeled data (e.g., clustering,


dimensionality reduction).

○ Reinforcement Learning – Learning through rewards and penalties (e.g.,


self-driving cars).

Key Components of ML Models:

○ Features (X) – Input variables used for prediction.


○ Target (Y) – The output or label the model aims to predict.
○ Training and Testing Data – Splitting data to train and evaluate the
model.

What is Linear Regression?

○ A supervised learning algorithm used for predicting continuous values.

○ Example: Predicting house prices based on features like size, location, etc.

Mathematical Representation:

○ Y=mX+bY = mX + bY=mX+b (Where m is the slope and b is the


intercept).
Cost Function & Gradient Descent:

○ The model minimizes the error by optimizing the cost function (Mean
Squared Error - MSE).

Overfitting vs. Underfitting:

○ Overfitting – Model learns too much from training data (low bias, high
variance).

○ Underfitting – Model is too simple and fails to capture patterns (high bias,
low variance).

Implementing Linear Regression

Implementation Steps:
● Load the dataset (California housing data).

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Load dataset (e.g., Boston Housing Prices)

from sklearn.datasets import fetch_california_housing

data = fetch_california_housing()

df = pd.DataFrame(data.data, columns=data.feature_names)

df['Price'] = data.target
df.head()

● Preprocess the data (select features and target variable).

# Select Features and Target

X = df[['MedInc', 'HouseAge', 'AveRooms']] # Selecting some features

y = df['Price']

# Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

● Split data into training and testing sets.

● Train a Linear Regression model using Scikit-learn.

model = LinearRegression()

model.fit(X_train, y_train)

● Make predictions on test data.

● Evaluate model performance using RMSE and R² score.

y_pred = model.predict(X_test)

# Performance Metrics

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

r2 = r2_score(y_test, y_pred)

print(f'RMSE: {rmse:.2f}')

print(f'R-squared: {r2:.2f}')

● Visualize predictions using a scatter plot.


plt.scatter(y_test, y_pred)

plt.xlabel("Actual Prices")

plt.ylabel("Predicted Prices")

plt.title("Actual vs Predicted Prices")

plt.show()

Understanding Logistic Regression

● What is Logistic Regression?

○ A supervised learning algorithm used for classification problems (predicts


categorical values).

○ Example: Predicting if an email is spam or not.

● Sigmoid Function & Decision Boundary:


○ Converts continuous outputs into probabilities (0 to 1).

○ If probability > 0.5 → Class 1, otherwise Class 0.

● Difference from Linear Regression:

○ Linear Regression predicts continuous values, while Logistic Regression


predicts probabilities.

Implementing Logistic Regression

Implementation Steps:

● Load the dataset ( Cancer dataset from sklearn).

from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()

df = pd.DataFrame(data.data, columns=data.feature_names)

df['Target'] = data.target
df.head()

● Preprocess data (select features and target).

● Split the dataset into training and testing sets.

# Features and Target

X = df.iloc[:, :-1] # All features except target

y = df['Target']

# Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

● Train a Logistic Regression model using Scikit-learn.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)

model.fit(X_train, y_train)

● Make predictions and evaluate model performance.

● Use classification metrics such as accuracy, confusion matrix, and


classification report.

y_pred = model.predict(X_test)

# Performance Metrics

from sklearn.metrics import accuracy_score, confusion_matrix,


classification_report

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')
print("Confusion Matrix:")

print(confusion_matrix(y_test, y_pred))

print("Classification Report:")

print(classification_report(y_test, y_pred))

y_pred = model.predict(X_test)

# Performance Metrics

from sklearn.metrics import accuracy_score, confusion_matrix,


classification_report

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')

print("Confusion Matrix:")

print(confusion_matrix(y_test, y_pred))

print("Classification Report:")

print(classification_report(y_test, y_pred))

● Visualize the confusion matrix using a heatmap.

import seaborn as sns

cm = confusion_matrix(y_test, y_pred)

sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")

plt.xlabel("Predicted Label")

plt.ylabel("True Label")

plt.title("Confusion Matrix")

plt.show()
Model Evaluation & Comparison
Regression Model (Linear Regression) Evaluation Metrics:

● Root Mean Squared Error (RMSE): Measures the difference between actual and
predicted values.

● R² Score: Measures how well the model explains variance in the data.

Classification Model (Logistic Regression) Evaluation Metrics:

● Accuracy Score: Measures overall correctness of predictions.

● Confusion Matrix: Shows True Positives (TP), False Positives (FP), True
Negatives (TN), and False Negatives (FN).

● Precision, Recall, and F1 Score: Useful for handling imbalanced datasets.

Hyperparameter Tuning

Optimizing Logistic Regression using GridSearchCV:

● GridSearchCV helps find the best parameters (C value in Logistic Regression).

● Example:

○ Try different values of C = [0.1, 1, 10, 100].

○ Select the best performing model.

from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.1, 1, 10, 100]}

grid = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5)

grid.fit(X_train, y_train)

print(f"Best Parameters: {grid.best_params_}")

print(f"Best Accuracy: {grid.best_score_:.2f}")

You might also like