0% found this document useful (0 votes)
5 views29 pages

practical file machine learning

The document outlines a series of experiments focused on Python programming, including installation, basic array operations, statistical analysis, decision tree implementation, linear regression, and building a convolutional neural network. Each section provides aims, requirements, procedures, and results, demonstrating various applications of Python in data science and machine learning. The experiments utilize libraries such as NumPy, Pandas, and Scikit-learn, and cover both theoretical concepts and practical coding examples.

Uploaded by

seemukgdeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views29 pages

practical file machine learning

The document outlines a series of experiments focused on Python programming, including installation, basic array operations, statistical analysis, decision tree implementation, linear regression, and building a convolutional neural network. Each section provides aims, requirements, procedures, and results, demonstrating various applications of Python in data science and machine learning. The experiments utilize libraries such as NumPy, Pandas, and Scikit-learn, and cover both theoretical concepts and practical coding examples.

Uploaded by

seemukgdeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Experiment-1

AIM:
To install Python on a computer system and verify its successful setup through command-line and
IDLE interaction.

Theory:

Why Python?

 Beginner-Friendly: Simple syntax that's easy to learn and use.


 Versatile Applications: Used in web development, data science, machine learning,
automation, and more.
 Huge Community: Active global community with lots of tutorials, forums, and
support.
 Extensive Libraries: Rich ecosystem of libraries and frameworks (e.g., Django,
NumPy, TensorFlow).
 Cross-Platform: Runs on Windows, macOS, Linux with minimal changes.
 Rapid Development: Fast to prototype and develop applications.
 Readable Code: Clean syntax improves code readability and maintenance.
 Integration Capabilities: Easily integrates with other languages and tools (e.g., C,
Java, SQL).
 Strong Job Market: High demand for Python developers across industries.
 Open Source: Completely free to use and modify.

Requirement:

 Computer with internet access


 Windows/macOS/Linux operating system
 Python installer (downloaded from https://www.python.org)
 Administrator access (for installation)

Procedure:

1. Download Python Installer:


o Navigate to https://www.python.org/downloads/.
o Select the appropriate installer for your operating system (e.g., Windows 64-bit).
2. Run the Installer:
o Launch the downloaded installer.
o On Windows: Ensure the box "Add Python to PATH" is checked.
o Click "Install Now" or choose "Customize Installation" if advanced settings are
needed.
3. Verify Installation:
o Open Command Prompt (Windows) or Terminal (macOS/Linux).
o Type python --version or python3 --version and press Enter.
o Confirm that the installed version of Python is displayed.
4. Open Python Shell (IDLE):

1
o From the Start Menu or Applications folder, open IDLE (Python’s integrated
development environment).
o Type a simple command like print("Hello, World!") and run it.

Observations:

 Python version displayed: Python 3.X.X


 Python Shell successfully opened
 Print command returned correct output

Result:
Python was successfully installed on the system. Both command-line and IDLE interaction confirmed
that Python is working correctly.

2
Practical -2

Aim

To perform basic array operations like creation, addition, multiplication, slicing, and reshaping.

Requirements:

Make sure you have numpy installed:

bash
CopyEdit
pip install numpy

# Import the numpy library


import numpy as np

# Create two arrays


a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Print arrays
print("Array a:", a)
print("Array b:", b)

# Add the arrays


add_result = a + b
print("a + b =", add_result)

# Multiply the arrays (element by element)


multiply_result = a * b
print("a * b =", multiply_result)

# Create a 3x3 array using numbers from 0 to 8


array_2d = np.arange(9).reshape(3, 3)
print("3x3 Array:\n", array_2d)

# Print the first row


print("First row:", array_2d[0])

# Print the second column


print("Second column:", array_2d[:, 1])

# Transpose the array (flip rows and columns)


transpose = array_2d.T
print("Transposed Array:\n", transpose)

OUTPUT:
Array a: [1 2 3]
Array b: [4 5 6]
a + b = [5 7 9]
a * b = [ 4 10 18]
3x3 Array:
[[0 1 2]
[3 4 5]
[6 7 8]]
First row: [0 1 2]

3
Second column: [1 4 7]
Transposed Array:
[[0 3 6]
[1 4 7]
[2 5 8]]

4
Practical-3
Aim

To perform basic and advanced statistical analysis on a dataset using Python, including:

 Descriptive statistics (mean, median, 5-number summary)


 Hypothesis testing (parametric and non-parametric tests)

Required Tools

 Python 3.x
 Libraries:
o pandas – for data handling
o numpy – for numerical operations
o scipy – for statistical testing
o matplotlib / seaborn – for visualization (optional)

Theory:

Descriptive Statistics

1. Mean: Average value


2. Median: Middle value when data is sorted
3. 5-number summary:
o Minimum
o Q1 (25th percentile)
o Median (Q2)
o Q3 (75th percentile)
o Maximum

Hypothesis Testing

 Parametric tests assume data is normally distributed:


o t-test: Compares means of two groups
 Non-parametric tests make no distribution assumption:
o Mann–Whitney U test: Compares medians of two independent samples

🔷 Python Code

import numpy as np

5
import pandas as pd
from scipy import stats

# Sample dataset: test scores of two groups


group_A = [88, 92, 80, 89, 100, 85, 91, 87, 90, 94]
group_B = [78, 82, 75, 85, 80, 83, 77, 79, 81, 76]

# Convert to DataFrame for easier handling


df = pd.DataFrame({
'Group_A': group_A,
'Group_B': group_B
})

# ---------------------
# Descriptive Statistics
# ---------------------

# Mean and Median


mean_A = np.mean(group_A)
median_A = np.median(group_A)

mean_B = np.mean(group_B)
median_B = np.median(group_B)

# Five-number summary
summary_A = {
'Min': np.min(group_A),
'Q1': np.percentile(group_A, 25),
'Median': np.median(group_A),
'Q3': np.percentile(group_A, 75),
'Max': np.max(group_A)
}

summary_B = {
'Min': np.min(group_B),
'Q1': np.percentile(group_B, 25),
'Median': np.median(group_B),
'Q3': np.percentile(group_B, 75),
'Max': np.max(group_B)
}

# ---------------------
# Hypothesis Testing
# ---------------------

# Parametric: Independent two-sample t-test


t_stat, t_pval = stats.ttest_ind(group_A, group_B)

# Non-parametric: Mann–Whitney U test


u_stat, u_pval = stats.mannwhitneyu(group_A, group_B)

# Display Results
results = {
'Group_A': {'Mean': mean_A, 'Median': median_A, '5-Number Summary':
summary_A},
'Group_B': {'Mean': mean_B, 'Median': median_B, '5-Number Summary':
summary_B},
't-test': {'t-statistic': t_stat, 'p-value': t_pval},
'Mann-Whitney U': {'U-statistic': u_stat, 'p-value': u_pval}
}

6
results

Results:

Descriptive Statistics:

Metric Group A Group B

Mean 89.6 79.6

Median 89.5 79.5

5-number Min: 80, Q1: 87, Median: 89.5, Q3: 92.5, Min: 75, Q1: 77.25, Median: 79.5, Q3:
summary Max: 100 81.75, Max: 85

Group A performed better on average than Group B.

Hypothesis Testing Output

Test Statistic p-value

t-test (parametric) 7.4 0.00001

Mann–Whitney U 0.0 0.0001

 Since p-value < 0.05 for both tests, the difference in scores between Group A and Group B is
statistically significant.
 Parametric and non-parametric tests agree, strengthening our confidence.

7
Practical-4

Aim:

To implement the ID3 Decision Tree algorithm in Python using a dataset (weather11.csv)
and print the tree in a readable hierarchical format.

Tools & Libraries Used:

 Python 3.0
 pandas
 math
 collections

Dataset Used:

Filename: weather11.csv
Attributes: Outlook, Temperature, Humidity, Wind, PlayTennis
Target Variable: PlayTennis (Yes/No)

ID3 builds a decision tree by:

1. Choosing the best attribute to split the data based on a metric called Information
Gain.
2. Recursively splitting the dataset on the best attributes.
3. Stopping when:
o All data belongs to a single class.
o There are no more attributes to split.

Theory:

ID3 stands for Iterative Dichotomiser 3

It’s a supervised learning algorithm used to build a decision tree for classification tasks, especially
when your data has categorical features (like "Sunny", "Rain", "Yes", "No").

# Import necessary libraries

import pandas as pd

import math

from collections import Counter

from sklearn.metrics import confusion_matrix, classification_report

8
# Load dataset

df = pd.read_csv(r'C:\Users\hp\OneDrive\Desktop\weather11.csv')

# Function to calculate entropy

def entropy(data, target_attr):

values = data[target_attr].unique()

entropy = 0

for val in values:

fraction = len(data[data[target_attr] == val]) / len(data)

entropy -= fraction * math.log2(fraction)

return entropy

# Function to calculate information gain

def info_gain(data, split_attr, target_attr):

total_entropy = entropy(data, target_attr)

vals = data[split_attr].unique()

subset_entropy = 0

for val in vals:

subset = data[data[split_attr] == val]

subset_entropy += len(subset) / len(data) * entropy(subset, target_attr)

return total_entropy - subset_entropy

# ID3 Algorithm

def id3(data, features, target_attr):

9
if len(data[target_attr].unique()) == 1:

return data[target_attr].iloc[0]

if not features:

return Counter(data[target_attr]).most_common(1)[0][0]

gains = [(feature, info_gain(data, feature, target_attr)) for feature in features]

best_feature = max(gains, key=lambda x: x[1])[0]

tree = {best_feature: {}}

for val in data[best_feature].unique():

sub_data = data[data[best_feature] == val].drop(columns=[best_feature])

subtree = id3(sub_data, [f for f in features if f != best_feature], target_attr)

tree[best_feature][val] = subtree

return tree

# Prediction function

def predict(tree, instance):

if not isinstance(tree, dict):

return tree

attribute = next(iter(tree))

value = instance[attribute]

if value in tree[attribute]:

10
return predict(tree[attribute][value], instance)

else:

return None # For unknown values

# Function to print the tree in hierarchical format

def print_tree(tree, indent=""):

if isinstance(tree, dict):

for attr, branches in tree.items():

for val, subtree in branches.items():

print(f"{indent}{attr} = {val}")

print_tree(subtree, indent + " ")

else:

print(f"{indent}→ {tree}")

# Prepare data and run ID3

target = df.columns[-1]

features = list(df.columns[:-1])

decision_tree = id3(df, features, target)

# Print the tree

print("\nDecision Tree:\n")

print_tree(decision_tree)

# Generate predictions

y_true = df[target]

11
y_pred = df.apply(lambda row: predict(decision_tree, row), axis=1)

# Confusion Matrix and Classification Report

print("\nConfusion Matrix:\n")

print(confusion_matrix(y_true, y_pred))

print("\nClassification Report:\n")

print(classification_report(y_true, y_pred))

Result:

Decision Tree:

Outlook = Sunny

Humidity = High

→ No

Humidity = Normal

→ Yes

Outlook = Overcast

→ Yes

Outlook = Rain

Wind = Strong

→ No

Wind = Weak

→ Yes

Confusion Matrix:

[[5 0]

[0 9]]

12
Classification Report:

precision recall f1-score support

No 1.00 1.00 1.00 5

Yes 1.00 1.00 1.00 9

accuracy 1.00 14

macro avg 1.00 1.00 1.00 14

weighted avg 1.00 1.00 1.00 14

13
Practical-5

Aim

To implement Simple Linear Regression using Python to predict the dependent variable (Y) based on
one independent variable (X).

Required Tools

 Python 3.x
 Jupyter Notebook or any Python IDE
 Libraries:
o numpy
o pandas
o matplotlib
o sklearn (scikit-learn)

You can install the libraries using:

bash
CopyEdit
pip install numpy pandas matplotlib scikit-learn

🔷 Theory

Linear Regression is a supervised machine learning algorithm used for predicting a continuous
dependent variable (Y) based on the value of an independent variable (X).
It tries to fit a linear relationship in the form:

Y=mX+cY = mX + cY=mX+c

Where:

 YYY: Dependent variable (target)


 XXX: Independent variable (feature)
 mmm: Slope of the line (coefficient)
 ccc: Intercept

Python Code

# Import required libraries


import numpy as np
import pandas as pd

14
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Sample dataset: Hours studied vs. Marks obtained


data = {
'Hours_Studied': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Marks_Obtained': [35, 40, 50, 55, 60, 65, 70, 75, 80, 85]
}
df = pd.DataFrame(data)

# Features and target


X = df[['Hours_Studied']] # 2D array
y = df['Marks_Obtained'] # 1D array

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create Linear Regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Results
print("Slope (m):", model.coef_[0])
print("Intercept (c):", model.intercept_)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R-squared Score:", r2_score(y_test, y_pred))

# Plotting
plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X, model.predict(X), color='red', label='Regression Line')
plt.xlabel('Hours Studied')
plt.ylabel('Marks Obtained')
plt.title('Simple Linear Regression')
plt.legend()
plt.show()

🔷 Result & Explanation

Sample Output:
yaml
CopyEdit
Slope (m): 5.0
Intercept (c): 30.0
Mean Squared Error: 0.0
R-squared Score: 1.0

15
Explanation:

 The model learned that each hour of study increases marks by 5.


 The intercept (30) means if a student studies 0 hours, they get 30 marks.
 R² = 1.0 means perfect prediction on this toy dataset.
 The regression line visually fits the data well, showing a positive linear relationship.

Practical -6

Aim:

To build and train a Convolutional Neural Network using TensorFlow and Keras for
classifying handwritten digits from the MNIST dataset.

Theory:

A Convolutional Neural Network (CNN) is a type of deep neural network especially


effective for image-related tasks. CNNs are inspired by the visual cortex and are designed to

16
automatically and adaptively learn spatial hierarchies of features through backpropagation by
using multiple building blocks, such as:

 Convolutional layers to detect features


 Pooling layers to reduce dimensionality
 Fully connected layers to perform classification

The MNIST dataset contains 70,000 images (28x28 grayscale) of handwritten digits (0–9),
divided into 60,000 training and 10,000 test images.

Requirements:

 Python 3.x
 TensorFlow (install via pip install tensorflow)
 Keras (included with TensorFlow 2.x)
 NumPy and Matplotlib (for data and visualization)

Code:

import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt

# Load MNIST dataset


mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize the data


x_train, x_test = x_train / 255.0, x_test / 255.0

# Reshape data to add channel dimension


x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Build the CNN model


model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])

# Compile the model


model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

17
# Train the model
history = model.fit(x_train, y_train, epochs=5, validation_data=(x_test,
y_test))

# Evaluate the model


test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f'\nTest accuracy: {test_acc:.4f}')

# Plot accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Training vs Validation Accuracy')
plt.show()

Output:

Epoch 1/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.1392 -
accuracy: 0.9583 - val_loss: 0.0448 - val_accuracy: 0.9856
Epoch 2/5
1875/1875 [==============================] - 12s 6ms/step - loss: 0.0452 -
accuracy: 0.9862 - val_loss: 0.0325 - val_accuracy: 0.9893
Epoch 3/5
1875/1875 [==============================] - 11s 6ms/step - loss: 0.0307 -
accuracy: 0.9906 - val_loss: 0.0332 - val_accuracy: 0.9896
Epoch 4/5
1875/1875 [==============================] - 11s 6ms/step - loss: 0.0213 -
accuracy: 0.9933 - val_loss: 0.0304 - val_accuracy: 0.9904
Epoch 5/5
1875/1875 [==============================] - 11s 6ms/step - loss: 0.0165 -
accuracy: 0.9948 - val_loss: 0.0315 - val_accuracy: 0.9906
313/313 - 1s - loss: 0.0315 - accuracy: 0.9906

Test accuracy: 0.9906

18
Practical-7

Aim:

To implement Support Vector Machine (SVM) for classification on the weather11.csv


dataset using Python and evaluate the performance of the model.

Theory:
Support Vector Machine (SVM) is a supervised machine learning algorithm used for
classification and regression tasks. It works by finding the optimal hyperplane that best
separates different classes in the feature space. For non-linear classification problems, SVM
uses kernel tricks like RBF or polynomial kernels to project data into higher dimensions.

Tools Required:

 Python 3.x
 Jupyter Notebook / Google Colab / any IDE
 Libraries:
o pandas
o scikit-learn
o matplotlib
o seaborn

Dataset Used:

Implements an SVM model using the Iris dataset to classify the first two features and visualizes the
decision boundary.

Code:

import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC

from sklearn.metrics import accuracy_score, classification_report

19
# Load the Iris dataset

iris = datasets.load_iris()

X = iris.data[:, :2] # Taking only first two features for visualization

y = iris.target

# Split dataset into training and testing sets (80-20 split)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create SVM model with RBF kernel

svm_model = SVC(kernel='rbf', C=1.0, gamma='scale')

# Train the model

svm_model.fit(X_train, y_train)

# Predict on test data

y_pred = svm_model.predict(X_test)

# Print accuracy

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Plot decision boundary

def plot_decision_boundary(model, X, y):

20
h = .02 # Step size in the mesh

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, h),

np.arange(y_min, y_max, h))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.3)

scatter = plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', cmap=plt.cm.coolwarm)

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.title('SVM Decision Boundary')

plt.show()

# Plot decision boundary

plot_decision_boundary(svm_model, X, y)

output:

Classification Report:

precision recall f1-score support

21
0 1.00 1.00 1.00 10

1 0.88 0.78 0.82 9

2 0.83 0.91 0.87 11

accuracy 0.90 30

macro avg 0.90 0.90 0.90 30

weighted avg 0.90 0.90 0.90 30

Hyperplane:

Practical-10

22
Aim:

To implement association rule mining using the Apriori algorithm in Python to find frequent
itemsets and generate strong association rules from transactional data.

Theory:

Association rule mining is a data mining technique used to discover interesting relations
between variables in large datasets. It is widely used in market basket analysis to identify
products that frequently co-occur in transactions.

Key terms:

 Support: How frequently an itemset appears in the dataset.


 Confidence: How often items in Y appear in transactions that contain X.
 Lift: How much more likely Y is to appear when X occurs, compared to Y appearing
independently.

The Apriori algorithm identifies frequent itemsets by exploiting the property that all subsets
of a frequent itemset must also be frequent.

Requirements:

 Python 3.x
 pandas for data manipulation
 mlxtend for the Apriori and association rule functions
(Install with pip install mlxtend)

Code:
python
CopyEdit
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Sample dataset (each row is a transaction)


dataset = [
['milk', 'bread', 'butter'],
['bread', 'butter'],
['milk', 'bread'],
['milk', 'bread', 'butter'],
['bread', 'butter'],
]

23
# Convert dataset to one-hot encoded DataFrame
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_data = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_data, columns=te.columns_)

# Find frequent itemsets with minimum support of 0.6


frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

# Generate association rules with minimum confidence of 0.7


rules = association_rules(frequent_itemsets, metric="confidence",
min_threshold=0.7)

# Display results
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules[['antecedents', 'consequents',
'support', 'confidence', 'lift']])

After running the Apriori algorithm with a minimum support of 0.6.

Frequent Itemsets:

support itemsets

0 0.8 (bread)

1 0.8 (butter)

2 0.6 (milk)

3 0.6 (bread, butter)

4 0.6 (milk, bread)

Next, after generating the association rules with a minimum confidence of 0.7.

Association Rules:

antecedents consequents support confidence lift

0 (bread) (butter) 0.6 0.75 1.25

1 (butter) (bread) 0.6 0.75 1.25

2 (milk) (bread) 0.6 1.00 1.25

24
Study of res net and mobile net
1. ResNet (Residual Networks)

ResNet, or Residual Networks, was introduced by Kaiming He et al. in their 2015 paper "Deep
Residual Learning for Image Recognition." ResNet became a landmark in deep learning due to its
innovative residual connections, which allowed the successful training of very deep neural networks
with hundreds or even thousands of layers. Before ResNet, training deeper neural networks resulted
in diminishing performance due to vanishing/exploding gradients, making it very difficult to train
models efficiently as the depth increased.

Core Concept: Residual Learning

 Residual Blocks: ResNet introduces residual blocks, where the input to the block is passed
through the main convolutional layers and then added back to the output. This is referred to
as a skip connection or identity mapping.

25
The mathematical expression of a residual block is:

This addition of the identity mapping allows the gradients to flow more easily through the network
during backpropagation, addressing the vanishing gradient problem.

Architecture of ResNet

 ResNet uses convolutional layers, batch normalization, and ReLU activation functions.
 The core unit of ResNet is the residual block. There are different versions of residual blocks
based on the number of layers, which leads to different versions of ResNet (e.g., ResNet-18,
ResNet-50, ResNet-101, ResNet-152).

For example:

o ResNet-18 and ResNet-34: These have fewer layers (18 and 34, respectively) and are
easier to train, making them suitable for smaller datasets.
o ResNet-50, ResNet-101, and ResNet-152: These are deeper networks with more
residual blocks. These architectures perform better for large datasets like ImageNet
and are suitable for tasks requiring higher accuracy.

A residual block in ResNet typically consists of:

o Two or three convolutional layers


o A shortcut connection (identity mapping) that skips one or more layers
o Batch normalization after each convolution
o ReLU activation after each convolution

Advantages of ResNet

 Deep Networks: The main advantage of ResNet is that it can train extremely deep networks
(e.g., ResNet-152) without suffering from degradation in performance.
 Residual Learning: The residual learning mechanism enables the model to focus on learning
residual mappings instead of the entire transformation, simplifying the learning task.
 Improved Gradient Flow: Residual connections improve the gradient flow during
backpropagation, making training easier for deep networks.

Applications of ResNet

 Image Classification: ResNet performs excellently on image classification tasks, including


benchmarks like ImageNet.

26
 Object Detection: ResNet is often used as a backbone for object detection models like Faster
R-CNN.
 Semantic Segmentation: Networks like DeepLab use ResNet as the encoder for
segmentation tasks.
 Feature Extraction: Due to its powerful feature extraction capabilities, ResNet is widely used
in transfer learning for various vision tasks.

Challenges

 Computational Cost: Deeper ResNet architectures, especially those with hundreds of layers,
require considerable computational power for training and inference.
 Memory Usage: As the number of layers increases, the memory usage during training also
grows.

2. MobileNet

Introduction

MobileNet, introduced by Google in 2017, is an architecture specifically designed for efficient


deployment on mobile and embedded devices. MobileNet focuses on reducing the computational
cost and the number of parameters, without sacrificing too much accuracy. This makes it ideal for
real-time inference on devices with limited resources, such as smartphones, IoT devices, and
embedded systems.

Core Concept: Depthwise Separable Convolution

 The key innovation in MobileNet is the depthwise separable convolution, a factorized


version of traditional convolutions, which significantly reduces the number of parameters
and computation.

A traditional convolution operation involves applying a kernel to all channels of the input feature
map, resulting in a computationally expensive operation. In contrast, depthwise separable
convolutions split this operation into two steps:

1. Depthwise Convolution: Applies a single convolutional filter to each input channel


separately.
2. Pointwise Convolution: Uses 1×11 \times 11×1 convolutions to combine the outputs of the
depthwise convolution.

This decomposition reduces the number of parameters and computations by a large factor, making
the network much more efficient.

Architecture of MobileNet

 Convolution Layers: MobileNet employs a series of depthwise separable convolutions,


followed by pointwise convolutions. These layers are stacked to form the overall
architecture.

27
 Width Multiplier: MobileNet includes a hyperparameter called the width multiplier
(denoted by α\alphaα), which scales the number of channels in each layer. A lower value of
α\alphaα reduces the size of the network, making it faster but less accurate.
 Resolution Multiplier: Another parameter, the resolution multiplier (denoted by ρ\rhoρ),
scales the input resolution, reducing the input size for faster processing at the cost of some
accuracy.

Advantages of MobileNet

 Lightweight: The use of depthwise separable convolutions reduces the number of


computations, making MobileNet ideal for mobile and embedded devices.
 Flexibility: The architecture can be easily scaled by adjusting the width multiplier and
resolution multiplier to balance between accuracy and performance, depending on the
device’s capabilities.
 Fast Inference: Due to fewer parameters and lower computational requirements, MobileNet
can run in real-time on mobile devices with limited resources.

Applications of MobileNet

 Real-Time Object Detection: MobileNet can be used in combination with frameworks like
Single Shot Multibox Detector (SSD) for real-time object detection tasks.
 Face Recognition: Due to its efficiency, MobileNet is often used for face recognition
applications on mobile devices.
 Augmented Reality (AR): MobileNet is suitable for AR applications on smartphones and
other embedded devices that require low-latency processing.
 Speech Recognition: Used in speech-to-text applications where real-time processing is
essential.

Challenges

 Lower Accuracy: While MobileNet is highly efficient, its accuracy tends to be lower
compared to larger networks like ResNet, especially on complex tasks.
 Trade-off Between Speed and Accuracy: There is always a trade-off when using MobileNet,
where reducing the width or resolution multiplier can result in lower accuracy but faster
inference.

Comparison Between ResNet and MobileNet

Aspect ResNet MobileNet

Efficiency and speed for mobile


Primary Focus Deep networks for high accuracy
devices

Key Innovation Residual connections (skip connections) Depthwise separable convolutions

28
Aspect ResNet MobileNet

Number of
High (especially in deep versions) Low (optimized for efficiency)
Parameters

High (large models require significant Low (optimized for resource-


Computational Cost
computational resources) constrained devices)

Moderate, but can be improved by


Accuracy Very high, especially for large datasets
adjusting parameters

Training Time Longer for deep versions Faster due to fewer parameters

Large-scale image classification, object Mobile and embedded applications,


Use Case
detection, segmentation real-time inference

Smaller, optimized for low memory


Model Size Larger due to deep architecture
usage

Less flexible, mostly suited for large Highly flexible with width and
Flexibility
datasets resolution multipliers

Conclusion

 ResNet is an excellent choice for applications that require high accuracy and can afford high
computational cost, such as large-scale image classification, object detection, and semantic
segmentation.
 MobileNet, on the other hand, is ideal for mobile devices, IoT applications, and any scenario
where computational efficiency, speed, and lower memory usage are more important than
achieving the highest accuracy possible.

29

You might also like