0% found this document useful (0 votes)
8 views4 pages

Week 3 - Machine Learnigng

This document outlines the fundamentals of machine learning, including its definitions, types, and workflows, emphasizing its integration with data science. It explains key concepts such as supervised, unsupervised, and reinforcement learning, as well as essential terminologies like features, labels, and model evaluation metrics. Additionally, it introduces Python libraries NumPy and Pandas for data manipulation and provides sample code for practical applications.

Uploaded by

UK Creation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Week 3 - Machine Learnigng

This document outlines the fundamentals of machine learning, including its definitions, types, and workflows, emphasizing its integration with data science. It explains key concepts such as supervised, unsupervised, and reinforcement learning, as well as essential terminologies like features, labels, and model evaluation metrics. Additionally, it introduces Python libraries NumPy and Pandas for data manipulation and provides sample code for practical applications.

Uploaded by

UK Creation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Week 3: Machine Learning Fundamentals

Diploma in Computer Science & Engineering – Course Code: 20CS51I

Learning Objectives

This week introduces the core principles of machine learning and its integration with data
science. Students will explore how machines learn from data, understand the structure of ML
workflows, and become familiar with essential terminology used in academic and industry
settings.

What Is Machine Learning?

Machine Learning is a branch of Artificial Intelligence that enables systems to learn from
data and improve performance over time without being explicitly programmed. It powers
intelligent applications like recommendation engines, fraud detection systems, autonomous
vehicles, and chatbots.

In real-world scenarios, machine learning helps Netflix suggest movies, Google Maps predict
traffic, and platforms like CyberRaksha Grid detect cyber fraud patterns. These systems rely
on historical data and algorithms to make predictions or decisions.

Types of Machine Learning

Machine learning is broadly categorized into three types:

Supervised Learning involves training models on labeled data, where both inputs and
expected outputs are known. It’s commonly used for tasks like spam detection, disease
prediction, and price forecasting.

Unsupervised Learning works with unlabeled data to uncover hidden patterns or groupings.
It’s useful for customer segmentation, anomaly detection, and market basket analysis.

Reinforcement Learning teaches agents to make decisions by interacting with an


environment and receiving feedback in the form of rewards or penalties. This approach is
popular in robotics, gaming, and autonomous navigation.

Machine Learning Workflow

Every machine learning project follows a structured workflow that transforms raw data into
actionable insights. It begins with defining the problem and collecting relevant data from
sources like sensors, APIs, or logs. The data is then cleaned and transformed to prepare it for
modeling.

Choosing the right algorithm is crucial. Models are trained to learn patterns from the data and
then evaluated using performance metrics. Once validated, the model is deployed into a real-
world application and monitored for accuracy and reliability. Retraining may be necessary as
new data becomes available.
Data Science and Its Role

Data science is the foundation that supports machine learning. It combines statistical analysis,
programming, and domain expertise to extract meaningful insights from data. A data
scientist’s workflow includes data wrangling, visualization, modeling, and storytelling.

Python is the most widely used language in data science. Libraries like Pandas help
manipulate data, Matplotlib and Seaborn enable visualization, and Scikit-learn provides tools
for building and evaluating machine learning models.

Data science ensures that the data fed into ML models is clean, relevant, and structured for
optimal learning.

ML Pipeline: From Data to Deployment

A machine learning pipeline is a step-by-step process that automates the journey from raw
data to deployed model. It typically starts with data engineering, where missing values are
handled, categories are encoded, and features are scaled.

Modeling involves selecting and training algorithms on the processed data. Once the model
performs well, it’s deployed using tools like Flask or FastAPI. For scalability, Docker
containers or cloud platforms such as AWS or Azure are used.

This pipeline ensures reproducibility, efficiency, and scalability in real-world ML


applications.

Essential Terminologies

Understanding machine learning requires fluency in key terms:

 A feature is an input variable used for prediction, such as age or income.


 A label is the output variable, like “spam” or “not spam.”
 The training set is the portion of data used to teach the model.
 The test set evaluates how well the model generalizes to new data.
 Overfitting occurs when a model memorizes training data but fails on unseen data.
 Underfitting happens when a model is too simple to capture patterns.
 Accuracy, precision, and recall are metrics used to assess model performance.

These terms form the vocabulary of every data scientist and ML engineer.

Suggested Activities

Students can deepen their understanding through hands-on practice and reflection.
Comparing supervised and unsupervised learning with real-world examples helps clarify their
differences. Sketching a machine learning pipeline diagram reinforces the workflow stages.
Creating a glossary of ML terms with definitions and examples builds technical fluency.
Using Scikit-learn to train a basic classifier and visualizing data distributions with Seaborn
provides practical exposure. Splitting data into training and test sets and evaluating model
accuracy completes the learning loop.

Python NumPy & Pandas – Notes for Beginners

What is NumPy?

NumPy (Numerical Python) is a Python library used for numerical computations. It


provides support for:

 Multi-dimensional arrays and matrices


 Mathematical functions like sum, mean, dot product, etc.
 Efficient operations on large datasets

Key Features

 Fast array operations


 Broadcasting (automatic shape adjustment)
 Linear algebra support
 Random number generation

What is Pandas?

Pandas is a Python library for data manipulation and analysis. It provides two main data
structures:

 Series: One-dimensional labeled array


 DataFrame: Two-dimensional labeled data table (like Excel)

Key Features

 Easy data loading from CSV, Excel, JSON


 Data filtering, grouping, and aggregation
 Handling missing values
 Powerful indexing and slicing

Installation Steps

To install both libraries, use:

pip install numpy pandas

Or use Google Colab (no installation needed).

Sample Programs

NumPy Examples
Create and Manipulate Arrays
import numpy as np

arr = [Link]([10, 20, 30])


print("Array:", arr)
print("Sum:", [Link](arr))
print("Mean:", [Link](arr))
print("Squared:", arr ** 2)

Matrix Operations
A = [Link]([[1, 2], [3, 4]])
B = [Link]([[5, 6], [7, 8]])

print("Dot Product:\n", [Link](A, B))


print("Transpose of A:\n", A.T)

Pandas Examples

Create a DataFrame
import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Marks': [85, 92, 78]
}

df = [Link](data)
print(df)

Analyze and Filter Data


print([Link]()) # Summary statistics
print(df[df['Marks'] > 80]) # Filter rows
df['Grade'] = ['B', 'A', 'C'] # Add column
print(df)

Read CSV File


df = pd.read_csv('[Link]')
print([Link]())

You might also like