0% found this document useful (0 votes)
2 views63 pages

Module 1 Introduction Notes Ml

The document provides an introduction to Machine Learning (ML), discussing its need, definitions, types, and processes, as well as its relationship to other fields like AI and statistics. It outlines the challenges faced in ML, such as data requirements and algorithm complexity, and details the CRISP-DM model for ML projects. Additionally, it highlights various applications of ML, including sentiment analysis, recommendation systems, and voice assistants.

Uploaded by

Manu Manoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views63 pages

Module 1 Introduction Notes Ml

The document provides an introduction to Machine Learning (ML), discussing its need, definitions, types, and processes, as well as its relationship to other fields like AI and statistics. It outlines the challenges faced in ML, such as data requirements and algorithm complexity, and details the CRISP-DM model for ML projects. Additionally, it highlights various applications of ML, including sentiment analysis, recommendation systems, and voice assistants.

Uploaded by

Manu Manoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

lOMoARcPSD|54088532

Module-1 Introduction notes- ML

Machine Learning (Visvesvaraya Technological University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Manoj K ([email protected])
lOMoARcPSD|54088532

Module-1
Introduction: Need for Machine Learning, Machine Learning Explained, Machine Learning in Relation
to other Fields, Types of Machine Learning, Challenges of Machine Learning, Machine Learning Process,
Machine Learning Applications.
Understanding Data – 1: Introduction, Big Data Analysis Framework, Descriptive Statistics, Univariate
Data Analysis and Visualization.
Chapter-1, 2 (2.1-2.5)
Need for Machine Learning
 Historical Challenges
o Data was scattered across different archive systems, making integration difficult.
o Lack of awareness about software tools to extract useful insights.
 Reasons for Machine Learning Popularity
1. High Data Volume: Companies like Facebook, Twitter, and YouTube generate vast amounts
of data, doubling every year.
2. Reduced Storage Cost: Lower hardware costs make data capture, processing, storage, and
transmission easier.
3. Advanced Algorithms: The rise of deep learning has introduced complex and efficient
machine learning algorithms.
 Knowledge Pyramid
o Data: Raw facts (numbers, text, etc.).
o Information: Processed data (patterns, associations).
o Knowledge: Condensed information (historical patterns, trends).
o Intelligence: Applied knowledge for actions.
o Wisdom: Human-like decision-making ability.
 Data: A list of temperatures recorded hourly (e.g., 30°C, 32°C, 31°C, 29°C).
 Information: The average temperature for the day is 30.5°C.
 Knowledge: The temperature tends to drop in the evening based on past records.
 Intelligence: Carry an umbrella or wear light clothing based on the weather forecast.
 Wisdom: Choosing the best time to step out based on weather, personal health, and planned
activities.
 Need for Machine Learning
o Helps organizations analyze archival data for better decision-making.
o Aids in designing new products and improving business processes.
o Supports the development of effective decision support systems.

1|Page M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Machine Learning Explained


 Definition
o Machine Learning (ML) is a subfield of Artificial Intelligence (AI).
o Arthur Samuel: "ML enables computers to learn without explicit programming."
 Traditional vs. ML Approach
o Conventional Programming: Requires predefined logic, rules, and expert knowledge (e.g.,
expert systems like MYCIN).
o Machine Learning: Uses data-driven models to automatically learn patterns for prediction.

 Statistical Learning Model


o Relationship between input (x) and output (y) is modeled as y = f(x).
o Learning function f maps inputs to outputs.
 Types of ML Models
o Mathematical Equations
o Relational Diagrams (Trees/Graphs)
o Logical Rules (If/Else Statements)
o Clusters (Grouping of Data)

2|Page M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Key ML Concept (Tom Mitchell’s Definition)

o
Example: Object detection improves as more labeled images (experience) are provided.

 ML Process
1. Data Collection: Gathering relevant data.
2. Abstraction: Identifying key concepts from data.
3. Generalization: Converting abstract concepts into actionable intelligence.
4. Heuristic Formation: Making educated guesses based on patterns.
5. Evaluation & Course Correction: Refining models for better accuracy.

 Human vs. Machine Learning


o Humans learn through experience, observation, and trial & error.
o Machines learn through data collection, abstraction, generalization, and heuristics.
o Both require evaluation and correction to improve accuracy.
Machine Learning in Relation to Other Fields
1.3.1 Machine Learning and Artificial Intelligence
 AI is a broader field aimed at developing intelligent agents (robots, humans, autonomous systems).
 Machine Learning (ML) is a subfield of AI focused on extracting patterns for prediction.
 Deep Learning (DL) is a subbranch of ML, utilizing neural networks inspired by human neurons.
 The shift from rule-based AI to data-driven AI led to the resurgence of AI and ML.

1.3.2 Machine Learning, Data Science, Data Mining, and Data Analytics
3|Page M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Data Science: An umbrella term covering multiple fields, including ML. It focuses on data
collection and analysis.
 Big Data: Deals with massive datasets characterized by:
1. Volume – Large amounts of data (e.g., Facebook, YouTube).
2. Variety – Multiple formats (text, images, videos).
3. Velocity – High-speed data generation and processing.
 Data Mining: Extracts hidden patterns from data, while ML uses these patterns for prediction.
 Data Analytics: Converts raw data into useful insights. Predictive analytics is closely related to
ML.
 Pattern Recognition: Uses ML algorithms to classify and analyze patterns in data.
1.3.3 Machine Learning and Statistics
 Statistics: A mathematical field that sets hypotheses, validates them, and finds relationships in data.
 Machine Learning: Focuses more on automation and requires fewer assumptions than traditional
statistics.
 Key Differences:
o Statistics relies on rigorous mathematical models and theoretical foundations.
o ML is more tool-based, focusing on learning from data with minimal manual intervention.
 Some view ML as an evolution of "old statistics," recognizing their deep connection.

Types of Machine Learning

4|Page M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

1.4.1 Supervised Learning

 Uses labeled data with input-output pairs.


 Involves a teacher-student analogy where a model is trained on labeled data and tested on unseen
data.

5|Page M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Two main types:


o Classification: Predicts discrete labels (e.g., dog vs. cat).

o Regression: Predicts continuous values (e.g., house price).

o
 Key algorithms: Decision Trees, Random Forest, SVM, Naïve Bayes, Neural Networks.
6|Page M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

1.4.2 Unsupervised Learning

 No labeled data; model finds patterns in the data.


 Often used for clustering and dimensionality reduction.
Cluster Analysis

 Example: Grouping customers based on purchasing behavior.


Key algorithms used:

 Dimensionality Reduction: Reduces data features while preserving important patterns.


 Differences from supervised learning: No supervisor, focuses on grouping rather than labeling.

7|Page M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

1.4.3 Semi-Supervised Learning


 Combination of labeled and unlabeled data.
 Helps when labeling data is expensive.
 Uses pseudo-labeling techniques to enhance learning.
1.4.4 Reinforcement Learning

 Agent interacts with the environment, receives rewards or penalties.


 Used for decision-making tasks like robotics, gaming (e.g., AlphaGo), and autonomous systems.
 Key concept: Maximizing cumulative rewards over time.
 Example: Navigating a grid world by learning the best path to maximize rewards.

Challenges of Machine Learning


1. Well-posed vs. Ill-posed Problems
o Machine learning requires well-defined problems with complete specifications.
o Ill-posed problems, like ambiguous mathematical models, require more data for validation.
Example of an Ill-posed Problem (Table 1.3)
Consider the dataset:
Input (x₁, x₂) Output (y)
(1, 1) 1
(2, 1) 2
(3, 1) 3
(4, 1) 4
(5, 1) 5
Now, we need to determine a mathematical model that fits this data. One possible function is:
y=x1×x2
which holds true for the given inputs.
8|Page M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

However, other possible functions that fit this dataset are:


1. y=x1+x2−1
2. y=x1
Since multiple valid models exist for the same dataset, the problem is ill-posed—there is no unique solution.
To make it well-posed, we need more data points or additional constraints to confirm the correct
relationship between inputs and outputs.
Ill-posed problems are common in puzzles, games, and scientific computations where insufficient
information makes it difficult to determine a single correct answer.

2. Huge Data Requirement


o ML models need large, high-quality datasets with minimal missing or incorrect data.
3. High Computational Power
o Big Data processing demands GPUs or TPUs for efficient execution.
o Increasing model complexity leads to higher time complexity.
4. Algorithm Complexity
o Choosing, implementing, and optimizing the right algorithm is challenging.
o Continuous evaluation and comparison are needed for optimal performance.
5. Bias-Variance Tradeoff
o Overfitting: Model performs well on training data but poorly on new data.
o Underfitting: Model fails to capture patterns even in training data.
o Balancing bias and variance is critical for generalization.

1.6 Machine Learning Process (CRISP-DM Model)


CRISP-DM (Cross Industry Standard Process for Data Mining) is a widely used model for data mining
and machine learning projects. It consists of six steps:

9|Page M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

1. Understanding the Business – Identify business objectives, define the problem statement, and
determine whether a single algorithm is sufficient.
2. Understanding the Data – Collect data, analyze its characteristics, and match patterns to
hypotheses.
3. Data Preparation – Clean the raw data, handle missing values, and prepare it for model training.
4. Modeling – Apply machine learning algorithms to detect patterns and generate predictive models.
5. Evaluation – Assess model performance using statistical analysis, accuracy metrics, and domain
expertise.
6. Deployment – Implement the trained model to improve business processes or handle new situations.
1.7 Applications of Machine Learning
Machine learning is widely used across various domains. Some key applications include:
1. Sentiment Analysis – NLP-based analysis of emotions in text (e.g., product and movie reviews).
2. Recommendation Systems – Personalized suggestions in e-commerce (Amazon), streaming
platforms (Netflix), etc.
3. Voice Assistants – AI-powered assistants like Alexa, Siri, and Google Assistant.
4. Navigation & Transportation – Google Maps, Uber route optimization.

10 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

11 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Chapter 2:
Understanding Data – 1: Introduction
Facts and Data in Computer Systems
 All facts are considered data.
 Bits encode facts in numbers, text, images, audio, and video.
 Data can be human-interpretable (numbers, text) or machine-interpretable (images, videos).
 Organizations store large volumes of data (Gigabytes, Terabytes, Exabytes).
 Byte = 8 bits, KB = 1024 bytes, MB ≈ 1000 KB, GB ≈ 1,000,000 KB, 1TB = 1000 GB, 1EB =
1,000,000 TB.
Data Sources
 Flat Files, Databases, Data Warehouses store data.
 Operational Data: Used in daily business processes (e.g., sales records).
 Non-Operational Data: Used for decision-making (e.g., historical trends).
Data vs. Information
12 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Data alone is meaningless; it must be processed to become information.


 Example: A list of numbers becomes useful when labeled (e.g., "Height of students").
 Processed data reveals patterns and insights (e.g., best-selling product in a quarter).

Elements of Big Data (6 Vs)


1. Volume
o Massive growth of data due to cheaper storage devices.
o Measured in PB (Petabytes) and EB (Exabytes) instead of just GB or TB.
2. Velocity
o Data is generated and processed at high speed.
o IoT devices and the internet contribute to this rapid increase.
3. Variety
o Form: Data can be text, graphs, images, audio, video, etc.
o Function: Data sources include human conversations, transactions, and archives.
o Source: Data comes from public/open data, social media, and multimodal sources.
4. Veracity
o Ensures data accuracy, truthfulness, and reliability.
o Errors may come from technical, human, or typographical mistakes.
5. Validity
o Ensures data is accurate for decision-making.
6. Value
o Determines how useful extracted data insights are for business or research.
Data Quality Factors
 Precision: Closeness of repeated measurements.
 Bias: Systematic error due to faulty assumptions.
 Accuracy: Measurement closeness to the true value.

Types of Data in Big Data


1. Structured Data
 Stored in databases (tables, rows, columns).
 Retrieved using SQL.
 Common in machine learning (record data, data matrices, graph data, ordered data).
Structured Data Types
 Record Data: Data arranged in rows (entities) and columns (attributes/features).
 Data Matrix: A numeric variation of record data, used in mathematical computations.
 Graph Data: Represents relationships (e.g., web pages linked via hyperlinks).
13 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Ordered Data: Involves attributes with implicit order.


Examples of Ordered Data
 Temporal Data: Time-based attributes (e.g., sales trends during festivals).
 Sequence Data: Ordered without timestamps (e.g., DNA sequences).
 Spatial Data: Geographic attributes (e.g., maps with location points).
2. Unstructured Data
 Includes videos, images, audio, text documents, blogs.
 Makes up 80% of all data.
3. Semi-Structured Data
 Partially structured and unstructured.
 Examples: XML, JSON, RSS feeds, hierarchical data.

Data Storage and Representation


1. Flat Files
 Simple, plain text storage (ASCII, EBCDIC format).
 Best for small datasets but not suitable for big data.
Popular Spreadsheet Formats
 CSV (Comma-Separated Values): Data values separated by commas.
 TSV (Tab-Separated Values): Data values separated by tabs.
 Supported by Excel, Google Sheets.
2. Database Systems
 Contains database files and DBMS (Database Management System).
 Data is stored in tables (rows = records, columns = attributes).
 SQL is used for querying and data manipulation.
Types of Databases
1. Transactional Database: Stores records of transactions (time-stamped, item-based).
2. Time-Series Database: Stores time-dependent data (e.g., hourly, daily sales).
3. Spatial Database: Stores raster (bitmaps, images) and vector (maps, polygons) data.
3. Web-Based Data Sources
 World Wide Web (WWW): Huge global data source for mining.
 XML (Extensible Markup Language): Used for structured data sharing.
4. Data Streams
 Dynamic data flow that continuously changes.
 Example: IoT sensor readings, stock market data.
5. RSS Feeds & JSON
 RSS (Really Simple Syndication): Delivers live data updates.
14 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 JSON (JavaScript Object Notation): Common data exchange format in machine learning.

Big Data Analytics and Types of Analytics


Purpose of Data Analysis
 Helps businesses make informed decisions (e.g., identifying the fastest-selling product).
 Converts raw data into meaningful insights for decision-making.
Difference Between Data Analysis and Data Analytics
 Data Analytics: Encompasses data collection, preprocessing, analysis, and prediction.
 Data Analysis: A subset of data analytics focused on analyzing historical data.
Types of Data Analytics
1. Descriptive Analytics
o Summarizes and quantifies collected data.
o Focuses on what happened rather than why it happened.
o Example: Monthly sales reports, website traffic analysis.
2. Diagnostic Analytics
o Answers the question "Why did it happen?"
o Involves causal analysis (identifying reasons behind trends).
o Example: Finding reasons for a decline in product sales.
3. Predictive Analytics
o Forecasts future outcomes using historical data and algorithms.
o Answers "What will happen next?"
o Example: Predicting customer demand, stock market trends.
4. Prescriptive Analytics
o Recommends actions to improve future outcomes.
o Goes beyond prediction to suggest best strategies.
o Example: Suggesting marketing campaigns to boost sales.

Big Data Analysis Framework and Processing Cycle


Big Data Framework Architecture
Big Data frameworks use a 4-layer architecture to efficiently handle vast amounts of data:
1. Data Connection Layer – Responsible for collecting raw data from various sources, such as
databases, IoT devices, and web scraping. It includes ETL (Extract, Transform, Load) processes to
prepare data for storage.

15 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

2. Data Management Layer – Manages storage, indexing, and retrieval of data. This includes
preprocessing tasks like data cleaning, deduplication, and transformation for efficient query
execution.
3. Data Analytics Layer – The core of Big Data processing, where machine learning algorithms,
statistical models, and predictive analytics are applied. The processing is done as shown below:

Cloud Computing
 Definition: Pay-per-usage model providing shared processing power, storage, and services over the
Internet.
 Service Models:
o SaaS (Software as a Service): Access to software applications via the cloud.
o PaaS (Platform as a Service): Platform to develop and run applications.
o IaaS (Infrastructure as a Service): Access to infrastructure like servers, storage, and OS.
 Deployment Models:
o Public Cloud: Open to the public, managed by a third-party vendor.
o Private Cloud: Owned by a single organization, providing secure access.
o Community Cloud: Shared by multiple organizations for common goals.
o Hybrid Cloud: Combination of two or more cloud types.
 Characteristics:
1. Shared Infrastructure – Shared physical resources (storage, networking).
2. Dynamic Provisioning – Resource allocation based on demand.
3. Dynamic Scaling – Automatic expansion and contraction of resources.
4. Network Access – Access through the Internet.
5. Utility-based Metering – Billing based on usage.
6. Multitenancy – Supports multiple customers.
7. Reliability – Ensures consistent, reliable service.

Grid Computing
 Definition: Parallel and distributed computing model connecting multiple nodes to function as a
virtual supercomputer.
 Features:
o Connects thousands of nodes as a cluster using middleware software.
o Tasks are divided and processed in parallel across multiple nodes.
o Suitable for complex applications requiring high computing power.

High-Performance Computing (HPC)


16 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Definition: Uses parallel processing to solve complex scientific, engineering, and business problems
at high speed.
 Components:
1. Compute – Networked servers to process tasks.
2. Network – Connects compute nodes for communication.
3. Storage – Stores and retrieves data outputs.
 Features:
o Combines thousands of compute nodes working in parallel.
o Suitable for tasks requiring large-scale, fast computations.

4. Presentation Layer – The final stage that involves visualization techniques such as dashboards and
applications to interpret and display results effectively.

Big Data Processing Cycle


The processing cycle in Big Data analytics is an iterative process involving:
1. Data Collection – Acquiring high-quality data for analysis.
2. Data Preprocessing – Cleaning and transforming data to improve accuracy.
3. Application of Machine Learning Algorithms – Applying predictive models and analytics.
4. Interpretation and Visualization – Understanding patterns and presenting results through charts,
graphs, or dashboards.
Each step ensures data is suitable for machine learning and data mining applications and improves
decision-making efficiency.

2.3.1 Data Collection


Data collection is the first and most crucial step in Big Data processing. The quality of collected data
significantly impacts the accuracy of the results.
Characteristics of Good Data
A dataset is considered "Good" if it meets the following criteria:
1. Timeliness – Data should be up-to-date and not outdated or obsolete.
2. Relevancy – Data must be relevant and free from bias. It should contain all necessary attributes for
analysis.
3. Interpretability – Data should be well-understood and structured, making it useful for domain
experts.
Types of Data Sources
Data sources are categorized into:
17 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

1. Open/Public Data
These datasets are freely available and have minimal copyright restrictions. Examples include:
 Government Census Data – Demographic, economic, and social statistics collected by
governments.
 Digital Libraries – Large repositories containing textual and image-based documents.
 Scientific Databases – Collections of genomic, biological, and research data.
 Healthcare Databases – Patient records, insurance data, and medical research information.
2. Social Media Data
Generated by platforms like Twitter, Facebook, YouTube, and Instagram, social media data includes:
 Text posts, comments, and messages.
 Images and videos.
 Likes, shares, and interactions.
3. Multimodal Data
Includes diverse formats such as text, images, audio, and video. Examples:
 Image Archives – Large repositories of labeled images, combined with metadata.
 World Wide Web (WWW) – A vast source of structured and unstructured data distributed across
the internet.

2.3.2 Data Preprocessing


Data collected from various sources is often incomplete, inconsistent, and redundant. Preprocessing is
essential to refine data before analysis.
Handling Dirty Data
Dirty data includes:
 Incomplete Data – Missing values in records.
 Inconsistent Data – Contradictory or incorrect values.
 Noisy Data – Random errors or distortions in datasets.
 Duplicate Data – Repeated records affecting accuracy.

Handling Missing Data


Several techniques are used to deal with missing values:

18 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

1. Ignore the Tuple – Discard records with missing values (useful if missing data is minimal).
2. Manual Filling – Domain experts manually enter missing values (time-consuming for large
datasets).
3. Global Constant Substitution – Replace missing values with a generic label like "Unknown" or
"Infinity".
4. Attribute Mean Substitution – Fill missing values with the mean of the attribute.
5. Class-Specific Mean – Use the average of similar class records for substitution.
6. Predictive Methods – Use machine learning models like decision trees to estimate missing values.
These methods help reduce bias and improve dataset quality but may introduce estimation errors.

Removal of Noisy and Outlier Data


Noise – Random errors or variance in measured values.
Techniques to Remove Noise:
Binning Method – Groups data into equal-sized bins and smooths noisy values using statistical methods.
Example 2.1: Binning Techniques
Given dataset S = {12, 14, 19, 22, 24, 26, 28, 31, 34}, assume bin size = 3.
1. Binning by Mean:
o Bin 1: (12, 14, 19) → 15, 15, 15
o Bin 2: (22, 24, 26) → 24, 24, 24
o Bin 3: (28, 31, 34) → 30.3, 30.3, 30.3
2. Binning by Boundaries:
o Bin 1: (12, 14, 19) → 12, 12, 19
o Bin 2: (22, 24, 26) → 22, 22, 26
o Bin 3: (28, 31, 34) → 28, 34, 34
This method smooths data while preserving key trends.

Data Integration & Transformation


Data Integration
Combines multiple datasets and eliminates redundancy.
Data Transformation
Converts raw data into a usable format, often through normalization.
Normalization Techniques
1. Min-Max Normalization
Scales values within a fixed range (e.g., 0-1) using the formula:

19 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Example 2.2: Min-Max Normalization


Given marks V = {88, 90, 92, 94}, with min = 88, max = 94, and range [0,1]:

Mapped range: {0, 0.33, 0.66, 1}

20 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

2. z-Score Normalization
Scales values using mean (μ) and standard deviation (σ):

Example 2.3: z-Score Normalization


Given marks V = {10, 20, 30}, with μ = 20 and σ = 10:

Mapped range: {-1, 0, 1}


z-Scores help detect outliers (values beyond ±3 indicate potential anomalies).

Data Reduction
Reduces dataset size without losing significant information.
Techniques:
1. Data Aggregation – Summarizing data.
2. Feature Selection – Removing irrelevant attributes.
3. Dimensionality Reduction – Techniques like PCA (Principal Component Analysis) reduce data
dimensions for better processing.

Descriptive Statistics
Definition
 Summarizes and describes datasets.
 Does not focus on machine learning algorithms.
 Helps in Exploratory Data Analysis (EDA) for understanding data before applying ML techniques.
Dataset and Data Types
 A dataset consists of multiple data objects (records, vectors, patterns, etc.).
 Each data object has multiple attributes (characteristics of an object).

21 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Broad Classification of Data

1. Categorical (Qualitative) Data


o Nominal Data (No meaningful order)
 Example: Patient ID (101, 102, 103) – These are just labels and cannot be averaged.
o Ordinal Data (Has a meaningful order)
 Example: Fever Level (Low, Medium, High) – The order matters but the difference
between them is not quantifiable.
2. Numeric (Quantitative) Data
o Interval Data (Meaningful differences, no true zero)
 Example: Temperature in Celsius (30°C, 40°C) – The difference is meaningful, but
zero does not mean "no temperature."
o Ratio Data (Meaningful differences and ratios, true zero exists)
 Example: Weight (50 kg, 100 kg) – 100 kg is twice as heavy as 50 kg, and zero
means no weight.

Alternative Classification of Data

1. Discrete Data (Whole numbers, countable)

22 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

o Example: Number of students in a class (10, 20, 30).


2. Continuous Data (Decimal values, measurable)
o Example: A patient's age (25.5 years), height (170.2 cm).

Classification Based on Variables


 Univariate Data: Contains a single variable.
 Bivariate Data: Contains two variables.
 Multivariate Data: Contains three or more variables.

Univariate Data Analysis and Visualization


1. Univariate Analysis
 Simplest form of statistical analysis involving a single variable.
 Focuses on describing data and identifying patterns.
 Does not deal with relationships or causes.
 Includes frequency distributions, central tendency measures, dispersion, and shape of data.
2. Data Visualization
 Essential for understanding and presenting data effectively.
 Helps in summarization, description, exploration, and comparison of data.
Common Graphs Used:
 Bar Chart: Displays frequency distribution of discrete variables; useful for comparing groups.

23 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Pie Chart: Represents percentage frequency distribution; useful for showing proportions.

24 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Histogram: Depicts frequency distributions; helps identify data shape, skewness, and mode.

 Dot Plot: Similar to bar charts but less cluttered; visually identifies high and low values.

3. Central Tendency
Summarizes data by finding the central point.
I. Mean: Arithmetic average; affected by extreme values.

25 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

1. Weighted mean considers different importance levels.

2. Geometric mean is used for multiplicative relationships.

26 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

II. Median
The median is the middle value in an ordered dataset.
 If the total number of values is odd, the median is the middle value.
 If the total number of values is even, the median is the average of the two middle values.
 In grouped data, the median is found using the formula:

where:
 L1 = lower boundary of the median class
 N= total number of observations
 cf = cumulative frequency before the median class
 f = frequency of the median class
 i = class width
Example
Consider a dataset: 10, 20, 30, 40, 50
 The median is 30 (middle value).
For an even dataset: 10, 20, 30, 40, 50, 60
 The median is (30+40)/2 = 35

2. Mode
The mode is the most frequently occurring value in a dataset.

27 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 If a dataset has one mode, it is unimodal (e.g., 10, 20, 20, 30 → mode = 20).
 If it has two modes, it is bimodal (e.g., 10, 20, 20, 30, 30 → modes = 20, 30).
 If it has three or more modes, it is trimodal/multimodal.
Example
Dataset: 5, 7, 8, 8, 10, 12, 12, 12, 15
 The mode is 12 (occurs most frequently).
For grouped data, mode is calculated using a specific formula similar to median.
This simplified explanation should help you grasp the concepts clearly! 🚀

4. Dispersion
Measures the spread of data around the central value.

 Range: Difference between maximum and minimum values.

 Standard Deviation: Measures variation from the mean.

28 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Interquartile Range (IQR): Difference between the third (Q3) and first quartile (Q1); helps detect
outliers.

29 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Five-Point Summary:  Definition: A box plot is a graphical method to display data distribution using five
key values.
 Five-Number Summary:

 Minimum: Smallest value


 Q1 (First Quartile): 25th percentile
 Median (Q2): Middle value
 Q3 (Third Quartile): 75th percentile
 Maximum: Largest value

 Box Representation:

 The box represents the middle 50% of the data (from Q1 to Q3).
 A line inside the box represents the median (Q2).

 Whiskers:

 Extend from Q1 to the minimum and from Q3 to the maximum.


 Indicate the spread of the data.

 Skewness:

 If the median is not in the center of the box, the data is skewed.

 Example:

 Data: {5, 7, 12, 15, 20, 22, 28, 30, 35}


 Q1 = 12, Median = 20, Q3 = 28
 Min = 5, Max = 35
 The box spans 12 to 28, whiskers extend to 5 and 35.

30 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 5. Shape of Data Distribution


 Skewness: Measures symmetry.
o Positive Skew: Right tail longer; mean > median.
o Negative Skew: Left tail longer; mean < median.

31 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

32 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Kurtosis: Measures the peak of the data distribution.


o High Kurtosis: More extreme outliers.
o Low Kurtosis: Flatter distribution.

33 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

6. Special Univariate Plots


 Stem and Leaf Plot: Splits values into "stem" (leading digits) and "leaf" (last digit).

 Q-Q Plot: Assesses if data follows a normal distribution; points align along the 45-degree reference
line in normal cases.

Exercise questions:
Here are the questions along with their answers based on the machine learning concepts:
34 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Short Questions & Answers


1. Why is machine learning needed for business organizations?
o Machine learning helps businesses analyze large datasets, make accurate predictions,
optimize operations, improve customer experiences, and automate decision-making.
2. List out the factors that drive the popularity of machine learning.
o Growth of Big Data, advancements in computational power (GPU/TPU), improved
algorithms, widespread AI adoption, and automation needs.
3. What is a model?
o A model in machine learning is a mathematical representation that learns patterns from
training data and makes predictions on new data.
4. Distinguish between the terms: Data, Information, Knowledge, and Intelligence.
o Data: Raw facts and figures.
o Information: Processed data with meaning.
o Knowledge: Understanding gained from information.
o Intelligence: Ability to apply knowledge for decision-making.
5. How is machine learning linked to AI, Data Science, and Statistics?
o Machine learning is a subset of AI that enables systems to learn. It is part of data science for
extracting insights and uses statistical methods for pattern recognition.
6. List out the types of machine learning.
o Supervised learning, unsupervised learning, semi-supervised learning, and reinforcement
learning.
7. List out the differences between a model and a pattern.
o A model is a global representation of the entire dataset, while a pattern is a local relationship
found within subsets of data.
8. Are classification and clustering the same or different? Justify.
o Different. Classification is supervised learning with labeled data, whereas clustering is
unsupervised learning that groups data based on similarities without predefined labels.
9. List out the differences between labeled and unlabeled data.
o Labeled data has predefined output labels, while unlabeled data does not have any assigned
labels and needs clustering or pattern detection.
10. Point out the differences between supervised and unsupervised learning.
o Supervised learning uses labeled data to train models, whereas unsupervised learning finds
patterns in unlabeled data.
11. What are the differences between classification and regression?
o Classification predicts discrete categories (e.g., spam or not spam), while regression predicts
continuous values (e.g., house prices).
35 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

12. What is semi-supervised learning?


o A hybrid approach that uses a small amount of labeled data and a large amount of unlabeled
data to train a model.
13. List out the differences between reinforcement learning and supervised learning.
o Reinforcement learning is based on rewards and penalties for decision-making, while
supervised learning relies on labeled examples for learning.
14. List out important classification and clustering algorithms.
o Classification algorithms: Decision Trees, SVM, Random Forest, Naïve Bayes, Neural
Networks.
o Clustering algorithms: K-Means, DBSCAN, Hierarchical Clustering, Gaussian Mixture
Model.
15. List at least five major applications of machine learning.
o Sentiment analysis, recommendation systems, fraud detection, self-driving cars, medical
diagnosis.

Long Questions
1. Explain in detail the machine learning process model.
Machine learning follows a structured process to build models that can analyze data and make
predictions. One of the most widely used process models for machine learning is CRISP-DM (Cross-
Industry Standard Process for Data Mining). It consists of six important steps that ensure a
systematic approach to solving business problems using machine learning techniques.
1.1 Business Understanding
The first step is to clearly define the business objective. This involves identifying the problem that needs to
be solved and understanding how machine learning can provide a solution.
Example: A retail company wants to predict customer churn. The goal is to develop a model that identifies
customers likely to leave and implement strategies to retain them.
1.2 Data Understanding
In this phase, data is collected from multiple sources, and an exploratory analysis is conducted. Key
activities include identifying missing values, data distribution, and potential relationships between variables.
Example: The retail company gathers past customer transaction records, demographics, and engagement
levels to understand which factors influence customer retention.
1.3 Data Preparation
The raw data collected is cleaned and transformed into a usable format. This includes handling missing
values, removing duplicates, encoding categorical data, normalizing numerical values, and feature
engineering.

36 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Example: Converting customer transaction logs into structured numerical features such as purchase
frequency, average spending, and time since last purchase.
1.4 Modeling
This step involves selecting an appropriate machine learning algorithm and training it using prepared data.
Various algorithms like Decision Trees, Neural Networks, and Support Vector Machines are tested to
determine the best model.
Example: The company trains multiple models to predict customer churn and compares their accuracy.
1.5 Evaluation
The trained models are tested using validation datasets to measure their accuracy, precision, recall, and other
performance metrics. Cross-validation techniques are also used to check for overfitting.
Example: If a model predicts customer churn with 85% accuracy, it is further analyzed to ensure it
generalizes well to unseen data.
1.6 Deployment
The final step involves deploying the model into production. The model is integrated into the business
process to make real-time predictions and assist in decision-making. Regular monitoring ensures that the
model continues to perform well.
Example: The churn prediction model is integrated into the company’s CRM system to alert managers about
customers at risk of leaving.
2. List out and briefly explain the classification algorithms.
Classification algorithms are supervised learning techniques used to categorize data into predefined classes.
They are widely used in spam detection, medical diagnosis, and customer segmentation. Below are some
important classification algorithms:
2.1 Decision Trees
 Decision trees use a tree-like structure where each internal node represents a decision based on a
feature, and each leaf node represents a class label.
 Strengths: Easy to interpret, handles both numerical and categorical data.
 Weaknesses: Prone to overfitting, especially on small datasets.
2.2 Support Vector Machines (SVM)
 SVM finds the optimal hyperplane that separates different classes in a high-dimensional space.
 Strengths: Effective in high-dimensional spaces, robust against overfitting.
 Weaknesses: Computationally expensive for large datasets.
2.3 Random Forest
 An ensemble learning method that builds multiple decision trees and combines their predictions. The
final output is determined by majority voting.
 Strengths: Reduces overfitting, performs well on complex datasets.
 Weaknesses: Requires more computational power compared to a single decision tree.
37 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

2.4 Naïve Bayes


 A probabilistic classifier based on Bayes' theorem that assumes independence between features.
 Strengths: Works well with large datasets and text classification (e.g., spam filtering).
 Weaknesses: Assumes that features are independent, which may not always be true.
2.5 Neural Networks
 Inspired by the human brain, neural networks consist of layers of interconnected nodes (neurons) that
learn patterns in data.
 Strengths: Can model complex relationships, used in deep learning for image and speech recognition.
 Weaknesses: Requires large amounts of data and high computational resources.
2.6 Logistic Regression
 A statistical method that models the probability of a binary outcome (e.g., pass/fail, spam/not spam).
 Strengths: Simple, interpretable, and works well for binary classification tasks.
 Weaknesses: Assumes linear relationships between features and outcomes.

3. List out and briefly explain the unsupervised learning algorithms.


Unsupervised learning is used when the dataset does not have labeled outputs. The goal is to find hidden
patterns or groupings in the data. Common techniques include clustering and dimensionality reduction.
3.1 K-Means Clustering
 K-Means is a clustering algorithm that partitions data into K clusters based on similarity. The
algorithm iteratively updates the cluster centers until convergence.
 Strengths: Fast and scalable, easy to interpret.
 Weaknesses: Requires predefining the number of clusters (K), sensitive to outliers.
3.2 DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
 Unlike K-Means, DBSCAN groups data based on density, making it effective for identifying clusters
of varying shapes.
 Strengths: Can detect outliers, does not require specifying the number of clusters.
 Weaknesses: Struggles with varying densities and high-dimensional data.
3.3 Hierarchical Clustering
 Builds a tree-like hierarchy of clusters through either an agglomerative (bottom-up) or divisive (top-
down) approach.
 Strengths: Does not require specifying the number of clusters, useful for hierarchical relationships.
 Weaknesses: Computationally expensive for large datasets.
3.4 Principal Component Analysis (PCA)
 PCA is a dimensionality reduction technique that transforms high-dimensional data into a smaller
number of principal components while preserving important patterns.
 Strengths: Reduces computational complexity, removes redundant features.
38 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

 Weaknesses: Can lose interpretability, assumes linear relationships in data.


3.5 Gaussian Mixture Model (GMM)
 A probabilistic model that assumes data is generated from a mixture of multiple Gaussian
distributions.
 Strengths: Can model complex distributions and works well with overlapping clusters.
 Weaknesses: Sensitive to initialization, requires specifying the number of components.

Numerical Problems and Activities


1. Let us assume a regression algorithm generates a model y = y=0.54+0.66 x for data pertaining to
week sales data of a product. Here, x is the week and y is the product sales. Find the prediction for the
5th and 8th week. Prediction using Regression Model
o Given: y = 0.54 + 0.66x, where x is the week and y is the product sales.
o For week 5: y=0.54+0.66(5)=3.84
o For week 8: y=0.54+0.66(8)=5.82
2. Give two examples of patterns and models.
o Pattern: Customers who buy laptops often buy mousepads.
o Model: A trained recommendation system that suggests mousepads based on past purchases.
3. Survey and find at least five latest applications of machine learning.
o AI chatbots (ChatGPT), deepfake detection, AI-driven medical imaging, automated financial
trading, AI-powered cybersecurity.
4. Survey and list out at least five products that use machine learning.
o Amazon Alexa, Netflix recommendations, Google Photos, Tesla Autopilot, Apple Face ID.

Chapter 2 : Exercise:
Question 1: What is univariate data?
Answer: Univariate data consists of a single variable and is analyzed to understand its distribution, central
tendency, and dispersion. Examples include students' test scores, height measurements, or weights.

Question 2: What are the types of data?


Answer: Data types include:
Nominal Data – Categorical data without a specific order (e.g., colors, gender).
Ordinal Data – Categorical data with a specific order (e.g., rankings, satisfaction levels).
Discrete Data – Countable numerical values (e.g., number of students in a class).
Continuous Data – Measurable numerical values (e.g., height, weight, temperature).

Question 3: Distinguish between ‘good’ and ‘bad’ data.


39 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Answer: Good Data: Accurate, complete, consistent, timely, and relevant.


Bad Data: Incomplete, inconsistent, outdated, and irrelevant.

Question 4: What are the problems of data collection?


Answer: Common problems include missing data, biased sampling, measurement errors, high cost, and time
consumption.

Question 5: Explain missing data analysis.


Answer: Missing data analysis involves identifying gaps in datasets and handling them using methods like
deletion, mean substitution, regression imputation, and multiple imputations.

Question 6: What are the measures of central tendencies?


Answer: The measures include:
Mean – The average value.
Median – The middle value.
Mode – The most frequently occurring value.

Question 7: Why are central tendency and dispersion measures important for data miners?
Answer: They help understand data distribution, detect outliers, and make predictions for decision-making.

Question 8: What are the measures of skewness and kurtosis?


Answer: Skewness measures data asymmetry.
Formula:
Kurtosis measures data tail heaviness.
Formula:

Question 9: How is the interquartile range (IQR) useful in eliminating outliers?


Answer:
IQR = Q3 - Q1.
Outliers are detected if data points fall outside the range [Q1 - 1.5×IQR, Q3 + 1.5×IQR].

Problems:

40 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Sol: 1. Mean

2.

4. Variance σ2

σ2 =

σ2 =

41 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

5. Standard Deviation (σ)

√𝛔2
or

σ =

Sol: 1. Mean

2. Geometric Mean

or

42 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Sol:
Step 1: Find Minimum and Maximum
 Minimum = 5
 Maximum = 30
Step 2: Find the Median (Q2)
Since the dataset has 6 values (even count), the median is the average of the

middle two values:


Step 3: Find First Quartile (Q1) and Third Quartile (Q3)

 Q1: Median of lower half {5,10,15}

Q1=Median(5,10,15)=10

 Q3: Median of upper half {20,25,30}

Q3=Median(20,25,30)=25

43 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Step 4: Box Plot

Now, we plot the box plot using the five-number summary. I will generate the box plot for you.

44 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

45 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

46 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

47 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

48 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

49 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

50 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

A)

51 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

B)
52 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

53 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

54 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

55 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

Skewness and Kurtosis

56 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

57 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

58 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

59 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

60 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

61 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])


lOMoARcPSD|54088532

62 | P a g e M a c h i n e L e a r n i n g – M o d u l e 1 – b y P r o f. S a ra n ya . B

Downloaded by Manoj K ([email protected])

You might also like