0% found this document useful (0 votes)

6 views

Data-Science

Data Science is an interdisciplinary field that integrates statistics, AI, programming, and domain expertise to solve complex problems. The Data Science process involves stages such as problem formulation, data acquisition, preparation, analysis, and communication of insights. Understanding different types of data attributes, including nominal, binary, ordinal, and numeric, is crucial for effective data analysis and modeling.

Uploaded by

Muhammad Maaz Rabi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Data-Science

Uploaded by

Muhammad Maaz Rabi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Lecture # 1: Introduction to Data Science

1. What is Data Science?

Data Science is an interdisciplinary field that combines several key areas:
 Statistics
 Artificial Intelligence (AI)
 Programming
 Domain Expertise
This interdisciplinary approach allows Data Science professionals to work on
complex problems across various industries, utilizing a range of tools and
methodologies to extract insights from data.

2. Data Science vs. Machine Learning and Deep Learning

A common question arises about how Data Science differs from Machine Learning
(ML) and Deep Learning (DL). Here's a comparison to clarify:
 Machine Learning: Focuses primarily on studying algorithms and models
that learn from data.
 Data Science: Encompasses an end-to-end process that addresses real-
world problems. It goes beyond just the model to include:
 Problem formulation
 Data acquisition
 Data preparation
 Model development and deployment
 Presentation of findings
In short, while ML and DL are focused on model development, Data Science spans
the entire process, from defining the problem to presenting solutions to
stakeholders.

3. The Data Science Process

The Data Science process involves several key steps, as outlined below:
1. Problem Formulation
 The first task is defining the problem you want to solve. For example,
predicting house prices. You will determine the inputs (features) and
the desired outputs (predictions).
2. Data Acquisition
 Once the problem is defined, the next step is gathering the relevant
data. This involves identifying data sources and acquiring the
necessary datasets.
3. Data Preparation
 After acquiring data, it must be cleaned and pre-processed. This step
ensures that the data is in a usable format for analysis.
4. Data Analysis
 At this stage, models are developed, evaluated, and fine-tuned to solve
the problem. This includes applying statistical and machine learning
techniques.
5. Model Deployment
 Once the model is ready, it is deployed to generate real-world insights.
This may involve putting the model into production for use by others.
6. Presentation of Insights
 After generating insights from the model, these findings need to be
communicated to non-technical stakeholders, such as CEOs, directors,
and management. Visualization tools are often used to make the
insights more understandable.

4. Essential Skill Set for Data Scientists

A Data Scientist needs a diverse set of skills to be effective in this field. These skills
can be categorized into technical knowledge and soft skills:
Technical Skills
 Statistics: A solid foundation in statistics is crucial for understanding data
distributions, hypothesis testing, and building models.
 Machine Learning: Knowledge of machine learning algorithms is essential
for data analysis and building predictive models.
 Programming (Python): Python is the most commonly used programming
language in Data Science due to its rich libraries (such as pandas, numpy,
scikit-learn) for data manipulation and modeling.
 Data Visualization: Proficiency in visualization tools like Matplotlib,
Seaborn, or Tableau is necessary to communicate insights effectively.
Soft Skills
 Communication Skills: Data Scientists must be able to clearly present their
findings to non-technical audiences. This includes the ability to tell a
compelling story with data and make recommendations that drive decisions.
 Problem-Solving and Critical Thinking: A Data Scientist must be able to
approach problems analytically, determine the best methods for solving
them, and think critically about the results.
 Storytelling with Data: The ability to craft a narrative around the data
findings is crucial for influencing stakeholders.

5. Final Thoughts
Data Science is an exciting and versatile field that blends technical knowledge and
creative problem-solving skills. It encompasses the entire process from
understanding the problem to deploying solutions and communicating them
effectively to decision-makers. The key to success in Data Science lies not only in
mastering the technical skills but also in developing the soft skills needed to
communicate insights and influence decision-making.

We are about to embark on a learning journey that will cover these aspects in more
detail. Stay excited and ready to delve deeper into the fascinating world of Data
Science
Lecture # 2: Process of Data Science

Overview of Data Science Process

Data Science is an interdisciplinary field that includes:
 Statistics
 Artificial Intelligence (AI)
 Programming
 Domain Expertise
In this field, the data science process is broken down into several stages that guide
data scientists through the steps needed to derive insights from data.

Key Stages in the Data Science Process

The main stages in Data Science are as follows:
1. Problem Formulation
2. Data Acquisition
3. Data Preparation
4. Data Analysis
5. Communication and Visualization
Each of these stages plays a pivotal role in solving real-world problems using data.

1. Problem Formulation
 Defining the Problem
In this stage, the problem that needs to be solved is formulated. Domain
expertise is essential for understanding the specific industry and what
features or data are important for solving the problem.
 Input and Output Identification
You need to define what data (input) is required to solve the problem and
what output is expected. For example, in the case of house price prediction,
inputs might include features like the number of bedrooms, location, and
area, while the output would be the predicted price.
2. Data Acquisition
 Finding the Right Data
Once the problem is formulated, you need to acquire relevant data from
various sources:
 Repositories like Kaggle
 University databases
 Government data sources (e.g., US or European governments)
 The goal is to obtain a dataset that can help solve the problem identified in
the previous stage.

3. Data Preparation
Data preparation consists of two sub-stages:
1. Understanding the Data (Exploratory Data Analysis - EDA)
2. Pre-processing the Data

Exploratory Data Analysis (EDA)

o Understanding the Structure
EDA involves exploring the dataset to understand its structure, data types,
and the presence of any missing or inconsistent values. This stage helps in
identifying if the data is structured or unstructured and the types of values
within it (e.g., numerical, categorical, ordinal).
o Statistical Methods
During EDA, various statistical methods are applied, such as:
 Measures of central tendency (mean, median)
 Measures of dispersion (variance, standard deviation)
 Checking for outliers or inconsistencies
o This helps determine the quality and structure of the data.

Data Pre-processing
 Data Cleaning
In the pre-processing stage, any inconsistencies, missing values, or errors
(e.g., a house price listed as 50 rupees or 70 bedrooms) are addressed.
 Data Transformation
If necessary, data transformation techniques like normalization (scaling data
to a range between 0 and 1) are applied to make the data suitable for
machine learning algorithms.

4. Data Analysis
Data analysis involves developing and evaluating models to address the defined
problem. Key steps in this stage include:
 Choosing the Right Technique
Depending on the problem at hand, you select an appropriate model or
technique. For example, if predicting house prices, regression is a suitable
technique as the output is continuous (a price).
 Model Development
Various models are developed using different algorithms, such as:
 Linear Regression
 Random Forest Regressor
 Polynomial Regression
 Deep Learning Regressors
 Each algorithm is trained using the dataset and generates models that can
predict or classify data.
 Model Evaluation
After developing multiple models, they are evaluated to identify which model
performs the best. Evaluation metrics like accuracy, mean squared error
(MSE), or R-squared can be used.
 Deploying the Model
After selecting the best model, it must be deployed so that other
stakeholders or users can interact with it. Model deployment involves making
the model available on a server or web/mobile application, allowing users to
input data and receive predictions or estimates.

5. Communication and Visualization

 Importance of Visualization
Once the model has been developed and evaluated, it's time to present the
results to stakeholders, including non-technical decision-makers like CEOs or
directors. This is done using visualizations such as:
 Bar Plots
 Scatter Plots
 Pie Charts
 These help convey complex data in an easy-to-understand format.
 Tools for Visualization
Various tools are used for visualization, including Python libraries (e.g.,
Matplotlib, Seaborn), as well as other platforms like Power BI and Tableau.
 Effective Communication
Besides creating visualizations, communication skills are essential for
pitching the solution. The ability to explain the problem, the approach, and
the solution effectively to non-technical audiences is crucial for successful
implementation.

Final Thoughts
In summary, Data Science is a comprehensive process that involves a range of
activities, from problem formulation to model deployment and effective
communication. The key stages—problem formulation, data acquisition, data
preparation, data analysis, and visualization—are all critical in transforming raw
data into actionable insights. Mastering these stages ensures that data scientists
can solve complex problems and communicate their findings effectively to
stakeholders. As we continue our journey through Data Science, we will dive deeper
into the tools and techniques used in each stage, ensuring a thorough
understanding of the subject.
Lecture # 3: Understanding Data – Types of Attributes

Key Concepts

1. Data Objects and Attributes

Data science involves working with various types of data, where each row
represents a different data object (or sample) and each column represents an
attribute or feature of that object. In simpler terms:

 Rows represent individual data samples (observations).

 Columns represent attributes or features of these samples.

For example, consider a dataset of people:

 Rows: Represent different people.

 Columns: Represent attributes like height, weight, hair color, profession, etc.

2. Types of Attributes
There are primarily four types of attributes, each playing a distinct role in data
analysis. Let’s go through each of them in detail:

Nominal Attributes

 Definition: Nominal attributes are used to describe categorical data, where

the values are names or labels with no inherent order.

 Example: Hair color (black, brown, red, etc.) is a nominal attribute because
the colors don't have a natural order.

 Key Features:

 No inherent order among categories.

 Arithmetic operations on nominal attributes, such as addition, do not

have meaningful results.

 Can be represented by symbols or codes, but calculations like addition

or subtraction aren't valid.

Binary Attributes

 Definition: A specialized form of nominal attributes where there are only two
possible values.
 Example: Gender (Male or Female), COVID-19 Positive or Negative, Smoker
or Non-Smoker.

 Key Features:

 Only two possible values.

 Can be divided into two types:

1. Symmetric Binary Attribute: Both classes are equally

important (e.g., gender).

2. Asymmetric Binary Attribute: One class is more important

than the other (e.g., being COVID-19 positive is more critical
than being negative).

 Encoding: Often represented using 0 (for one category) and 1 (for the
other category).

3. Data Science Process: Data Preparation

The data preparation phase is crucial in data science, where we attempt to
understand and organize the data before applying any algorithms or models. This
phase is split into two parts:

Exploratory Data Analysis (EDA)

 Objective: The goal of EDA is to explore and understand the data by

visualizing distributions, detecting outliers, and uncovering relationships
between attributes. EDA helps answer questions like:

 What kind of data do we have?

 Are there any outliers?

 How do different attributes relate to each other?

 What are the key patterns in the data?

Data Cleaning and Preprocessing

 Objective: After understanding the data, it is cleaned and preprocessed. This

includes handling missing values, converting data types, and normalizing or
scaling data where necessary. Proper data cleaning ensures that the data is
ready for analysis or model training.

4. Understanding Data Attributes

In data science, we often encounter terms like attributes, features,
and variables. These terms are interchangeable and refer to the same concept.
They are used to describe columns in a dataset.

Attribute (or Feature or Variable)

 Definition: An attribute is any piece of information that helps describe a data

object. For example, in a dataset about people, attributes could
be name, height, weight, and hair color.

 Example:

 Name: John

 Height: 5'9"

 Weight: 150 lbs

 Hair color: Brown

Data Object (or Observation)

 Definition: A data object represents a single instance of a dataset. In the

previous example, each person’s data would be a separate data object (or
observation). In a tabular format, each row represents a data object.

Final Thoughts
Data science is a multifaceted field that involves understanding data, cleaning it,
and preparing it for further analysis. A solid understanding of attributes and data
objects is essential for any data scientist, as it forms the foundation of data analysis
and modeling. Whether working with nominal, binary, or other types of attributes,
it's important to understand their roles and how they influence your analysis.

 Attributes represent important features of data objects and can be classified

into various types, including nominal and binary.

 Data preparation, particularly exploratory data analysis, is a critical step in

the data science process, helping to uncover patterns and relationships
within the data.

 Effective data cleaning and preprocessing ensure that data is ready for
analysis or machine learning algorithms.

By following the proper data science processes and understanding key concepts like
attributes and data objects, you can improve your approach to data analysis and
enhance your ability to derive meaningful insights from your data.
Types of Attributes

1. Nominal Attributes
 Definition: Nominal attributes are categorical variables that represent
different categories without any specific order or ranking.

 Example: Hair color (Black, Brown, Red, White).

 Key Characteristics:

 No inherent order.

 Categories are simply labels or names.

 Arithmetic operations are not meaningful (e.g., you cannot add black
hair to red hair).

 Operations:

 Frequency Count: Count how often each category occurs (e.g., how
many people have black hair).

2. Binary Attributes
 Definition: A type of nominal attribute with only two possible values.

 Example: Gender (Male, Female), COVID status (Positive, Negative), Smoking

status (Smoker, Non-smoker).

 Key Characteristics:

 Two distinct categories or values.

 Types:

 Symmetric Binary Attribute: Both categories are equally important

(e.g., Male and Female in gender classification).

 Asymmetric Binary Attribute: One category is more important (e.g.,

COVID-Positive is more important than COVID-Negative).

 Operations:

 Frequency Count: Count how many of each binary value is present.

3. Ordinal Attributes

 Definition: Ordinal attributes are categorical variables where the categories

have a meaningful order or ranking.

 Example: Educational level (Junior, Assistant Professor, Associate Professor,

Professor).

 Key Characteristics:

 Categories have a specific order.

 There is no consistent difference between categories.

 Operations:

 Most Frequent Value (Mode): Identify the most common category.

 Median: The middle value in an ordered list of categories.

 Conversion from Numeric: You can convert a numeric attribute to an

ordinal one by defining ranges (e.g., Temperature as Low, Medium, High).

4. Numeric Attributes
Numeric attributes are quantitative and can be subjected to arithmetic operations.
These can be further divided into two types:

 Interval-Scaled Attributes:

 Definition: Numeric attributes with a scale where there is no absolute

zero point.

 Example: Temperature (Celsius or Fahrenheit).

 Key Characteristics:

 No absolute zero.

 Equal intervals between values.

 Operations:

 Addition and Subtraction: Arithmetic operations like addition

and subtraction are meaningful (e.g., 30°C - 20°C = 10°C).

 Ratio-Scaled Attributes:

 Definition: Numeric attributes with a defined scale and an absolute

zero.

 Example: Height (0 cm), Weight (0 kg), Years of Experience (0 years).

 Key Characteristics:

 Absolute zero exists.

 Ratios and multiples are meaningful (e.g., 2 meters is twice as

long as 1 meter).

 Operations:

 Addition, Subtraction, Multiplication, and Division: All

arithmetic operations are meaningful (e.g., 2 meters is twice the
length of 1 meter).

5. Discrete and Continuous Attributes

 Discrete Attributes: These attributes take whole numbers without any
fractional values.

 Example: Number of children in a family (1, 2, 3, etc.).

 Continuous Attributes: These attributes can take any value, including

fractions or decimals.

 Example: Height (e.g., 5.5 cm), Weight (e.g., 60.3 kg).

Final Thoughts
Understanding the different types of attributes is essential in data science for
effective data analysis and processing. Each type of attribute – whether nominal,
binary, ordinal, or numeric – requires specific methods for analysis, and recognizing
these differences is key to drawing accurate conclusions from the data.

By correctly identifying the types of attributes, data scientists can choose the most
appropriate methods for data cleaning, exploration, and modeling, ensuring that the
data is handled efficiently and effectively.
Lecture # 4: Understanding Statistical Description in Data Science

Data Science Roadmap 2024
No ratings yet
Data Science Roadmap 2024
12 pages
MRA ML1 - Kirtesh
100% (7)
MRA ML1 - Kirtesh
43 pages
Three Things Statistics Books Don't Tell You by Seth Roberts
No ratings yet
Three Things Statistics Books Don't Tell You by Seth Roberts
38 pages
DS Handout 1
No ratings yet
DS Handout 1
4 pages
DSUR_EA2352001010391_W3
No ratings yet
DSUR_EA2352001010391_W3
3 pages
1. Introduction to Data Science
No ratings yet
1. Introduction to Data Science
12 pages
DS
No ratings yet
DS
94 pages
Data Science Process Stages Lecture 2
No ratings yet
Data Science Process Stages Lecture 2
4 pages
DOC-20241126-WA0001.
No ratings yet
DOC-20241126-WA0001.
9 pages
Self Learning Material - Introduction To Data Science
No ratings yet
Self Learning Material - Introduction To Data Science
10 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
Data Science Management_vss
No ratings yet
Data Science Management_vss
84 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Introduction to Data Science __ 23CSH-283
No ratings yet
Introduction to Data Science __ 23CSH-283
48 pages
Fundamental of Data Science
No ratings yet
Fundamental of Data Science
20 pages
Data Science
No ratings yet
Data Science
18 pages
DS_UNIT I
No ratings yet
DS_UNIT I
3 pages
Data Science
No ratings yet
Data Science
10 pages
File
No ratings yet
File
27 pages
Data Science
No ratings yet
Data Science
18 pages
Unit I
No ratings yet
Unit I
52 pages
Data Science-Lec 1
No ratings yet
Data Science-Lec 1
17 pages
CHAPTER 1
No ratings yet
CHAPTER 1
85 pages
Introduction To Data Science and Python For Data
No ratings yet
Introduction To Data Science and Python For Data
12 pages
Unit 3
No ratings yet
Unit 3
9 pages
Data Science PDF
No ratings yet
Data Science PDF
11 pages
Life Cycle of DS Project
No ratings yet
Life Cycle of DS Project
9 pages
Data Science
No ratings yet
Data Science
5 pages
DSE 3 Unit 1
100% (1)
DSE 3 Unit 1
10 pages
01_Introduction
No ratings yet
01_Introduction
7 pages
DS PPT 1
No ratings yet
DS PPT 1
30 pages
Architecture of Data Science Projects: Components
No ratings yet
Architecture of Data Science Projects: Components
4 pages
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
53 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Week 3
No ratings yet
Week 3
3 pages
datascience
No ratings yet
datascience
12 pages
Data Science & Cyber Security
No ratings yet
Data Science & Cyber Security
13 pages
Module1 Data Science
No ratings yet
Module1 Data Science
15 pages
Final Industrial Report
No ratings yet
Final Industrial Report
34 pages
Data Science
100% (2)
Data Science
33 pages
Dsdm-Unit1 241031 194317
No ratings yet
Dsdm-Unit1 241031 194317
38 pages
Data Science Overview Basic to Advance Guide
No ratings yet
Data Science Overview Basic to Advance Guide
27 pages
Ds unit 1 notes
No ratings yet
Ds unit 1 notes
23 pages
MSE-merged
No ratings yet
MSE-merged
78 pages
Data Science Course in Hyderabad
No ratings yet
Data Science Course in Hyderabad
9 pages
What Is Data Science
No ratings yet
What Is Data Science
13 pages
Statictics Computerscience Information Science
No ratings yet
Statictics Computerscience Information Science
3 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
ADS Final Sem
No ratings yet
ADS Final Sem
112 pages
M1 - FDS
No ratings yet
M1 - FDS
19 pages
Module 1 Applied Data Science 1.1 and 1.2
No ratings yet
Module 1 Applied Data Science 1.1 and 1.2
104 pages
Technical Report Writing For Ca2 Examination: Topic: Introduction To Data Science
No ratings yet
Technical Report Writing For Ca2 Examination: Topic: Introduction To Data Science
7 pages
Intro DA and ML Lecture 1 - S-2
No ratings yet
Intro DA and ML Lecture 1 - S-2
17 pages
22amh32 - Data Analytics and Data Science Unit I & Data Science Process 1. Data Science Process
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Data Science Process 1. Data Science Process
7 pages
Data Science (Quick Guide) for College Exams
No ratings yet
Data Science (Quick Guide) for College Exams
34 pages
Impact of Data Science Across Industries
No ratings yet
Impact of Data Science Across Industries
3 pages
Data Science
No ratings yet
Data Science
65 pages
A Functional Approach To Basics of Data Science With Excel-Book - Chapter 1 and 2 - 1st Print
No ratings yet
A Functional Approach To Basics of Data Science With Excel-Book - Chapter 1 and 2 - 1st Print
13 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Shreyas Srinivasa 2024 Resume
No ratings yet
Shreyas Srinivasa 2024 Resume
3 pages
Data Analytics for Process Engineers Prediction Control and Optimization
No ratings yet
Data Analytics for Process Engineers Prediction Control and Optimization
3 pages
OCR AS and A Level Mathematics Specification
No ratings yet
OCR AS and A Level Mathematics Specification
92 pages
Data Analytics Brouchure
No ratings yet
Data Analytics Brouchure
15 pages
Program Calender - July 2020 Data Science - Sheet1
No ratings yet
Program Calender - July 2020 Data Science - Sheet1
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
15 pages
6) Exploratory Data Analysis
No ratings yet
6) Exploratory Data Analysis
29 pages
Mra Project1 - Firoz Afzal
60% (5)
Mra Project1 - Firoz Afzal
20 pages
VSEMESTERIT (1)
No ratings yet
VSEMESTERIT (1)
16 pages
12308700_33.int-375
No ratings yet
12308700_33.int-375
21 pages
Krishna Kumar BTP2 Report
No ratings yet
Krishna Kumar BTP2 Report
23 pages
CDSS Day-3
No ratings yet
CDSS Day-3
207 pages
10.lesson Plan Theory
No ratings yet
10.lesson Plan Theory
10 pages
Insights From Data With R An Introduction For The Life And Environmental Sciences Owen L Petchey instant download
100% (1)
Insights From Data With R An Introduction For The Life And Environmental Sciences Owen L Petchey instant download
78 pages
EDA QB Full Answers
No ratings yet
EDA QB Full Answers
18 pages
Exploratory Data Analysis Reference
100% (2)
Exploratory Data Analysis Reference
49 pages
A Preliminary Exploration of The Data To Better Understand Its Characteristics
No ratings yet
A Preliminary Exploration of The Data To Better Understand Its Characteristics
35 pages
Sports Analytics Article
No ratings yet
Sports Analytics Article
4 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
2 pages
DOC-20250105-WA0007.
No ratings yet
DOC-20250105-WA0007.
8 pages
BTECH_(L&SCM)_Detailed_Syllabus
No ratings yet
BTECH_(L&SCM)_Detailed_Syllabus
43 pages
2mark Question
No ratings yet
2mark Question
2 pages
HR Analytics
No ratings yet
HR Analytics
22 pages
Project Synopsis of Student Droupout Prediction
No ratings yet
Project Synopsis of Student Droupout Prediction
6 pages
Forecasting Stability Categories Using Neural Networks
No ratings yet
Forecasting Stability Categories Using Neural Networks
5 pages
Internship Report - K
No ratings yet
Internship Report - K
30 pages

Data-Science

Uploaded by

Data-Science

Uploaded by

Lecture # 1: Introduction to Data Science

1. What is Data Science?

2. Data Science vs. Machine Learning and Deep Learning

3. The Data Science Process

4. Essential Skill Set for Data Scientists

Overview of Data Science Process

Key Stages in the Data Science Process

Exploratory Data Analysis (EDA)

5. Communication and Visualization

1. Data Objects and Attributes

 Rows represent individual data samples (observations).

 Columns represent attributes or features of these samples.

For example, consider a dataset of people:

 Rows: Represent different people.

 Definition: Nominal attributes are used to describe categorical data, where

 No inherent order among categories.

 Arithmetic operations on nominal attributes, such as addition, do not

 Can be represented by symbols or codes, but calculations like addition

 Only two possible values.

 Can be divided into two types:

1. Symmetric Binary Attribute: Both classes are equally

2. Asymmetric Binary Attribute: One class is more important

3. Data Science Process: Data Preparation

Exploratory Data Analysis (EDA)

 Objective: The goal of EDA is to explore and understand the data by

 What kind of data do we have?

 Are there any outliers?

 How do different attributes relate to each other?

 What are the key patterns in the data?

Data Cleaning and Preprocessing

 Objective: After understanding the data, it is cleaned and preprocessed. This

4. Understanding Data Attributes

Attribute (or Feature or Variable)

 Definition: An attribute is any piece of information that helps describe a data

 Weight: 150 lbs

 Hair color: Brown

Data Object (or Observation)

 Definition: A data object represents a single instance of a dataset. In the

 Attributes represent important features of data objects and can be classified

 Data preparation, particularly exploratory data analysis, is a critical step in

 Example: Hair color (Black, Brown, Red, White).

 Categories are simply labels or names.

 Example: Gender (Male, Female), COVID status (Positive, Negative), Smoking

 Two distinct categories or values.

 Symmetric Binary Attribute: Both categories are equally important

 Asymmetric Binary Attribute: One category is more important (e.g.,

 Frequency Count: Count how many of each binary value is present.

 Definition: Ordinal attributes are categorical variables where the categories

 Example: Educational level (Junior, Assistant Professor, Associate Professor,

 Categories have a specific order.

 There is no consistent difference between categories.

 Most Frequent Value (Mode): Identify the most common category.

 Median: The middle value in an ordered list of categories.

 Conversion from Numeric: You can convert a numeric attribute to an

 Definition: Numeric attributes with a scale where there is no absolute

 Example: Temperature (Celsius or Fahrenheit).

 Equal intervals between values.

 Addition and Subtraction: Arithmetic operations like addition

 Definition: Numeric attributes with a defined scale and an absolute

 Example: Height (0 cm), Weight (0 kg), Years of Experience (0 years).

 Absolute zero exists.

 Ratios and multiples are meaningful (e.g., 2 meters is twice as

 Addition, Subtraction, Multiplication, and Division: All

5. Discrete and Continuous Attributes

 Example: Number of children in a family (1, 2, 3, etc.).

 Continuous Attributes: These attributes can take any value, including

 Example: Height (e.g., 5.5 cm), Weight (e.g., 60.3 kg).

You might also like