0% found this document useful (0 votes)
4 views65 pages

2. Types of Data_students

The document outlines the different types of data, categorizing them into qualitative and quantitative data. Qualitative data is further divided into nominal and ordinal data, while quantitative data includes interval and ratio data, each with distinct characteristics and examples. The document also explains how to analyze these data types and the operations applicable to them.

Uploaded by

Rohith Saindla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views65 pages

2. Types of Data_students

The document outlines the different types of data, categorizing them into qualitative and quantitative data. Qualitative data is further divided into nominal and ordinal data, while quantitative data includes interval and ratio data, each with distinct characteristics and examples. The document also explains how to analyze these data types and the operations applicable to them.

Uploaded by

Rohith Saindla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Types of data

• Dataset
• Qualitative data
1. Nominal data Dataset

2. Ordinal data
• Quantitative data Qualitativ Quantitati
e data ve data
1. Interval data
2. Ratio data Nominal Ordinal Interval
Ratio data
data data data
Qualitative data
• Qualitative data provides information about the quality of
an object or information which cannot be measured.
• For example, if we consider the quality of performance of
students in terms of ‘Good’, ‘Average’, and ‘Poor’, it falls
under the category of qualitative data.
• Also, name or roll number of students are information that
cannot be measured using some scale of measurement.
So, they would fall under qualitative data. Qualitative data
is also called categorical data.
• E.g. Human behavior, intentions, attitudes, experience,
etc.,
Qualitative Variables: Sometimes referred to as “categorical”
variables, these are variables that take on names or labels and
can fit into categories. Examples include:
• Eye color (e.g. “blue”, “green”, “brown”)
• Gender (e.g. “male”, “female”)
• Breed of dog (e.g. “lab”, “bulldog”, “poodle”)
• Level of education (e.g. “high school”, “Associate’s degree”,
“Bachelor’s degree”)
• Marital status (e.g. “married”, “single”, “divorced”)
Dataset

Qualitativ Quantitati
e data ve data

Nominal Ordinal Interval


Ratio data
data data data
Qualitative data can be further subdivided into two types
as follows:

1. Nominal data
2. Ordinal data
• Nominal data is one which has no numeric value, but
a named value.
• It is used for assigning named values to attributes.
• Nominal values cannot be quantified.
• Examples of nominal data are:
1. Blood group: A, B, O, AB, etc.
2. Nationality: Indian, American, British, etc.
3. Gender: Male, Female, Other
Nominal data
• Nominal data is one which has no numeric value, but a
named value.
• It is used for assigning named values to attributes.
• Nominal values cannot be quantified.
• Nominal data is a type of categorical data that
represents labels or names without any inherent order
or ranking. It is used to classify data into distinct
categories that are mutually exclusive and exhaustive.
Characteristics of Nominal Data

1.Categorical: Represents categories or labels, not numbers.


1. Example: Colors (red, blue, green), gender (male, female, other).
2.No Order or Ranking: Categories have no logical order or
sequence.
1. Example: "Apple" is not higher or lower than "Orange."
3.Non-Quantitative: Cannot perform arithmetic operations
(like addition or subtraction) on nominal data.
4.Mutually Exclusive: Each data point belongs to only one
category.
5.Qualitative: Focuses on descriptive attributes rather than
measurements.
Examples of Nominal Data
• Demographics: Gender, marital status, nationality.
• Product Features: Types of fruits (apple, orange,
banana), car brands (Toyota, Honda, Ford).
• Survey Responses: Yes/No answers, preferred browser
(Chrome, Firefox, Safari).
How to Analyze Nominal Data
• Since nominal data is non-numeric:
• Mode: The most frequently occurring category.
• Frequency Distribution: Counting the occurrences of
each category.
• Visualization: Use bar charts or pie charts to display
distributions.
• A special case of nominal data is when only two labels
are possible, e.g. pass/fail as a result of an examination.
• This sub-type of nominal data is called ‘dichotomous’.
• It is obvious, mathematical operations such as addition,
subtraction, multiplication, etc. cannot be performed on
nominal data. For that reason, statistical functions such
as mean, variance, etc. can also not be applied on
nominal data. However, a basic count is possible. So
mode, i.e. most frequently occurring value, can be
identified for nominal data.
Ordinal data:
• In addition to possessing the properties of nominal data, can also
be naturally ordered.
• This means ordinal data also assigns named values to attributes
• Unlike nominal data, they can be arranged in a sequence of
increasing or decreasing value so that we can say whether a
value is better than or greater than another value.
• Examples of ordinal data:
1. Customer satisfaction: ‘Very Happy’, ‘Happy’, ‘Unhappy’,
etc.
2. Grades: A, B, C, etc.
3. Hardness of Metal: ‘Very Hard’, ‘Hard’, ‘Soft’, etc.

• Like nominal data, basic counting is possible for ordinal data.


• Hence, the mode can be identified.

Ordinal data
• Ordinal data classifies data while introducing an order,
or ranking. For instance, measuring economic status
using the hierarchy: ‘wealthy’, ‘middle income’ or ‘poor.’
However, there is no clearly defined interval between
these categories.
• Ordinal data is a type of categorical data that has a
clear, meaningful order or ranking among its categories.
However, the intervals between the ranks are not
necessarily equal or meaningful.
Ordinal data, in addition to possessing the properties of nominal data, can
also be naturally ordered. This means ordinal data also assigns named
values to attributes but unlike nominal data, they can be arranged in a
sequence of increasing or decreasing value so that we can say whether a
value is better than or greater than another value.
• Examples of ordinal data are:
• 1. Customer satisfaction: ‘Very Happy’, ‘Happy’, ‘Unhappy’, etc.
• 2. Grades: A, B, C, etc.
• 3. Hardness of Metal: ‘Very Hard’, ‘Hard’, ‘Soft’, etc.
Like nominal data, basic counting is possible for ordinal data. Hence, the
mode can be identified. Since ordering is possible in case of ordinal data,
median, and quartiles can be identified in addition. Mean can still not be
calculated
Characteristic of ordinal data
•Categorical with Order:

•Ordinal data represents categories that can be ranked or ordered.


•Example: Educational level (High School < Bachelor’s < Master’s < PhD).

•No Consistent Interval:

•The difference between ranks is not standardized or equal.


•Example: The gap in satisfaction between "Neutral" and "Satisfied" may not equal the gap
between
•"Satisfied" and "Very Satisfied.“

•No Arithmetic Operations:

•While ranks can be compared (greater or lesser), arithmetic operations like addition or
subtraction are not meaningful.
•Example: "Very Happy" - "Neutral" doesn’t make sense numerically.
Characteristic of ordinal data
•Qualitative with Order:

•Although ordinal data can be numeric (e.g., 1 = Poor, 2 = Fair, 3 = Good), the numbers are
labels, not actual values.

•Analysis:
•Measures like median and percentiles are appropriate.
•Cannot calculate meaningful averages (mean).

•Visualization:
•Best visualized with bar charts, stacked bar charts, or histograms.
• Examples of Ordinal Data
1.Rankings:
1. Movie ratings: Excellent, Good, Average, Poor.
2. Competition placement: 1st, 2nd, 3rd.
2.Scales:
1. Likert scale in surveys: Strongly Agree, Agree, Neutral, Disagree,
Strongly Disagree.
3.Grades:
1. Academic performance: A, B, C, D, F.
4.Socioeconomic Levels:
1. Low income, middle income, high income.
• Difference Between Nominal and Other Data
Types
1.Nominal vs. Ordinal:
1.Nominal: Categories have no order (e.g., Colors: Red, Green,
Blue).
2.Ordinal: Categories have a logical order (e.g., Size: Small,
Medium, Large).
• Difference Between Ordinal and Other Data Types
1.Ordinal vs. Nominal:
1.Ordinal: Has order (e.g., Small, Medium, Large).
2.Nominal: No order (e.g., Red, Blue, Green).
2.Ordinal vs. Interval/Ratio:
1.Ordinal: No equal intervals or true zero point (e.g.,
Happiness levels).
2.Interval/Ratio: Numeric, ordered, with equal intervals (e.g.,
Temperature, Age).
Operations on Nominal data
1. Counting (Frequency)
•Description: You can count the occurrences of each rank or
category.
•Example: For customer satisfaction ratings (Very Unsatisfied,
Unsatisfied, Neutral, Satisfied, Very Satisfied):
•Very Satisfied: 10
•Satisfied: 15
•Neutral: 5
2. Median
•Description: The middle value or rank can be identified when the data is ordered.
•Example: For rankings: Very Unsatisfied, Unsatisfied, Neutral, Satisfied, Very
Satisfied:
•If the dataset has 9 responses in this order:
Very Satisfied, Satisfied, Neutral, Neutral, Neutral, Satisfied, Very Satisfied,
Unsatisfied, Neutral
•After ordering: Very Unsatisfied, Unsatisfied, Neutral, Neutral, Neutral,
Satisfied, Satisfied, Very Satisfied, Very Satisfied
•The median is Neutral (the 5th value in the ordered list).
3 Percentiles and Quartiles
•Description: You can divide the data into percentiles or quartiles based
on rank.
•Example: For 100 students ranked on performance (Poor, Average,
Good, Excellent):
•The top 25% (4th quartile) might fall in the Excellent category.

4. Mode:
Description: The most frequently occurring category can be identified.
Example: In a survey with responses like Neutral, Neutral, Satisfied

5. Ordering:
Description: Sorting ordinal data in ascending or descending order is
meaningful.
Example: Sorting rankings of hotels by star ratings (1-star, 2-star, 3-star,
4-star, 5-star) in descending order., Satisfied, Neutral, the mode is
Neutral.
Quantitative data
• Quantitative data relates to information about the
quantity of an object – hence it can be measured. For
example, if we consider the attribute ‘marks’, it can be
measured using a scale of measurement.
• Quantitative data is data that represents measurable
quantities or numerical values. Quantitative data is also
termed as numeric data. There are two types of
quantitative data:
1. Interval data
2. Ratio data
Characteristics of Quantitative data
1.Numerical Nature:
1. Quantitative data consists of numbers or quantities.
2. Example: Height (170 cm), weight (65 kg), test scores (95%).
2.Measurable:
1. Represents measurable attributes like length, time, temperature, or cost.
3.Arithmetic Operations:
1. You can perform mathematical operations like addition, subtraction, and averaging.
4.Two Subtypes:
1. Discrete Data: Finite or countable values (e.g., number of students in a class).
2. Continuous Data: Infinite or uncountable values within a range (e.g., time taken to
complete a task).
5.Objective:
1. Based on concrete measurements, not subjective interpretations.
6.Visualization:
1. Represented using histograms, scatter plots, line graphs, and box plots.
Quantitative
Data
Dataset

Qualitativ Quantitati
e data ve data

Nominal Ordinal Interval


Ratio data
data data data
Interval Data
• Interval data is numeric data for which not only the
order is known, but the exact difference between values
is also known. An ideal example of interval data is
Celsius temperature. The difference between each value
remains the same in Celsius temperature. For example,
the difference between 12°C and 18°C degrees is
measurable and is 6°C as in the case of difference
between 15.5°C and 21.5°C. Other examples include
date, time, etc
For interval data, mathematical operations such as addition
and subtraction are possible. For that reason, for interval data,
the central tendency can be measured by mean, median, or
mode. Standard deviation can also be calculated.
However, interval data do not have something called a ‘true
zero’ value. For example, there is nothing called ‘0
temperature’ or ‘no temperature’. Hence, only addition and
subtraction applies for interval data. The ratio cannot be
applied. This means, we can say a temperature of 40°C is
equal to the temperature of 20°C + temperature of 20°C.
However, we cannot say the temperature of 40°C means it is
twice as hot as in temperature of 20°C.
Examples of Interval Data

• Examples of Interval Data


1.Temperature
1. Measured in Celsius or Fahrenheit.
2. Example:
1. 30∘C30^\circ \text{C}30∘C and 40∘C40^\circ \text{C}40∘C: The difference is 10∘C10^\circ \
text{C}10∘C, but 40∘C40^\circ \text{C}40∘C is not "twice as hot" as 20∘C20^\circ \text{C}20∘C.
3. Reason: Zero (0∘C0^\circ \text{C}0∘C or 0∘F0^\circ \text{F}0∘F) is arbitrary and doesn't
represent the absence of temperature.
2.IQ Scores
1. Example: A person with an IQ of 120 is not "twice as intelligent" as someone with an IQ of
60.
2. Reason: The scale has equal intervals, but no true zero point.
3.Dates and Time (Calendar Years)
1. Example: The year 2000 and the year 1900 have a meaningful difference of 100 years.
2. Reason: The zero point (e.g., year 0) is arbitrary and does not represent the beginning of
time.
Overview of operations for interval
data:
• Percentiles and quartiles can be calculated on
interval data
Operations in interval data
Why Ratios Cannot Be Applied to Temperatures in
Celsius
Interval Scale Properties: Celsius is an interval scale, meaning it has equal
intervals between points (e.g., the difference between 20°C and 30°C is the
same as the difference between 40°C and 50°C).However, interval scales lack a
true zero point. The zero on the Celsius scale (0°C) does not represent the
complete absence of temperature—it is just a point relative to the freezing point
of water.
Implications: Because there is no true zero, ratios are not meaningful. For
instance, 40°C cannot be interpreted as "twice as hot" as 20°C. The "doubling"
logic requires a true zero point, which Celsius does not have.
Valid Operations: In Celsius, you can calculate differences (e.g., 40°C - 20°C =
20°C) because the intervals are consistent. However, you cannot perform
meaningful multiplicative comparisons like "twice" or "half."
• Contrast with Ratio Scales
• A ratio scale has a true zero point and meaningful
ratios.
• Example: Kelvin temperature scale. Zero Kelvin (0 K)
represents the absence of thermal energy, so 40 K is indeed
twice as hot as 20 K.
• Example to Clarify
• Celsius:
• 40°C = 20°C + 20°C (valid as a difference).
• But 40°C ≠ 2 × 20°C because the scale has no true zero.
• Kelvin:
• If 20 K and 40 K were compared, 40 K = 2 × 20 K is valid
because Kelvin has an absolute zero.
Ratio Data
• Ratio data represents numeric data for which exact
value can be measured. Absolute zero is available for
ratio data. Also, these variables can be added,
subtracted, multiplied, or divided. The central tendency
can be measured by mean, median, or mode and
methods of dispersion such as standard deviation.
Examples of ratio data include height, weight, age,
salary, etc.
• Ratio data is the most precise and informative type of
quantitative data. It builds on the properties of nominal,
ordinal, and interval data by including a true zero point,
allowing for meaningful comparisons using both
Key Characteristics of Ratio Data
True Zero Point:
•A value of zero in ratio data indicates the absence of the measured attribute.
•Example: Weight (0 kg means no weight), income (0 dollars means no income).

Numerical and Measurable:


•Ratio data is expressed in numbers and is continuous or discrete.
•Example: Height, weight, speed, and age.

Arithmetic Operations:
•All mathematical operations, including addition, subtraction, multiplication, and division, are valid.
•Example: A person weighing 60 kg is twice as heavy as someone weighing 30 kg.

•Equal Intervals:
•The intervals between values are consistent and meaningful.
•Example: The difference between 20°C and 30°C is the same as between 50°C and 60°C when using Kelvin.
Ratios Are Meaningful:
•Because of the true zero, ratios can be interpreted.
•Example: A temperature of 40 Kelvin is twice as hot as 20 Kelvin.

Continuous or Discrete:
•Can take any value within a range (continuous) or specific countable values (discrete).
•Continuous Example: Height (e.g., 170.5 cm).
•Discrete Example: Number of students in a class (e.g., 20 students).

Visualization:
•Best represented using histograms, line graphs, scatter plots, or box plots.
Examples of Ratio Data

1.Physical Measurements:
1.Height, weight, distance, speed, temperature in Kelvin.
2.Financial Data:
1.Income, revenue, expenses.
3.Demographic Data:
1.Age, number of children, years of education.
Overview of operations for ratio
data:
• Percentiles and quartiles can be calculated on ratio
data because ratio data is quantitative, continuous (or
sometimes discrete), and allows for meaningful
ordering of values.
• Apart from the approach detailed above, attributes can also be
categorized into types based on a number of values that can be
assigned.
• The attributes can be either discrete or continuous based on this
factor.
• Discrete attributes can assume a finite or countably infinite
number of values.
• Nominal attributes such as roll number, street number, pin code,
etc. can have a finite number of values whereas numeric
attributes such as count, rank of students, etc. can have
countably infinite values.
• A special type of discrete attribute which can assume two values
only is
called binary attribute.
• Examples of binary attribute include: male/ female,
positive/negative, yes/no, etc.
• Continuous attributes can assume any possible value which is a
Discrete Data

Discrete data consists of countable, distinct values. It cannot take on values


between two specific points.
Key Characteristics:
• Countable: Values are finite or countable.
• Whole Numbers: Typically, the data is represented as integers.
• No Fractions or Decimals: Cannot include partial values.
• Examples:
• Number of students in a class (e.g., 25, 30).
• Number of cars in a parking lot (e.g., 10, 15).
• Number of goals scored in a soccer match (e.g., 1, 2, 3).
• Common Visualizations:
Bar charts, pie charts.
Continuous data

Continuous data can take any value within a given range, including fractions and
decimals. It represents measurements rather than counts.
Key Characteristics:
• Measurable: Values can be measured, not just counted.
• Infinite Possibilities: Can include any value within a range.
• Fractions and Decimals Allowed: Examples include 5.5, 10.75, etc.
• Examples:
• Height of a person (e.g., 170.5 cm, 165.2 cm).
• Weight of an object (e.g., 65.3 kg, 72.8 kg).
• Time taken to complete a task (e.g., 12.5 seconds, 14.8 seconds).
• Common Visualizations:
Histograms, line graphs, scatter plots, box plots.
Common Applications

1.Discrete Data:
1.Business: Counting products sold, number of employees.
2.Sports: Counting goals, fouls, or wins.
2.Continuous Data:
1.Healthcare: Measuring patient blood pressure, weight, or
height.
2.Physics: Recording speed, temperature, or distance.
Note:

• In general, nominal and ordinal attributes


are discrete.
• On the other hand, interval and ratio
attributes are continuous, barring a few
exceptions
• e.g. ‘count’ attribute.
Comparison with Discrete Data
The following figure gives a summarized view of different types of data
that we may find in a typical machine learning problem.
State the type of data in these
dataset

You might also like