0% found this document useful (0 votes)
11 views39 pages

Data Visualization

This document discusses predictive analytics and different types of data. It begins by explaining the importance of understanding data types when applying statistics. It then describes different levels of measurement for data: nominal, ordinal, interval, and ratio scales. Specific examples are provided for each. Common data visualization tools and graphs like pie charts, bar charts, and line graphs are also outlined. The document emphasizes that knowing the appropriate data type enables applying the correct analysis and visualization.

Uploaded by

ravi gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views39 pages

Data Visualization

This document discusses predictive analytics and different types of data. It begins by explaining the importance of understanding data types when applying statistics. It then describes different levels of measurement for data: nominal, ordinal, interval, and ratio scales. Specific examples are provided for each. Common data visualization tools and graphs like pie charts, bar charts, and line graphs are also outlined. The document emphasizes that knowing the appropriate data type enables applying the correct analysis and visualization.

Uploaded by

ravi gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Predictive Analytics

Dr. Aanchal Anant Awasthi


Prerequisite
• Data Types
• Data Visualization
Why Data Types?

Knowledge about types of data enables us to apply correct statistics


Dataset 1: Student Register
Levels of Measurement

Nominal

Ordinal

Interval

Ratio
Nominal Scale
• “Nominal” scales could simply be called “labels.”

• Objects fall into unordered categories

• Collect information through frequencies

• Examples

• Hair colour: Brown, Red, Black, etc.


• Race: Caucasian, African, American, Asian etc.
• Smoking status: Smoker, Non-Smoker
Ordinal Scale
• It is the order of the values is what’s important and significant, but the differences between
each one is not really known

• Ordinal scale dealing with relative differences rather than with quantitative differences
Category Level
• Quantitative comparisons are impossible Illetrate 0
Pre Primary 1
• Examples: Primary 2
Educational level, Satisfaction,
Junior High School 3
Happiness, Discomfort
Highschool 4
Intermediate 5
Graduation 6
Post Graduation 7
Likert Scale
Interval Scale
• Possess a constant interval size

Examples: Temperature in centigrade

- Distance between 940◦C and 960◦C is the same as the distance between 1000◦C and
1020◦C

• Not a true zero (That means zero point is arbitrary)

- But it can’t be said that a temperature of 20◦C is as twice as hot as a temperature of


10◦C
True zero point (Absence)
• Consider Temperature scale

F = 9/5C + 32

ᵒC 5 10 15 0
ᵒF 41 50 69 32
Circular Scales
• The interval between 2:00 pm (1400hr) and 3:30 pm (1530hr) is the same as the
interval between 08:00 am (0800hr) and 09:30am (0930hr).

• But one cannot speak of ratios of times of day because the zero point (midnight)
on the scale is arbitrary.

• Circular biological data are usually like compass, as the designation of north as 0◦
is arbitrary.
Ratio Scale
• They tell us about the order, they tell us the exact value
between units, and they also have an absolute zero

Examples

• Weight

• Height

• Length of time (hr, days, year etc.)

• Volume
Qualitative Data

• Qualitative data deals with characteristics and descriptors


that can't be easily measured, but can be observed
subjectively

• Smell, Taste, Attractiveness, Colour


Quantitative data
• Quantitative data deals with numbers and things you can measure objectively

• Quantitative data is numerical information (numbers)

• It can be discrete or continuous

• Height, width, length, temperature, humidity, price, area and volume


Data Visualization
Dr. Aanchal Anant Awasthi
Data Visualization
Stages of Cancer Distribution of Smokers according to Cancer Staging
I Gender
30
10% Female,
IV II 24
20, 20% 25
35% 15% 21
Male, 20
100, 25%
15
Male,
300, 75% 10 9
III
Female, 6
40%
80, 80% 5
0
Cancer Staging Smoker Non Smoker I II III IV

IV 21
Trends in Breast Cancer (per Distribution of marks
III 24 100,000) and study time
40 8
II 9 22

Self Study Time (Hours)


13 15 18 6
20 12
I 6 9 9 8 9 10 4
0
2
2017 2018 2019 2020 2021
0
Incidence of Breast Cancer 0 50 100 150
0 20 40 Marks (out of 100)
Mortality of Breast Cancer
*All the graphs are based upon dummy data
Why Data Visualization?

Pattern

Trends

Relationship between variables

Easy to understand

Fastest way of summarising big data


Frequently used Graphs

• Pie-Chart
• Bar chart
• Column chart
• Line Graph
• Histogram
• Box-Whisker Plot
• Scatter Plot
Data Visualization Tools
• Tableau
• Microsoft Power BI
• R (ggplot2…)
• Python(matplotlib)
• Microsoft Excel
Pie Charts

Pie graphs shows parts or percentages of a whole

Student Grades TYPE OF MOVIE


10, 10%

Action, 280, 29%


Comedy, 400,
50, 50% 42%

40, 40%

Romantic, 100,
11%

Drama, 170,
A B C 18%

Figure 1: Distribution of grades of 5th grade Figure 2: Type of movie liked by members of a
family
students in a school of state of Bihar
Limitations

• Difficult to visualize the differences between estimates of almost similar size.

• Pie graphs simply don’t work when comparing data.


Donut Charts

It can contain more than one data series

Type of Movie Type of Movie

280, 29% Female, 30%


Male, 24%
Male, 32%
400, 42% Female, 47%
Male, 5%

100, 11% Male, 39%


Female, 16%
170, 18%
Female, 7%

Comedy Drama Romantic Action Comedy Drama Romantic Action

Figure 1: Distribution of grades of 5th grade Figure 2: Type of movie liked by youth of
Lucknow
students in a school of state of Bihar
Bar Charts
• Vertical Bar Graph

• Horizontal Bar Graph

• Clustered Bar Graph

• Stacked Bar Graph


Bar Charts

Vertical Bar Graphs

• Using vertical bars going up


from bottom
• Length are proportional to
quantities they represent
• Vertical bar graphs are best for
Figure 3: Distribution of favorite
comparing estimates between 2 Music among students of Higher
and 7 groups. Education
Bar Charts
Horizontal Bar Graphs

• These are the same as vertical bar


graphs, but turned on their side

• Horizontal bar graphs are best to use


when we have eight or more different
groups

• Horizontal bar graphs are also Figure 4: Favorite Snacks of child


appropriate to use when the category aged 5 to 9 completed Years
labels are too long to appear neatly on
the x-axis
Bar Charts
Clustered Bar Graphs

• Clustered or grouped bar


graphs are bar graphs that
show two or more
categories on one graph

• Plotting multiple categories


on one graph increases the
amount of information Figure 5: Sales of product this year vs
last year
Bar Charts
Stacked Bar Graphs

• It segment their bars of multiple datasets on top of each other

• They are used to show how a larger category is divided into smaller categories and what
the relationship of each part has on the total amount

Major Flaw

• They become harder to read the more segments each bar has

• Also comparing each segment to each other is difficult, as they're not aligned on a
common baseline
Bar Charts
Simple Stacked Bar Graphs

• It place each value for the segment


after the previous one

• The total value of the bar is all the


segment values added together

• Ideal for comparing the total Figure 6: Defining stacked bar graphs
amounts across each
group/segmented bar.
Stacked Bar Diagram Examples

Figure 8: Favorite sports of grade 8 Figure 9: Quarterly sales of garments


students manufactured by ABC company in India
Line Graphs
• Used to illustrate trends
over time for continuous
data

• They can also be used to


compare two different
variables over time
Figure 10: Average onion prices in
various Maharashtra districts in July
(Rs/Quintal)
Histogram

• A histogram shows the underlying


frequency distribution (shape) of a set
of continuous data

• Data should be grouped into exclusive


ranges

• They are connected bars

• The width of each bar is proportional


to the width of each category, and the
height is proportional to the frequency Figure 11: Distribution of heights
of that category. of Black Cherry trees in a village
of state of Kerala
Optimum number of classes intervals(Bins)
Sturge’s rule • Min Bin Width = (Max Observed
Value – Min Observed Value) /k
k = 1 + 3.322 log n

N K Roundof k
Where: 30 5.91 6
k = the number of bins 40
n = the number of observations in 50
the data set. 100
1000
Histogram

• Histograms clearly show outliers and skewness

Figure 12: Positively Skewed, Negatively Skewed & Normally distributed


Data from left to right
Difference between Bar diagram & Histogram
Bar Diagram Histogram

• Gaps between bars • Bars are adjacent to each other


• Shows frequency distribution of
• Plots categorical data numerical data
• Bars can be reordered • Bars can not be reordered
• Height of the bar is proportion to
• Height of the bar frequency
represents frequency • Width of the bar is equal to
interval range
• Width of the bar is
immaterial
Box Whisker Plot
• Often used in exploratory data
analysis

• Five number summary:

✓ the minimum value


✓ the lower quartile
✓ the median value
✓ the upper quartile
✓ The maximum value
Figure 13: Annotated Box Plot
Box Whisker Plot

Boxplots show robust


measures of location and
spread as well as providing
information about
symmetry and outliers

Figure 14: Distribution of heights of males &


females in a class
Scatter Plot
• A graph in which the values
of two variables are plotted
along two axes

• The pattern of the resulting


points revealing any
correlation, if present.

Figure 15: Correlation


between height & weight
of children aged 1 to 4
years
Conclusion
An appropriate and properly prepared graph can be a powerful tool to convey
statistical information.

Features of an Idea Graph

• What you aimed to present

• Graph should be Clear

• Define Chart Title & Legends

• Name & Number of each graph

• Label Axes
Bibliography
• Jerrold H. Zar. Biostatistical Analysis, Fourth Edition, Pearson Education India, 1999

• S Manikandan. Frequecy Distribution, J Pharmacol Pharmacother.[ cited from 2011 Jan-Mar; 2(1): 54–
56]. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117575/

You might also like