0% found this document useful (0 votes)
2 views76 pages

Basic Applied Visualizations: Data Visualization

The document outlines various data visualization techniques, categorizing them into one-dimensional (1D) and two-dimensional (2D) methods, including pictograms, pie charts, bar charts, histograms, and scatter plots. It emphasizes the goals of data visualization, such as exploration, confirmation, and communication, while also discussing common tools and the data visualization process. Additionally, it provides detailed explanations on how to create and interpret specific visualizations like pictographs and pie charts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views76 pages

Basic Applied Visualizations: Data Visualization

The document outlines various data visualization techniques, categorizing them into one-dimensional (1D) and two-dimensional (2D) methods, including pictograms, pie charts, bar charts, histograms, and scatter plots. It emphasizes the goals of data visualization, such as exploration, confirmation, and communication, while also discussing common tools and the data visualization process. Additionally, it provides detailed explanations on how to create and interpret specific visualizations like pictographs and pie charts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

UNIT-II

Basic Applied Visualizations: Data Visualization: One


dimensional (Pictogram, Pie Chart, Bar Chart,), two-
dimensional (Histogram, Line plot,frequency curves&
polygons, ogive Curves, Scatter Plot,) and other data
visualization techniques. Gantt Chart, Heat Map, Box and
Whisker Plot,Waterfall Chart, Area Chart, Stacked Bar Charts –
Sub Plots – Matplotlib, Seaborn Styles, Box plot – Density Plot
- - Tree map – Graph Networks.

Data visualization is the process of creating graphical


representations of datato better understand and communicate
information. It involves using visual elements like charts,
graphs, maps, and more to help people quickly grasp complex
data insights.

Data visualization techniques


Data visualization techniques are methods used to represent
data graphically,making it easier to understand, analyze, and
communicate insights. Here are some key points about data
visualization techniques:
Types of techniques:
• One-dimensional (1D) Techniques : Display a single variable or
category
• Examples: Line plots, Bar charts, Histograms, Density plots
• Two-dimensional (2D) Techniques : Display two variables or
categories
• Examples: Scatter plots, Bubble charts, Heatmaps, Contour plots
• Multi-dimensional (Multi-D): Display three or more variables or
categories

• Examples: Scatter plot matrices, Parallel coordinates, Radar charts, 3D


plot

Goals of data visualization:

• Exploration: Understand data distribution, trends, and patterns


• Confirmation: Validate hypotheses or assumptions

• Communication: Share insights and findings with others


Common data visualization tools:

• Tableau

• Power BI

• Matplotlib

• Seaborn

• Plotly
• D3.js
Data visualization process:

• Data preparation: Clean, transform, and aggregate data


• Visualization selection: Choose appropriate technique

• Design and implementation: Create and refine visualization


• Interpretation and communication: Analyze and share insights
One-dimensional (1D)
One-dimensional (1D) data visualization represents data using
only one axisor dimension. These type of visualization is used to
display the distribution, trend, or pattern of a single variable or
category.

Common 1D data visualization techniques include:

• Pictogram
• Pie chart
• Bar charts

Characteristics of 1D techniques:

• Use a single axis (x or y)

• Display a single variable or category


• Show distribution, trend, or pattern

• Often used for time-series data, categorical data, or continuous data

Pictogram in One dimensional data


visualization

A pictogram(pictograph) is a type of one-dimensional (1D)


data visualization technique that uses icons or images or
symbols to represent data. It is also known as a pictorial chart
or icon chart.
Pictograms, also known as picture grams, are very frequently
used in representing statistical data. Pictograms are drawn with
the help of pictures.
Characteristics of pictograms:

• Use icons or images to represent data

• One-dimensional, displaying a single variable or category


• Icons are often scaled or repeated to represent quantity
• Can be used for categorical or numerical data

Types of pictograms:

• Icon-based pictograms: Use simple icons to represent data

• Image-based pictograms: Use images to represent data


• Composite pictograms: Combine icons and images to represent data
Parts of Pictograph

Parts of a pictograph include the following:


Title: It is a title that describes what the pictograph is of.
Icon or Symbol: This visual representation represents
individual data pointsor categories.

Data values: It represents the quantity of each data point.

Labels: It provides more details about the data points.

Color (optional): It adds more meaning or readability to the


pictograph.

How to Make a Pictograph?

To create a pictograph, follow the steps listed below:


Understand the data: The first and foremost step to
creating a pictograph is to understand the data i.e. the type
of data given to create a pictograph.
Choose icons or symbols: To create your pictograph you need
to select anicon or image by which you are going to represent
the data.
Provide a scaling factor: You need to provide a scaling factor
that representsthe quantity or value of the icon.
Use colours: This is the optional step that is used to enhance
yourpictograph.
Present your Pictograph: After performing the above steps
you can present it as a presentation
Example 1 : Represent the number of books four students read
in a monthusing a pictograph.

Students Student A Student B Student C Student D


Books 4 8 6 8

• To create a pictograph we need to understand the data


first that is thetype of data as in our table some students
are there with some books.
• Now, we will choose an icon to make a pictograph. After
choosing anicon we will provide a scaling factor which will
be 2 in the given example.

• Now, we will present our pictograph


To read the pictograph, identify the icon which is a book in this
case, and thenlook for the scaling factor and then based on the
scaling factor count the
icons for every student and then find the value by
multiplying the number oficons with the scaling
factor(interpreting the data). Now find conclusions.

Now since each picture of book represent two books and the
pictograph shows student A has two book pictures hence,
student A has in total 2 × 2 = 4 books. Similarly Student B has
4 × 2 = 8 books, Student C has 3 × 2 = 6 books and Student
D has 4 × 2 = 8 books.
Example 2

Draw a pictogram for the data of production of tea (in


hundred kg) in aparticular area of Assam from year 2006 to
2010.
Pie chart in One dimensional data
visualization

A pie chart is a type of one-dimensional (1D) data visualization


technique that displays categorical data as a circle divided into
segments, each representinga category’s proportion.

Characteristics of pie charts:

• Circular shape, divided into segments (slices)

• Each slice represents a category or group or a portion of the data.


• Slice size indicates the proportion of data in each category

• One-dimensional, displaying a single variable or category.


• Pie charts work best with a limited number of categories.
Too many slicescan make the chart difficult to interpret, as
small slices can become indistinguishable.

Steps used for constructing a pie chart.

The total value or percentage of the pie is 100% always.


Step 1 :The first step requires us to write down the available
data in tabular form.
Step 2 :Now find the sum of all the given data.
Step 3 :Now, calculate the percentage of each sector. We need
to divide eachsector value by the sum or total and then multiply it
by 100.
Step 4 :Next step is to calculate the degrees corresponding to
each slice. Thevalues can be calculated as

Central Angle of Each Component = (Given Data / Total Value of Data)


× 360
Step 5 :

Now, with the help of a protractor, we will measure each angle


from a singlepoint or central point and draw the circle’s sectors.

Advantages of pie charts:

• Easy to understand and interpret


• Effective for displaying small number of categories (2-5)
• Can be used to show how different categories contribute to a whole
• Visually appealing and engaging

Disadvantages of pie charts:

• Difficult to compare slice sizes accurately

• Not effective for displaying large number of categories

• Can be misleading if slice sizes are similar


• Not suitable for displaying continuous data

Example 1 :

A teacher surveyed a group of students to see what their


favorite hobby ofeach student is

Hobbies Number of students


Singing 16
Reading books 20
Dancing 10
Painting 30
Others 24

Step 1: The first step requires us to write down the available


data in tabularform as follows:

Singing Reading Dancing Painting Others


Books
16 20 10 30 24
Step 2: Now find the sum of all the given data. Here, the Sum of All
Data = (16
+ 20 + 10 + 30 + 24) = 100
Step 3: Now, calculate the percentage of each sector. We need
to divide eachsector value by the sum or total and then multiply it
by 100.

Singing Reading Books Dancing Painting Others


(16/100) × (20/100) × (10/100) × (30/100) × (24/100) ×
100 100 100 100 100

= 16% = 20% = 10% = 30% = 24%

Step 4: Next step is to calculate the degrees corresponding to


each slice. Thevalues can be calculated as:

Central Angle of Each Component = (Given Data / Total

Value of Data) × 360Hence, The values are as follows:

Singing Reading Dancing Painting Others


books
(16/100) × (20/100) × (10/100) × (30/100) × (24/100) ×
360 360 360 360 360

= 57.6 = 72 = 36 = 108 = 86.4

Step 5: Now, with the help of a protractor, we will measure each


angle from asingle point or central point and draw the circle’s
sectors. The resultant pie chart will be:
How to Read Pie Chart

In order to read a pie chart, the first thing to notice is the data
presented in the pie chart. If the data is given in percentage, it
should be converted accordingly in order to analyze and interpret
the data. Let’s take a look at an example in
order to learn how to interpret pie charts.

Example: In a survey done among 300 people, it was


observed which type ofgenre each person prefers. The pie
chart of the same is mentioned below.
Analyze and interpret the pie chart accordingly to find the original data.
Solution:
While observing the pie chart, it came to notice that the data is
present inpercentage. Let’s convert the data to obtain the
original value.

Number of people who like comedy = 20/100 × 300 =


60 people.Number of people who like action = 25/100 ×

300 = 75 people.

Number of people who like romance = 30/100 × 300 =

90 people.Number of people who like drama = 5/100 ×

300 = 15 people.
Number of people who like sci-fi = 20/100 × 300 = 60 people.

Bar chart
A bar chart is a type of one-dimensional (1D) data visualization
technique thatdisplays categorical data using rectangular bars,
each representing a category’s value.
The length or height of each bar is proportional to the value it
represents, making it easy to compare different categories at a
glance.
They are called one dimensional diagrams because only length
of the bar matters and not the width. That is, width of each
bar remains same in a diagram, but it may vary diagram to
diagram depending on the space available and number of bars
to be presented.

Bar charts are a popular and effective choice for 1D data


visualization, allowing for easy comparison and analysis of
categorical data.
Characteristics of bar charts:

• Rectangular bars, each representing a category

• Bar length or height indicates the value or magnitude

• One-dimensional, displaying a single variable or category


• Can be horizontal (bar chart) or vertical (column chart)
Types of bar charts:

• Simple bar chart : If someone has to represent the data


based on one variable, then the simple bar diagram can
be used.Displays categoricaldata with rectangular bars.
• Stacked bar chart : Stacked bar charts divide each bar
into sub-bars representing different sub-categories. This
allows for the visualization ofboth the total value and the
contribution of each sub-category within each main
category.

• Grouped bar chart : Grouped bar charts display


multiple bars side by side for each category. Each group
of bars represents a different sub- category, enabling
comparisons within and across categories.
• Clustered bar chart : Similar to grouped bar charts,
but with a gapbetween clusters.
• Horizontal Bar Chart : Bars are displayed
horizontally instead ofvertically.
• Vertical Bar Chart : Bars are displayed vertically
(also known as acolumn chart).
• Percentage Bar Chart : Bars represent percentage values.
Advantages of bar charts:

• Easy to read and compare values


• Effective for displaying categorical data

• Can display large number of categories

• Accurate and precise representation of data


• Bar charts can be used for both qualitative and quantitative
data, makingthem a versatile tool for various types of data.
Disadvantages of bar charts

• Can be cluttered if too many categories

• May not show trends or patterns as clearly as line charts

• Can be misleading if categories are not clearly labeled


Best practices for creating bar charts:

• Use a clear and concise title

• Label categories and values accurately


• Use contrasting colors to differentiate bars

• Avoid 3D or stacked bar charts, as they can be misleading


• Consider using sorted or grouped bars for clarity
Bar Graph Examples

Ǫuestion 1: The average speed of some vehicles is shown below,


Represent itthrough a bar graph.

Solution:

The average speed mentioned in the table is the frequency that


decides the length of each bar graph, therefore, if the graph
is vertical, the Average speedwill be shown on the y-axis and
the types of vehicles will be shown on the x- axis..

Two-dimensional (2D)
Two-dimensional (2D) data visualization techniques are used to
display datawith two variables or dimensions i.e.., length and
width using visualizations like charts, plots, and graphs.

In one dimensional diagrams only length of the bar is important and


comparison of bars are done on the basis of their lengths
only, while in twodimensional diagrams both length and width
of the bars are considered, i.e. intwo dimensional diagrams
given numerical figures are represented by areas ofThe bars. So,
two dimensional diagrams are also known as “Area Diagrams.”

This type of visualization helps to:

• Identify relationships and correlations

• Detect patterns and trends


• Compare distributions and densities
• Visualize high-dimensional data

Common 2D data visualization techniques include:

histogram
line plot
frequency curves and polygons
ogive curves
scatter plot
Histogram in two dimensional data
visualization
A histogram is a type of two-dimensional (2D) data visualization
that displaysthe distribution of a continuous variable using
rectangular bars.

The 2D histogram divides the plane into a grid of bins along


both the x-axis and the y-axis.
The x-axis represents the variable, and the y-axis
represents the frequency. It’s a popular technique for:
• Showing distribution : Histograms help understand the
shape of the datadistribution, including skewness, modality,
and outliers.
• Identifying patterns : They reveal patterns, such as
peaks, troughs, and plateaus, in the data.
• Comparing datasets : Histograms enable comparison of
multiple datasetsto identify similarities and differences.

Key components of a histogram:

• Binning in Two Dimensions:: Rectangular bars


representing a range of values.The 2D histogram divides the
plane into a grid of bins along both the x-axis and the y-axis.
Each bin represents a range of values for both variables.

• Frequency : The height of each bin indicates the number


of data pointswithin that range.
• Axis : The x-axis represents the variable, and the y-
axis represents thefrequency.
Types of histograms:

• Simple histogram: Displays a single variable.


• Stacked histogram : Displays multiple variables, with
each variablerepresented by a different color.
• Grouped histogram : Displays multiple variables, with
each variablerepresented by a separate bar.

Construction and interpretation of Histogram

• First, mark the class intervals on X-axis and frequencies on Y-axis.


• Make sure that the scale of both axes is the same.
• The class Intervals shall always be exclusive.
• Create bars with class intervals on the x-axis and
correspondingfrequencies on the y-axis.
• The length of each bar reflects the Frequency when intervals are
equal.
• The area of each bar is the same as its respective
frequency whenintervals are unequal.

Histogram Examples

In a Park, there are 28 trees of different heights, the heights


can be measured in centimeters and the range of the trees lie
between 100-350 cms. Draw the Histogram for the following
data,
Solution:

Since the height of the trees are lying between 100-350, we


shall start by marking the heights on x-axis in groups of 50cm
each and the number of treeswill be mentioned on y-axis.
Therefore, if a tree has a height of 230 cm, it will lie in the rectangle
200-250.
Line plot in two dimensional data visualization

A line plot is one of the most common and effective techniques


in two-dimensional (2D) data visualization.

Line plot is a graphical representation of the relationship between two


continuous variables. It displays the trend or pattern between
the variablesusing a series of connected points.

Key components:

• _X-axis_: Represents the independent variable.

• _Y-axis_: Represents the dependent variable.

• _Data points_: Individual observations or measurements.

• _Line_: Connects the data points to show the trend or pattern.

Types of line plots:

• Simple line plot: Displays a single line.

• Multi-line plot: Displays multiple lines for comparison.

Best practices:

• Choose appropriate scales: Ensure accurate representation.

• Label axes and title: Clearly indicate variables and purpose.

• Use color effectively: Differentiate lines and highlight important


features.

• Avoid clutter: Limit the number of lines and data points.


• Consider alternative charts: Depending on the data
and messageExample:

Time(hr) Rahul dist.(km) Mahesh dist. (km)


0.5 180 200
1 360 400
1.5 540 600
2 720 800
2.5 900 1000
3 1080 1200

Frequency curves in two dimensional data visualization


Frequency curves are a type of two-dimensional (2D) data
visualization thatrepresent the distribution of data set.
A frequency curve is a limiting form of a histogram. A
frequency curve for agiven distribution can be obtained by
drawing a smooth, free hand curve through the midpoints of
the upper sides of the rectangles forming the histogram.
Characteristics:

• X-axis : Represents the continuous variable.

• Y-axis : Represents the density or frequency.

• Curve: A smooth line that estimates the distribution.The


curve is plotted byconnecting points that represent the
frequency (or density) of data points within specific intervals
along the x-axis.

Frequency polygons in two dimensional data visualization

Frequency polygon is a type of two-dimensional (2D) data


visualization that isused to represent the distribution of a
dataset. They are similar to frequency curves but use straight
lines instead of smooth curves.
Characteristics:

• _X-axis_: Represents the continuous variable.


• _Y-axis_: Represents the frequency or density.

• _Polygon_: A line graph that connects the midpoints of histogram


bars.
Purpose of Frequency curves and Frequency polygons

They are used to:

• Show distribution shape : Visualize the underlying


distribution of thedata.
• Identify patterns : Reveal modes, skewness, and outliers.
• Compare distributions : Display multiple distributions for
comparison.

Let us consider an example. Consider the following table:

Classes 0-10 10-20 20-30 30-40 40-50


Class mark 5 15 25 35 45
Frequencies 2 6 4 3 1

For the example we have considered, a frequency curve looks like the
following,
And for the same example, a frequency polygon looks like the
following,
We can see that the frequency polygon and the frequency
curve, both depend on the class mark to be expressed as
graphs.
The only difference between a frequency polygon and a
frequency curve is the following,

A frequency curve is a smooth, free hand drawn curve.


A frequency polygon is drawn by joining the class marks with line
segments.
Note:
The information given about grouped data and its frequencies
can be represented in more than one way. The frequency curve
can be drawn from the frequency polygon. The area under the
frequency polygon is equal to the area of the histogram. This
area tells us the frequency values that would be displayed in
the distribution table.
Ogive curves in two dimensional data visualization
Ogive curves, also known as cumulative frequency curves, are a type
of two-
dimensional (2D) data visualization used to display the cumulative
distribution of acontinuous variable.

Characteristics:
• X-axis: Represents the continuous variable.
• Y-axis : Represents the cumulative frequency or percentage.

• Curve : A smooth or stepped curve that displays the cumulative


distribution.

Types of ogive curves:


• Less than ogive: Displays the cumulative frequency
of values less than or equal to the x-axis value.

A less than ogive curve is an increasing curve that


slopes upwards from left to right
• More than ogive: Displays the cumulative frequency of
values greater than or equal to the x-axis value.
A more than ogive curve is a decreasing curve that
slopes downwards from left to right.
How to Construct an Ogive Curve:

• Collect Data: Start with your dataset, which is


organized into intervals or bins. Calculate the
frequency for each bin.
• Calculate Cumulative Frequency :
For each bin, calculate the cumulative frequency by
adding the frequency ofthat bin to the cumulative
frequency of all previous bins.
• To present a less than ogive graph, add the
frequencies of all the preceding class
intervals to the frequency of a class.
• To present a more than ogive graph, add the
frequencies of all the succeeding class
intervals to the frequency of a class.
• Plot Points: Plot the cumulative frequency against
the upper boundary of each bin on the x-axis.
• Connect the Points: Connect the plotted points with
straight lines to form the ogive curve.
Example: Draw a ‘less than’ and ‘more than ‘ ogive curve
from the following distribution of the marks of 50 students
in a class.

Marks 10-20 20-30 30-40 40-50 50-60 60-70 70-80


No of 6 4 15 5 8 7 5
students

Solution: Less than ogive curve : First of all, we have to


convert the frequency distribution into a less than cumulative
frequency distribution
More than ogive curve
Scatter plot in two dimensional data visualization
A scatter plot is a two-dimensional data visualization that
displays the relationship between two continuous variables.

Scatter plots are the graphs that present the relationship


between two variables in a data-set. The independent
variable or attribute is plotted on the X-axis, while the
dependent variable is plotted on the Y-axis.
A scatter plot is also called a scatter chart, scattergram, or

scatter plot, XY graph It’s used to:


• Show relationship: Visualize the relationship between two variables.

• Identify patterns: Reveal trends, correlations, and outliers.


• Compare groups : Compare relationships between different groups
or categories.

Characteristics:

• X-axis : Represents the independent variable.

• Y-axis: Represents the dependent variable.

• Points: Each point represents an observation, and its


position indicates the values of the variables.

Types of scatter plots:

• Simple scatter plot: Displays the relationship between two


variables.

• Multiple scatter plot: Displays the relationship between more than


two variables.
• Colored scatter plot: Uses colors to represent different groups or
categories.

Interpreting a Scatter Plot:

• Positive Correlation : If the points tend to rise


from the lower left to the upper right, it suggests
a positive correlation, meaning that as the
independent variable increases, the dependent variable also
increases.
• Negative Correlation: If the points fall from the upper
left to the lower right, it indicates a negative
correlation, meaning that as the independent variable
increases, the dependent variable decreases.
• No Correlation: If the points are scattered randomly
without any discernible pattern, it suggests no
correlation, meaning there is no apparent relationship
between the variables.
• Clusters: Sometimes points may form distinct groups
or clusters, indicating subgroups within the data that
may need further analysis.
• Outliers: Outliers are points that fall far from the
general pattern of the data. They can indicate unusual
observations, errors, or important exceptions that merit
further investigation.
Example :
The line drawn in a scatter plot, which is near to almost all
the points in the plot is known as “line of best fit” or “trend
line“.

DV TECHNIQUES

Gantt chart
A Gantt chart is a type of bar chart used in two-dimensional
(2D) data visualization, primarily for project management and
scheduling.
It visually represents the timeline of tasks, activities, or
events within a project, showing their start and end dates,
duration, and sometimes dependencies.
On the left of the chart is a list of the activities and along
the top is a suitable time scale. Each activity is represented
by a bar; the position and length of the bar reflects the start
date, duration and end date of the activity..
Utilizing gantt charts to display timelines can be incredibly
helpful and enable team members to keep track of every
aspects of project. Even if you are not a project management
professional, familaring your self gantt charts can help you
stay organized.

Dimensions or key features of gantt chart

• Time(X-axis): Represents the project timeline ,


divided into days, weeks, months, or other relevant
time units.

• Tasks(Y-axis): Lists the tasks or activities or project milestones.

• Bars:Each task or activity is represented by a


horizontal bar. The length and position of the bar indicate
the start and end dates and the duration of the task.
Bars may also show the progress of tasks, often by shading or
color-coding the portion of the bar that represents completed
work.

• Dependencies: Arrows or lines connect tasks to show


dependencies.

• Milestones: Important dates or project milestones can be


marked with special symbols, like diamonds, on the chart.

Purpose:

• Project planning : Visualize the project schedule and tasks.

• Task management : Show dependencies and timelines.


• Progress tracking : Highlight completed tasks and milestones

Types of Gantt charts:

• Simple Gantt chart : Shows the basic project schedule.

• Detailed Gantt chart : Shows additional


details like resources and dependencies..

Types of Gantt Chart Dependencies


• Finish to Start (FS): A task must be completed before
the next task begins. (You must complete Task A
before you start task B).
• Start to Start (SS): A task must be started before the
next task starts (You must start Task A before you start
Task B).
• Finish to Finish (FF): A task can only be completed
once another task iscompleted (You can only
complete Task A once you complete Task B).
• Start to Finish (SF): A task can only be completed
once another task starts (You can only complete Task A
once you start Task B).
Gantt chart you’ll see a list of the activities on the chart’s y-
axis and a suitable time scale along the x-axis (either on the
top or the bottom). Within the chart you’ll see a horizontal
bar that indicates the progress of each activity. The location
and length of the bar corresponds to how far along each task
is at any given point.

Heat maps
• Heat maps are two-dimensional techniques.
• It is a graphical representation of data where values
are depicted using colors. The data is typically
arranged in a grid or matrix format, with each cell
assigned a color based on its value. Heatmaps are
particularly useful for
visualizing large datasets and identifying areas of interest or
concentration.
• Heatmap data visualization is a technique that uses
color to represent data values. The most common
color schemes range from warm colors (such asred) to
cool colors (such as blue), with warm colors typically
representing higher values and cool colors representing
lower values.

Key characteristics:

• Matrix layout : Rows and columns represent different dimensions


of the data.
• Color encoding : Colors indicate the magnitude of the
values, with darker colors typically representing higher values.
• Values : Cells contain numerical values, which are used to
determine the color.
• Legends: A legend or color scale is usually included to
indicate the range of values that the colors represent,
allowing viewers to interpret the data accurately.

A heatmap must typically contain a legend describing how


the colours correspond to numerical values.

Best practices:

• Choose appropriate colors : Select colors that are


easily distinguishable and convey the intended meaning.

• Use a clear legend : Provide a legend to explain the color encoding.

• Avoid 3D effects : Keep the Heat Map flat, as 3D effects can distort
the data.
How to Create a Heat Map:
• Prepare Data: Organize your data into a matrix or
table format, where each row and column represents
a category or variable, and the intersecting cells
contain the data values.
• Assign Colors: Determine a color scale that
represents the range of data values. Assign colors to
each cell based on the corresponding data value.
• Plot the Heat Map: Plot the heat map using software
or a visualization tool that supports heat map creation.
Tools like Excel, Python (with libraries like Seaborn or
Matplotlib), and R are commonly used.
• Add a Legend: bInclude a color legend that
indicates what the colors represent in terms of
data values.
Example 1 :

Example 2 : Region-wise monthly sale of a SKU (stock-keeping unit)


Box and Whisker Plot
• A Box and Whisker Plot, also known simply as a box plot.
• Box and Whisker Plot is defined as a visual
representation of the five-point summary statistics of
minimum, first quartile, median, third quartile, and
maximum.
• It consists of a rectangular “box” and two “whiskers.”
• It is particularly useful for comparing the distribution
of data across different categories or groups.
Key Components of a Box and Whisker Plot:

• Box :
• The central part of the plot is a rectangle
(the “box”) that spans from the first
quartile (Q1) to the third quartile (Q3) of
the data.
• The height (or width, if the plot is
horizontal) of the box represents
interquartile range (IQR), which is the middle
50% of the data.
• Median Line:
• A line inside the box represents the median
(Q2), which is the middle value of the
dataset. This divides the data into two equal
parts.
• Whiskers:
• The “whiskers” extend from the edges of the
box to the minimum and maximum values
within a specified range.

• Typically, the whiskers extend to the


smallest and largest data points within 1.5
times the IQR from the first and third
quartiles, respectively. Data points beyond
this range are considered
potential outliers.
• Outliers:
• Outliers are data points that lie outside the
range covered by the whiskers. They are
usually plotted as individual dots or asterisks
beyond the whiskers.
• Axes:
• The x-axis usually represents the
categories or groups being compared.
• The y-axis represents the data values.

How to Make Box and Whisker Plot

The following steps are involved in making Box and Whisker Plot:
• Gather Information:
• collect the dataset and sort the data in ascending
order
• Calculate Quartiles:
• calculate minimum, maximum, first
quartile (Q1), third quartile (Q3), and
median (Q2) from the given information.
• Identify any outliers using the 1.5 times IQR rule.
• Draw the Box:
• Draw a rectangle from Q1 to Q3..
• Inside the box, draw a line at the median (Q2).
• Add Whiskers:
• Extend lines (whiskers) from Q1 to the
minimum value within the 1.5 times IQR
range and from Q3 to the maximum
value within this range.
• Identify Outliers:
• Plot any pieces of information outside the
stubbles as individual focuses.

• Label the Axes:


• Label the x-axis with the categories or
groups and the y-axis with the data
values.
Example of Box and Whisker Plot
Suppose we have a dataset representing the test scores of
a group of students: Data (test scores): 78, 85, 90, 92, 95,
96, 97, 98, 99, 100, 105, 110, 120.

Solution:
Step 1: Collect Data

Dataset: 78, 85, 90, 92, 95, 96, 97, 98, 99, 100, 105, 110, 120

Step 2: Calculate Quartiles


To create a Box and Whisker Plot, we need to calculate the
quartiles (Q1 and Q3) and the median (Q2).

-Q1 (the first quartile) is the median of the lower half of the
data (78, 85, 90, 92, 95, 96) = 91
-Q2 (the median) is the median of the entire dataset = 97
-Q3 (the third quartile) is the median of the upper half of
the data: (98, 99, 100, 105, 110, 120) = 102.5

Step 3: Determine Whiskers


To find the whiskers, calculate the minimum and maximum
values within the dataset, excluding potential outliers.
Minimum = 78, Maximum = 120

The required five-number summary is 78, 91, 97, 102.5, 120.


Step 4: Plot the Box and Whiskers
Now, we can create the Box and Whisker Plot:

-Draw a box from Q1 (91) to Q3 (102.5).

-Draw a line inside the box at Q2 (97).

-Extend the left whisker from the minimum (78) to Q1 (91).


-Extend the right whisker from Q3 (102.5) to the maximum (120).

Step 5: Identify Outliers

Any data points that fall outside the whiskers are considered
outliers. In this case, we do not have any outliers. This Box
and Whisker Plot gives a visual rundown of the grades, showing
the middle (Q2) at 97, the interquartile range (IQR) from Q1 to
Q3 (91 to 102.5), and the shortfall of exceptions. It
successfully outlines the focal
propensity, spread, and dissemination of the dataset.
Waterfall Chart
A Waterfall Chart is a data visualization tool used to show
how an initial value is affected by a series of positive or
negative values, leading to a final value.
It is particularly useful in illustrating how an initial value is
affected by a series of intermediate values, leading to a final
result.

It's useful for:

• Showing cumulative effects : Illustrate how each value contributes


to the total.
• Identifying key contributors : Highlight the most
significant positive or negative values.
• Visualizing data flow : Display how the initial value is
transformed through various stages.

Key components:
• Initial value : Starting point of the waterfall.
• Positive values : Increases to the initial value.

• Negative values : Decreases to the initial value.


• Final value : Resulting value after all positive and negative
values are applied.
Types of Waterfall Charts:

• Simple Waterfall : Displays a single series of values.


• Stacked Waterfall : Displays multiple series of values
stacked on top of each other.

• Grouped Waterfall : Displays multiple series of values grouped by


categories.
How to Create a Waterfall Chart:
1.Identify Data Points:

Determine the initial value, all intermediate values (both


positive and negative), and the final value. The intermediate
values are the key contributors that show the changes.

• Plot the Initial Value:


Start by plotting the initial value as a bar on the left side of the
chart.
• Add Intermediate Bars:

Add bars for each intermediate value. Each bar will either
add to (positive value) or subtract from (negative value) the
previous total.

• Plot the Final Value:


Finally, plot the ending value as the last bar, showing the
cumulative effect of all the intermediate values.

• Color Code the Bars:

Use colors to differentiate between positive and negative


changes. Typically, green is used for positive contributions,
red for negative, and a neutral color for the initial and final
totals.

Area charts
An area chart is similar to a line chart, except the region below
the lines in an area chart is filled with color or shading, making
it simple to view the overall value across multiple data series.

Area charts are frequently used to display trends and


compare the contributions of various groups to the overall
picture.
Area charts are effective for showing changes in composition
over time and comparing the contributions of different
categories to the total.

Key components:
• _X-axis_:Typically represents time or another
continuous variable. This axis isused to plot the data points
in sequential order.

• _Y-axis_: Represents the quantitative values. The height


of the area at each point along the x-axis corresponds to the
value of the data point.
• _Area_:
• The area between the plotted line and the x-axis is
filled with color, creating a visual emphasis on the
magnitude of the values.
• In a stacked area chart, multiple data series are
represented, with each area stacked on top of the
others. The color fill helps distinguish between
different categories or series.

Types of Area Charts:

• _Simple Area Chart_: Displays a single data series with


the area under the line filled with color.
• _Stacked Area Chart_: Displays multiple data series of
values stacked on top ofeach other.

• _100% Stacked Area Chart_: Displays multiple series of


values as a percentage of the total.

Best practices:

• _Use clear labels_: Label axes, title, and legend.

• _Choose appropriate colors_: Use colors to differentiate between


series.
• _Avoid clutter_: Limit the number of series and use
filtering or aggregation if necessary.
Stacked Bar Charts

Stacked bar charts are a type of data visualization technique used to


compare the composition of different categories across multiple
groups.
Each bar in a standard bar chart is divided into a number of
sub-bars stacked end to end, each one corresponding to a level
of the second categorical variable.
They are particularly effective in showing how individual
components contribute to the whole within each category,
allowing for comparisons both within and between categories.

They are useful for:

• Comparing categories : Show how different categories contribute to


a total value.
• Identifying trends : Illustrate how categories
change over time or across dimensions.

• Displaying hierarchical data : Show how lower-level


categories contribute to higher-level categories.

Key components:

• _X-axis_: Dimensions (e.g., time, categories).

• _Y-axis_: Total value.

• _Bars_: Stacked bars represent the contribution of each category.


• _Colors_: Different colors represent different categories.

Types of Stacked Bar Charts:

• _Simple Stacked Bar Chart_: Displays a single set of categories.


• _Multi-series Stacked Bar Chart_: Displays multiple sets of
categories.

• _100% Stacked Bar Chart_: Displays categories as a percentage of


the total.

Sub Plots
Subplots in dimensional data visualization are used to display
multiple plots in a single visual, allowing for:

• Comparing different variables : Show relationships between


multiple variables.
• Analyzing different dimensions : Examine trends and
patterns across different dimensions.

• Highlighting insights : Emphasize key findings by


displaying related plotstogether.

Types of Subplots:
• _Small Multiples_: Multiple small plots displaying the
same variable across different dimensions.

• _Panel Charts_: Multiple plots displaying different


variables or dimensions in separate panels.
• _Faceted Charts_: Multiple plots displaying the same
variable across different dimensions, with each plot
representing a subset of the data.

Strengths of Subplots:

Comparative Analysis: Subplots make it easy to compare


different datasets or variables within a single figure.

Cohesive Presentation: They allow for a cohesive presentation of


multiple
visualizations, which can be more informative than presenting each
plot separately.
Space Efficiency: Subplots save space by combining multiple
plots into one figure, making it ideal for reports or
presentations.

Limitations of Subplots:

Complexity: Too many subplots in a single figure can make


the visualization complex and difficult to interpret.
Small Plot Size: If too many subplots are included, each plot
might become too small, reducing the clarity and readability
of the data.

Overwhelming for the Audience: If not carefully designed, subplots


can
overwhelm the audience, especially if there are too many
variables or datasets to compare.
Matplotlib
Matplotlib is a versatile and widely-used data visualization
library in Python. It allows users to create static, interactive,
and animated visualizations.
The library is built on top of NumPy, making it efficient for
handling large datasets. This library is built on the top of
NumPy arrays and consist of several plots like line chart, bar
chart, histogram, etc. It provides a lot of flexibility but at the
cost of writing more code.
Pyplot in Matplotlib
• Pyplot is a Matplotlib module that provides a MATLAB-like
interface. Matplotlib is designed to be as usable as MATLAB,
with the ability to use Python and the
advantage of being free and open-source.

• Each pyplot function makes some changes to a figure:


e.g., creates a figure, creates a plotting area in a figure,
plots some lines in a plotting area, decorates the
plot with labels, etc. The various plots we can utilize using
Pyplot are Line Plot, Histogram, Scatter, 3D Plot, Image etc

Basic Usage of Matplotlib:


To use Matplotlib, you typically start by importing the
library, often alongside NumPy for numerical operations:.
code

import

matplotlib.pypl
ot as plt import
numpy as np

Example Plots:
• Line Plot:
A line plot is used to display data points connected by
straight lines. It’s commonly used to visualize trends over
time.

Import matplotlib.pyplot as plt

initia

lizing
the

data
X=

[10,
20,

30,
40]
Y = [20, 25, 35, 55]

#
plotti
ng

the

data
Plt.pl

ot(x,

y)

# Adding title to the plot

Plt.title(“Line Chart”)

# Adding

label on

the y-axis
Plt.ylabel(‘

Y-Axis’)

# Adding
label on

the x-axis

Plt.xlabel(‘
X-Axis’)

Plt.show()

Output :

• Bar Chart:
Bar charts are used to represent categorical data with
rectangular bars. Each bar’s length or height corresponds
to the value it represents.
Code :

Import

matplotlib.pypl
ot as plt Import
pandas as pd

# Reading the
tips.csv file

Data =
pd.read_csv(‘

tips.csv’)

#
initia

lizing
the

data

X=
data[
‘day’

]
Y = data[‘total_bill’]

plotti
ng

the

data
Plt.b

ar(x,
y)

# Adding

title to

the plot

Plt.title

(“Tips
Dataset

”)

# Adding

label on the
y-axis
Plt.ylabel(‘

Total Bill’)

# Adding

label on the

x-axis

Plt.xlabel(‘
Day’)

Plt.show()

Output :
Seaborn Styles
Seaborn is a well-known Python library for data visualization
that offers a user- friendly interface for producing visually
appealing and informative statistical
graphics. It is designed to work with Pandas dataframes,
making it easy to visualize and explore data quickly and
effectively.

Seaborn offers a variety of powerful tools for visualizing data,


including scatter
plots, line plots, bar plots, heat maps, and many more. It also
provides support for advanced statistical analysis, such as
regression analysis, distribution plots, and categorical plots.

Seaborn has five built-in themes to style its plots: darkgrid,


whitegrid, dark, white, and ticks. Seaborn defaults to using
the darkgrid theme for its plots, but you can change this
styling to better suit your presentation needs.

Here are some of the styles available in Seaborn:


• Darkgrid: A dark background with white gridlines.
The darkgrid style is characterized by a dark
background with grid lines. It is suitable for plots that
require high contrast, making it easy to focus on the
data. This style is achieved by setting a dark gray
background color and light gray grid lines.
• Whitegrid : A white background with gray gridlines.
The whitegrid style is similar to the darkgrid style but with a
white
background. It combines a clean appearance with grid
lines, allowing for clear visual separation between data
points.
• Dark: A dark background with no gridlines.
The darkgrid style is characterized by a dark background
with grid lines. It is suitable for plots that require high
contrast, making it easy to focus on the data. This style is
achieved by setting a dark gray background
color and light gray grid lines.
• White : A white background with no gridlines.
The white style features a white background with no grid lines. It
creates a
simple and clean look, suitable for plots where the focus
is on the data itself. This style is achieved by setting a
white background color and removing the grid lines.
• Ticks : A minimalistic style with only axis ticks.
The ticks style removes the top and right spines of the plot and
only retains
the ticks on the remaining spines. This style simplifies
the appearance of the plot by reducing clutter while
still providing necessary axis information.

To use any of the preset themes pass the name of it to


sns.set_style().

Import Seaborn as sns


Import

matplotlib.pypl

ot as plt

Sns.set_style(“d

arkgrid”)
Tips = sns.stripplot(‘tips’)

Sns.stripplot(x=”day”, y=”total_bill”, data=tips)

Background Color

When thinking about the look of your visualization, one thing to


consider is the
background color of your plot. The higher the contrast
between the color palette of your plot and your figure
background, the more legible your data visualization will
be.

The dark background themes provide a nice change from the


Matplotlib styling norms, but doesn’t have as much contrast:

sns.set_style(“dark”)

sns.stripplot(x=”day”, y=”total_bill”, data=tips)


The white and tick themes will allow the colors of your
dataset to show more visibly and provides higher contrast so
your plots are more legible:

Sns.set_style(“ticks”)
Sns.stripplot(x=”day”, y=”total_bill”, data=tips)

Grids
It’s a good choice to use a grid when you want your audience to
be able to draw their own conclusions about data. A grid allows
the audience to read your chart and get
specific information about certain values. Research papers
and reports are a goodexample of when you would want to
include a grid.
Import Seaborn as sns

Import
matplotlib.pypl

ot as plt

Sns.set_style(“w
hitegrid”)
Tips = sns.stripplot(‘tips’)

Sns.stripplot(x=”day”, y=”total_bill”, data=tips)

Despine

In addition to changing the color background, you can also define the
usage of
spines. Spines are the borders of the figure that contain the
visualization. By default, an image has four spines.
You may want to remove some or all of the spines for various
reasons. A figure with the left and bottom spines resembles
traditional graphs. You can automatically take away the top
and right spines using the sns.despine()function. Note: this
function must be called after you have called your plot.

Import Seaborn as sns

Import
matplotlib.pypl

ot as plt

Sns.set_style(“w
hite”)
Tips = sns.stripplot(‘tips')

Sns.stripplot(x=”day”, y=”total_bill”, data=tips)

Sns.despine()
Not including any spines at all may be an aesthetic decision.
You can also specify how many spines you want to include by
calling despine() and passing in the spines you want to get rid
of, such as: left, bottom, top, right.

Import Seaborn
as sns

Sns.set_style(“w
hitegrid”)
Sns.stripplot(x=”day”, y=”total_bill”,

data=tips) Sns.despine(left=True,
bottom=True)

Box plot
Box Plot is a graphical method to visualize data distribution
for gaining insights and making informed decisions. Box plot
is a type of chart that depicts a group of numerical data
through their quartiles.

Box plot is also known as a whisker plot, box-and-whisker


plot, or simply a box-and whisker diagram. Box plot is a
graphical representation of the distribution of a
dataset. It displays key summary statistics such as the median,
quartiles, and
potential outliers in a concise and visual manner. By using Box
plot you can provide a summary of the distribution, identify
potential and compare different datasets in a compact and
visual manner.

Elements of Box Plot

A box plot gives a five-number summary of a set of data which is-

Minimum – It is the minimum value in the dataset excluding the


outliers.

First Quartile (Q1) – 25% of the data lies below the First (lower)
Quartile.
Median (Q2) – It is the mid-point of the dataset. Half of the
values lie below it and half above.

Third Quartile (Q3) – 75% of the data lies below the Third (Upper)
Quartile.
Maximum – It is the maximum value in the dataset excluding the
outliers.

The area inside the box (50% of the data) is known as the
Inter Quartile Range. The IQR is calculated as –
IQR = Q3-Q1

Outlies are the data points below and above the lower and
upper limit. The lower and upper limit is calculated as –

Lower Limit = Q1

– 1.5*IQR Upper
Limit = Q3 +

1.5*IQR

The values below and above these limits are considered


outliers and the minimum and maximum values are
calculated from the points which lie under the lower and
upper limit.
Density Plot
• A density plot is a data visualization technique used to
show the distribution of acontinuous variable. It is a
smoothed, continuous version of a histogram where
there is a smooth curve instead of bars. and helps to estimate
the probability density function of the variable.

• It displays the underlying distribution of the data by


plotting the density of the data points. The x-axis
represents the variable, and the y-axis represents the
density.

• It uses a kernel density estimate to show the probability


density function of the variable.
• In this method Kernel (continuous curve) is drawn at every
individual data point and then all these curves are added
together to make a single smoothened density estimation.
Histogram fails when we want to compare the data
distribution of a
single variable over the multiple categories at that time
Density Plot is useful for visualizing the data.

Density estimates of the butterfat percentage in the milk of four


cattle breeds.

Key Features of a Density Plot:

Smooth Curve:
A density plot represents the distribution as a smooth curve,
which is generated using kernel density estimation (KDE). The
curve is continuous and shows the
probability density of the data across the entire range of the
variable.

Probability Density Function (PDF):


The y-axis of a density plot represents the probability density,
which indicates the relative likelihood of different values in
the dataset. Unlike histograms, the y-axis
does not represent raw counts but instead shows how dense
the data is at different values.

KDE (Kernel Density Estimation):


The smoothing of the curve is controlled by KDE, which
estimates the density by placing small curves (kernels) at
each data point and summing them to create the smooth
distribution.

Comparison of Distributions:

Density plots are effective for comparing the distributions of


two or more datasets, making it easier to see differences or
similarities in their shape, spread, and central tendency.
Less Dependence on Bins:
Unlike histograms, which require choosing specific bin sizes,
density plots do not have bins. The smoothness of the plot
depends on the bandwidth parameter used in KDE, which
determines the amount of smoothing applied.

Tree map
A tree map is a type of data visualization that displays
hierarchical data using nested rectangles. Each branch of the
hierarchy is represented as a rectangle,
which is then subdivided into smaller rectangles that
represent sub-branches. The size and color of the rectangles
can be used to represent different variables.

Key Features of a Tree Map:


• Hierarchical Representation: Displays relationships between data
points
Tree maps are used to visualize data that is organized hierarchically.
Each rectangle represents a node in the hierarchy, and the structure of
the data is shown through
the nesting of rectangles.
• Rectangular representation: Size and color represent value and
category

The size of each rectangle is proportional to a specific data


metric, such as revenue, population, or quantity. This makes
it easy to compare the relative sizes of different elements
within the hierarchy.

• Color Coding: Represents the category or group of the data point

Colors can be used to represent an additional variable, such as


performance,
growth, or categories. Different shades of colors help to
convey more information at a glance.
• Space Efficiency:
Tree maps make efficient use of space by filling the
entire plotting area with rectangles. This is
particularly useful when visualizing large datasets, as
the available space is maximized without any wasted
areas.

• Summarizing Complex Data:


Tree maps are effective for summarizing complex datasets with
multiple levels of categories or subcategories. They provide a quick
overview of how various elements contribute to the whole.
Interpretation:
• Size: Represents the magnitude or value of the data point

• Color: Represents the category or group of the data point


• Hierarchy: Shows the relationships and structure of the data
• Proportions: Reveals the relative importance of each data point

Advantages:
• Visualizes complex hierarchical data
• Displays multiple variables and relationships

• Facilitates exploration and drill-down capabilities


• Space-efficient and scalable

Common uses:
• Business intelligence

• Financial analysis
• Marketing and sales

• Social network analysis

Best practices:
• Use clear and consistent colors
• Label rectangles and provide tooltips

• Use size and color effectively to represent data


• Keep the hierarchy simple and intuitive

Graph Networks.
Graph networks are a powerful tool in data visualization,
used to represent relationships between huge amount of
entities as nodes and edges.

Graph Networks are also known as Network


Visualisation,Network Graphs, Network Mapping

A network visualisation displays undirected and directed graph


structures.

Key Components in Graph Networks:


• Nodes: Represent the entities in the data (people, devices,
locations, etc.).
• Edges: The lines that connect nodes, representing
relationships, interactions, or dependencies.

• Directed vs. Undirected Graphs:

Directed Graphs: The edges have a direction, showing the


flow or influence from one node to another (e.g., followers
on Twitter).Arrows represent directed
relationships.

Undirected Graphs: The edges don’t have a direction,


representing mutual or bidirectional relationships (e.g.,
Facebook friendships).No arrows, representing bidirectional
relationship’s.
• weighted Graph and

unweighted graphs Weighted

Graphs: Edges have weights or

values. Unweighted Graphs:

Edges have no weights.

• Weighted Edges: Some graphs use weights on edges to


represent the strength of a relationship, such as the
frequency of interactions or the strength of a
connection.

• Node and Edge Attributes:


Nodes can have various attributes like color, size, or shape to
represent different characteristics.
Edges can also have thickness or color gradients to represent
different strengths or types of relationships.

Entities are displayed as round nodes and lines show the


relationships between them.

Common Uses of Graph Networks in Data

Visualization: Social Network Analysis:

Used to visualize relationships between individuals or


organizations (e.g., Facebook friends, LinkedIn connections).
Knowledge Graphs

Purpose: To connect concepts, entities, or ideas in a


structured, interconnected manner.
Example: Google’s Knowledge Graph connects search queries to real-
world entities

Biological Networks
Purpose: To visualize and analyze relationships

between biological entities. Example: Protein-protein

interaction networks, gene regulatory networks.

Supply Chain Networks

Purpose: To map and monitor the flow of goods, services, or


information between different entities.
Example: Visualizing how raw materials move from suppliers
to manufacturers to retailers.

Fraud Detection

Purpose: To identify suspicious behavior in financial or online


transactions.

Example: Detecting fraudulent credit card transactions or


fake accounts in social media.

You might also like