Lab Manual for Students
Lab Manual for Students
Aim: Data Visualization is the presentation of data in graphical format. It helps people
understand the significance of data by summarizing and presenting huge amount of data in a
simple and easy-to-understand format and helps communicate information clearly and
effectively.
Bar plot
A bar plot or bar chart is a graph that represents the category of data with rectangular bars with
lengths and heights that is proportional to the values which they represent. The bar plots can be
plotted horizontally or vertically. A bar chart describes the comparisons between the discrete
categories. One of the axis of the plot represents the specific categories being compared, while
the other axis represents the measured values corresponding to those categories.
Program1:
import pandas as pd
import matplotlib.pyplot as plt
Scatter Plot
The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis.
Program2:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("C:\\Users\\GOPAL\\Desktop\\Book1.csv")
# scatter plot between income and age
plt.scatter(df['Income'], df['Age'])
plt.xlabel("Income")
plt.ylabel("Age")
plt.show()
# scatter plot between income and sales
plt.scatter(df['Income'], df['Sales'])
plt.xlabel("Income")
plt.ylabel("Sales")
plt.show()
# scatter plot between sales and age
plt.scatter(df['Sales'], df['Age'])
plt.xlabel("Sales")
plt.ylabel("Age")
plt.show()
Program3:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("C:\\Users\\GOPAL\\Desktop\\Book1.csv")
# create histogram for numeric data
df.hist()
# show plot
plt.show()
Box plot
A Box plot is a way to visualize the distribution of the data by using a box and some vertical
lines. It is known as the whisker plot. The data can be distributed between five key ranges, which
are as follows:
1. Minimum: Q1-1.5*IQR
2. 1st quartile (Q1): 25th percentile
3. Median:50th percentile
4. 3rd quartile(Q3):75th percentile
5. Maximum: Q3+1.5*IQR
Here IQR represents the InterQuartile Range which starts from the first quartile (Q1) and ends
at the third quartile (Q3).
In the box plot, those points which are out of range are called outliers. We can create the box plot
of the data to determine the following:
Program4:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("C:\\Users\\GOPAL\\Desktop\\Book1.csv")
# For each numeric attribute of dataframe
#df.plot.box()
# individual attribute box plot
plt.boxplot(df['Income'])
plt.show()
Seaborn Library:
Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and
integrates closely with pandas data structures. Seaborn helps you explore and understand your
data
Program5:
Plotly library
Plotly Python is a library which is used to design graphs, especially interactive graphs. It can
plot various graphs and charts like histogram, barplot, boxplot, spreadplot and many more. It is
mainly used in data analysis as well as financial analysis. Plotly python is an interactive
visualization library.
import plotly.express as px
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('C:\\Users\\GOPAL\\Desktop\\Book1.csv')
fig = px.scatter(df, x="Income", y="Age")
fig.update_traces(marker=dict(color='red'))
fig.show()
Aim: Getting started with Tableau Software using Data file formats, connecting your Data to
Tableau, creating basic charts(line, bar charts, Tree maps),Using the Show me panel.
Tableau is a data visualization tool that provides pictorial and graphical representations of data.
It is used for data analytics and business intelligence. Tableau provides limitless data exploration
without interrupting flow of analysis. With an intuitive drag and drop interface, user can uncover
hidden insights in data and make smarter decisions faster.
Before connecting your data to Tableau, ensure that your data is in a suitable format. Common
data file formats that Tableau supports include Excel (.xlsx), CSV (.csv), and text files (.txt).
Make sure your data is organized with headers for each column.
Steps to follow –
Run Tableau. On the left side of the start screen, there is a Connect Panel.
Go to the File Section and choose the type of data file you want. Let’s choose any excel
file.
Now, Tableau will create connection to the data file and as you can see we have our data at the
bottom.
File Menu: For any Windows program the file menu contains New, Open, Close, Save,
Save As, and Print, functions. The most frequently used feature found in this menu is the
Print to pdf option. This allows us to export our dashboard or worksheet in pdf form. If
you don't remember where Tableau places files, or you want to change the default file-
save location, use the repository location option for review the file and change it. We can
create a packaged workbook from the export packaged workbook option in a fast manner.
Data Menu: You can use a data menu if you find some interesting tabular data on a
website that you want to analyze with Tableau. Highlight and copy the data from the site,
then use the Paste Data option to input it into Tableau. Once pasted, then Tableau will
copy the data from the Windows clipboard and add a data source in the data window. The
Edit Relationships menu option is used in data blending. This menu option is needed if
the field names are not identical in two different data sources. It allows you to define the
related fields correctly.
Worksheet Menu: The Export option allows you to export the worksheet as an Excel
crosstab, an image, or in Access database file format. The Duplicate as Crosstab option
creates a crosstab version of the worksheet and places it in a new worksheet.
Dashboard Menu: The Action Menu is a useful feature that is reachable from both the
Worksheet Menu and the Dashboard Menu.
Analysis Menu: In this menu, you can access the stack marks and aggregate measures
options. These switches allow you to adjust default Tableau behaviors that are useful if
you required to build non-standard chart types. The Create Edit Calculated Field and
Calculated Field options are used to make measures and new dimensions that don't exist
in your data source.
Map Menu: The Map Menu bar is used to alter the base map color schemes. The other
menu bar are related in the way of replacing Tableau's standard maps with other map
sources. You can also import the geocoding for the custom locations using the geocoding
menu.
Format Menu: This menu is not used very commonly because pointing at anything, and
right-clicking gets you to a context-specific formatting menu more quickly. You may
need to alter the cell size in a worksheet rarely. If you don't like the default workbook
theme, use the Workbook Theme menu to select one of the other two options.
Toolbar Icon: Toolbar icon below the menu bar can be used to edit the workbook using
different features like redo, undo, new data source, save, slideshow, and so on.
Dimension Shelf: The dimension presents in the data source for example- customer (customer
name, segment), order (order date, order id, ship date, and ship mode), and location (country,
state, and city) these all type of data source can be viewed in the dimension shelf.
Measure Shelf: The measures present in the data source, for example- Discount, Profit, Profit
ratio, Quantity, and Sales- These all types of data source can be viewed in the measure shelf.
Page Shelf: Page shelf is used to view the visualization in video format by keeping the related
filter on the page shelf.
Filter Shelf: Filter Shelf is used to filter the graphical view by the help of the measures and
dimensions.
Marks Card: Marks card is used to design the visualization. The data components of the
visualization like size, color, path, shape, label, and tooltip are used in the visualizations. It can
be modified in the marks card.
Worksheet: The worksheet is the space where the actual visualization, design, and
functionalities are viewed in the workbook.
Tableau Repository: Tableau repository is used to store all the files related to the Tableau
desktop. It includes various folders like Connectors, Bookmarks, Data sources, Logs, Extensions,
Map sources, Shapes, Services, Tab Online Sync Client, and Workbooks. My Tableau repository
is located in the file path C:\Users\User\Documents\My Tableau Repository.
Aim: Getting started with Tableau Software creating basic charts (line, bar charts, Tree maps),
using the Show me panel.
Once your data is connected, the Data Source Pane will appear on the left-hand side of the
Tableau interface. Here, you can see a preview of your data and perform data transformations or
join multiple data sources if necessary.
a. Line Chart:
1. From the "Data Source pane", drag and drop the date field to the Columns shelf and a numeric
field (e.g., sales, revenue) to the Rows shelf.
1. Drag and drop a categorical field (e.g., product category, region) to the Columns shelf and a
numeric field to the Rows shelf.
2. Then Tableau will create a bar chart. You can adjust the orientation and formatting as needed.
To display Labels on the bars click on Lables and select "Show mark lables".
3. Tableau will create tree map visualization. You can further customize it by adjusting colors
and labels.
The Show Me panel in Tableau helps you explore various chart types based on your data and the
fields you select. Here's how to use it:
1. After adding fields to the Rows and Columns shelves, click on the "Show Me" panel
located on the left side of the Tableau interface.
2. In the Show Me panel, you'll see a variety of chart options that Tableau recommends
based on your data. Click on a chart type to create it.
3. Tableau will automatically generate the selected chart type with your data. You can
further customize it as needed.
4. To go back to the regular worksheet view, click the "Clear" button in the Show Me panel.
Aim: Tableau Calculations, Overview of SUM, AVR, and Aggregate features, Creating custom
calculations and fields.
Aggregate Function: Aggregate function can be a function where the values of multiple lines
are grouped together to form a single summative value. Typical integration functions include:
Measurement (e.g., arithmetic means), Count, etc.
Syntax:
SUM(Expression)
Syntax:
AVG(Expression)
Syntax:
MIN(Expression1, Expression2)
Syntax:
MAX(Expression 1, Expression 2)
Returns the statistical variance of all values in the given expression based on a sample of the
population.
Syntax:
VAR(Expression)
Dashboards
A dashboard is a way of displaying various types of visual data in one place. Usually, a
dashboard is intended to convey different, but related information in an easy-to-digest form. And
oftentimes, this includes things like key performance indicators (KPI)s or other important
business metrics that stakeholders need to see and understand at a glance.
Dashboards are useful across different industries and verticals because they’re highly
customizable. They can include data of all sorts with varying date ranges to help you understand:
what happened, why it happened, what may happen, and what action should be taken.
Example: A car dashboard provides real-time information about a car's speed, fuel volume,
RPM, and other engine-related indicators. Similarly, a data dashboard provides information
about company historical sales, key performance indicators (KPIs), sales growth, operational
indicators, and customer feedback. This information is presented in a precise manner so that
managers or executives can understand the situation and make appropriate decisions.
There are several ways to customize the dashboard, and they all fall into one of three categories -
Worksheet1
The first worksheet will include the insights related to the sales and profit (KPI’s (Key
Performance Indicators) ).
In second worksheet lets create another visualization that provides insights related to the sales by
state in USA.
In third worksheet let’s create another visualization that provides insights related to the sales by
month
.Worksheet4
In fourth worksheet let’s create another visualization that provides insights related to the sales by
product.
Dashboard pane
In the window where we can create our dashboard, we get a lot of tabs and options related to
dashboarding. On the left, we have a Dashboard pane which shows the dashboard size, list of
available sheets in a workbook, objects, etc.
Layout pane
Right next to the Dashboard pane is the Layout pane where we can enhance the appearance and
layout of the dashboard by setting the position, size, border, background, and paddings.
We can add as many sheets as we require and arrange them on the dashboard properly.
Final dashboard
Now, we move towards making a final dashboard in Tableau with all its elements in place.
This opens our dashboard in the presentation mode. So far we were working in the Edit Mode. In
the presentation mode, it neatly shows all the visuals and objects that we have added on the
dashboard. We can see how the dashboard will look when we finally present it to others or share
it with other people for analysis.
Interactive plots in Python are a great way to enhance data visualization by allowing users to
explore data dynamically. These plots can be zoomed, panned, and modified in real time,
offering a richer experience compared to static visualizations.
Plotly
The plotly Python library is an interactive, open-source plotting library that supports over 40
unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-
dimensional use-cases.
Program1:
import plotly.express as px
import pandas as pd
# Sample dataset
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5],
'y': [10, 11, 12, 13, 14],
'category': ['A', 'B', 'C', 'D', 'E'] })
# Create an interactive scatter plot
fig = px.scatter(df, x='x', y='y', color='category', title="Interactive Scatter Plot")
fig.show()
Program2:
import plotly.express as px
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'time': [1, 2, 3, 4, 5],
'value': [10, 20, 30, 40, 50] })
Program3:
import plotly.express as px
# using the iris dataset
df = px.data.iris()
# plotting the bar chart
fig = px.bar(df, x="sepal_width", y="sepal_length")
# showing the plot
fig.show()
Program4:
import plotly.express as px
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5],
'y': [10, 11, 12, 13, 14],
'size': [100, 200, 300, 400, 500],
'category': ['A', 'B', 'C', 'D', 'E']
})
# Bubble Chart
fig = px.scatter(df, x='x', y='y', size='size', color='category', title="Interactive Bubble Chart")
fig.show()
Bokeh is a Python library for creating interactive visualizations for modern web browsers. It
helps you build beautiful graphics, ranging from simple plots to complex dashboards with
streaming datasets. With Bokeh, you can create JavaScript-powered visualizations without
writing any JavaScript yourself.
Program5:
Program6:
from bokeh.plotting import figure, show
# prepare some data
x = [1, 2, 3, 4, 5]
y1 = [6, 7, 2, 4, 5]
y2 = [2, 3, 4, 5, 6]
y3 = [4, 5, 5, 7, 2]
# create a new plot with a title and axis labels
p = figure(title="Multiple line example", x_axis_label="x", y_axis_label="y")
# add multiple renderers
p.line(x, y1, legend_label="Temp.", color="blue", line_width=2)
Program7:
from bokeh.plotting import figure,show
x=[1,2,3,4,5]
y=[4,5,5,7,2]
p=figure(title="bar graph",x_axis_label='x',y_axis_label='y')
p.scatter(x,y,legend_label='objects',color='#aa54ff',size=12)
show(p)
Mpld3
The mpld3 project brings together Matplotlib, the popular Python-based graphing library,
and D3js, the popular JavaScript library for creating interactive data visualizations for the web.
The result is a simple API for exporting your matplotlib graphics to HTML code which can be
used within the browser, within standard web pages, blogs, or tools such as the IPython
notebook.
Program8:
import mpld3
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
np.random.seed(0)
x, y = np.random.normal(size=(2, 200))
color, size = np.random.random((2, 200))
ax.scatter(x, y, c=color, s=500 * size, alpha=0.3)
Program9:
Pygal
Pygal is a Python library and It is an open-source that is used for creating visual and interactive
charts. This library is based on SVG(Scalar Vector Graphics) technology, which ensures that
the charts are scalable without any loss in quality. SVG is a vector-based graphic in the XML
format that can be edited in any editor. Pygal can create graphs with minimal lines of code that
can be easy to understand and write.
Program10:
import pygal
Program11:
import pygal
import numpy as np
# creating line chart object
line_chart = pygal.Line()
# naming the title
line_chart.title = 'Line chart'
# adding lines
line_chart.add('A', np.random.rand(5))
line_chart.add('B', np.random.rand(5))
line_chart.add('C', np.random.rand(5))
line_chart.add('D', np.random.rand(5))
line_chart
Chord diagram
A chord diagram represents flows or connections between several entities (called nodes). Each
entity is represented by a fragment on the outer part of the circular layout. Then, arcs are drawn
between each entities. The size of the arc is proportional to the importance of the flow.
Program1:
Treemaps
Treemaps display hierarchical (tree-structured) data as a set of nested rectangles. Each branch of
the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-
branches. A leaf node's rectangle has an area proportional to a specified dimension of the
data. Often the leaf nodes are colored to show a separate dimension of the data.
Program2:
Program3:
import plotly.express as px
fig = px.treemap(
names = ["Eve","Cain", "Seth", "Enos", "Noam", "Abel", "Awan", "Enoch", "Azura"],
parents = ["", "Eve", "Eve", "Seth", "Seth", "Eve", "Eve", "Awan", "Eve"]
)
fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
Sankey diagrams
A Sankey Diagram is a visualisation technique that allows displaying flows. Several entities
(nodes) are represented by rectangles or text. Their links are represented with arrows or arcs that
have a width proportional to the importance of the flow.
Program4:
"""As from figure we can see there are 6 nodes(label in code above) A1(0), A2(1),
B1(2),B2(3),C1(4) and C2(5). Here we have given numbers to all the nodes.
Now let create source list . A1(0), A2(1), B1(2) and B2(3) are working as source, so source list is
[0(A1),0(A1),1(A2),2(B1),3(B2),3(B2) ]=> [0, 0, 1, 2, 3, 3].
import plotly.graph_objects as go
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = ["A1", "A2", "B1", "B2", "C1", "C2"]
),
link = dict(
source = [0, 0, 1, 2, 3, 3],
target = [2, 3, 3, 4, 4, 5],
value = [8, 2, 4, 8, 4, 2],
))])
fig.update_layout(title_text="Basic Sankey Diagram", font_size=10,width=600, height=400)
fig.show()
Radar charts
A radar chart displays multivariate data stacked at an axis with the same central point. The
chart features three or more quantitative variables for comparison; these variables are known as
radii. The map looks similar to the spider web, which is why it’s also called a spider chart.
Program5:
import plotly.graph_objects as go
import plotly.offline as pyo
genre = ['Action', 'Comedy', 'Drama', 'Horror', 'Mystery', 'Romance']
3D Plot
Program6:
import plotly.graph_objects as go
import numpy as np
np.random.seed(1)
N = 70
fig = go.Figure(data=[go.Mesh3d(x=(70*np.random.randn(N)),
y=(55*np.random.randn(N)),
z=(40*np.random.randn(N)),
color='rgba(244,22,100,0.6)'
)])
fig.update_layout(
scene = dict(
width=700,
fig.show()