Lecture 3 - Data Manipulation
Lecture 3 - Data Manipulation
Data Manipulation
Data Manipulation
• Why do we need to manipulate data?
• Data in the real world is rarely formatted
• Different visualization tools use different data formats
• Reformatting helps us re-purpose data
6
Jupyter Notebooks
Python based data
manipulation suite
Uses a web browser for
managing file and
programming functions
Very easy loading and
manipulation of large data
files
3
Importing Libraries
4
Importing Libraries
matplotlib – Used for 2D plotting in Python
pyplot – Provides a MATLAB-like plotting framework
seaborn – Used to create attractive and informative statistical
graphics
os – Allows to use operating system dependent functionality
%matplotlib inline – Allows to produce inline graphs
5
Data Loading
6
Data Loading
df acts as a pandas object which reads and loads the excel file
sheet_names is an attribute of ExcelFile object
sheet_names displays the name of all the sheets present within
an excel workbook
7
Data Frame Definition
data.frame() - List of variables(vectors) with same number of rows
A data frame is used for storing data tables
Vectors
R Help, r-tutor.com 8
DataFrame
DataFrame is a 2-dimensional labeled data structure with
columns of potentially different types.
9
Viewing Data
10
Exploring Data
11
Nulls in DataFrame
12
Nulls in DataFrame
13
Dropping NA values
14
Plotting Histogram
15
Skewness
16
Imputing NA (null) values
17
Unique Values
18
Mapping Key Values
Greece
India
China
Ireland
19
Mapping Key Values
The Country column contains mis-spelled names
20
Handling Duplicates
21
Handling Duplicates
duplicated() returns a series of duplicate values
22
Handling Duplicates
duplicated().sum() returns total duplicate values
23
Merging Datasets
DataFrame1 DataFrame 2
24
Merging Datasets
merge() allows to join two dataframes
25
Merging Datasets With Parameters
Merging DataFrame1 with DataFrame2 on parameter ID
26
Merging Datasets Without Parameters
Merging DataFrame1 with DataFrame2 without passing any on
parameter
27
Sorting Values
28
Workspace Area
Analysis worksheets are created first
Dashboards are constructed from worksheets
Stories are constructed from worksheets and dashboards
29
Visualization Worksheet
View constructed of visually encoded data elements
Data elements are dragged to cards or shelves to forms rows
and columns
One or multiple data elements can be dragged to create single
or multiple axes
30
Visual Data Encoding
• Automatic adjusts to data selections
• Bar chart provides comparison
• Line chart provides trend
• Area provides trend and comparison
• Shape provides complex comparison
• Maps provide proximity in space
• Pie provides % contribution
• Gantt provides relationship of measures in
time
• Polygon creates data areas
31
Visualization Standard View
33
Ask Data Functionality
Automatic view building from asking questions
Clarifications like “as a bar chart” “by country”
34
Worksheet – Customer Scatter
Customer data from 2011 to 2017 across all US region is used
Sales data is plotted against the corresponding profit.
The scatter plot uses color to represent the profit ratio
36
Worksheet – Customer Overview
For each of the measures, data is plotted against each region
The tooltip gives an overview of all the measure values
37
What is a Dashboard?
“A visual display of the most important information needed to achieve one or more
objectives; consolidated and arranged on a single screen so the information can be
monitored at a glance”
(Stephen Few)
38
Create a Dashboard
Fixed size (default): The dashboard remains
the same size, regardless of the size of the
window used to display it
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_organize_floatingandtiled.htm 39
Critical Aspects of the Dashboard
• Essential Strategic Metrics
• Monitored at a Glance
40
Hospitality Design Metaphor
Cold
Warm
Hot
Opportunity
42
Dashboard Layouts
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_organize_floatingandtiled.htm 43
Dashboard Layouts
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_organize_floatingandtiled.htm 44
Dashboard – Customer Analysis
It combines all the previous 3 worksheets to give an analytical
view for comparison using tiled layout
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_create.htm 49
Add Interactivity to Dashboard
Enable highlighting –
A highlighter allows to highlight parts of a view based on what
one enters or selects
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_best_practices.htm 50
Dashboard Actions
Use a single view to filter other views in a dashboard
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/actions_dashboards.htm 51
Dashboard Actions
Use multiple views to filter other views in a dashboard
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/actions_dashboards.htm 52
Dashboard Actions
Navigate from one view to another view, dashboard, or story
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/actions_dashboards.htm 53
Dashboard Actions
Interactively display a web page in a dashboard
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/actions_dashboards.htm 54
Modify Layout for Mobile Devices
“Use automatic layout” option allows to automatically synchronize any
changes to the Default dashboard
“Edit layout myself ” makes the Phone layout fully independent, so
manually add and arrange items to reflect changes to the Default
dashboard
Source: https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_dsd_create.htm 55
Summary
• Define your titles and axis for what you want to say
• Reformat data files for the visualization tools you wish to use
• Clean and format data to a solid executable file
• Create knowledge of the data domain
• Define the user experience
• Visually encode the data