0% found this document useful (0 votes)
13 views83 pages

ADV Module 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views83 pages

ADV Module 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 83

Advanced Data

Visualization
21ADS73

MODULE 1
Data Modelling in Tableau
Contents
⚫ Data Modelling
⚫ Data Sources and Connections in Tableau
⚫ Data Preparation and Cleaning
⚫ Layers of the data model
⚫ Understanding the data model
⚫ Building Relationships in Tableau
⚫ Joining Tables in Tableau
⚫ Blending Data from Multiple Sources
⚫ Build charts and analyze data

2 01/27/25
Data Modelling

⚫ A data model in Tableau is a structured representation of how data is


organized, connected, and related within a dataset. It defines how tables,
fields, and their relationships interact, forming the foundation for creating
insightful visualizations.
⚫ Tableau's data model accommodates various data sources. By defining
relationships between tables, users can seamlessly join, blend, and
transform data to create meaningful insights. The data model supports
hierarchical, categorical, and numerical data, enabling users to create
dynamic dashboards and visualizations that reveal patterns and trends.
⚫ A well-designed data model ensures accuracy, consistency, and optimal
performance in data analysis, helping users derive actionable insights from
complex datasets
⚫Data model is a diagram that tells Tableau how it should query data in the connected
database tables.
⚫The tables that you add to the canvas in the Data Source page create the structure of
the data model.
⚫The data model has two layers:
⚫The default view that you first see in the Data Source page canvas is the logical
layer of the data source. You combine data in the logical layer using relationships
(Noodles).
⚫The next layer is the physical layer. You combine data between tables at the physical
layer using joins.

4 01/27/25
Data Sources and Connections in Tableau

Data Sources and Connections in Tableau


⚫Data sources and connections are fundamental components of Tableau that
enable users to access and analyze diverse datasets, fueling insightful
visualizations and reports. Here's a comprehensive overview of data sources
and connections in Tableau:
Data Sources:
⚫Data sources are the origin points of information that Tableau taps into for
analysis. These sources can range from databases (SQL, NoSQL),
spreadsheets (Excel, Google Sheets), cloud platforms (Amazon Redshift,
Google BigQuery), web services, and more. Tableau's compatibility with a wide
array of data sources ensures flexibility in handling different types of
information.
Data Connections:
⚫Data connections, on the other hand, are the bridges that link Tableau to
these data sources. They establish a secure and dynamic link, allowing Tableau
to retrieve, process, and visualize data in real-time. Tableau provides several
ways to establish connections:
Live Connection:
⚫Tableau connects directly to the data source, enabling real-time analysis. It's
suitable when data freshness is crucial, but it may require robust server and
network performance.
Extract Connection:
⚫Tableau creates a static snapshot (extract) of the data, which is optimized for
faster performance. Extracts can be scheduled for refresh, providing a balance
between performance and real-time data.
Key Aspects:

Schema Mapping:
⚫Tableau automatically detects and maps data source schema,
simplifying the process of creating visualizations.
Data Preparation:
⚫Users can cleanse, reshape, and transform data within Tableau using
calculated fields, calculated tables, and other features.
Data Blending:
⚫Tableau can blend data from multiple sources, allowing for holistic
analysis and correlation.
Join and Relationship:
⚫Tables from the same or different data sources can be joined or related
based on common fields to create cohesive datasets.
Custom SQL:
⚫Advanced users can use Custom SQL to create tailored queries to retrieve
specific data.
Performance Optimization:
⚫Tableau's "Data Engine" accelerates query execution, enhancing
performance even with large datasets.
Data Preparation and Cleaning
Data Preparation and Cleaning
Transforming Data for Analysis
⚫Transforming data for analysis in Tableau is a pivotal step that ensures your
data is clean, structured, and ready to reveal meaningful insights. Here's a
comprehensive guide on how to transform data effectively within Tableau:
Connect to Data Source:
⚫Begin by connecting to your data source in Tableau Desktop, whether it's a
database, spreadsheet, or cloud-based repository.
Data Profiling:
⚫Use the data profiling feature to gain an initial understanding of your data's
characteristics, such as data types, null values, and unique values.
Data Cleansing:
⚫Address inconsistencies, errors, and missing values in your dataset. Utilize
tools like calculated fields and data cleaning functions to clean and standardize
your data.
Data Reshaping:
⚫Pivot and unpivot data as needed to reshape it into a format suitable for
analysis. This is particularly useful for time-series data and comparison analysis.
Data Aggregation:
⚫Aggregate data to higher levels (sum, average, count) for summary analysis.
Tableau's aggregation functions facilitate this process.
Creating Calculated Fields:
⚫Craft calculated fields to derive new insights from existing data. You can
perform calculations, apply logic, and generate new metrics based on your
requirements.
Creating Parameters:
⚫Parameters allow you to introduce dynamic elements to your analysis. Users
can adjust parameters to view different scenarios or compare variables.
Data Blending:
⚫If working with multiple data sources, use data blending to combine datasets
and discover relationships that might not be apparent in individual sources.
Data Grouping and Binning:
⚫Group categorical data into logical clusters or create bins to categorize
numerical data, enhancing visualization and analysis.
Hierarchies and Drill-Downs:
⚫Establish hierarchies to enable users to drill down from high-level summaries
to granular details, enhancing interactivity.
Filters and Sets:
⚫Apply filters and sets to focus on specific data subsets. This enhances
visualization clarity and supports targeted analysis.
Data Aggregations:
⚫Utilize data aggregation functions like SUM, AVG, MAX, and MIN to calculate
metrics based on your analysis goals.
Layers of the data model
⚫ The top-level view that you see of a data source is the logical layer of
the data model. You can also think of it as the Relationships canvas,
because you combine tables here using relationships instead of joins.
⚫ When you combine data from multiple tables, each table that you drag
to the canvas in the logical layer must have a relationship to another
table
⚫ The physical layer of the data model is where you can combine data
using joins and unions
⚫ you can think of it as the Join/Union canvas

1 01/27/25
3
⚫ Tableau Data Model is split into two layers. You can make the data model
using logical and/or physical layer. Both layers can be used in conjunction
with each other. So, you can use both model within the same workbook.
⚫ Within the Tableau Data Model, the logical layer represents the view that
creates a relationship between two or more tables. Relationship is a common
phenomena found within the SQL world. Relationship simply means
connecting two or more normalized tables based on a common column.
⚫ the physical layer represents the view that creates a union or a join between
two or more tables. A union simply means making the table longer for same
dimensions.
⚫ A join is more like a relationship where it connects the two tables based on a
common column.
Logical Layer Physical
Layer
Logical Layer Physical Layer

Relationships canvas in the Data Join/Union canvas in the Data


Source page Source page

Tables that you drag here are called Tables that you drag here are called
logical tables physical tables

Logical tables can be related to other Physical tables can be joined or


logical tables unioned to other physical tables

Logical tables are like containers for Double-click a logical table to see its
physical tables physical tables

Level of detail is at the row level of Level of detail is at the row level of
the logical table merged physical tables

Logical tables remain distinct Physical tables are merged into a


(normalized), not merged in the data single, flat table that defines the
source logical table
Understanding the data model
⚫ In previous versions of Tableau (pre-2020.2), the data model had a physical layer only. In Tableau

2020.2 and later, the data model has the logical (semantic) layer and a physical layer.

⚫ In earlier versions of Tableau (pre-2020.2), the data model in your data source consisted of a

single, physical layer where you could specify joins and unions. Tables added to the physical layer

(joined or unioned) create a single, flattened table (denormalized) for analysis.

⚫ In Tableau 2020.2 and later, the data model in your data source includes a new semantic layer

above the physical layer—called the logical layer—where you can add multiple tables and relate

them to each other. Tables at the logical layer are not merged in the data source, they remain

distinct (normalized), and maintain their native level of detail.

⚫ Logical tables act like containers for merged physical tables. A logical table can contain a single,

physical table. Or it can contain multiple physical tables merged together through joins or unions.
Building Relationships in Tableau
Build a new model

⚫When you add one or more tables to the logical layer, you are essentially
building the data model for your data source. A data source can be made of a
single, logical table, or you can drag multiple tables to the canvas to create a
more complex model.
⚫The first table that you drag to the canvas becomes the root table for the
data model in your data source.
⚫After you drag out the root table, you can drag out additional tables in any
order. You will need to consider which tables should be related to each other,
and the matching field pairs that you define for each relationship.
⚫ Deleting a table in the canvas automatically deletes its

related descendants as well. If you delete the root table, all


other tables in the model are also removed.

⚫ Each relationship must be made of at least one matched pair

of fields. Relationships can be based on calculated fields.

⚫ You can specify how fields used in the relationships should

be compared by using operators when you define the


relationship
Multi Table Model
⚫ Tables that you drag to the logical layer of the Data Source page canvas must be related
to each other.

⚫ When you drag additional tables to the logical layer canvas, Tableau automatically
attempts to create the relationship based on existing key constraints and matching fields
to define the relationship.

⚫ If no constraints are detected, a Many-to-many relationship is created and referential


integrity is set to Some records match.

⚫ You can add more data inside any logical table by double-clicking the table. This opens
the physical layer of the Data Source page canvas.

⚫ If you need to use joins or unions, you can drag the tables you want to join or union into
the physical layer canvas. The physical tables are merged in their logical table.
Single Table Model

⚫ To create a single-table model, drag a table into the logical layer canvas of the Data Source
page. You can then use the fields from that table in the Data pane for analysis.
Single-table model that contains other tables
⚫You can add more data inside the single, logical table by
double-clicking the table. This opens the physical layer of the
Data Source page canvas. If you need to use joins or unions, you
can drag the tables you want to join or union into the physical
layer canvas. The physical tables are merged in their logical
table.
⚫This example shows the Book table in the Relationships canvas
(logical layer) of the data source. Double-clicking the Book
logical table opens the Join/Union canvas (physical layer).
Star and snowflake
⚫In enterprise data warehouses, it is common to have data structured in star or
snowflake schemas where measures are contained in a central fact table and
dimensions are stored separately in independent dimension tables. This
organization of data supports many common analysis flows including rollup and
drill down.
⚫ Single-table
⚫ Analysis over a single logical table that contains a mixture
of dimensions and measures works just as in Tableau pre-
2020.2. You can build a logical table using a combination of
joins, unions, custom SQL, and so on.
Relate your data

⚫ Relationships are a dynamic, flexible way to combine data from multiple


tables for analysis. A relationship describes how two tables relate to each
other, based on common fields, but doesn’t merge the tables together. When
a relationship is created between tables, the tables remain separate,
maintaining their individual level of detail and domains.
⚫ What are relationships
⚫ Relationships are the flexible, connecting lines created between the logical
tables in your data source. Some people affectionately call relationships
"noodles," but we usually refer to them as "relationships" in our help
documentation.
⚫ Relationships use joins, but they’re automatic. Tableau automatically selects
join types based on the fields being used in the visualization. During
analysis, Tableau adjusts join types intelligently and preserves the native
level of detail in your data.
REQUIREMENTS FOR RELATIONSHIPS

• When relating tables, the fields that define the relationships must have the
same data type.
• You can't define relationships based on geographic fields.
• You can't define relationships between published data sources
• Create and define relationships
• For a single base table model, after you drag the first table to the top-level
canvas of the data source, each new table that you drag to the canvas must be
related to an existing table. When you create relationships between tables in
the logical layer, you’re building the data model for your data source
• You create relationships in the logical layer of the data source. This is the
default view of the canvas that you see in the Data Source page.
⚫ Drag a table to the canvas. For a single base table model: The first table that you
add to the canvas becomes the base table. All other tables that you add will be
related to that table.
⚫ For a multiple base table model: You will need to decide which tables are base
tables. To create another base table, drag a table from the left pane to the New Base
Table drop area.
⚫ Drag another table to the canvas. When you see the "noodle" between the two
tables that you want to relate, drop that table. The relationship settings open below
the canvas in the Table Details pane. Tableau automatically attempts to create the
relationship based on existing key constraints and matching fields to define the
relationship. If it can't determine the matching fields, you’ll need to select them.
⚫ To change the fields: Select a field pair, and then select from the list of fields
below to set a new pair of matching fields.
⚫ To add multiple field pairs: After you select the first pair, select Close, and then
select Add more fields.
⚫ If no constraints are detected, a Many-to-many relationship is
created and referential integrity is set to Some records match
⚫ Add more tables following the same steps, as needed.
⚫ Move a table to create a different relationship
⚫ To move a table, drag it next to a different table. Or, hover over a table,
select the arrow, and then select Move.
⚫ Remove a table from a relationship
⚫ To remove a table, hover over a table, select the arrow, and then select Remove.
⚫ View a relationship
⚫ Hover over the relationship line (noodle) to see the matching fields that define it.
You can also hover over any logical table to see what it contains
⚫ Edit a relationship
⚫ Select a relationship line to open the relationship settings in the Table Details pane. You can add,
change, or remove the fields used to define the relationship. Add more field pairs to create a
compound relationship.
⚫ To add multiple field pairs: After you select the first pair, select Close, and then select Add more
fields
• Relationships (logical tables) versus joins (physical tables)
• You create relationships between logical tables at the top-level, logical layer
of your data source. You create joins between physical tables in the
physical layer of your data source.
• Joins merge data from two tables into a single table before your analysis
begins. Merging the tables together can cause data to be duplicated or
filtered from one or both tables; it can also cause NULL rows to be added to
your data if you use a left, right, or full outer join. When analyzing joined
data, you need to make sure that you correctly handle the effects of the join
on your data.
⚫ Relationships versus blends

relationships blends

Defined in the data source Defined in the worksheet between a primary


and a secondary data source

Can be published Can't be published

All tables are equal semantically Depend on selection of primary and secondary
data sources, and how those data sources are
structured.

Support full outer joins Only support left joins

Computed locally Computed as part of the SQL query

Related fields are fixed Related fields vary by sheet (can be


customized on a sheet-by-sheet basis)
Features of different options for combining data: Relationships, joins, and blends

Relate • Use when combining data


from different levels of detail.
• Requires matching fields
between two logical tables.
• Supports many-to-many and
outer joins.
• Relationships are consistent
for the entire workbook and
can be published.
• Can be published, but you
can't combine published data
sources by using
relationships.
Join • Use when you want to add more
columns of data across the same row
structure.
• Requires common fields between two
physical tables.
• Joined physical tables are merged
into a single logical table with a fixed
combination of data.
• May cause data duplication if fields
are at different levels of detail.
• Can use data source filters.

Union • Use when you want to add more


rows of data with the same column
structure.
• Based on matching columns
between two tables.
• Unioned physical tables are
merged into a single logical table
with a fixed combination of data.
Blend • Use when combining data from
different levels of detail.
• Can be used to combine
published data sources, but can't
be published.
• Data sources can be blended on a
per-sheet basis.
• Are always effectively left joins
Joining Tables in Tableau

Joining Tables in Tableau


⚫Joining tables in Tableau is a fundamental process that allows you to combine
data from multiple tables based on shared fields. This enables you to create
comprehensive datasets for analysis and visualization. Here's a step-by-step
guide on how to join tables in Tableau:
⚫Connect to Data Source:
⚫Start by connecting to your data source(s) in Tableau Desktop.
⚫Drag and Drop Tables:
⚫Drag the tables you want to join from the "Connections" pane onto the canvas
area in the main window.
⚫Identify Key Fields:
⚫Identify the common fields (keys) between the tables that you'll use to join
them. These fields should have similar data types and values.
⚫ Add Join Clause:
⚫ Click and drag one of the common fields from one table to the corresponding
field in the other table. A "Join" dialog box will appear.
⚫ Select Join Type:
⚫ Choose the appropriate join type:
⚫ Inner Join: Returns only matching rows from both tables.
⚫ Left Join: Returns all rows from the left table and matching rows from
the right table.
⚫ Right Join: Returns all rows from the right table and matching rows
from the left table.
⚫ Full Outer Join: Returns all rows from both tables, including
unmatched rows.
⚫ Preview and Adjust:
⚫ Preview the results of the join to ensure it's correct. Adjust join conditions or
types if needed.
⚫ Join More Tables:
⚫ If you need to join more tables, repeat the process, ensuring you connect the
appropriate fields.
⚫ Blend Data:
⚫ If you're working with data from different data sources, use data blending to
combine data from separate connections.
⚫ Create Visualizations:
⚫ Once tables are joined, you can start creating visualizations and analysis
using the combined dataset.
⚫ Maintain Hierarchy:
⚫ If joining hierarchical data, ensure that your join logic maintains the desired
hierarchy.
STEPS TO CREATE A JOIN

⚫To create a join, connect to the relevant data source or sources.


⚫These can be in the same data source (such as tables in a database or
sheets in an Excel spreadsheet) or different data sources (this is known as a
cross-database join). If you combined tables using a cross-database join,
Tableau colors the tables in the canvas and the columns in the data grid to
show you which connection the data comes from.
⚫Drag the first table to the canvas.
⚫Select Open from the menu or double-click the first table to open the join
canvas (physical layer).
⚫Double-click or drag another table to the join canvas.
⚫Click the join icon to configure the join. Add one or more join clauses by
selecting a field from one of the available tables used in the data source,
choosing a join operator, and a field from the added table.
⚫When finished, close the join dialog and join canvas.
Cross-database joins

⚫ Once you've connected to the first source of data, use the Add option in the data pane to
add another connection.
⚫ This creates a second connection rather than an entirely different data
source. You can switch between the two (or more) connections while on the
data source tab
⚫ Once you move to a worksheet and begin analysis, the data source
functions as a single, combined data source. This is in contrast to two
independent data sources that can be toggled between on a worksheet.
Blending Data from Multiple Sources

Blending Data from Multiple Sources


⚫Blending data from multiple sources in Tableau is a powerful technique that
enables you to combine data from different databases, spreadsheets, or even
cloud-based platforms to create unified visualizations and insights. Here's a
step-by-step guide on how to blend data in Tableau:
⚫Connect to Data Sources:
⚫Start by connecting to your primary data source in Tableau Desktop, as you
normally would.
⚫Add Secondary Data Source: After connecting to the primary source, click
on the "Add" button in the "Connections" pane to connect to your secondary
data source.
⚫Define Common Dimensions:
⚫Identify common dimensions (fields) that exist in both the primary and
secondary data sources. These dimensions will serve as the basis for blending.
⚫ Create Relationship:
⚫ Drag a dimension from the primary source and drop it onto the corresponding
dimension in the secondary source. This establishes a relationship for blending.
⚫ Configure Blending Options:
⚫ Once a relationship is created, Tableau will automatically detect the blending
fields and offer options to configure how the data should be blended.
⚫ Join Type and Aggregation:
⚫ Specify the join type (left or inner join) and choose the aggregation method for
measures from the secondary source. Common aggregation methods include
SUM, AVG, MAX, MIN, etc.

Add to Visualization:
⚫ Create a visualization that uses data from both sources. Fields from the primary
and secondary sources will be available in the "Data" pane for building
visualizations.
⚫ Visualize and Analyze:
⚫ Use the blended data to create visualizations that showcase insights from
multiple sources combined seamlessly.
⚫ Additional Blending:
⚫ If needed, you can blend data from multiple secondary sources by
establishing relationships for each one.
⚫ Data Validation:
⚫ Ensure that the blended results are accurate by validating the combined
data against your expectations and original sources.
⚫ Data Source Filters:
⚫ Apply filters to the primary and secondary data sources independently to
limit data before blending. This can improve performance and data
accuracy.
Steps for blending data

⚫ Ensure that the workbook has multiple data sources. The second data
source should be added by going to Data > New data source.
⚫ Drag a field to the view. This data source will be the primary data source.
⚫ Switch to another data source and verify there is a blend relationship to the
primary data source
⚫ If there is a linking field icon (), the data sources are automatically linked. As
long as there is at least one active link, the data can be blended.
⚫ If there are broken link icons (), click the icon next to the field that should
link the two data sources. The slash will go away, representing an active
link
⚫ If a link icon does not appear next to the desired field
⚫ Drag a field into the view from the secondary data source.
⚫ As soon as this second data source is used in the same view, a blend is
established. In the example below, our primary data source is Movie
Adaptations and the secondary data source is Bookshop.
⚫ The primary data source is indicated with a blue check mark on the data
source. Fields from the primary data source used in the view have no
indication.
⚫ The secondary data source is indicated with an orange check mark on the
data source and an orange bar down the side of the Data pane. Fields from
the secondary data source used in the view have an orange check mark.
Build Charts and Analyze Data
Predictive Modelling
Predictive modeling functions in Tableau use linear
regression to build predictive models and generate
predictions about your data.
Two table calculations, MODEL_PERCENTILE and
MODEL_QUANTILE, can generate predictions and
surface relationships within your data.
These can be used to identify outliers, estimate values for
sparse or missing data, and predict values for future time
periods
⚫ Previously, users had to integrate Tableau with R and Python
in order to perform advanced statistical calculations and
visualize them in Tableau. Now, you can use the predictive
modeling functions to make predictions from your data by
including them in a table calculation.
⚫ With these predictive modeling functions, you can select
targets and predictors by updating the variables and
visualizing multiple models with different combinations of
predictors.
⚫ The data can be filtered, aggregated, and transformed at any
level of detail, and the model—and thus the prediction—will
automatically recalculate to match your data
Predictive modeling functions available in Tableau
MODEL_PERCENTILE

Syntax MODEL_PERCENTILE(
model_specification (optional),
target_expression,
predictor_expression(s))
Definition Returns the probability (between 0 and 1) of
the expected value being less than or equal to
the observed mark, defined by the target
expression and other predictors. This is the
Posterior Predictive Distribution Function, also
known as the Cumulative Distribution Function
(CDF).
Example MODEL_PERCENTILE( SUM([Sales]),COUN
T([Orders]))
⚫ MODEL_QUANTILE

Syntax MODEL_QUANTILE(
model_specification (optional),
quantile,
target_expression,
predictor_expression(s))
Definition Returns a target numeric value within the
probable range defined by the target
expression and other predictors, at a specified
quantile. This is the Posterior Predictive
Quantile.
Example MODEL_QUANTILE(0.5, SUM([Sales]),
COUNT([Orders]))
Syntax of predictive modeling functions in detail

⚫ What is MODEL_QUANTILE?
⚫ MODEL_QUANTILE calculates the posterior predictive quantile, or the
expected value at a specified quantile.
⚫ Quantile: The first argument is a number between 0 and 1, indicating what
quantile should be predicted. For example, 0.5 specifies that the median will
be predicted.
⚫ Target expression: The second argument is the measure to predict or
“target.”
⚫ Predictor expression(s): The third argument is the predictor used to make
the prediction. Predictors can be dimensions, measures, or both
⚫ The result is a number within the probable range.
⚫ You can use MODEL_QUANTILE to generate a confidence interval,
missing values such as future dates, or to generate categories that don't
exist in your underlying data set
⚫ What is MODEL_PERCENTILE?
⚫ MODEL_PERCENTILE calculates the posterior predictive distribution
function, also known as the Cumulative Distribution Function (CDF). This
calculates the quantile of a particular value between 0 and 1, the inverse of
MODEL_QUANTILE.
⚫ Target expression: The first argument is the measure to target, identifying
which values to assess.
⚫ Predictor expression(s): The second argument is the predictor used to
make the prediction.
⚫ Additional arguments are optional and are included to control the prediction.
 Notice that the calculation syntax is similar, with MODEL_QUANTILE having the
extra argument of a defined quantile.
 The result is the probability of the expected value being less than or equal to the
observed value expressed in the mark.
 You can use MODEL_PERCENTILE to surface correlations and relationships
within your database. If MODEL_PERCENTILE returns a value close to 0.5, the
observed mark is near the median of the range of predicted values, given the other
predictors that you've selected. If MODEL_PERCENTILE returns a value close to
0 or to 1, the observed mark is near the lower or upper range of what the model
expects, given the other predictors that you've selected.
Example - Explore Female Life Expectancy with
Predictive Modeling Functions
 This example uses the World Indicators saved data source, which comes with
Tableau. We’ll use the MODEL_QUANTILE and
MODEL_PERCENTILE predictive modeling functions to explore the relationships
between health spending per capita, female life expectancy, birth rate.
 Let’s start with a visualization that compares each country’s health spending with
its female life expectancy. To follow along and access the pre-built views and
dashboards, or to view the solution, download the following workbook from
Tableau Public: Predictive Modeling of Female Life Expectancy.
 Step 1: Create the prediction calculation
 If you also have Tableau Server or Tableau Cloud and you want to do your authoring on the web
instead of in Tableau Desktop, publish the workbook to your Tableau server, click Workbooks,
select the workbook, then under Actions, choose Edit Workbook.
 Once you open the workbook, you'll see that it has several sheets. You'll be using those sheets to
build your views.
 In the starter workbook, click the Percentile Starter sheet.
 Open the Analysis menu at the top, and then select Create Calculated Field.
 In the Calculation Editor, do the following:
 Name the calculation: Percentile Expectancy vs Spending
 Enter the following formula:
 MODEL_PERCENTILE(AVG([Life Expectancy Female]), LOG(MEDIAN([Health
Exp/Capita])))
 This calculation uses average life expectancy as the target expression, and median health
expenditure as the predictor. In this case, we used a logarithmic transformation on the health
spending axis, as well as for the predictor
 Click OK.
 The prediction calculation is now added as a calculated field in the Data pane.
 Step 2: Add the prediction calculation to the view
 In the viz above, you can see each country's health spending against its female life
expectancy, filtered to 2012.
 Now, let’s add the MODEL_PERCENTILE calculation to the view and see what
insights we can gain.
 Drag Percentile Expectancy vs Spending to Color on the Marks card.
 Click the drop-down arrow on the pill and select Compute Using > Country/Region.
 Click Color on the Marks card, and then click Edit Colors.
 Under Palette, select Orange-Blue Diverging.
 Select the Stepped Color checkbox.
 Select the Reversed checkbox.
 Click OK.
 You can see the distribution of countries where health
expectancy is both higher and lower than expected based on
the level of spending. Notice that generally, the dark red
marks indicate that life expectancy is high relative to
healthcare spending, dark blue means that life expectancy is
low relative to healthcare spending, and grey means that life
expectancy is close to what the model expects, based on the
level of healthcare spending
Step 3: Group the results by color
To simplify analysis, let’s use the prediction calculation within a new calculation to
group the results. We’ll build groups so that marks above the 90th percentile and
below the 10th percentile are grouped together, marks in the 80th-90th percentile
range and 10th-20th percentile range are grouped together, and so on. We’ll also
highlight marks with a null value and address those later using the other predictive
modeling function, MODEL_QUANTILE.
1.In the Calculation Editor, do the following:
•Name the calculation: Percentile by Color.
•Enter the following formula:
IF
ISNULL([Percentile Expectancy vs Spending])
THEN "Null"
ELSEIF [Percentile Expectancy vs Spending] >=0.9 OR
[Percentile Expectancy vs Spending] <=0.1
THEN "<10th & >90th percentile"
ELSEIF [Percentile Expectancy vs Spending] >=0.8 OR
[Percentile Expectancy vs Spending] <=0.2
THEN "<20th & >80th percentile"
ELSEIF [Percentile Expectancy vs Spending] >=0.7 OR
[Percentile Expectancy vs Spending] <=0.3
THEN "<30th & >70th percentile"
ELSEIF [Percentile Expectancy vs Spending] >=0.6 OR
[Percentile Expectancy vs Spending] <=0.4
THEN "<40th & >60th percentile"
ELSE "50th percentile +-10"
END
 Add the new calculation to Color on the Marks card.
 Click the drop-down arrow on the pill and select Compute
Using > Country/Region.
 Click Color on the Marks card, and then click Edit Colors.
 Adjust the colors to better see the trend. In this case, let’s
use the Traffic Light color palette, and use gray for Nulls.
 Click OK.
Looking at the orange mark in the corner, notice that the U.S.
spends $8,895 per female for a life expectancy of 81 years.
Moving along the X-axis to the left, you can see that other
countries spend less and have the same life expectancy.
The model evaluates the strength of the relationship at each
point, where the U.S. is close to the upper end of the model’s
expected range
Step 4: Compare life expectancy with birth rate
Next, let’s look at a viz that compares female life expectancy
with birth rate. Notice that there is a negative correlation
between birth rates and female life expectancy; however, this
does not mean that higher birth rates cause lower female life
expectancy. There are likely additional factors that affect both
birth rates and female life expectancy that are not visible in this
view of the data. But let’s add the model and see where the
model expects female life expectancy to be higher or lower given
health expenditures.
 On the Birth Rate sheet, add the Percentile by Color prediction
calculation to Color on the Marks card to bring it into the view.
 Click the drop-down arrow on the pill and select Compute Using
> Country/Region.
 Click Color on the Marks card, and click Edit Colors. Edit the
colors as before, using the Traffic Light palette and gray for
Null.
 Click OK.
Now the data is much more distributed. The red band in the lower
right corner is where life expectancy is lowest but the birth rate is
highest, and healthcare spending relative to life expectancy is low.
Singling out the two red marks in the top left quadrant, which pertain
to Albania and Armenia, you’ll notice that both countries have high
female life expectancy, lower birth rates, and low health
expenditures.
As you can see, we were able to use MODEL_PERCENTILE to
identify that these two countries are outliers: Even though they both
had relatively low healthcare spending, they still have relatively high
life expectancies, placed in the context of birth rate.
Now, let’s see how you can use the other predictive modeling
function, MODEL_QUANTILE, to continue your analysis.
THANK YOU

You might also like