ADV Module 1
ADV Module 1
Visualization
21ADS73
MODULE 1
Data Modelling in Tableau
Contents
⚫ Data Modelling
⚫ Data Sources and Connections in Tableau
⚫ Data Preparation and Cleaning
⚫ Layers of the data model
⚫ Understanding the data model
⚫ Building Relationships in Tableau
⚫ Joining Tables in Tableau
⚫ Blending Data from Multiple Sources
⚫ Build charts and analyze data
2 01/27/25
Data Modelling
4 01/27/25
Data Sources and Connections in Tableau
Schema Mapping:
⚫Tableau automatically detects and maps data source schema,
simplifying the process of creating visualizations.
Data Preparation:
⚫Users can cleanse, reshape, and transform data within Tableau using
calculated fields, calculated tables, and other features.
Data Blending:
⚫Tableau can blend data from multiple sources, allowing for holistic
analysis and correlation.
Join and Relationship:
⚫Tables from the same or different data sources can be joined or related
based on common fields to create cohesive datasets.
Custom SQL:
⚫Advanced users can use Custom SQL to create tailored queries to retrieve
specific data.
Performance Optimization:
⚫Tableau's "Data Engine" accelerates query execution, enhancing
performance even with large datasets.
Data Preparation and Cleaning
Data Preparation and Cleaning
Transforming Data for Analysis
⚫Transforming data for analysis in Tableau is a pivotal step that ensures your
data is clean, structured, and ready to reveal meaningful insights. Here's a
comprehensive guide on how to transform data effectively within Tableau:
Connect to Data Source:
⚫Begin by connecting to your data source in Tableau Desktop, whether it's a
database, spreadsheet, or cloud-based repository.
Data Profiling:
⚫Use the data profiling feature to gain an initial understanding of your data's
characteristics, such as data types, null values, and unique values.
Data Cleansing:
⚫Address inconsistencies, errors, and missing values in your dataset. Utilize
tools like calculated fields and data cleaning functions to clean and standardize
your data.
Data Reshaping:
⚫Pivot and unpivot data as needed to reshape it into a format suitable for
analysis. This is particularly useful for time-series data and comparison analysis.
Data Aggregation:
⚫Aggregate data to higher levels (sum, average, count) for summary analysis.
Tableau's aggregation functions facilitate this process.
Creating Calculated Fields:
⚫Craft calculated fields to derive new insights from existing data. You can
perform calculations, apply logic, and generate new metrics based on your
requirements.
Creating Parameters:
⚫Parameters allow you to introduce dynamic elements to your analysis. Users
can adjust parameters to view different scenarios or compare variables.
Data Blending:
⚫If working with multiple data sources, use data blending to combine datasets
and discover relationships that might not be apparent in individual sources.
Data Grouping and Binning:
⚫Group categorical data into logical clusters or create bins to categorize
numerical data, enhancing visualization and analysis.
Hierarchies and Drill-Downs:
⚫Establish hierarchies to enable users to drill down from high-level summaries
to granular details, enhancing interactivity.
Filters and Sets:
⚫Apply filters and sets to focus on specific data subsets. This enhances
visualization clarity and supports targeted analysis.
Data Aggregations:
⚫Utilize data aggregation functions like SUM, AVG, MAX, and MIN to calculate
metrics based on your analysis goals.
Layers of the data model
⚫ The top-level view that you see of a data source is the logical layer of
the data model. You can also think of it as the Relationships canvas,
because you combine tables here using relationships instead of joins.
⚫ When you combine data from multiple tables, each table that you drag
to the canvas in the logical layer must have a relationship to another
table
⚫ The physical layer of the data model is where you can combine data
using joins and unions
⚫ you can think of it as the Join/Union canvas
1 01/27/25
3
⚫ Tableau Data Model is split into two layers. You can make the data model
using logical and/or physical layer. Both layers can be used in conjunction
with each other. So, you can use both model within the same workbook.
⚫ Within the Tableau Data Model, the logical layer represents the view that
creates a relationship between two or more tables. Relationship is a common
phenomena found within the SQL world. Relationship simply means
connecting two or more normalized tables based on a common column.
⚫ the physical layer represents the view that creates a union or a join between
two or more tables. A union simply means making the table longer for same
dimensions.
⚫ A join is more like a relationship where it connects the two tables based on a
common column.
Logical Layer Physical
Layer
Logical Layer Physical Layer
Tables that you drag here are called Tables that you drag here are called
logical tables physical tables
Logical tables are like containers for Double-click a logical table to see its
physical tables physical tables
Level of detail is at the row level of Level of detail is at the row level of
the logical table merged physical tables
2020.2 and later, the data model has the logical (semantic) layer and a physical layer.
⚫ In earlier versions of Tableau (pre-2020.2), the data model in your data source consisted of a
single, physical layer where you could specify joins and unions. Tables added to the physical layer
⚫ In Tableau 2020.2 and later, the data model in your data source includes a new semantic layer
above the physical layer—called the logical layer—where you can add multiple tables and relate
them to each other. Tables at the logical layer are not merged in the data source, they remain
⚫ Logical tables act like containers for merged physical tables. A logical table can contain a single,
physical table. Or it can contain multiple physical tables merged together through joins or unions.
Building Relationships in Tableau
Build a new model
⚫When you add one or more tables to the logical layer, you are essentially
building the data model for your data source. A data source can be made of a
single, logical table, or you can drag multiple tables to the canvas to create a
more complex model.
⚫The first table that you drag to the canvas becomes the root table for the
data model in your data source.
⚫After you drag out the root table, you can drag out additional tables in any
order. You will need to consider which tables should be related to each other,
and the matching field pairs that you define for each relationship.
⚫ Deleting a table in the canvas automatically deletes its
⚫ When you drag additional tables to the logical layer canvas, Tableau automatically
attempts to create the relationship based on existing key constraints and matching fields
to define the relationship.
⚫ You can add more data inside any logical table by double-clicking the table. This opens
the physical layer of the Data Source page canvas.
⚫ If you need to use joins or unions, you can drag the tables you want to join or union into
the physical layer canvas. The physical tables are merged in their logical table.
Single Table Model
⚫ To create a single-table model, drag a table into the logical layer canvas of the Data Source
page. You can then use the fields from that table in the Data pane for analysis.
Single-table model that contains other tables
⚫You can add more data inside the single, logical table by
double-clicking the table. This opens the physical layer of the
Data Source page canvas. If you need to use joins or unions, you
can drag the tables you want to join or union into the physical
layer canvas. The physical tables are merged in their logical
table.
⚫This example shows the Book table in the Relationships canvas
(logical layer) of the data source. Double-clicking the Book
logical table opens the Join/Union canvas (physical layer).
Star and snowflake
⚫In enterprise data warehouses, it is common to have data structured in star or
snowflake schemas where measures are contained in a central fact table and
dimensions are stored separately in independent dimension tables. This
organization of data supports many common analysis flows including rollup and
drill down.
⚫ Single-table
⚫ Analysis over a single logical table that contains a mixture
of dimensions and measures works just as in Tableau pre-
2020.2. You can build a logical table using a combination of
joins, unions, custom SQL, and so on.
Relate your data
• When relating tables, the fields that define the relationships must have the
same data type.
• You can't define relationships based on geographic fields.
• You can't define relationships between published data sources
• Create and define relationships
• For a single base table model, after you drag the first table to the top-level
canvas of the data source, each new table that you drag to the canvas must be
related to an existing table. When you create relationships between tables in
the logical layer, you’re building the data model for your data source
• You create relationships in the logical layer of the data source. This is the
default view of the canvas that you see in the Data Source page.
⚫ Drag a table to the canvas. For a single base table model: The first table that you
add to the canvas becomes the base table. All other tables that you add will be
related to that table.
⚫ For a multiple base table model: You will need to decide which tables are base
tables. To create another base table, drag a table from the left pane to the New Base
Table drop area.
⚫ Drag another table to the canvas. When you see the "noodle" between the two
tables that you want to relate, drop that table. The relationship settings open below
the canvas in the Table Details pane. Tableau automatically attempts to create the
relationship based on existing key constraints and matching fields to define the
relationship. If it can't determine the matching fields, you’ll need to select them.
⚫ To change the fields: Select a field pair, and then select from the list of fields
below to set a new pair of matching fields.
⚫ To add multiple field pairs: After you select the first pair, select Close, and then
select Add more fields.
⚫ If no constraints are detected, a Many-to-many relationship is
created and referential integrity is set to Some records match
⚫ Add more tables following the same steps, as needed.
⚫ Move a table to create a different relationship
⚫ To move a table, drag it next to a different table. Or, hover over a table,
select the arrow, and then select Move.
⚫ Remove a table from a relationship
⚫ To remove a table, hover over a table, select the arrow, and then select Remove.
⚫ View a relationship
⚫ Hover over the relationship line (noodle) to see the matching fields that define it.
You can also hover over any logical table to see what it contains
⚫ Edit a relationship
⚫ Select a relationship line to open the relationship settings in the Table Details pane. You can add,
change, or remove the fields used to define the relationship. Add more field pairs to create a
compound relationship.
⚫ To add multiple field pairs: After you select the first pair, select Close, and then select Add more
fields
• Relationships (logical tables) versus joins (physical tables)
• You create relationships between logical tables at the top-level, logical layer
of your data source. You create joins between physical tables in the
physical layer of your data source.
• Joins merge data from two tables into a single table before your analysis
begins. Merging the tables together can cause data to be duplicated or
filtered from one or both tables; it can also cause NULL rows to be added to
your data if you use a left, right, or full outer join. When analyzing joined
data, you need to make sure that you correctly handle the effects of the join
on your data.
⚫ Relationships versus blends
relationships blends
All tables are equal semantically Depend on selection of primary and secondary
data sources, and how those data sources are
structured.
⚫ Once you've connected to the first source of data, use the Add option in the data pane to
add another connection.
⚫ This creates a second connection rather than an entirely different data
source. You can switch between the two (or more) connections while on the
data source tab
⚫ Once you move to a worksheet and begin analysis, the data source
functions as a single, combined data source. This is in contrast to two
independent data sources that can be toggled between on a worksheet.
Blending Data from Multiple Sources
Add to Visualization:
⚫ Create a visualization that uses data from both sources. Fields from the primary
and secondary sources will be available in the "Data" pane for building
visualizations.
⚫ Visualize and Analyze:
⚫ Use the blended data to create visualizations that showcase insights from
multiple sources combined seamlessly.
⚫ Additional Blending:
⚫ If needed, you can blend data from multiple secondary sources by
establishing relationships for each one.
⚫ Data Validation:
⚫ Ensure that the blended results are accurate by validating the combined
data against your expectations and original sources.
⚫ Data Source Filters:
⚫ Apply filters to the primary and secondary data sources independently to
limit data before blending. This can improve performance and data
accuracy.
Steps for blending data
⚫ Ensure that the workbook has multiple data sources. The second data
source should be added by going to Data > New data source.
⚫ Drag a field to the view. This data source will be the primary data source.
⚫ Switch to another data source and verify there is a blend relationship to the
primary data source
⚫ If there is a linking field icon (), the data sources are automatically linked. As
long as there is at least one active link, the data can be blended.
⚫ If there are broken link icons (), click the icon next to the field that should
link the two data sources. The slash will go away, representing an active
link
⚫ If a link icon does not appear next to the desired field
⚫ Drag a field into the view from the secondary data source.
⚫ As soon as this second data source is used in the same view, a blend is
established. In the example below, our primary data source is Movie
Adaptations and the secondary data source is Bookshop.
⚫ The primary data source is indicated with a blue check mark on the data
source. Fields from the primary data source used in the view have no
indication.
⚫ The secondary data source is indicated with an orange check mark on the
data source and an orange bar down the side of the Data pane. Fields from
the secondary data source used in the view have an orange check mark.
Build Charts and Analyze Data
Predictive Modelling
Predictive modeling functions in Tableau use linear
regression to build predictive models and generate
predictions about your data.
Two table calculations, MODEL_PERCENTILE and
MODEL_QUANTILE, can generate predictions and
surface relationships within your data.
These can be used to identify outliers, estimate values for
sparse or missing data, and predict values for future time
periods
⚫ Previously, users had to integrate Tableau with R and Python
in order to perform advanced statistical calculations and
visualize them in Tableau. Now, you can use the predictive
modeling functions to make predictions from your data by
including them in a table calculation.
⚫ With these predictive modeling functions, you can select
targets and predictors by updating the variables and
visualizing multiple models with different combinations of
predictors.
⚫ The data can be filtered, aggregated, and transformed at any
level of detail, and the model—and thus the prediction—will
automatically recalculate to match your data
Predictive modeling functions available in Tableau
MODEL_PERCENTILE
Syntax MODEL_PERCENTILE(
model_specification (optional),
target_expression,
predictor_expression(s))
Definition Returns the probability (between 0 and 1) of
the expected value being less than or equal to
the observed mark, defined by the target
expression and other predictors. This is the
Posterior Predictive Distribution Function, also
known as the Cumulative Distribution Function
(CDF).
Example MODEL_PERCENTILE( SUM([Sales]),COUN
T([Orders]))
⚫ MODEL_QUANTILE
Syntax MODEL_QUANTILE(
model_specification (optional),
quantile,
target_expression,
predictor_expression(s))
Definition Returns a target numeric value within the
probable range defined by the target
expression and other predictors, at a specified
quantile. This is the Posterior Predictive
Quantile.
Example MODEL_QUANTILE(0.5, SUM([Sales]),
COUNT([Orders]))
Syntax of predictive modeling functions in detail
⚫ What is MODEL_QUANTILE?
⚫ MODEL_QUANTILE calculates the posterior predictive quantile, or the
expected value at a specified quantile.
⚫ Quantile: The first argument is a number between 0 and 1, indicating what
quantile should be predicted. For example, 0.5 specifies that the median will
be predicted.
⚫ Target expression: The second argument is the measure to predict or
“target.”
⚫ Predictor expression(s): The third argument is the predictor used to make
the prediction. Predictors can be dimensions, measures, or both
⚫ The result is a number within the probable range.
⚫ You can use MODEL_QUANTILE to generate a confidence interval,
missing values such as future dates, or to generate categories that don't
exist in your underlying data set
⚫ What is MODEL_PERCENTILE?
⚫ MODEL_PERCENTILE calculates the posterior predictive distribution
function, also known as the Cumulative Distribution Function (CDF). This
calculates the quantile of a particular value between 0 and 1, the inverse of
MODEL_QUANTILE.
⚫ Target expression: The first argument is the measure to target, identifying
which values to assess.
⚫ Predictor expression(s): The second argument is the predictor used to
make the prediction.
⚫ Additional arguments are optional and are included to control the prediction.
Notice that the calculation syntax is similar, with MODEL_QUANTILE having the
extra argument of a defined quantile.
The result is the probability of the expected value being less than or equal to the
observed value expressed in the mark.
You can use MODEL_PERCENTILE to surface correlations and relationships
within your database. If MODEL_PERCENTILE returns a value close to 0.5, the
observed mark is near the median of the range of predicted values, given the other
predictors that you've selected. If MODEL_PERCENTILE returns a value close to
0 or to 1, the observed mark is near the lower or upper range of what the model
expects, given the other predictors that you've selected.
Example - Explore Female Life Expectancy with
Predictive Modeling Functions
This example uses the World Indicators saved data source, which comes with
Tableau. We’ll use the MODEL_QUANTILE and
MODEL_PERCENTILE predictive modeling functions to explore the relationships
between health spending per capita, female life expectancy, birth rate.
Let’s start with a visualization that compares each country’s health spending with
its female life expectancy. To follow along and access the pre-built views and
dashboards, or to view the solution, download the following workbook from
Tableau Public: Predictive Modeling of Female Life Expectancy.
Step 1: Create the prediction calculation
If you also have Tableau Server or Tableau Cloud and you want to do your authoring on the web
instead of in Tableau Desktop, publish the workbook to your Tableau server, click Workbooks,
select the workbook, then under Actions, choose Edit Workbook.
Once you open the workbook, you'll see that it has several sheets. You'll be using those sheets to
build your views.
In the starter workbook, click the Percentile Starter sheet.
Open the Analysis menu at the top, and then select Create Calculated Field.
In the Calculation Editor, do the following:
Name the calculation: Percentile Expectancy vs Spending
Enter the following formula:
MODEL_PERCENTILE(AVG([Life Expectancy Female]), LOG(MEDIAN([Health
Exp/Capita])))
This calculation uses average life expectancy as the target expression, and median health
expenditure as the predictor. In this case, we used a logarithmic transformation on the health
spending axis, as well as for the predictor
Click OK.
The prediction calculation is now added as a calculated field in the Data pane.
Step 2: Add the prediction calculation to the view
In the viz above, you can see each country's health spending against its female life
expectancy, filtered to 2012.
Now, let’s add the MODEL_PERCENTILE calculation to the view and see what
insights we can gain.
Drag Percentile Expectancy vs Spending to Color on the Marks card.
Click the drop-down arrow on the pill and select Compute Using > Country/Region.
Click Color on the Marks card, and then click Edit Colors.
Under Palette, select Orange-Blue Diverging.
Select the Stepped Color checkbox.
Select the Reversed checkbox.
Click OK.
You can see the distribution of countries where health
expectancy is both higher and lower than expected based on
the level of spending. Notice that generally, the dark red
marks indicate that life expectancy is high relative to
healthcare spending, dark blue means that life expectancy is
low relative to healthcare spending, and grey means that life
expectancy is close to what the model expects, based on the
level of healthcare spending
Step 3: Group the results by color
To simplify analysis, let’s use the prediction calculation within a new calculation to
group the results. We’ll build groups so that marks above the 90th percentile and
below the 10th percentile are grouped together, marks in the 80th-90th percentile
range and 10th-20th percentile range are grouped together, and so on. We’ll also
highlight marks with a null value and address those later using the other predictive
modeling function, MODEL_QUANTILE.
1.In the Calculation Editor, do the following:
•Name the calculation: Percentile by Color.
•Enter the following formula:
IF
ISNULL([Percentile Expectancy vs Spending])
THEN "Null"
ELSEIF [Percentile Expectancy vs Spending] >=0.9 OR
[Percentile Expectancy vs Spending] <=0.1
THEN "<10th & >90th percentile"
ELSEIF [Percentile Expectancy vs Spending] >=0.8 OR
[Percentile Expectancy vs Spending] <=0.2
THEN "<20th & >80th percentile"
ELSEIF [Percentile Expectancy vs Spending] >=0.7 OR
[Percentile Expectancy vs Spending] <=0.3
THEN "<30th & >70th percentile"
ELSEIF [Percentile Expectancy vs Spending] >=0.6 OR
[Percentile Expectancy vs Spending] <=0.4
THEN "<40th & >60th percentile"
ELSE "50th percentile +-10"
END
Add the new calculation to Color on the Marks card.
Click the drop-down arrow on the pill and select Compute
Using > Country/Region.
Click Color on the Marks card, and then click Edit Colors.
Adjust the colors to better see the trend. In this case, let’s
use the Traffic Light color palette, and use gray for Nulls.
Click OK.
Looking at the orange mark in the corner, notice that the U.S.
spends $8,895 per female for a life expectancy of 81 years.
Moving along the X-axis to the left, you can see that other
countries spend less and have the same life expectancy.
The model evaluates the strength of the relationship at each
point, where the U.S. is close to the upper end of the model’s
expected range
Step 4: Compare life expectancy with birth rate
Next, let’s look at a viz that compares female life expectancy
with birth rate. Notice that there is a negative correlation
between birth rates and female life expectancy; however, this
does not mean that higher birth rates cause lower female life
expectancy. There are likely additional factors that affect both
birth rates and female life expectancy that are not visible in this
view of the data. But let’s add the model and see where the
model expects female life expectancy to be higher or lower given
health expenditures.
On the Birth Rate sheet, add the Percentile by Color prediction
calculation to Color on the Marks card to bring it into the view.
Click the drop-down arrow on the pill and select Compute Using
> Country/Region.
Click Color on the Marks card, and click Edit Colors. Edit the
colors as before, using the Traffic Light palette and gray for
Null.
Click OK.
Now the data is much more distributed. The red band in the lower
right corner is where life expectancy is lowest but the birth rate is
highest, and healthcare spending relative to life expectancy is low.
Singling out the two red marks in the top left quadrant, which pertain
to Albania and Armenia, you’ll notice that both countries have high
female life expectancy, lower birth rates, and low health
expenditures.
As you can see, we were able to use MODEL_PERCENTILE to
identify that these two countries are outliers: Even though they both
had relatively low healthcare spending, they still have relatively high
life expectancies, placed in the context of birth rate.
Now, let’s see how you can use the other predictive modeling
function, MODEL_QUANTILE, to continue your analysis.
THANK YOU