0% found this document useful (0 votes)
44 views46 pages

Week 2 Stats

This document outlines the process of cleaning, transforming, and loading data in Power BI, emphasizing the importance of data integrity and user-friendly practices. It covers various tasks such as profiling data, handling duplicates, changing data types, and merging tables to create a semantic model that enhances report accuracy and efficiency. Additionally, it discusses the creation of date tables and the use of DAX functions for data analysis.

Uploaded by

banathi nkosi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views46 pages

Week 2 Stats

This document outlines the process of cleaning, transforming, and loading data in Power BI, emphasizing the importance of data integrity and user-friendly practices. It covers various tasks such as profiling data, handling duplicates, changing data types, and merging tables to create a semantic model that enhances report accuracy and efficiency. Additionally, it discusses the creation of date tables and the use of DAX functions for data analysis.

Uploaded by

banathi nkosi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

DATA ANALYTICS

CODAA2
Week 2
Lessons 3-4
Lesson Outcomes:
By the end of this lesson, you should be able to:

•Clean and Transform data in Power BI

•Enforce Data Integrity using Relationships


• Resolve inconsistencies, unexpected or null values, and data quality
issues.

• Apply user-friendly value replacements.

• Profile data so you can learn more about a specific column before
using it.

• Evaluate and transform column data types.

• Apply data shape transformations to table structures.

• Combine queries.

• Apply user-friendly naming conventions to columns and queries.


Clean, Transform & Load Data to Power BI

Dirty data can be overwhelming and, though you might feel frustrated, you decide to get
to work and figure out how to make this semantic model as pristine as possible.
. Clean data has the following advantages:

•Measures and columns produce more accurate results when they perform aggregations
and calculations.

•Tables are organized, where users can find the data in an intuitive manner.
Clean, Transform & Load Data to Power BI

•Duplicates are removed, making data navigation simpler. It will also produce columns that
can be used in slicers and filters.

•A complicated column can be split into two, simpler columns. Multiple columns can be
combined into one column for readability.

•Codes and integers can be replaced with human readable values.


Step Task Tool
1 Profile data Column profiling
Clean Replace/
2
errors/nulls Remove
Data type
3 Fix data types
selector
Handle Remove
4
duplicates Duplicates
Reshape
5 Split/Merge
columns
Replace Values /
6 Replace codes
Merge
7 Rename Rename tool
8 Combine tables Append / Merge
Advanced
9 M code editor
control
Transform Initial Data to Power BI

Power Query Editor in Power BI Desktop allows you to transform your imported data. You can
accomplish actions such as renaming columns or tables, changing text to numbers, removing rows,
setting the first row as headers, and much more. It is important to transform your data to ensure
that it meets your needs and is suitable for use in reports.
• To start shaping your data, open Power Query Editor by selecting the Transform data option on
the Home tab of Power BI Desktop.
• The first step in shaping your initial data is to identify the column headers and names within the
data and then evaluate where they are located to ensure that they are in the right place.
Transform Initial Data to Power BI

• When a table is created in Power BI Desktop, Power Query Editor assumes that all data belongs in
table rows. However, a data source might have a first row that contains column names, To correct
this inaccuracy, you need to promote the first table row into column headers You can promote
headers in two ways: by selecting the Use First Row as Headers option on the Home tab or by
selecting the drop-down button next to Column1 and then selecting Use First Row as Headers
• Next, examine the column headers. You can rename column headers in two ways. One approach
is to right-click the header, select Rename, edit the name, and then press Enter. Alternatively,
you can double-click the column header and overwrite the name with the correct name.
Transform Initial Data to Power BI

• Next, remove some of the top rows, for example, if they are blank or if they contain data that you
do not need in your reports. To remove these excess rows, select Remove Rows > Remove Top
Rows on the Home tab.
• remove unnecessary columns. You can remove columns in two ways. The first method is to select
the columns that you want to remove and then, on the Home tab, select Remove Columns..
Alternatively, you can select the columns that you want to keep and then, on the Home tab,
select Remove Columns > Remove Other Columns.
Simplify the Data Structure
Rename a query: In Power Query Editor, in the Queries pane to the left of your data,
select the query that you want to rename. Right-click the query and select Rename. Edit the
current name or type a new name, and then press Enter.

Replace values: Select the column that contains the value that you want to replace
(Attribute in this case), and then select Replace Values on the Transform tab.

Replace null values

Remove duplicates - You can achieve this action by selecting a column, right-clicking on
the header of the column, and then selecting the Remove Duplicates option.
Evaluate and Change Column Data Types Of
Data Analysis
Evaluate and change column data types if necessary
Implications of incorrect data types: Incorrect data types will prevent you from creating certain
calculations, deriving hierarchies, or creating proper relationships with other tables.

Change the column data type: You can change the data type of a column in two places: in Power
Query Editor and in the Power BI Desktop Report view by using the column tools. It is best to change the
data type in the Power Query Editor before you load the data.

Change the column data type in Power Query Editor: You can change the data type of a column in
two places: in Power Query Editor and in the Power BI Desktop Report view by using the column tools. It
is best to change the data type in the Power Query Editor before you load the data. Another method is
to select the data type icon next to the column header and then select the correct data type from the
list.
Combine Multiple Tables Into A
Single Table

You can combine the tables in two different ways: Merge and Append.

Append queries: When you append queries, you'll be adding rows of data to another table or
query. For example, you could have two tables, one with 300 rows and another with 100 rows, and
when you append queries, you'll end up with 400 rows. When you merge queries, you'll be adding
columns from one table (or query) into another. To merge two tables, you must have a column that
is the key between the two tables.
On the Home tab on the Power Query Editor ribbon, select the drop-down list for Append Queries.
You can select Append Queries as New, which means that the output of appending will result in a
new query or table, or you can select Append Queries, which will add the rows from an existing
table into another.
Combine Multiple Tables Into A
Single Table
Merge queries

When you merge queries, you're combining the data from multiple tables into one based on a
column that is common between the tables. This process is similar to the JOIN clause in SQL.

Go to Home on the Power Query Editor ribbon and select the Merge Queries drop-down menu,
where you can select Merge Queries as New. This selection will open a new window, where you
can choose the tables that you want to merge from the drop-down list, and then select the column
that is matching between the tables
Combine Multiple Tables Into A Single Table

You can also choose how to join the


two tables together, a process that
is also similar to JOIN statements in
SQL. These join options include:
Left Outer - Displays all rows from
the first table and only the
matching rows from the second.

Full Outer - Displays all rows from


both tables.

Inner - Displays the matched rows


between the two tables.
Profile Data in Power BI
Examine data structures and Find data anomalies and data statistics

To understand data anomalies and statistics, select the Column Distribution, Column Quality,
and Column Profile options
Power Query Editor determines data anomalies by using the Column Distribution feature.
Column distribution shows you the distribution of the data within the column and the counts of
distinct and unique values, both of which can tell you details about the data counts. Distinct values
are all the different values in a column, including duplicates and null values, while unique values
do not include duplicates or nulls.
Profile Data in Power BI
Column quality shows you the percentages of data that is valid, in error, and empty.

Column profile gives you a more in-depth look into the statistics within the columns for the first
1,000 rows of data. Value distribution graph tells you the counts for each distinct value in that
specific column. On a numeric column, Column Statistics will also include how many zeroes and
null values exist, along with the average value in the column, the standard deviation of the values
in the column, and how many even and odd values are in the column.
Use Advanced Editor to Modify M Code

Each time you shape data in Power Query, you create a step in the Power Query process. Those
steps can be reordered, deleted, and modified where it makes sense. Each cleaning step that you
made was likely created by using the graphical interface, but Power Query uses the M language
behind the scenes. The combined steps are available to read by using the Power Query Advanced
Editor. The M language is always available to be read and modified directly. It is not required that
you use M code to take advantage of Power Query. You will rarely need to write M code, but it can
still prove useful.
Design a Semantic Model in Power BI

Creating a great semantic model is one of the most important tasks that a data analyst can
perform in Microsoft Power BI.
A good semantic model offers the following benefits:

•Data exploration is faster.

•Aggregations are simpler to build.

•Reports are more accurate.

•Writing reports takes less time.

•Reports are easier to maintain in the future.


Design a Semantic Model in Power BI
Design a Semantic Model in Power BI

Relationships are defined between tables through primary and foreign keys. Primary keys are
column(s) that identify each unique, non-null data row. For instance, if you have a Customers table,
you could have an index that identifies each unique customer. The first row has an ID of 1, the
second row an ID of 2, and so on. Each row is assigned a unique value, which can be referred to by
this simple value: the primary key. This process becomes important when you are referencing rows
in a different table, which is what foreign keys do. Relationships between tables are formed when
you have primary and foreign keys in common between different tables.
Design a Semantic Model in Power BI

Star schemas

In a star schema, each table within your


semantic model is defined as a dimension or
a fact table:

Fact tables contain observational or event


data values: sales orders, product counts,
prices, transactional dates and times, and
quantities. Fact tables can contain several
repeated values. For example, one product
can appear multiple times in multiple rows,
for different customers on different dates.
These values can be aggregated to create
visuals.
Design a Semantic Model in Power BI

Star schemas cont’d

Dimension tables contain the details about the data in fact tables: products, locations,
employees, and order types. These tables are connected to the fact table through key columns.
Dimension tables are used to filter and group the data in fact tables. The fact tables, on the other
hand, contain the measurable data, such as sales and revenue, and each row represents a unique
combination of values from the dimension tables
Design a Semantic Model in Power BI

Work with Dimensions


When building a star schema, you will have dimension and fact tables. Fact tables contain
information about events such as sales orders, shipping dates, resellers, and suppliers. Dimension
tables store details about business entities, such as products or time, and are connected back to fact
tables through a relationship.

You can use hierarchies as one source to help you find detail in dimension tables. These hierarchies
form through natural segments in your data. For instance, you can have a hierarchy of dates in
which your dates can be segmented into years, months, weeks, and days. Hierarchies are useful
because they allow you to drill down into the specifics of your data instead of only seeing the data at
a high level
Design a Semantic Model in Power BI

Hierachies:

You can manually create hierarchies. To create a hierarchy, go to the Fields pane on Power BI and
then right-click the column that you want the hierarchy for. Select New hierarchy, as shown in the
following figure.
Next, drag and drop the subcategory column into this new hierarchy that you've created. This
column will be added as a sublevel on the hierarchy..

Now, you can build the visual by selecting a stacked bar chart in the Visualizations pane.
Hierarchies allow you to view increasing levels of data on a single view.

Parent- child Hierachies: The column that determines the hierarchy (e.g Manager) is the parent,
while the "children or child" is the sublevel column (e.g employees)
Design a Semantic Model in Power BI

You need to ensure that, before you begin working on building reports, your semantic model and
table structure are simplified. A simple table structure will:

•Be simple to navigate because of column and table properties that are specific and user-friendly.

•Have merged or appended tables to simplify the tables within your data structure.

•Have good-quality relationships between tables that make sense.


Design a Semantic Model in Power BI

The following further explain how you might work with your tables to ensure a simple and readable
table structure

Configure semantic model and build relationships between tables: Assuming that you've
already retrieved your data and cleaned it in Power Query, you can then go to the Model tab,
where the semantic model is located. To manage relationships amongst your tables,, go
to Manage Relationships on the ribbon. There you can create, edit, and delete relationships
between tables and also autodetect relationships that already exist

While the Manage Relationships feature allows you to configure relationships between tables,
you can also configure table and column properties to ensure organization in your table structure.
Design a Semantic Model in Power BI

Configure table and column properties

The Model view in Power BI desktop provides


many options within the column properties that
you can view or update. A simple method to get
to this menu to update the tables and fields is
by Ctrl+clicking or Shift+clicking items on this
page.

Let’s evaluate the General, Formatting and


Advanced tabs in the figure to your right
Design a Semantic Model in Power BI

Under the General tab, you can:

•Edit the name and description of the column.

•Add synonyms that can be used to identify the column when you are using the Q&A feature.

•Add a column into a folder to further organize the table structure.

•Hide or show the column.

Under the Formatting tab, you can:

•Change the data type.

•Format the date.


Design a Semantic Model in Power BI

Under the Advanced tab, you can:

•Sort by a specific column.

•Assign a specific category to the data.

•Summarize the data.

•Determine if the column or table contains null values.


Design a Semantic Model in Power BI

Create a date table:


Ways that you can build a common date table are:

•Source data

•DAX – Data Analysis Expressions

•Power Query

1. Source data

Occasionally, source databases and data warehouses already have their own date tables. Source data
tables are mature and ready for immediate use. If you have a table as such, bring it into your semantic
model and don't use any other methods that are outlined in this section.
Design a Semantic Model in Power BI

2. DAX

You can use the Data Analysis Expression (DAX) functions CALENDARAUTO() or CALENDAR() to build
your common date table. The CALENDAR() function returns a contiguous range of dates based on a
start and end date that are entered as arguments in the function. Alternatively, the
CALENDARAUTO() function returns a contiguous, complete range of dates that are automatically
determined from your semantic model. The starting date is chosen as the earliest date that exists in
your semantic model, and the ending date is the latest date that exists in your semantic model plus
data that has been populated to the fiscal month that you can choose to include as an argument in
the CALENDARAUTO() function.
.
DAX for CALENDER() function:-
= CALENDAR(DATE(yyyy, mm, dd), DATE(yyyy, mm, dd)
Design a Semantic Model in Power BI
You may also want to see columns for just the year, the month number, the week of the year, and
the day of the week. You can accomplish this task by selecting New Column on the ribbon and
entering the following DAX equation, which will retrieve the year from your Date table.

DAX:- Year(Which is a column name) = YEAR(Dates[Date])

You can perform the same process to retrieve the month number, week number, and day of the
week:

MonthNum = MONTH(Dates[Date])

WeekNum = WEEKNUM(Dates[Date])

DayoftheWeek = FORMAT(Dates[Date], "DDDD")


Design a Semantic Model in Power BI
3. Power Query

You can use M-language, the development language that is used to build queries in Power Query, to
define a common date table.
Select Transform Data in Power BI Desktop, which will direct you to Power Query. In the blank space of
the left Queries pane, right-click to open the following drop-down menu, where you will select New
Query > Blank Query. In the resulting New Query view, enter the following M-formula to build a
calendar table:

= List.Dates(#date(YYYY,MM, DD), 365*10, #duration(1,0,0,0))


Design a Semantic Model in Power BI

This approach ensures that, as new sales data flows in, you won't have to re-create this table.

After you have realized success in the process, you notice that you have a list of dates instead of a table
of dates. To correct this error, go to the Transform tab on the ribbon and select Convert > To Table.
As the name suggests, this feature will convert your list into a table. Your NEXT task is to change the
column type by selecting the icon next to the name of the column and, in the resulting drop-down
menu, selecting the Date type.

You can mark your created date table as the official date table. First task in marking your table as the
official date table is to find the new table on the Fields pane. Right-click the name of the table and then
select Mark as date table, as shown in the following figure. By marking your table as a date table,
Power BI performs validations to ensure that the data contains zero null values, is unique, and contains
continuous date values over a period.
Design a Semantic Model in Power BI

Alternatively, you can also choose specific columns in your table to mark as the date, which can be
useful when you have many columns within your table. Right-click the table, select Mark as date
table, and then select Date table settings. A window will appear, where you can choose which
column should be marked as Date.

Build your visual

Before building your visuals, ensure you have created relationships between your tables if you have
more than one table.
Design a Semantic Model in Power BI

Data granularity
Data granularity is the detail that is represented within your data, meaning that the more granularity
your data has, the greater the level of detail within your data. Generally, the fewer the records that
you are working with, the faster your reports and visuals will function. This approach translates to a
faster refresh rate for the entire semantic model, which might mean that you can refresh more
frequently.

However, that approach has a drawback. If your users want to drill into every single transaction,
summarizing the granularity will prevent them from doing that, which can have a negative impact on
the user experience. It is important to negotiate the level of data granularity with report users, so
they understand the implications of these choices.
Relationships and Cardinality
Power BI has the concept of directionality to a relationship. This directionality plays an important role
in filtering data between multiple tables. When you load data, Power BI automatically looks for
relationships that exist within the data by matching column names. You can also use Manage
Relationships to edit these options manually.

The following are different types of relationships that you'll find in Power BI.
Many-to-one (*:1) or one-to-many (1: *) relationship

•Describes a relationship in which you have many instances of a value in one column that are related
to only one unique corresponding instance in another column.

•Describes the directionality between fact and dimension tables.

•Is the most common type of directionality and is the Power BI default when you are automatically
creating relationships.
Relationships and Cardinality
Power BI has the concept of directionality to a relationship. This directionality plays an important role
in filtering data between multiple tables. When you load data, Power BI automatically looks for
relationships that exist within the data by matching column names. You can also use Manage
Relationships to edit these options manually.

The following are different types of relationships that you'll find in Power BI:

1. Many-to-one (*:1) or one-to-many (1: *) relationship


2. Many-to-many (.) relationship
3. One-to-one (1:1) relationship
Relationships and Cardinality
1. Many-to-one (*:1) or one-to-many (1: *) relationship

•Describes a relationship in which you have many instances of a value in one column that are related
to only one unique corresponding instance in another column.

•Describes the directionality between fact and dimension tables.

•Is the most common type of directionality and is the Power BI default when you are automatically
creating relationships.

An example of a one-to-many relationship would be between the Managers and Employees tables,
where you can have many employees that are associated with one unique manager.
Relationships and Cardinality
2. Many-to-many (.) relationship:

•Describes a relationship where many values are in common between two tables.

•Does not require unique values in either table in a relationship.

•Is not recommended; a lack of unique values introduces ambiguity, and your users might not know
which column of values is referring to what.

For instance, a many-to-many relationship between the Sales and Order tables of McDonalds. Both
tables have an OrderDate column, multiple sales can be associated with multiple orders and vice-
versa . Ambiguity is introduced because both tables can have the same order date.
Relationships and Cardinality
3. One-to-one (1:1) relationship:

•Describes a relationship in which only one instance of a value is common between two tables.

•Requires unique values in both tables.

•Is not recommended because this relationship stores redundant information and suggests that the
model is not designed correctly. It is better practice to combine the tables.

An example of a one-to-one relationship would be if you had products and product IDs in two
different tables. Creating a one-to-one relationship is redundant and these two tables should be
combined.
Cardinaltiy
Relationships and Cardinality
Cross-filter direction
Data can be filtered on one or both sides of a relationship.
With a single cross-filter direction:
•Only one table in a relationship can be used to filter the data. For instance, Table 1 can be filtered by
Table 2, but Table 2 cannot be filtered by Table 1.

•For a one-to-many or many-to-one relationship, the cross-filter direction will be from the "one" side,
meaning that the filtering will occur in the table that has many values.
With both cross-filter directions or bi-directional cross-filtering:
•One table in a relationship can be used to filter the other. For instance, a dimension table can be
filtered through the fact table, and the fact tables can be filtered through the dimension table.

•You might have lower performance when using bi-directional cross-filtering with many-to-many
relationships.
REFERENCES
1. Microsoft Data Analytics (2025), Clean, transform, and load data in Power BI. Microsoft
Learn. Available at:
https://learn.microsoft.com/en-gb/training/modules/clean-data-power-bi/ [Accessed 9th
May 2025].

2. Microsoft Data Analytics (2025), Design a Semantic Model in Power BI. Microsoft Learn

Available at: https://learn.microsoft.com/en-gb/training/modules/design-model-power-bi/ [Accessed


9th May 2025].
What Happens Next?

Week 3: Basics of DAX language,


Measures and Calculated Tables
Get comfortable with building semantic models, Identifying the
various types of relationships and establishing relationships in
Power BI

• Do Knowledge Check to ensure you cover all basis of


content.

• Do quiz 2 which is available from Friday, 16 May 2025.


WHAT’S NEXT?

You are expected to go During class, the A test of the Take time to go
through the content for application of the understanding of the through all activities,
Week 3 before the next
lesson. Cover all activities content covered for topics covered in your lecturer is
on myLMS in preparation week 2 will be Week 2 will be done. tracking your
for the next lesson. covered. progress.

You might also like