KPMG Data Analyst
Interview Questions
SQL Questions
1. Write a SQL query to find the second-highest salary from an
employee table.
Given an employee table with columns:
id | name | salary
---+-------+--------
1 | John | 50000
2 | Alice | 60000
3 | Bob | 70000
4 | David | 70000
5 | Emma | 80000
Solution 1: Using LIMIT with OFFSET (MySQL, PostgreSQL)
SELECT DISTINCT salary
FROM employee
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
• ORDER BY salary DESC: Sorts the salaries in descending order.
• LIMIT 1 OFFSET 1: Skips the highest salary and returns the second-highest salary.
Solution 2: Using MAX() with WHERE
SELECT MAX(salary) AS second_highest_salary
FROM employee
WHERE salary < (SELECT MAX(salary) FROM employee);
• The inner query finds the highest salary.
• The outer query finds the highest salary below the maximum, which is the second-
highest.
Solution 3: Using DENSE_RANK() (Works in SQL Server, PostgreSQL, MySQL 8+)
SELECT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employee
) ranked_salaries
WHERE salary_rank = 2;
• DENSE_RANK() assigns ranks to unique salaries.
• The outer query filters the row where rank is 2 (second-highest salary).
2. How do you optimize a slow-running SQL query?
Here are key optimization techniques:
1. Use Indexing Efficiently
• Index the columns used in WHERE, JOIN, and ORDER BY.
• Example:
CREATE INDEX idx_salary ON employee(salary);
2. Avoid SELECT *, Use Specific Columns
• Instead of:
SELECT * FROM employee;
• Use:
SELECT id, name FROM employee;
3. Optimize Joins Using Proper Indexing
• Ensure indexed columns are used in JOIN conditions.
4. Use EXPLAIN or EXPLAIN ANALYZE
• Helps analyze query execution.
EXPLAIN SELECT * FROM employee WHERE salary > 50000;
5. Avoid Using Functions in WHERE Clause
• Bad:
SELECT * FROM employee WHERE YEAR(hire_date) = 2023;
• Good:
SELECT * FROM employee WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31';
6. Optimize Subqueries and Use Joins Instead
• Bad:
SELECT name FROM employee WHERE salary = (SELECT MAX(salary) FROM employee);
• Good:
SELECT name FROM employee e JOIN (SELECT MAX(salary) AS max_salary FROM
employee) m ON e.salary = m.max_salary;
7. Use Proper Data Types & Partitioning for Large Tables
• Use partitioning for large datasets to improve query performance.
3. Explain the difference between JOIN and UNION.
Feature JOIN UNION
Combines data from multiple tables based Combines results of two or more
Purpose
on a relationship. queries into a single result set.
Returns rows from multiple
Output Returns columns from multiple tables. queries (same column
structure).
Requires same number of
Condition Uses ON or USING to define relationships.
columns in both queries.
Duplicate Can return duplicates if INNER JOIN, LEFT UNION removes duplicates
Rows JOIN, etc. (UNION ALL keeps them).
SELECT e.id, e.name, d.department FROM SELECT name FROM employee
Example employee e JOIN department d ON UNION SELECT name FROM
e.dept_id = d.id; contractor;
4. Write a query to find duplicate records in a table.
Given a table employees:
id | name | department
---+-------+-----------
1 | John | HR
2 | Alice | IT
3 | John | HR
4 | Bob | IT
5 | John | HR
Solution: Using GROUP BY with HAVING
To find duplicates based on the name and department:
SELECT name, department, COUNT(*) as count
FROM employees
GROUP BY name, department
HAVING COUNT(*) > 1;
• GROUP BY groups records with the same values.
• HAVING COUNT(*) > 1 filters groups with more than one record, indicating
duplicates.
To Find Complete Duplicate Rows
SELECT *, COUNT(*) as count
FROM employees
GROUP BY id, name, department
HAVING COUNT(*) > 1;
This returns:
name | department | count
------+------------+------
John | HR | 3
5. What are window functions in SQL? Give an example.
Definition
Window functions perform calculations across a set of table rows related to the current
row. Unlike aggregate functions, they do not collapse rows into a single output.
Common Window Functions
1. ROW_NUMBER(): Assigns a unique number to each row.
2. RANK(): Assigns a rank, with gaps for ties.
3. DENSE_RANK(): Assigns a rank without gaps.
4. SUM(), AVG(), COUNT(): Aggregate functions used over a window.
Example: Using ROW_NUMBER()
Given a sales table:
id | employee | sales_amount
---+----------+-------------
1 | John | 500
2 | Alice | 700
3 | Bob | 600
4 | John | 800
5 | Alice | 900
Find the top sales for each employee:
SELECT
employee,
sales_amount,
ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sales_amount DESC) as rank
FROM
sales;
Output:
employee | sales_amount | rank
---------+--------------+-----
Alice | 900 | 1
Alice | 700 | 2
Bob | 600 | 1
John | 800 | 1
John | 500 | 2
Explanation:
• PARTITION BY employee: Resets the row numbering for each employee.
• ORDER BY sales_amount DESC: Orders sales in descending order for each
partition.
6. How would you handle missing or null values in SQL?
Common Approaches:
1. Use COALESCE():
o Returns the first non-null value in a list.
o Example:
SELECT COALESCE(phone_number, 'N/A') AS contact_number FROM employees;
2. Use IS NULL / IS NOT NULL:
o Filters records with null or non-null values.
o Example:
SELECT * FROM employees WHERE phone_number IS NULL;
3. Use IFNULL() (MySQL) or NVL() (Oracle):
o Similar to COALESCE() but for two values only.
o Example:
SELECT IFNULL(salary, 0) AS salary FROM employees;
4. Replace Nulls with Aggregate Values:
o Fill nulls with average, sum, or other aggregate values.
o Example:
UPDATE employees
SET salary = (SELECT AVG(salary) FROM employees)
WHERE salary IS NULL;
5. Use CASE Statement:
o Provides more control over how to handle nulls.
o Example:
SELECT
name,
CASE
WHEN phone_number IS NULL THEN 'No Contact'
ELSE phone_number
END AS contact_info
FROM employees;
6. Delete Rows with Null Values:
o Remove incomplete records if necessary.
o Example:
DELETE FROM employees WHERE phone_number IS NULL;
7. Explain the difference between HAVING and WHERE clauses.
Feature WHERE Clause HAVING Clause
Filters rows before
Purpose Filters groups after GROUP BY.
grouping.
GROUP BY, aggregate functions (SUM(),
Used With SELECT, FROM, JOIN.
COUNT()).
Can Use No (Cannot use
Yes (Used for filtering aggregated values).
Aggregates? SUM(), AVG(), etc.).
SELECT customer_id, SUM(amount) FROM sales
SELECT * FROM sales
Example GROUP BY customer_id HAVING SUM(amount)
WHERE amount > 500;
> 500;
Example Demonstration:
Given a sales table:
id | customer_id | amount
---+------------+-------
1 | 101 | 200
2 | 102 | 400
3 | 101 | 300
4 | 103 | 800
5 | 102 | 600
Using WHERE (Filters before grouping):
SELECT customer_id, amount
FROM sales
WHERE amount > 300;
Filters out rows where amount is less than 300 before aggregation.
Using HAVING (Filters after grouping):
SELECT customer_id, SUM(amount) AS total_spent
FROM sales
GROUP BY customer_id
HAVING SUM(amount) > 500;
Groups data first and then filters out customers with SUM(amount) ≤ 500.
8. What is a Common Table Expression (CTE)? How is it different
from a subquery?
Common Table Expression (CTE)
• A temporary result set defined using WITH.
• Improves query readability and reuse.
• Can be used multiple times within the same query.
Example of CTE:
WITH total_sales AS (
SELECT customer_id, SUM(amount) AS total_spent
FROM sales
GROUP BY customer_id
SELECT customer_id, total_spent
FROM total_sales
WHERE total_spent > 500;
• The WITH clause creates a temporary named result set (total_sales).
• The main query then filters customers who spent more than 500.
Difference Between CTE and Subquery
Feature CTE (WITH Clause) Subquery
Harder to read, especially
Readability More readable, reusable
with nesting
Defined once and cannot be
Reusability Can be referenced multiple times
reused
Optimized for recursion & Can be slower in complex
Performance
complex queries cases
Recursion
Yes (RECURSIVE CTE) No recursion
Support
Example of Subquery (Alternative to CTE):
SELECT customer_id, total_spent
FROM (
SELECT customer_id, SUM(amount) AS total_spent
FROM sales
GROUP BY customer_id
) AS total_sales
WHERE total_spent > 500;
• Works but is harder to read compared to a CTE.
9. Write a SQL query to calculate the Customer Lifetime Value
(CLV).
What is CLV?
Customer Lifetime Value (CLV) estimates the total revenue a business can expect from a
customer over their lifetime.
Formula:
CLV=Average Purchase Value×Purchase Frequency×Customer LifespanCLV = \text{Average
Purchase Value} \times \text{Purchase Frequency} \times \text{Customer
Lifespan}CLV=Average Purchase Value×Purchase Frequency×Customer Lifespan
SQL Query to Calculate CLV:
WITH customer_data AS (
SELECT
customer_id,
SUM(amount) AS total_revenue,
COUNT(DISTINCT order_id) AS total_orders,
COUNT(DISTINCT YEAR(order_date)) AS years_active
FROM sales
GROUP BY customer_id
SELECT
customer_id,
(total_revenue / total_orders) * (total_orders / years_active) * 5 AS estimated_CLV
FROM customer_data;
Explanation:
• total_revenue / total_orders → Average purchase value.
• total_orders / years_active → Purchase frequency per year.
• 5 → Assumed customer lifespan (adjust based on business model).
Example Output:
customer_id | estimated_CLV
------------+--------------
101 | 1500.00
102 | 2000.00
103 | 2500.00
10. What are indexes, and how do they improve query
performance?
What is an Index?
• An index is a data structure that improves the speed of data retrieval operations on
a table.
• Similar to a book index—it helps locate information quickly.
Types of Indexes in SQL
Type Description
Primary Index Automatically created on PRIMARY KEY.
Unique Index Ensures column values are unique.
Composite Index Index on multiple columns.
Full-text Index Used for text searches (MySQL, PostgreSQL).
Clustered Index Data is stored in sorted order (SQL Server).
Non-clustered Index Stores pointers to data (MySQL, PostgreSQL).
Creating an Index
CREATE INDEX idx_customer ON sales(customer_id);
Using EXPLAIN to Check Index Usage
EXPLAIN SELECT * FROM sales WHERE customer_id = 101;
If an index is used, it significantly reduces search time.
Performance Improvement:
Query Without Index With Index
SELECT * FROM sales WHERE customer_id = Full table scan Index lookup
101; (Slow) (Fast)
SELECT * FROM sales ORDER BY amount; Sorting required Faster sorting
When to Use Indexes?
Use Indexes When:
• Columns are frequently used in WHERE, JOIN, ORDER BY, GROUP BY.
• Large tables need faster lookups.
Avoid Indexes When:
• Table is small (overhead is unnecessary).
• Columns have low uniqueness (e.g., gender).
• Frequent INSERT, UPDATE, DELETE (Indexes slow down writes).
PYTHON Questions
11. How do you handle missing values in Pandas?
Missing values occur in datasets due to various reasons like data entry errors, sensor
failures, or missing records. Pandas provides several ways to handle missing values.
Checking for Missing Values
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', None],
'Age': [25, None, 30, 22],
'Salary': [50000, 60000, None, 45000]
})
print(df.isnull()) # Checks for missing values (True if missing)
print(df.isnull().sum()) # Count of missing values per column
Output:
mathematica
CopyEdit
Name Age Salary
0 False False False
1 False True False
2 False False True
3 True False False
Name 1
Age 1
Salary 1
dtype: int64
Methods to Handle Missing Values
1 .Removing Missing Values (dropna())
• Removes rows or columns with missing values.
df.dropna(inplace=True) # Removes rows with NaN values
df.dropna(axis=1) # Removes columns with NaN values
Use When: Data loss is acceptable.
2 .Filling Missing Values (fillna())
• Replace missing values with a specific value or strategy.
df['Age'].fillna(df['Age'].mean(), inplace=True) # Fill with mean
df['Salary'].fillna(df['Salary'].median(), inplace=True) # Fill with median
df['Name'].fillna("Unknown", inplace=True) # Fill categorical column
Use When: Data should be retained, and an estimate is acceptable.
3 .Forward & Backward Fill (ffill(), bfill())
• Forward Fill (ffill) → Replaces missing values with the previous row value.
• Backward Fill (bfill) → Replaces missing values with the next row value.
df.fillna(method='ffill', inplace=True) # Uses previous row value
df.fillna(method='bfill', inplace=True) # Uses next row value
Use When: Time-series data needs continuity.
4 .Interpolation (interpolate())
• Estimates missing values using interpolation methods.
df.interpolate(method='linear', inplace=True)
Use When: Missing values follow a trend.
12. Explain the difference between a list and a tuple.
Feature List (list) Tuple (tuple)
Definition Ordered, mutable sequence. Ordered, immutable sequence.
Syntax lst = [1, 2, 3] tup = (1, 2, 3)
Can be modified (append(), Cannot be modified once
Mutability
remove()). created.
Performance Slower (modification overhead). Faster (fixed size).
Memory
Uses more memory. Uses less memory.
Usage
When data should remain
Usage When modifications are required.
constant.
Example:
# List Example
my_list = [1, 2, 3]
my_list.append(4) # Allowed
print(my_list) # [1, 2, 3, 4]
# Tuple Example
my_tuple = (1, 2, 3)
# my_tuple.append(4) Not Allowed (Throws Error)
print(my_tuple) # (1, 2, 3)
When to Use?
• Lists: When the data needs to be changed dynamically.
• Tuples: When the data should remain unchanged (e.g., coordinates, database
records).
13. What are lambda functions in Python? Give an example.
What is a Lambda Function?
A lambda function in Python is an anonymous function (without a name) that is used for
short, simple operations. It is defined using the lambda keyword.
Syntax:
lambda arguments: expression
Example:
square = lambda x: x ** 2
print(square(5)) # Output: 25
Equivalent to:
def square(x):
return x ** 2
Why Use Lambda Functions?
• Concise: Reduces the need for defining separate functions.
• Useful for One-time Operations: Often used in functions like map(), filter(), and
sorted().
• Improves Code Readability for short functions.
Common Use Cases of Lambda Functions
1 .Using lambda with map()
Applies a function to every element in an iterable.
numbers = [1, 2, 3, 4, 5]
squared_numbers = list(map(lambda x: x ** 2, numbers))
print(squared_numbers) # [1, 4, 9, 16, 25]
Equivalent to using a loop but shorter.
2 .Using lambda with filter()
Filters elements based on a condition.
numbers = [1, 2, 3, 4, 5, 6]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
print(even_numbers) # [2, 4, 6]
Filters only even numbers.
3 .Using lambda with sorted() (Custom Sorting)
students = [('Alice', 25), ('Bob', 20), ('Charlie', 23)]
students_sorted = sorted(students, key=lambda x: x[1]) # Sort by age
print(students_sorted)
Sorts students by age.
Summary Table
Feature Lists Tuples Lambda Functions
Immutable
Definition Mutable sequence Anonymous function
sequence
Syntax [1, 2, 3] (1, 2, 3) lambda x: x + 1
Yes (but for one
Mutability Yes No
expression)
Performance Slower Faster Faster (Short functions)
Common When data needs
Fixed data map(), filter(), sorted()
Use modification
14. How do you read a CSV file using Pandas?
Pandas provides the read_csv() function to load CSV files into a DataFrame.
Basic Syntax:
import pandas as pd
df = pd.read_csv('data.csv') # Load CSV file into a DataFrame
print(df.head()) # Display first 5 rows
Handling Different Scenarios While Reading a CSV
Scenario Solution
CSV has no headers df = pd.read_csv('data.csv', header=None)
Use custom column names df = pd.read_csv('data.csv', names=['A', 'B', 'C'])
Specify delimiter (e.g., ;
df = pd.read_csv('data.csv', delimiter=';')
instead of ,)
Skip rows while reading df = pd.read_csv('data.csv', skiprows=2) (Skips first 2 rows)
Read only specific columns df = pd.read_csv('data.csv', usecols=['Name', 'Salary'])
Handle missing values df = pd.read_csv('data.csv', na_values=['NA', '?', '-'])
df_chunk = pd.read_csv('data.csv', chunksize=1000)
Read a large file in chunks
(Processes in batches)
15. Explain the difference between apply(), map(), and
vectorization in Pandas.
Vectorization
Feature apply() map()
(NumPy/Pandas)
Works on DataFrame & Series Series only Entire column/array
Function Row-wise (axis=1) or Element-wise using built-
Element-wise
Applied column-wise (axis=0) in functions
Slower than
Performance Faster than apply() Fastest (uses NumPy)
vectorization
Element-wise
Use Case Complex operations Mathematical operations
transformation
Example for apply() (Row-wise or Column-wise Transformation)
df['Salary_After_Tax'] = df['Salary'].apply(lambda x: x * 0.9) # Apply a function
Example for map() (Only for Series, Single Value Change)
df['Category'] = df['Category'].map({'A': 'Excellent', 'B': 'Good', 'C': 'Average'})
Example for Vectorization (Fastest Method)
df['New_Salary'] = df['Salary'] * 1.1 # Direct operation on the column
Best Practice: Prefer vectorization when possible for better performance.
16. How would you perform data transformation using Python?
Data transformation involves modifying, aggregating, or restructuring data for better
analysis.
1 .Handling Missing Values
df.fillna(df.mean(), inplace=True) # Replace NaNs with column mean
2 .Data Type Conversion
df['Age'] = df['Age'].astype(int) # Convert float to integer
3 .String Transformations
df['Name'] = df['Name'].str.upper() # Convert names to uppercase
4 .Feature Scaling (Normalization)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[['Salary']] = scaler.fit_transform(df[['Salary']])
5 .One-Hot Encoding (Categorical to Numeric)
df = pd.get_dummies(df, columns=['Gender'], drop_first=True) # Converts 'Male'/'Female'
to 0/1
6 .Aggregation (Grouping Data)
df.groupby('Department')['Salary'].mean() # Get average salary per department
17. What is the difference between NumPy and Pandas?
Both NumPy and Pandas are Python libraries for data manipulation, but they serve
different purposes.
Feature NumPy Pandas
Data analysis & manipulation
Definition Numerical computing library
library
Data Structure ndarray (N-dimensional array) DataFrame (Tabular) & Series (1D)
Slightly slower due to additional
Performance Faster for numerical operations
features
Mathematical operations, ML Data wrangling, analysis,
Use Case
preprocessing visualization
Labeled indexing (row/column
Indexing Uses numerical indexing (0-based)
names)
Built-in Mathematical functions (np.mean(), Data manipulation (df.groupby(),
Functions np.sum()) df.merge())
Example in NumPy (Array Operations)
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr * 2) # [2 4 6 8 10]
Example in Pandas (Tabular Operations)
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Salary': [50000, 60000]})
print(df['Salary'].mean()) # 55000
Best Practice:
• Use NumPy for numerical computations.
• Use Pandas for working with structured data (CSV, Excel, databases).
POWER BI Questions
21. What are Calculated Columns and Measures in Power BI?
Power BI allows users to create Calculated Columns and Measures using DAX (Data
Analysis Expressions), but they serve different purposes.
Feature Calculated Column Measure
A new column added to a table A dynamic calculation applied to
Definition
based on a DAX formula aggregated data
Stored in The table (each row gets a value) Computed dynamically in visuals
Calculation Row-level (computed during data Aggregate-level (computed at
Type load) runtime)
Feature Calculated Column Measure
Can slow down large datasets (uses
Performance Optimized for performance
memory)
TotalPrice = Sales[Quantity] *
Example Total Sales = SUM(Sales[Amount])
Sales[UnitPrice]
Example of a Calculated Column:
• Used when a new field is needed in the data model.
TotalPrice = Sales[Quantity] * Sales[UnitPrice]
Example of a Measure:
• Used when calculations need to be aggregated dynamically in reports.
Total Sales = SUM(Sales[Amount])
22. Explain Power Query and How It Is Used for Data
Transformation.
Power Query is a data transformation tool in Power BI used to clean, reshape, and load
data from multiple sources.
Key Functions of Power Query:
✔ Import data from multiple sources (Excel, SQL, APIs, etc.)
✔ Clean and transform data (remove duplicates, change data types)
✔ Merge & append datasets
✔ Apply custom formulas
Example Workflow in Power Query:
1. Load Data: Import an Excel file or SQL database.
2. Transform Data:
o Remove duplicates
o Fill missing values
o Convert data types
3. Apply Custom Columns:
o Add a column to calculate Profit = Sales - Cost
4. Load to Power BI:
o Apply changes and load the transformed data into Power BI reports.
Power Query Formula Language (M Language)
Power Query uses the M Language for advanced transformations. Example:
= Table.TransformColumns(Sales, {{"Price", each _ * 1.1, type number}})
23. What Are the Different Types of Filters in Power BI?
Power BI provides multiple filter types to refine reports dynamically.
Filter Type Description
Visual Filters Applied to a specific chart or table
Page Filters Applied to all visuals on a single report page
Report Filters Applied to the entire report (all pages)
Allows navigation from a summary report to a detailed
Drillthrough Filters
report
Cross-Filtering & Cross- Interaction between visuals (clicking on one affects
Highlighting another)
Top N Filters Displays only the top N values based on a metric
Filters data dynamically based on a time period (e.g.,
Relative Date Filters
last 30 days)
Example of a Page Filter:
• If a report has sales data from multiple countries, you can filter a single page to
show only India’s sales.
Example of a Top N Filter:
• Show Top 5 Products by sales using the Top N filter.
Best Practice:
Use filters efficiently to improve report performance instead of complex DAX calculations.
24. How Do You Optimize a Power BI Report for Performance?
Optimizing a Power BI report improves speed, efficiency, and user experience. Below are
key optimization techniques:
Data Model Optimization
Use Star Schema instead of a flat table.
Reduce the number of columns by removing unnecessary fields.
Avoid high-cardinality columns (e.g., using Month Name instead of Date).
Query Optimization
Filter data at the source before importing it into Power BI.
Use Power Query transformations instead of complex DAX expressions.
Disable "Auto Date/Time" in Power BI settings to reduce unnecessary tables.
DAX Performance Optimization
Use SUMX, AVERAGEX carefully to avoid row-by-row calculations.
Replace IF conditions with SWITCH for better performance.
Use variables (VAR) to store intermediate calculations.
Report-Level Optimization
Reduce the number of visuals per page (ideally under 8-10 visuals).
Turn off unnecessary interactions between visuals.
Use Aggregations and Summary Tables instead of detailed data.
Example of an Optimized DAX Measure
Bad:
Total Sales = SUMX(Sales, Sales[Quantity] * Sales[Price])
Optimized:
Total Sales = SUM(Sales[Quantity] * Sales[Price])
Result: Less computation, better performance!
25. What is the Difference Between a Star Schema and a
Snowflake Schema?
Both Star Schema and Snowflake Schema are used in Power BI and data warehousing.
Feature Star Schema Snowflake Schema ❄
Fact table with directly linked Dimension tables are normalized into
Structure
dimension tables sub-tables
Complexity Simple structure (faster joins) More complex due to normalization
Performance Faster query performance Slightly slower due to more joins
Requires more storage
Storage Optimized storage (normalized)
(denormalized)
FactSales → DimCustomer, FactSales → DimCustomer →
Example
DimProduct CustomerRegion
Example in Power BI:
• A Star Schema has a Fact Table (Sales) connected directly to Customers,
Products, and Dates.
• A Snowflake Schema further normalizes Customers into Regions and Products
into Categories.
Which One to Use?
✔ Use Star Schema when performance matters (recommended for Power BI).
✔ Use Snowflake Schema when data consistency is a priority.
26. Explain Row-Level Security (RLS) in Power BI.
Row-Level Security (RLS) in Power BI restricts data access based on user roles.
Why Use RLS?
✔ Prevent users from seeing unauthorized data.
✔ Improve data security without creating multiple reports.
✔ Efficient data access control for different user roles.
Types of RLS in Power BI
Type Description
Static RLS Filters are manually assigned (e.g., Sales Manager sees only their region).
Dynamic Uses a USERPRINCIPALNAME() function to filter data based on login
RLS credentials.
Example: Implementing RLS in Power BI
Suppose we have a Sales Table and a Users Table with Region info.
1 .Create a Role in Power BI:
Go to Modeling → Manage Roles → New Role
2 .Apply a DAX Filter:
[Region] = LOOKUPVALUE(Users[Region], Users[Email], USERPRINCIPALNAME())
3 .Test the Role:
Use View As Role to verify if users only see their assigned region.
Result: Users see only their specific data without modifying the report!
27. What is the Purpose of DAX? Give an Example of a DAX
Function.
DAX (Data Analysis Expressions) is a formula language used in Power BI, Power Pivot, and
Analysis Services for performing calculations and aggregations on data.
Purpose of DAX:
✔ Perform custom calculations on data.
✔ Create calculated columns and measures for deeper insights.
✔ Optimize data modeling and business logic.
✔ Enable time-based calculations like YTD, MTD, and QoQ.
Example of a DAX Function:
Calculate Total Sales:
Total Sales = SUM(Sales[Quantity] * Sales[Price])
Calculate Year-to-Date (YTD) Sales:
YTD Sales = TOTALYTD(SUM(Sales[TotalAmount]), Sales[OrderDate])
Result: Enables dynamic calculations in Power BI dashboards!
28. How Do You Create a Relationship Between Tables in Power
BI?
To connect tables in Power BI, we create relationships using primary and foreign keys.
Steps to Create a Relationship:
1 .Go to Model View in Power BI.
2 .Drag and drop the common field between two tables.
3 .Set the relationship type (e.g., One-to-Many (1:M)).
4 .Choose Cross-filter direction (Single or Both).
5 .Click OK and verify.
Example: Relationship Between Sales and Customer Tables
• Sales Table (CustomerID, OrderID, Amount)
• Customers Table (CustomerID, Name, Region)
Result: Allows combining Sales Data with Customer Info in reports!
29. Explain How to Handle Large Datasets in Power BI Efficiently.
Handling large datasets efficiently in Power BI improves performance and reduces report
load time.
Best Practices for Large Datasets
Use Import Mode Instead of DirectQuery (if feasible).
Reduce the Number of Columns (Remove unused fields).
Aggregate Data Before Loading (Use summarized tables).
Use Star Schema Instead of Snowflake Schema.
Enable Query Folding in Power Query (Push transformations to the source).
Optimize DAX Measures (Use variables and avoid iterators).
Use Composite Models (Import + DirectQuery for hybrid performance).
Example: Instead of storing millions of transaction records, create a summary table:
Sales Summary = SUMMARIZE(Sales, Sales[Year], Sales[Product], "Total Sales",
SUM(Sales[Amount]))
Result: Faster queries and better dashboard performance!
30. How Do You Create a Dynamic Dashboard in Power BI?
A Dynamic Dashboard allows users to interact with visuals, filter data, and drill down
for deeper insights.
Steps to Create a Dynamic Dashboard
1 .Use Slicers and Filters for interactivity.
2 .Enable Drill-through and Tooltips for deeper insights.
3 .Create Dynamic Measures using SELECTEDVALUE() and SWITCH().
4 .Use Bookmarks and Buttons to toggle views.
5 .Implement Row-Level Security (RLS) for personalized access.
Example: Dynamic Measure Based on User Selection
Suppose a user selects "Sales" or "Profit" from a slicer:
Dynamic Measure = SWITCH(
SELECTEDVALUE(Metrics[Metric]),
"Sales", SUM(Sales[Amount]),
"Profit", SUM(Sales[Profit])
)
Result: Users can switch between Sales and Profit dynamically!