0% found this document useful (0 votes)

199 views30 pages

KPMG Data Analyst Interview Questions

The document provides a comprehensive set of interview questions and answers for a KPMG Data Analyst position, focusing on SQL and Python. It covers various SQL topics including queries for finding the second-highest salary, optimizing slow queries, understanding JOIN vs UNION, handling duplicates, and using window functions. Additionally, it discusses handling missing values in Pandas, the differences between lists and tuples, and the use of lambda functions in Python.

Uploaded by

Lalit Daswani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

199 views30 pages

KPMG Data Analyst Interview Questions

Uploaded by

Lalit Daswani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

KPMG Data Analyst

Interview Questions

SQL Questions
1. Write a SQL query to find the second-highest salary from an
employee table.
Given an employee table with columns:

id | name | salary

---+-------+--------

1 | John | 50000

2 | Alice | 60000

3 | Bob | 70000

4 | David | 70000

5 | Emma | 80000

Solution 1: Using LIMIT with OFFSET (MySQL, PostgreSQL)

SELECT DISTINCT salary

FROM employee

ORDER BY salary DESC

LIMIT 1 OFFSET 1;

• ORDER BY salary DESC: Sorts the salaries in descending order.

• LIMIT 1 OFFSET 1: Skips the highest salary and returns the second-highest salary.

Solution 2: Using MAX() with WHERE

SELECT MAX(salary) AS second_highest_salary

FROM employee

WHERE salary < (SELECT MAX(salary) FROM employee);

• The inner query finds the highest salary.

• The outer query finds the highest salary below the maximum, which is the second-
highest.

Solution 3: Using DENSE_RANK() (Works in SQL Server, PostgreSQL, MySQL 8+)

SELECT salary

FROM (

SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS salary_rank

FROM employee

) ranked_salaries

WHERE salary_rank = 2;

• DENSE_RANK() assigns ranks to unique salaries.

• The outer query filters the row where rank is 2 (second-highest salary).

2. How do you optimize a slow-running SQL query?

Here are key optimization techniques:

1. Use Indexing Efficiently

• Index the columns used in WHERE, JOIN, and ORDER BY.

• Example:

CREATE INDEX idx_salary ON employee(salary);

2. Avoid SELECT *, Use Specific Columns

• Instead of:

SELECT * FROM employee;

• Use:

SELECT id, name FROM employee;

3. Optimize Joins Using Proper Indexing

• Ensure indexed columns are used in JOIN conditions.

4. Use EXPLAIN or EXPLAIN ANALYZE

• Helps analyze query execution.

EXPLAIN SELECT * FROM employee WHERE salary > 50000;

5. Avoid Using Functions in WHERE Clause

• Bad:

SELECT * FROM employee WHERE YEAR(hire_date) = 2023;

• Good:

SELECT * FROM employee WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31';

6. Optimize Subqueries and Use Joins Instead

• Bad:

SELECT name FROM employee WHERE salary = (SELECT MAX(salary) FROM employee);

• Good:

SELECT name FROM employee e JOIN (SELECT MAX(salary) AS max_salary FROM

employee) m ON e.salary = m.max_salary;

7. Use Proper Data Types & Partitioning for Large Tables

• Use partitioning for large datasets to improve query performance.

3. Explain the difference between JOIN and UNION.

Feature JOIN UNION

Combines data from multiple tables based Combines results of two or more
Purpose
on a relationship. queries into a single result set.

Returns rows from multiple

Output Returns columns from multiple tables. queries (same column
structure).

Requires same number of

Condition Uses ON or USING to define relationships.
columns in both queries.

Duplicate Can return duplicates if INNER JOIN, LEFT UNION removes duplicates
Rows JOIN, etc. (UNION ALL keeps them).

SELECT e.id, e.name, d.department FROM SELECT name FROM employee

Example employee e JOIN department d ON UNION SELECT name FROM
e.dept_id = d.id; contractor;

4. Write a query to find duplicate records in a table.

Given a table employees:

id | name | department

---+-------+-----------

1 | John | HR

2 | Alice | IT

3 | John | HR

4 | Bob | IT

5 | John | HR

Solution: Using GROUP BY with HAVING

To find duplicates based on the name and department:

SELECT name, department, COUNT(*) as count

FROM employees
GROUP BY name, department

HAVING COUNT(*) > 1;

• GROUP BY groups records with the same values.

• HAVING COUNT(*) > 1 filters groups with more than one record, indicating
duplicates.

To Find Complete Duplicate Rows

SELECT , COUNT() as count

FROM employees

GROUP BY id, name, department

HAVING COUNT(*) > 1;

This returns:

name | department | count

------+------------+------

John | HR | 3

5. What are window functions in SQL? Give an example.

Definition

Window functions perform calculations across a set of table rows related to the current
row. Unlike aggregate functions, they do not collapse rows into a single output.

Common Window Functions

1. ROW_NUMBER(): Assigns a unique number to each row.

2. RANK(): Assigns a rank, with gaps for ties.

3. DENSE_RANK(): Assigns a rank without gaps.

4. SUM(), AVG(), COUNT(): Aggregate functions used over a window.

Example: Using ROW_NUMBER()

Given a sales table:

id | employee | sales_amount

---+----------+-------------

1 | John | 500

2 | Alice | 700

3 | Bob | 600

4 | John | 800

5 | Alice | 900

Find the top sales for each employee:

SELECT

employee,

sales_amount,

ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sales_amount DESC) as rank

FROM

sales;

Output:

employee | sales_amount | rank

---------+--------------+-----

Alice | 900 | 1

Alice | 700 | 2

Bob | 600 | 1

John | 800 | 1

John | 500 | 2

Explanation:

• PARTITION BY employee: Resets the row numbering for each employee.

• ORDER BY sales_amount DESC: Orders sales in descending order for each

partition.
6. How would you handle missing or null values in SQL?
Common Approaches:

1. Use COALESCE():

o Returns the first non-null value in a list.

o Example:

SELECT COALESCE(phone_number, 'N/A') AS contact_number FROM employees;

2. Use IS NULL / IS NOT NULL:

o Filters records with null or non-null values.

o Example:

SELECT * FROM employees WHERE phone_number IS NULL;

3. Use IFNULL() (MySQL) or NVL() (Oracle):

o Similar to COALESCE() but for two values only.

o Example:

SELECT IFNULL(salary, 0) AS salary FROM employees;

4. Replace Nulls with Aggregate Values:

o Fill nulls with average, sum, or other aggregate values.

o Example:

UPDATE employees

SET salary = (SELECT AVG(salary) FROM employees)

WHERE salary IS NULL;

5. Use CASE Statement:

o Provides more control over how to handle nulls.

o Example:

SELECT

name,
CASE

WHEN phone_number IS NULL THEN 'No Contact'

ELSE phone_number

END AS contact_info

FROM employees;

6. Delete Rows with Null Values:

o Remove incomplete records if necessary.

o Example:

DELETE FROM employees WHERE phone_number IS NULL;

7. Explain the difference between HAVING and WHERE clauses.

Feature WHERE Clause HAVING Clause

Filters rows before

Purpose Filters groups after GROUP BY.
grouping.

GROUP BY, aggregate functions (SUM(),

Used With SELECT, FROM, JOIN.
COUNT()).

Can Use No (Cannot use

Yes (Used for filtering aggregated values).
Aggregates? SUM(), AVG(), etc.).

SELECT customer_id, SUM(amount) FROM sales

SELECT * FROM sales
Example GROUP BY customer_id HAVING SUM(amount)
WHERE amount > 500;
> 500;

Example Demonstration:
Given a sales table:

id | customer_id | amount

---+------------+-------

1 | 101 | 200

2 | 102 | 400
3 | 101 | 300

4 | 103 | 800

5 | 102 | 600

Using WHERE (Filters before grouping):

SELECT customer_id, amount

FROM sales

WHERE amount > 300;

Filters out rows where amount is less than 300 before aggregation.

Using HAVING (Filters after grouping):

SELECT customer_id, SUM(amount) AS total_spent

FROM sales

GROUP BY customer_id

HAVING SUM(amount) > 500;

Groups data first and then filters out customers with SUM(amount) ≤ 500.

8. What is a Common Table Expression (CTE)? How is it different

from a subquery?
Common Table Expression (CTE)

• A temporary result set defined using WITH.

• Improves query readability and reuse.

• Can be used multiple times within the same query.

Example of CTE:

WITH total_sales AS (

SELECT customer_id, SUM(amount) AS total_spent

FROM sales
GROUP BY customer_id

SELECT customer_id, total_spent

FROM total_sales

WHERE total_spent > 500;

• The WITH clause creates a temporary named result set (total_sales).

• The main query then filters customers who spent more than 500.

Difference Between CTE and Subquery

Feature CTE (WITH Clause) Subquery

Harder to read, especially

Readability More readable, reusable
with nesting

Defined once and cannot be

Reusability Can be referenced multiple times
reused

Optimized for recursion & Can be slower in complex

Performance
complex queries cases

Recursion
Yes (RECURSIVE CTE) No recursion
Support

Example of Subquery (Alternative to CTE):

SELECT customer_id, total_spent

FROM (

SELECT customer_id, SUM(amount) AS total_spent

FROM sales

GROUP BY customer_id

) AS total_sales

WHERE total_spent > 500;

• Works but is harder to read compared to a CTE.

9. Write a SQL query to calculate the Customer Lifetime Value
(CLV).
What is CLV?

Customer Lifetime Value (CLV) estimates the total revenue a business can expect from a
customer over their lifetime.

Formula:

CLV=Average Purchase Value×Purchase Frequency×Customer LifespanCLV = \text{Average

Purchase Value} \times \text{Purchase Frequency} \times \text{Customer
Lifespan}CLV=Average Purchase Value×Purchase Frequency×Customer Lifespan

SQL Query to Calculate CLV:

WITH customer_data AS (

SELECT

customer_id,

SUM(amount) AS total_revenue,

COUNT(DISTINCT order_id) AS total_orders,

COUNT(DISTINCT YEAR(order_date)) AS years_active

FROM sales

GROUP BY customer_id

SELECT

customer_id,

(total_revenue / total_orders) * (total_orders / years_active) * 5 AS estimated_CLV

FROM customer_data;

Explanation:

• total_revenue / total_orders → Average purchase value.

• total_orders / years_active → Purchase frequency per year.

• 5 → Assumed customer lifespan (adjust based on business model).

Example Output:

customer_id | estimated_CLV

------------+--------------

101 | 1500.00

102 | 2000.00

103 | 2500.00

10. What are indexes, and how do they improve query

performance?
What is an Index?

• An index is a data structure that improves the speed of data retrieval operations on
a table.

• Similar to a book index—it helps locate information quickly.

Types of Indexes in SQL

Type Description

Primary Index Automatically created on PRIMARY KEY.

Unique Index Ensures column values are unique.

Composite Index Index on multiple columns.

Full-text Index Used for text searches (MySQL, PostgreSQL).

Clustered Index Data is stored in sorted order (SQL Server).

Non-clustered Index Stores pointers to data (MySQL, PostgreSQL).

Creating an Index

CREATE INDEX idx_customer ON sales(customer_id);

Using EXPLAIN to Check Index Usage

EXPLAIN SELECT * FROM sales WHERE customer_id = 101;

If an index is used, it significantly reduces search time.

Performance Improvement:

Query Without Index With Index

SELECT * FROM sales WHERE customer_id = Full table scan Index lookup
101; (Slow) (Fast)

SELECT * FROM sales ORDER BY amount; Sorting required Faster sorting

When to Use Indexes?

Use Indexes When:

• Columns are frequently used in WHERE, JOIN, ORDER BY, GROUP BY.

• Large tables need faster lookups.

Avoid Indexes When:

• Table is small (overhead is unnecessary).

• Columns have low uniqueness (e.g., gender).

• Frequent INSERT, UPDATE, DELETE (Indexes slow down writes).

PYTHON Questions

11. How do you handle missing values in Pandas?

Missing values occur in datasets due to various reasons like data entry errors, sensor
failures, or missing records. Pandas provides several ways to handle missing values.

Checking for Missing Values

import pandas as pd
df = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie', None],

'Age': [25, None, 30, 22],

'Salary': [50000, 60000, None, 45000]

})

print(df.isnull()) # Checks for missing values (True if missing)

print(df.isnull().sum()) # Count of missing values per column

Output:

mathematica

CopyEdit

Name Age Salary

0 False False False

1 False True False

2 False False True

3 True False False

Name 1

Age 1

Salary 1

dtype: int64

Methods to Handle Missing Values

1 .Removing Missing Values (dropna())

• Removes rows or columns with missing values.

df.dropna(inplace=True) # Removes rows with NaN values

df.dropna(axis=1) # Removes columns with NaN values

Use When: Data loss is acceptable.

2 .Filling Missing Values (fillna())

• Replace missing values with a specific value or strategy.

df['Age'].fillna(df['Age'].mean(), inplace=True) # Fill with mean

df['Salary'].fillna(df['Salary'].median(), inplace=True) # Fill with median

df['Name'].fillna("Unknown", inplace=True) # Fill categorical column

Use When: Data should be retained, and an estimate is acceptable.

3 .Forward & Backward Fill (ffill(), bfill())

• Forward Fill (ffill) → Replaces missing values with the previous row value.

• Backward Fill (bfill) → Replaces missing values with the next row value.

df.fillna(method='ffill', inplace=True) # Uses previous row value

df.fillna(method='bfill', inplace=True) # Uses next row value

Use When: Time-series data needs continuity.

4 .Interpolation (interpolate())

• Estimates missing values using interpolation methods.

df.interpolate(method='linear', inplace=True)

Use When: Missing values follow a trend.

12. Explain the difference between a list and a tuple.

Feature List (list) Tuple (tuple)

Definition Ordered, mutable sequence. Ordered, immutable sequence.

Syntax lst = [1, 2, 3] tup = (1, 2, 3)

Can be modified (append(), Cannot be modified once

Mutability
remove()). created.

Performance Slower (modification overhead). Faster (fixed size).

Memory
Uses more memory. Uses less memory.
Usage

When data should remain

Usage When modifications are required.
constant.

Example:

# List Example

my_list = [1, 2, 3]

my_list.append(4) # Allowed

print(my_list) # [1, 2, 3, 4]

# Tuple Example

my_tuple = (1, 2, 3)

# my_tuple.append(4) Not Allowed (Throws Error)

print(my_tuple) # (1, 2, 3)

When to Use?

• Lists: When the data needs to be changed dynamically.

• Tuples: When the data should remain unchanged (e.g., coordinates, database
records).
13. What are lambda functions in Python? Give an example.
What is a Lambda Function?

A lambda function in Python is an anonymous function (without a name) that is used for
short, simple operations. It is defined using the lambda keyword.

Syntax:

lambda arguments: expression

Example:

square = lambda x: x ** 2

print(square(5)) # Output: 25

Equivalent to:

def square(x):

return x ** 2

Why Use Lambda Functions?

• Concise: Reduces the need for defining separate functions.

• Useful for One-time Operations: Often used in functions like map(), filter(), and
sorted().

• Improves Code Readability for short functions.

Common Use Cases of Lambda Functions

1 .Using lambda with map()

Applies a function to every element in an iterable.

numbers = [1, 2, 3, 4, 5]

squared_numbers = list(map(lambda x: x ** 2, numbers))

print(squared_numbers) # [1, 4, 9, 16, 25]

Equivalent to using a loop but shorter.

2 .Using lambda with filter()

Filters elements based on a condition.

numbers = [1, 2, 3, 4, 5, 6]

even_numbers = list(filter(lambda x: x % 2 == 0, numbers))

print(even_numbers) # [2, 4, 6]

Filters only even numbers.

3 .Using lambda with sorted() (Custom Sorting)

students = [('Alice', 25), ('Bob', 20), ('Charlie', 23)]

students_sorted = sorted(students, key=lambda x: x[1]) # Sort by age

print(students_sorted)

Sorts students by age.

Summary Table

Feature Lists Tuples Lambda Functions

Immutable
Definition Mutable sequence Anonymous function
sequence

Syntax [1, 2, 3] (1, 2, 3) lambda x: x + 1

Yes (but for one

Mutability Yes No
expression)

Performance Slower Faster Faster (Short functions)

Common When data needs

Fixed data map(), filter(), sorted()
Use modification
14. How do you read a CSV file using Pandas?
Pandas provides the read_csv() function to load CSV files into a DataFrame.

Basic Syntax:

import pandas as pd

df = pd.read_csv('data.csv') # Load CSV file into a DataFrame

print(df.head()) # Display first 5 rows

Handling Different Scenarios While Reading a CSV

Scenario Solution

CSV has no headers df = pd.read_csv('data.csv', header=None)

Use custom column names df = pd.read_csv('data.csv', names=['A', 'B', 'C'])

Specify delimiter (e.g., ;

df = pd.read_csv('data.csv', delimiter=';')
instead of ,)

Skip rows while reading df = pd.read_csv('data.csv', skiprows=2) (Skips first 2 rows)

Read only specific columns df = pd.read_csv('data.csv', usecols=['Name', 'Salary'])

Handle missing values df = pd.read_csv('data.csv', na_values=['NA', '?', '-'])

df_chunk = pd.read_csv('data.csv', chunksize=1000)

Read a large file in chunks
(Processes in batches)

15. Explain the difference between apply(), map(), and

vectorization in Pandas.
Vectorization
Feature apply() map()
(NumPy/Pandas)

Works on DataFrame & Series Series only Entire column/array

Function Row-wise (axis=1) or Element-wise using built-

Element-wise
Applied column-wise (axis=0) in functions

Slower than
Performance Faster than apply() Fastest (uses NumPy)
vectorization

Element-wise
Use Case Complex operations Mathematical operations
transformation

Example for apply() (Row-wise or Column-wise Transformation)

df['Salary_After_Tax'] = df['Salary'].apply(lambda x: x * 0.9) # Apply a function

Example for map() (Only for Series, Single Value Change)

df['Category'] = df['Category'].map({'A': 'Excellent', 'B': 'Good', 'C': 'Average'})

Example for Vectorization (Fastest Method)

df['New_Salary'] = df['Salary'] * 1.1 # Direct operation on the column

Best Practice: Prefer vectorization when possible for better performance.

16. How would you perform data transformation using Python?

Data transformation involves modifying, aggregating, or restructuring data for better
analysis.

1 .Handling Missing Values

df.fillna(df.mean(), inplace=True) # Replace NaNs with column mean

2 .Data Type Conversion

df['Age'] = df['Age'].astype(int) # Convert float to integer

3 .String Transformations

df['Name'] = df['Name'].str.upper() # Convert names to uppercase

4 .Feature Scaling (Normalization)

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

df[['Salary']] = scaler.fit_transform(df[['Salary']])

5 .One-Hot Encoding (Categorical to Numeric)

df = pd.get_dummies(df, columns=['Gender'], drop_first=True) # Converts 'Male'/'Female'

to 0/1

6 .Aggregation (Grouping Data)

df.groupby('Department')['Salary'].mean() # Get average salary per department

17. What is the difference between NumPy and Pandas?

Both NumPy and Pandas are Python libraries for data manipulation, but they serve
different purposes.

Feature NumPy Pandas

Data analysis & manipulation

Definition Numerical computing library
library

Data Structure ndarray (N-dimensional array) DataFrame (Tabular) & Series (1D)

Slightly slower due to additional

Performance Faster for numerical operations
features

Mathematical operations, ML Data wrangling, analysis,

Use Case
preprocessing visualization

Labeled indexing (row/column

Indexing Uses numerical indexing (0-based)
names)

Built-in Mathematical functions (np.mean(), Data manipulation (df.groupby(),

Functions np.sum()) df.merge())
Example in NumPy (Array Operations)

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr * 2) # [2 4 6 8 10]

Example in Pandas (Tabular Operations)

import pandas as pd

df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Salary': [50000, 60000]})

print(df['Salary'].mean()) # 55000

Best Practice:

• Use NumPy for numerical computations.

• Use Pandas for working with structured data (CSV, Excel, databases).

POWER BI Questions
21. What are Calculated Columns and Measures in Power BI?
Power BI allows users to create Calculated Columns and Measures using DAX (Data
Analysis Expressions), but they serve different purposes.

Feature Calculated Column Measure

A new column added to a table A dynamic calculation applied to

Definition
based on a DAX formula aggregated data

Stored in The table (each row gets a value) Computed dynamically in visuals

Calculation Row-level (computed during data Aggregate-level (computed at

Type load) runtime)
Feature Calculated Column Measure

Can slow down large datasets (uses

Performance Optimized for performance
memory)

TotalPrice = Sales[Quantity] *
Example Total Sales = SUM(Sales[Amount])
Sales[UnitPrice]

Example of a Calculated Column:

• Used when a new field is needed in the data model.

TotalPrice = Sales[Quantity] * Sales[UnitPrice]

Example of a Measure:

• Used when calculations need to be aggregated dynamically in reports.

Total Sales = SUM(Sales[Amount])

22. Explain Power Query and How It Is Used for Data

Transformation.
Power Query is a data transformation tool in Power BI used to clean, reshape, and load
data from multiple sources.

Key Functions of Power Query:

✔ Import data from multiple sources (Excel, SQL, APIs, etc.)
✔ Clean and transform data (remove duplicates, change data types)
✔ Merge & append datasets
✔ Apply custom formulas

Example Workflow in Power Query:

1. Load Data: Import an Excel file or SQL database.

2. Transform Data:

o Remove duplicates

o Fill missing values

o Convert data types

3. Apply Custom Columns:

o Add a column to calculate Profit = Sales - Cost

4. Load to Power BI:

o Apply changes and load the transformed data into Power BI reports.

Power Query Formula Language (M Language)

Power Query uses the M Language for advanced transformations. Example:

= Table.TransformColumns(Sales, {{"Price", each _ * 1.1, type number}})

23. What Are the Different Types of Filters in Power BI?

Power BI provides multiple filter types to refine reports dynamically.

Filter Type Description

Visual Filters Applied to a specific chart or table

Page Filters Applied to all visuals on a single report page

Report Filters Applied to the entire report (all pages)

Allows navigation from a summary report to a detailed

Drillthrough Filters
report

Cross-Filtering & Cross- Interaction between visuals (clicking on one affects

Highlighting another)

Top N Filters Displays only the top N values based on a metric

Filters data dynamically based on a time period (e.g.,

Relative Date Filters
last 30 days)

Example of a Page Filter:

• If a report has sales data from multiple countries, you can filter a single page to
show only India’s sales.

Example of a Top N Filter:

• Show Top 5 Products by sales using the Top N filter.

Best Practice:
Use filters efficiently to improve report performance instead of complex DAX calculations.

24. How Do You Optimize a Power BI Report for Performance?

Optimizing a Power BI report improves speed, efficiency, and user experience. Below are
key optimization techniques:

Data Model Optimization

Use Star Schema instead of a flat table.

Reduce the number of columns by removing unnecessary fields.
Avoid high-cardinality columns (e.g., using Month Name instead of Date).

Query Optimization

Filter data at the source before importing it into Power BI.

Use Power Query transformations instead of complex DAX expressions.
Disable "Auto Date/Time" in Power BI settings to reduce unnecessary tables.

DAX Performance Optimization

Use SUMX, AVERAGEX carefully to avoid row-by-row calculations.

Replace IF conditions with SWITCH for better performance.
Use variables (VAR) to store intermediate calculations.

Report-Level Optimization

Reduce the number of visuals per page (ideally under 8-10 visuals).
Turn off unnecessary interactions between visuals.
Use Aggregations and Summary Tables instead of detailed data.

Example of an Optimized DAX Measure

Bad:

Total Sales = SUMX(Sales, Sales[Quantity] * Sales[Price])

Optimized:

Total Sales = SUM(Sales[Quantity] * Sales[Price])

Result: Less computation, better performance!

25. What is the Difference Between a Star Schema and a

Snowflake Schema?
Both Star Schema and Snowflake Schema are used in Power BI and data warehousing.

Feature Star Schema Snowflake Schema ❄

Fact table with directly linked Dimension tables are normalized into
Structure
dimension tables sub-tables

Complexity Simple structure (faster joins) More complex due to normalization

Performance Faster query performance Slightly slower due to more joins

Requires more storage

Storage Optimized storage (normalized)
(denormalized)

FactSales → DimCustomer, FactSales → DimCustomer →

Example
DimProduct CustomerRegion

Example in Power BI:

• A Star Schema has a Fact Table (Sales) connected directly to Customers,

Products, and Dates.

• A Snowflake Schema further normalizes Customers into Regions and Products

into Categories.

Which One to Use?

✔ Use Star Schema when performance matters (recommended for Power BI).
✔ Use Snowflake Schema when data consistency is a priority.

26. Explain Row-Level Security (RLS) in Power BI.

Row-Level Security (RLS) in Power BI restricts data access based on user roles.

Why Use RLS?

✔ Prevent users from seeing unauthorized data.
✔ Improve data security without creating multiple reports.
✔ Efficient data access control for different user roles.

Types of RLS in Power BI

Type Description

Static RLS Filters are manually assigned (e.g., Sales Manager sees only their region).

Dynamic Uses a USERPRINCIPALNAME() function to filter data based on login

RLS credentials.

Example: Implementing RLS in Power BI

Suppose we have a Sales Table and a Users Table with Region info.

1 .Create a Role in Power BI:

Go to Modeling → Manage Roles → New Role

2 .Apply a DAX Filter:

[Region] = LOOKUPVALUE(Users[Region], Users[Email], USERPRINCIPALNAME())

3 .Test the Role:

Use View As Role to verify if users only see their assigned region.

Result: Users see only their specific data without modifying the report!

27. What is the Purpose of DAX? Give an Example of a DAX

Function.
DAX (Data Analysis Expressions) is a formula language used in Power BI, Power Pivot, and
Analysis Services for performing calculations and aggregations on data.

Purpose of DAX:

✔ Perform custom calculations on data.

✔ Create calculated columns and measures for deeper insights.
✔ Optimize data modeling and business logic.
✔ Enable time-based calculations like YTD, MTD, and QoQ.
Example of a DAX Function:

Calculate Total Sales:

Total Sales = SUM(Sales[Quantity] * Sales[Price])

Calculate Year-to-Date (YTD) Sales:

YTD Sales = TOTALYTD(SUM(Sales[TotalAmount]), Sales[OrderDate])

Result: Enables dynamic calculations in Power BI dashboards!

28. How Do You Create a Relationship Between Tables in Power

BI?
To connect tables in Power BI, we create relationships using primary and foreign keys.

Steps to Create a Relationship:

1 .Go to Model View in Power BI.

2 .Drag and drop the common field between two tables.
3 .Set the relationship type (e.g., One-to-Many (1:M)).
4 .Choose Cross-filter direction (Single or Both).
5 .Click OK and verify.

Example: Relationship Between Sales and Customer Tables

• Sales Table (CustomerID, OrderID, Amount)

• Customers Table (CustomerID, Name, Region)

Result: Allows combining Sales Data with Customer Info in reports!

29. Explain How to Handle Large Datasets in Power BI Efficiently.

Handling large datasets efficiently in Power BI improves performance and reduces report
load time.

Best Practices for Large Datasets

Use Import Mode Instead of DirectQuery (if feasible).
Reduce the Number of Columns (Remove unused fields).
Aggregate Data Before Loading (Use summarized tables).
Use Star Schema Instead of Snowflake Schema.
Enable Query Folding in Power Query (Push transformations to the source).
Optimize DAX Measures (Use variables and avoid iterators).
Use Composite Models (Import + DirectQuery for hybrid performance).

Example: Instead of storing millions of transaction records, create a summary table:

Sales Summary = SUMMARIZE(Sales, Sales[Year], Sales[Product], "Total Sales",

SUM(Sales[Amount]))

Result: Faster queries and better dashboard performance!

30. How Do You Create a Dynamic Dashboard in Power BI?

A Dynamic Dashboard allows users to interact with visuals, filter data, and drill down
for deeper insights.

Steps to Create a Dynamic Dashboard

1 .Use Slicers and Filters for interactivity.

2 .Enable Drill-through and Tooltips for deeper insights.
3 .Create Dynamic Measures using SELECTEDVALUE() and SWITCH().
4 .Use Bookmarks and Buttons to toggle views.
5 .Implement Row-Level Security (RLS) for personalized access.

Example: Dynamic Measure Based on User Selection

Suppose a user selects "Sales" or "Profit" from a slicer:

Dynamic Measure = SWITCH(

SELECTEDVALUE(Metrics[Metric]),

"Sales", SUM(Sales[Amount]),

"Profit", SUM(Sales[Profit])

)
Result: Users can switch between Sales and Profit dynamically!

Amazon Data Analyst Interview Questions - 1
No ratings yet
Amazon Data Analyst Interview Questions - 1
22 pages
SQL Retail Sales Project
No ratings yet
SQL Retail Sales Project
5 pages
HAVING Clause Practice Problems
No ratings yet
HAVING Clause Practice Problems
3 pages
KBR Sand Management Presentation
0% (1)
KBR Sand Management Presentation
20 pages
SQL Queries Interview Questions - Oracle Analytical Functions Part 1
No ratings yet
SQL Queries Interview Questions - Oracle Analytical Functions Part 1
10 pages
Safety Gondola
No ratings yet
Safety Gondola
14 pages
SQL Practice Queries
No ratings yet
SQL Practice Queries
4 pages
Select As From: Firstname (First Name) Employeedetail
100% (1)
Select As From: Firstname (First Name) Employeedetail
7 pages
19 Teradata Interview Questions and Answers For Experienced
No ratings yet
19 Teradata Interview Questions and Answers For Experienced
3 pages
Informatica Scenario Based Interview Questions and Answer - Introduction
No ratings yet
Informatica Scenario Based Interview Questions and Answer - Introduction
4 pages
SQL Practice Questions
No ratings yet
SQL Practice Questions
5 pages
SQL Subquery
No ratings yet
SQL Subquery
7 pages
SQL Server Assingment PDF
No ratings yet
SQL Server Assingment PDF
6 pages
Subquery Question
100% (1)
Subquery Question
4 pages
Teradata Interview Questions
No ratings yet
Teradata Interview Questions
25 pages
Etl Testing New Faqs23
No ratings yet
Etl Testing New Faqs23
3 pages
SQL and Hand Book
No ratings yet
SQL and Hand Book
4 pages
Homemadejobs Blogspot Com
No ratings yet
Homemadejobs Blogspot Com
21 pages
SCD Type1
100% (1)
SCD Type1
16 pages
Answers To Exercises Given in 9-DEC-2011 Batch Related To HR Schema in Oracle Database 11g
100% (1)
Answers To Exercises Given in 9-DEC-2011 Batch Related To HR Schema in Oracle Database 11g
5 pages
SSRS Interview Questions (1) .Odt
No ratings yet
SSRS Interview Questions (1) .Odt
11 pages
Teradata Joins
No ratings yet
Teradata Joins
27 pages
School Case Study
No ratings yet
School Case Study
4 pages
Sulfuro Hach Dr3900
No ratings yet
Sulfuro Hach Dr3900
6 pages
Etl Interview Questions
No ratings yet
Etl Interview Questions
14 pages
Social Stratification
No ratings yet
Social Stratification
30 pages
Correlated Subquery
100% (1)
Correlated Subquery
13 pages
Cao Wang FTA EMA
No ratings yet
Cao Wang FTA EMA
5 pages
SQL Query Question
100% (1)
SQL Query Question
9 pages
Hierarchical Queries
No ratings yet
Hierarchical Queries
11 pages
Newgen Management Trainee: Oracle Technical Orientation Program
No ratings yet
Newgen Management Trainee: Oracle Technical Orientation Program
41 pages
Solution Practice 6 Consolidations 3
No ratings yet
Solution Practice 6 Consolidations 3
8 pages
Practice Exercise Power BI: Tables
No ratings yet
Practice Exercise Power BI: Tables
2 pages
Lecture 7 - Using Subqueries To Solve Queries
No ratings yet
Lecture 7 - Using Subqueries To Solve Queries
19 pages
Questions (SQL) : Saved To The Database. DCL
No ratings yet
Questions (SQL) : Saved To The Database. DCL
15 pages
Dabra, Gwalior
No ratings yet
Dabra, Gwalior
3 pages
Gemba Walk
No ratings yet
Gemba Walk
10 pages
Answer Following Questions: TH TH
100% (1)
Answer Following Questions: TH TH
5 pages
Chapter 6, "Automatic Performance Diagnostics"
No ratings yet
Chapter 6, "Automatic Performance Diagnostics"
7 pages
Functions, Operators, Date - Time Lab Manual: Dummy X Select 777 888 From Dual
No ratings yet
Functions, Operators, Date - Time Lab Manual: Dummy X Select 777 888 From Dual
17 pages
What Is RDBMS (Relational Database Management System) ?
No ratings yet
What Is RDBMS (Relational Database Management System) ?
54 pages
BJT AC Analysis Part 1 PDF
100% (1)
BJT AC Analysis Part 1 PDF
9 pages
SQL Case When Statement
100% (1)
SQL Case When Statement
10 pages
SQLQueries First Editon PDF
No ratings yet
SQLQueries First Editon PDF
43 pages
HR Schema Queries - Ans
No ratings yet
HR Schema Queries - Ans
12 pages
SQL Questions
No ratings yet
SQL Questions
14 pages
SQL Queries
No ratings yet
SQL Queries
28 pages
Real DSA and SQL Interview Questions Solutions
No ratings yet
Real DSA and SQL Interview Questions Solutions
15 pages
DataAnalyst Interview Questions
No ratings yet
DataAnalyst Interview Questions
15 pages
SQL Solved Questions
No ratings yet
SQL Solved Questions
23 pages
Appendix-47
No ratings yet
Appendix-47
9 pages
Top Advanced SQL Interview Questions - ThinkETL
No ratings yet
Top Advanced SQL Interview Questions - ThinkETL
28 pages
CS502 Mcqs MidTerm by Vu Topper RM
No ratings yet
CS502 Mcqs MidTerm by Vu Topper RM
45 pages
FAQ Fo ETL TESTING
No ratings yet
FAQ Fo ETL TESTING
9 pages
When Do You Go For Data Transfer Transformation: Benefits
No ratings yet
When Do You Go For Data Transfer Transformation: Benefits
5 pages
Dax 1
No ratings yet
Dax 1
27 pages
T13 - Tableau Handson
No ratings yet
T13 - Tableau Handson
6 pages
Tech Mahindra Data Analyst Interview Questions
No ratings yet
Tech Mahindra Data Analyst Interview Questions
11 pages
Views and Subqueries
100% (1)
Views and Subqueries
22 pages
Ab Final
No ratings yet
Ab Final
32 pages
Leetcode SQL Interview Questions & Solutions
100% (1)
Leetcode SQL Interview Questions & Solutions
34 pages
SQL Project 1
100% (1)
SQL Project 1
40 pages
7.3 Options - Pricing Binomial-1
No ratings yet
7.3 Options - Pricing Binomial-1
25 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
19 pages
Computer Engineering Technician - Sample Resume
No ratings yet
Computer Engineering Technician - Sample Resume
2 pages
Top 13 SQL Scenario Based Interview Questions With Answers (2024)
No ratings yet
Top 13 SQL Scenario Based Interview Questions With Answers (2024)
10 pages
Declare (@local - Variable Data - Type ( Value) ) : @testvariable 100 @testvariable
100% (1)
Declare (@local - Variable Data - Type ( Value) ) : @testvariable 100 @testvariable
26 pages
IntervalMatch and Slowly Changing Dimensions
No ratings yet
IntervalMatch and Slowly Changing Dimensions
22 pages
Oracle HR Schema Practise Queries
No ratings yet
Oracle HR Schema Practise Queries
10 pages
Classic Cars Script
No ratings yet
Classic Cars Script
4 pages
ACADEMIC_CALENDAR_2025_Approved
No ratings yet
ACADEMIC_CALENDAR_2025_Approved
2 pages
Using Advanced SQL: Department of Information Systems School of Information and Technology
No ratings yet
Using Advanced SQL: Department of Information Systems School of Information and Technology
30 pages
Lesson 1 Economics As Social Science
No ratings yet
Lesson 1 Economics As Social Science
6 pages
Factors and Norms Influencing Unpaid Care Work
No ratings yet
Factors and Norms Influencing Unpaid Care Work
64 pages
PE
No ratings yet
PE
552 pages
Labour Welfare Scheme
No ratings yet
Labour Welfare Scheme
20 pages
SQL Interview Questions
100% (1)
SQL Interview Questions
10 pages
Gland Packing Standards Regulations THINKTANK
No ratings yet
Gland Packing Standards Regulations THINKTANK
9 pages
2022 Q1 OKR Update Supply Chain Tech Update
No ratings yet
2022 Q1 OKR Update Supply Chain Tech Update
5 pages
Herzegovina University Faculty of Social Sciences Dr. Milenko Brkić
No ratings yet
Herzegovina University Faculty of Social Sciences Dr. Milenko Brkić
4 pages
Bro vd10 20140115
No ratings yet
Bro vd10 20140115
2 pages
Industrial Ventilation A Manual of Recommended Practice For Operation and Maintenance 2nd Edition Acgih Download
100% (1)
Industrial Ventilation A Manual of Recommended Practice For Operation and Maintenance 2nd Edition Acgih Download
58 pages
Write The Room
No ratings yet
Write The Room
11 pages
We Tube
No ratings yet
We Tube
38 pages
Review Of: Generated On 2022-12-20
No ratings yet
Review Of: Generated On 2022-12-20
21 pages
Pila ES 100416 BE LCP Assessment 2021 2022
No ratings yet
Pila ES 100416 BE LCP Assessment 2021 2022
4 pages
Geogia Hotel Ghana LTD Vrs Silver Star Auto LTD (J4 34 of 2012) 2012 GHASC 54 (4 December 2012)
No ratings yet
Geogia Hotel Ghana LTD Vrs Silver Star Auto LTD (J4 34 of 2012) 2012 GHASC 54 (4 December 2012)
26 pages
Teradata Vantage SQL Basics 2
No ratings yet
Teradata Vantage SQL Basics 2
62 pages
SQL - Practice Assignments
No ratings yet
SQL - Practice Assignments
3 pages

KPMG Data Analyst Interview Questions

Uploaded by

KPMG Data Analyst Interview Questions

Uploaded by

KPMG Data Analyst

Solution 1: Using LIMIT with OFFSET (MySQL, PostgreSQL)

SELECT DISTINCT salary

ORDER BY salary DESC

• ORDER BY salary DESC: Sorts the salaries in descending order.

Solution 2: Using MAX() with WHERE

SELECT MAX(salary) AS second_highest_salary

WHERE salary < (SELECT MAX(salary) FROM employee);

• The inner query finds the highest salary.

Solution 3: Using DENSE_RANK() (Works in SQL Server, PostgreSQL, MySQL 8+)

SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS salary_rank

• DENSE_RANK() assigns ranks to unique salaries.

2. How do you optimize a slow-running SQL query?

1. Use Indexing Efficiently

• Index the columns used in WHERE, JOIN, and ORDER BY.

CREATE INDEX idx_salary ON employee(salary);

SELECT * FROM employee;

SELECT id, name FROM employee;

3. Optimize Joins Using Proper Indexing

• Ensure indexed columns are used in JOIN conditions.

4. Use EXPLAIN or EXPLAIN ANALYZE

• Helps analyze query execution.

EXPLAIN SELECT * FROM employee WHERE salary > 50000;

5. Avoid Using Functions in WHERE Clause

SELECT * FROM employee WHERE YEAR(hire_date) = 2023;

SELECT * FROM employee WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31';

6. Optimize Subqueries and Use Joins Instead

SELECT name FROM employee e JOIN (SELECT MAX(salary) AS max_salary FROM

7. Use Proper Data Types & Partitioning for Large Tables

• Use partitioning for large datasets to improve query performance.

3. Explain the difference between JOIN and UNION.

Returns rows from multiple

Requires same number of

SELECT e.id, e.name, d.department FROM SELECT name FROM employee

4. Write a query to find duplicate records in a table.

Solution: Using GROUP BY with HAVING

To find duplicates based on the name and department:

SELECT name, department, COUNT(*) as count

HAVING COUNT(*) > 1;

• GROUP BY groups records with the same values.

To Find Complete Duplicate Rows

SELECT *, COUNT(*) as count

GROUP BY id, name, department

HAVING COUNT(*) > 1;

name | department | count

5. What are window functions in SQL? Give an example.

Common Window Functions

1. ROW_NUMBER(): Assigns a unique number to each row.

2. RANK(): Assigns a rank, with gaps for ties.

3. DENSE_RANK(): Assigns a rank without gaps.

4. SUM(), AVG(), COUNT(): Aggregate functions used over a window.

Example: Using ROW_NUMBER()

Given a sales table:

Find the top sales for each employee:

ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sales_amount DESC) as rank

employee | sales_amount | rank

• PARTITION BY employee: Resets the row numbering for each employee.

• ORDER BY sales_amount DESC: Orders sales in descending order for each

o Returns the first non-null value in a list.

SELECT COALESCE(phone_number, 'N/A') AS contact_number FROM employees;

2. Use IS NULL / IS NOT NULL:

o Filters records with null or non-null values.

SELECT * FROM employees WHERE phone_number IS NULL;

3. Use IFNULL() (MySQL) or NVL() (Oracle):

o Similar to COALESCE() but for two values only.

SELECT IFNULL(salary, 0) AS salary FROM employees;

4. Replace Nulls with Aggregate Values:

o Fill nulls with average, sum, or other aggregate values.

SET salary = (SELECT AVG(salary) FROM employees)

WHERE salary IS NULL;

5. Use CASE Statement:

o Provides more control over how to handle nulls.

WHEN phone_number IS NULL THEN 'No Contact'

6. Delete Rows with Null Values:

o Remove incomplete records if necessary.

DELETE FROM employees WHERE phone_number IS NULL;

SELECT , COUNT() as count