User's Third Transaction [Uber SQL
Interview Question]
This is the same question as problem #11 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the table below on Uber transactions made by users. Write a query to
obtain the third transaction of every user. Output the user id, spend and transaction date.
transactions Table:
Column Name Type
user_id integer
spend decimal
transaction_date timestamp
transactions Example Input:
user_id spend transaction_date
111 100.50 01/08/2022 12:00:00
111 55.00 01/10/2022 12:00:00
121 36.00 01/18/2022 12:00:00
145 24.99 01/26/2022 12:00:00
111 89.60 02/05/2022 12:00:00
Example Output:
user_id spend transaction_date
111 89.60 02/05/2022 12:00:00
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
;with cte
as
SELECT user_id,spend,transaction_date,
row_number() over(partition by user_id order by transaction_date) as rk
FROM transactions
select user_id,spend,transaction_date
from cte
where rk = 3
Sending vs. Opening Snaps [Snapchat SQL
Interview Question]
This is the same question as problem #25 in the SQL Chapter of Ace the Data Science Interview!
Assume you're given tables with information on Snapchat users, including their ages and time
spent sending and opening snaps.
Write a query to obtain a breakdown of the time spent sending vs. opening snaps as a
percentage of total time spent on these activities grouped by age group. Round the percentage
to 2 decimal places in the output.
Notes:
• Calculate the following percentages:
o time spent sending / (Time spent sending + Time spent opening)
o Time spent opening / (Time spent sending + Time spent opening)
• To avoid integer division in percentages, multiply by 100.0 and not 100.
Effective April 15th, 2023, the solution has been updated and optimised.
activities Table
Column Name Type
activity_id Integer
user_id Integer
activity_type string ('send', 'open', 'chat')
time_spent Float
activity_date Datetime
activities Example Input
activity_id user_id activity_type time_spent activity_date
7274 123 open 4.50 06/22/2022 12:00:00
2425 123 send 3.50 06/22/2022 12:00:00
1413 456 send 5.67 06/23/2022 12:00:00
1414 789 chat 11.00 06/25/2022 12:00:00
2536 456 open 3.00 06/25/2022 12:00:00
age_breakdown Table
Column Name Type
user_id Integer
age_bucket string ('21-25', '26-30', '31-25')
age_breakdown Example Input
user_id age_bucket
123 31-35
456 26-30
789 21-25
Example Output
age_bucket send_perc open_perc
26-30 65.40 34.60
31-35 43.75 56.25
Explanation
Using the age bucket 26-30 as example, the time spent sending snaps was 5.67 and the time
spent opening snaps was 3.
To calculate the percentage of time spent sending snaps, we divide the time spent sending snaps
by the total time spent on sending and opening snaps, which is 5.67 + 3 = 8.67.
So, the percentage of time spent sending snaps is 5.67 / (5.67 + 3) = 65.4%, and the percentage
of time spent opening snaps is 3 / (5.67 + 3) = 34.6%.
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
;with cte
as
SELECT a.user_id,sum(time_spent) topen FROM activities a
inner join age_breakdown b on a.user_id=b.user_id
where lower(activity_type) = 'open'
group by a.user_id
), cte2 as
SELECT a.user_id,sum(time_spent) as tsend FROM activities a
inner join age_breakdown b on a.user_id=b.user_id
where lower(activity_type) = 'send'
group by a.user_id
),cte3 as
SELECT a.user_id,sum(time_spent) ttotal FROM activities a
inner join age_breakdown b on a.user_id=b.user_id
where lower(activity_type) in ('send','open')
group by a.user_id
select age_bucket,
round(tsend/ttotal*100,2) as send_perc,
round(topen/ttotal*100,2) as open_perc
from cte
inner join cte2 on cte.user_id=cte2.user_id
inner join cte3 on cte.user_id=cte3.user_id
inner join age_breakdown b on b.user_id=cte.user_id
order by age_bucket
Tweets' Rolling Averages [Twitter SQL
Interview Question]
This is the same question as problem #10 in the SQL Chapter of Ace the Data Science Interview!
Given a table of tweet data over a specified time period, calculate the 3-day rolling average of
tweets for each user. Output the user ID, tweet date, and rolling averages rounded to 2 decimal
places.
Notes:
• A rolling average, also known as a moving average or running mean is a time-series
technique that examines trends in data over a specified period of time.
• In this case, we want to determine how the tweet count for each user changes over a 3-
day period.
Effective April 7th, 2023, the problem statement, solution and hints for this question have been
revised.
tweets Table:
Column Name Type
user_id integer
tweet_date timestamp
tweet_count integer
tweets Example Input:
user_id tweet_date tweet_count
111 06/01/2022 00:00:00 2
111 06/02/2022 00:00:00 1
111 06/03/2022 00:00:00 3
111 06/04/2022 00:00:00 4
111 06/05/2022 00:00:00 5
Example Output:
user_id tweet_date rolling_avg_3d
111 06/01/2022 00:00:00 2.00
111 06/02/2022 00:00:00 1.50
user_id tweet_date rolling_avg_3d
111 06/03/2022 00:00:00 2.00
111 06/04/2022 00:00:00 2.67
111 06/05/2022 00:00:00 4.00
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
;with cte
as
SELECT
user_id,tweet_date,
round(avg(tweet_count) over
(partition by user_id order by tweet_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2)
as rsum
FROM tweets
select * from cte
Highest-Grossing Items [Amazon SQL
Interview Question]
This is the same question as problem #12 in the SQL Chapter of Ace the Data Science Interview!
Assume you're given a table with information on Amazon customers and their spending on
products in different categories, write a query to identify the top two highest-grossing products
within each category in the year 2022. The output should include the category, product, and total
spend.
product_spend Table:
Column Name Type
category string
product string
user_id integer
Spend decimal
transaction_date timestamp
product_spend Example Input:
category product user_id spend transaction_date
appliance refrigerator 165 246.00 12/26/2021 12:00:00
appliance refrigerator 123 299.99 03/02/2022 12:00:00
appliance washing machine 123 219.80 03/02/2022 12:00:00
electronics vacuum 178 152.00 04/05/2022 12:00:00
electronics wireless headset 156 249.90 07/08/2022 12:00:00
electronics vacuum 145 189.00 07/15/2022 12:00:00
Example Output:
category product total_spend
appliance refrigerator 299.99
appliance washing machine 219.80
category product total_spend
electronics vacuum 341.00
electronics wireless headset 249.90
The dataset you are querying against may have different input & output - this is just an
example!
;with cte
as
SELECT
category,product,
sum(spend) as total_spend
FROM product_spend
where extract(year from transaction_date) = '2022'
group by category,product
),
cte2 as
select
category,
product,
total_spend,
row_number() over(partition by category order by total_spend desc) as rk
from cte
order by category,total_spend DESC
select
category,product,total_spend
from cte2
where rk<=2
Signup Activation Rate [TikTok SQL
Interview Question]
New TikTok users sign up with their emails. They confirmed their signup by replying to the text
confirmation to activate their accounts. Users may receive multiple text messages for account
confirmation until they have confirmed their new account.
A senior analyst is interested to know the activation rate of specified users in the emails table.
Write a query to find the activation rate. Round the percentage to 2 decimal places.
Definitions:
• emails table contain the information of user signup details.
• texts table contains the users' activation information.
Assumptions:
• The analyst is interested in the activation rate of specific users in the emails table, which
may not include all users that could potentially be found in the texts table.
• For example, user 123 in the emails table may not be in the texts table and vice
versa.
Effective April 4th 2023, we added an assumption to the question to provide additional clarity.
emails Table:
Column Name Type
email_id integer
user_id integer
signup_date datetime
emails Example Input:
email_id user_id signup_date
125 7771 06/14/2022 00:00:00
236 6950 07/01/2022 00:00:00
433 1052 07/09/2022 00:00:00
texts Table:
Column Name Type
text_id integer
email_id integer
signup_action varchar
texts Example Input:
text_id email_id signup_action
6878 125 Confirmed
6920 236 Not Confirmed
6994 236 Confirmed
'Confirmed' in signup_action means the user has activated their account and successfully
completed the signup process.
Example Output:
confirm_rate
0.67
Explanation:
67% of users have successfully completed their signup and activated their accounts. The
remaining 33% have not yet replied to the text to confirm their signup.
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
;with cte
as
SELECT count(e.user_id) as total
FROM emails e
inner join texts t on e.email_id=t.email_id
),cte2 as
SELECT count(t.email_id) as signup_total
FROM emails e
inner join texts t on e.email_id=t.email_id
where lower(t.signup_action) ='confirmed'
select
round(round(signup_total,2)/round(total,2),2)
from cte,cte2
Pharmacy Analytics (Part 4) [CVS Health
SQL Interview Question]
CVS Health is trying to better understand its pharmacy sales, and how well different drugs are
selling.
Write a query to find the top 2 drugs sold, in terms of units sold, for each manufacturer. List your
results in alphabetical order by manufacturer.
pharmacy_sales Table:
Column Name Type
product_id integer
units_sold integer
total_sales decimal
Cogs decimal
manufacturer varchar
Drug varchar
pharmacy_sales Example Input:
product_id units_sold total_sales cogs manufacturer drug
94 132362 2041758.41 1373721.70 Biogen UP and UP
9 37410 293452.54 208876.01 Eli Lilly Zyprexa
50 90484 2521023.73 2742445.9 Eli Lilly Dermasorb
61 77023 500101.61 419174.97 Biogen Varicose Relief
136 144814 1084258.00 1006447.73 Biogen Burkhart
Tizanidine
109 118696 1433109.50 263857.96 Eli Lilly
Hydrochloride
Example Output:
manufacturer top_drugs
Biogen Burkhart
manufacturer top_drugs
Biogen UP and UP
Eli Lilly Tizanidine Hydrochloride
Eli Lilly TA Complete Kit
Explanation
Biogen sold 144,814 units of Burkhart drug (ranked 1) followed by the second highest with
132,362 units of UP and UP drug (ranked 2).
Eli Lilly sold 118,696 units of Tizanidine Hydrochloride drug (ranked 1) followed by the second
highest with 90,484 units of TA Complete Kit drug (ranked 2).
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
;WITH CTE
AS
SELECT MANUFACTURER,DRUG,
ROW_NUMBER() OVER(PARTITION BY MANUFACTURER ORDER BY UNITS_SOLD DESC) AS
TOP_DRUGS
FROM PHARMACY_SALES
SELECT MANUFACTURER, DRUG
FROM CTE
WHERE TOP_DRUGS<=2
Supercloud Customer [Microsoft SQL
Interview Question]
A Microsoft Azure Supercloud customer is a company which buys at least 1 product from each
product category.
Write a query to report the company ID which is a Supercloud customer.
As of 5 Dec 2022, data in the customer_contracts and products tables were updated.
customer_contracts Table:
Column Name Type
customer_id integer
product_id integer
amount integer
customer_contracts Example Input:
customer_id product_id Amount
1 1 1000
1 3 2000
1 5 1500
2 2 3000
2 6 2000
products Table:
Column Name Type
product_id integer
product_category string
product_name string
products Example Input:
product_id product_category product_name
1 Analytics Azure Databricks
2 Analytics Azure Stream Analytics
4 Containers Azure Kubernetes Service
5 Containers Azure Service Fabric
6 Compute Virtual Machines
7 Compute Azure Functions
Example Output:
customer_id
Explanation:
Customer 1 bought from Analytics, Containers, and Compute categories of Azure, and thus is a
Supercloud customer. Customer 2 isn't a Supercloud customer, since they don't buy any
container services from Azure.
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
;with cte
as
select count(distinct product_category) as pct from products
),cte2 as
(
select customer_id,count(distinct product_category) as ict
from customer_contracts c
inner join products p on c.product_id=p.product_id
group by customer_id
select cte2.customer_id from cte
inner join cte2 on cte.pct = cte2.ict
order by cte2.customer_id
User Shopping Sprees [Amazon SQL
Interview Question]
In an effort to identify high-value customers, Amazon asked for your help to obtain data about
users who go on shopping sprees. A shopping spree occurs when a user makes purchases on 3
or more consecutive days.
List the user IDs who have gone on at least 1 shopping spree in ascending order.
transactions Table:
Column Name Type
user_id integer
amount float
transaction_date timestamp
transactions Example Input:
user_id amount transaction_date
1 9.99 08/01/2022 10:00:00
user_id amount transaction_date
1 55 08/17/2022 10:00:00
2 149.5 08/05/2022 10:00:00
2 4.89 08/06/2022 10:00:00
2 34 08/07/2022 10:00:00
Example Output:
user_id
Explanation
In this example, user_id 2 is the only one who has gone on a shopping spree.
The dataset you are querying against may have different input & output - this is just an
example!
select distinct t.user_id from transactions t
inner join
select transactions.user_id,
extract(days from max(transaction_date)-min(transaction_date)) as diff
from transactions
group by transactions.user_id
) p on t.user_id=p.user_id
where p.diff =2
Histogram of Users and Purchases
[Walmart SQL Interview Question]
This is the same question as problem #13 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the table on Walmart user transactions. Based on a user's most recent
transaction date, write a query to obtain the users and the number of products bought.
Output the user's most recent transaction date, user ID and the number of products sorted by
the transaction date in chronological order.
P.S. As of 10 Nov 2022, the official solution was changed from output of the transaction date,
number of users and number of products to the current output.
user_transactions Table:
Column Name Type
product_id integer
user_id integer
Spend decimal
transaction_date timestamp
user_transactions Example Input:
product_id user_id spend transaction_date
3673 123 68.90 07/08/2022 12:00:00
9623 123 274.10 07/08/2022 12:00:00
1467 115 19.90 07/08/2022 12:00:00
2513 159 25.00 07/08/2022 12:00:00
1452 159 74.50 07/10/2022 12:00:00
Example Output:
transaction_date user_id purchase_count
07/08/2022 12:00:00 115 1
07/08/2022 12:00:000 123 2
07/10/2022 12:00:00 159 1
The dataset you are querying against may have different input & output - this is just an
example!
;with cte
as
select user_id,max(transaction_date) as maxd
from user_transactions
group by user_id
select u.transaction_date,u.user_id,count(u.product_id)
from cte
inner join user_transactions u on cte.user_id=u.user_id
and u.transaction_date = cte.maxd
group by u.transaction_date,u.user_id
order by u.transaction_date,u.user_id
Card Launch Success [JPMorgan Chase SQL
Interview Question]
Your team at JPMorgan Chase is soon launching a new credit card. You are asked to estimate
how many cards you'll issue in the first month.
Before you can answer this question, you want to first get some perspective on how well new
credit card launches typically do in their first month.
Write a query that outputs the name of the credit card, and how many cards were issued in its
launch month. The launch month is the earliest record in the monthly_cards_issued table for
a given card. Order the results starting from the biggest issued amount.
monthly_cards_issued Table:
Column Name Type
issue_month integer
issue_year integer
card_name string
issued_amount integer
monthly_cards_issued Example Input:
issue_month issue_year card_name issued_amount
1 2021 Chase Sapphire Reserve 170000
2 2021 Chase Sapphire Reserve 175000
3 2021 Chase Sapphire Reserve 180000
3 2021 Chase Freedom Flex 65000
4 2021 Chase Freedom Flex 70000
Example Output:
card_name issued_amount
Chase Sapphire Reserve 170000
Chase Freedom Flex 65000
Explanation
Chase Sapphire Reserve card was launched on 1/2021 with an issued amount of 170,000 cards
and the Chase Freedom Flex card was launched on 3/2021 with an issued amount of 65,000
cards.
The dataset you are querying against may have different input & output - this is just an
example!
;with cte
as
SELECT card_name,issue_year,issued_amount,
row_number() over(partition by card_name order by
issue_year, issue_month) as rk
from monthly_cards_issued
select card_name,issued_amount from cte
where rk=1
order by issued_amount DESC
LinkedIn Power Creators (Part 2) [LinkedIn
SQL Interview Question]
The LinkedIn Creator team is looking for power creators who use their personal profile as a
company or influencer page. This means that if someone's Linkedin page has more followers than
all the company they work for, we can safely assume that person is a Power Creator. Keep in
mind that if a person works at multiple companies, we should take into account the company
with the most followers.
Write a query to return the IDs of these LinkedIn power creators in ascending order.
Assumptions:
• A person can work at multiple companies.
• In the case of multiple companies, use the one with largest follower base.
This is the second part of the question, so make sure your start with Part 1 if you haven't
completed that yet!
personal_profiles Table:
Column Name Type
profile_id integer
Name string
followers integer
personal_profiles Example Input:
profile_id name Followers
1 Nick Singh 92,000
2 Zach Wilson 199,000
3 Daliana Liu 171,000
4 Ravit Jain 107,000
5 Vin Vashishta 139,000
6 Susan Wojcicki 39,000
employee_company Table:
Column Name Type
personal_profile_id integer
company_id integer
employee_company Example Input:
personal_profile_id company_id
1 4
1 9
2 2
3 1
4 3
5 6
6 5
company_pages Table:
Column Name Type
company_id integer
Name string
followers integer
company_pages Example Input:
company_id Name followers
1 The Data Science Podcast 8,000
2 Airbnb 700,000
3 The Ravit Show 6,000
4 DataLemur 200
company_id Name followers
5 YouTube 1,6000,000
6 DataScience.Vin 4,500
9 Ace The Data Science Interview 4479
Example Output:
profile_id
This output shows that profile IDs 1-5 are all power creators, meaning that they have more
followers than their each of their company pages, whether they work for 1 company or 3.
The dataset you are querying against may have different input & output - this is just an
example!
;WITH CTE
AS
SELECT PERSONAL_PROFILE_ID,
SUM(FOLLOWERS) AS FLW
FROM employee_company EC
INNER JOIN company_pages C ON EC.COMPANY_ID = C.COMPANY_ID
GROUP BY PERSONAL_PROFILE_ID
ORDER BY PERSONAL_PROFILE_ID
), CTE2 AS
(
SELECT PROFILE_ID,FOLLOWERS
FROM PERSONAL_PROFILES
SELECT PROFILE_ID FROM CTE
INNER JOIN CTE2 ON CTE.PERSONAL_PROFILE_ID=CTE2.PROFILE_ID
WHERE CTE2.FOLLOWERS > CTE.FLW
First Transaction [Etsy SQL Interview
Question]
This is the same question as problem #9 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the table below on user transactions. Write a query to obtain the list of
customers whose first transaction was valued at $50 or more. Output the number of users.
Clarification:
• Use the transaction_date field to determine which transaction should be labeled as
the first for each user.
• Use a specific function (we can't give too much away!) to account for scenarios where a
user had multiple transactions on the same day, and one of those was the first.
user_transactions Table:
Column Name Type
transaction_id integer
user_id integer
Spend decimal
transaction_date timestamp
user_transactions Example Input:
transaction_id user_id spend transaction_date
759274 111 49.50 02/03/2022 00:00:00
850371 111 51.00 03/15/2022 00:00:00
615348 145 36.30 03/22/2022 00:00:00
137424 156 151.00 04/04/2022 00:00:00
248475 156 87.00 04/16/2022 00:00:00
Example Output:
Users
Explanation: Only user 156 has a first transaction valued over $50.
The dataset you are querying against may have different input & output - this is just an
example!
select count(distinct user_id) as users
from
select user_transactions.user_id,SPEND,
rank() over(partition by user_id order by transaction_date asc) as first_tran
from user_transactions
)t
where T.spend >=50.00
and first_tran =1
International Call Percentage [Verizon SQL
Interview Question]
A phone call is considered an international call when the person calling is in a different country
than the person receiving the call.
What percentage of phone calls are international? Round the result to 1 decimal.
Assumption:
• The caller_id in phone_info table refers to both the caller and receiver.
phone_calls Table:
Column Name Type
caller_id integer
receiver_id integer
call_time timestamp
phone_calls Example Input:
caller_id receiver_id call_time
1 2 2022-07-04 10:13:49
1 5 2022-08-21 23:54:56
5 1 2022-05-13 17:24:06
5 6 2022-03-18 12:11:49
phone_info Table:
Column Name Type
caller_id integer
country_id integer
network integer
phone_number string
phone_info Example Input:
caller_id country_id network phone_number
1 US Verizon +1-212-897-1964
2 US Verizon +1-703-346-9529
3 US Verizon +1-650-828-4774
4 US Verizon +1-415-224-6663
5 IN Vodafone +91 7503-907302
6 IN Vodafone +91 2287-664895
Example Output:
international_calls_pct
50.0
Explanation
There is a total of 4 calls with 2 of them being international calls (from caller_id 1 => receiver_id
5, and caller_id 5 => receiver_id 1). Thus, 2/4 = 50.0%
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
;WITH CTE
AS
SELECT
SUM(CASE WHEN I.COUNTRY_ID<>I1.COUNTRY_ID THEN 1 ELSE NULL END)
AS INTERNATION_CALLS, COUNT(*) TOTAL_CALLS
FROM PHONE_CALLS P
LEFT JOIN PHONE_INFO I ON P.CALLER_ID = I.CALLER_ID
LEFT JOIN PHONE_INFO I1 ON P.RECEIVER_ID = I1.CALLER_ID
SELECT
ROUND(ROUND(INTERNATION_CALLS,2)/ROUND(TOTAL_CALLS,2)*100,1) AS INT_CALL_PCT
FROM CTE
User Session Activity [Twitter SQL
Interview Question]
This is the same question as problem #24 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the table containing Twitter user session activities.
Write a query that ranks users according to their total session durations (in minutes) in
descending order for each session type between the start date (2022-01-01) and the end date
(2022-02-01).
Output the user ID, session type, and the ranking of the total session duration.
sessions Table:
Column Name Type
session_id Integer
Column Name Type
user_id Integer
session_type string ("like", "reply", "retweet")
duration integer (in minutes)
start_date Timestamp
session Example Input:
session_id user_id session_type duration start_date
6368 111 Like 3 12/25/2021 12:00:00
1742 111 retweet 6 01/02/2022 12:00:00
8464 222 Reply 8 01/16/2022 12:00:00
7153 111 retweet 5 01/28/2022 12:00:00
3252 333 Reply 15 01/10/2022 12:00:00
Example Output:
user_id session_type ranking
333 reply 1
222 reply 2
111 retweet 1
Explanation: User 333 is listed on the top due to the highest duration of 15 minutes. The ranking
resets on 3rd row as the session type changes.
The dataset you are querying against may have different input & output - this is just an
example!
;WITH CTE
AS
SELECT USER_ID,SESSION_TYPE,SUM(DURATION) AS TDU
FROM SESSIONS
WHERE START_date between '2022-01-01' and '2022-02-01'
GROUP BY USER_ID,SESSION_TYPE
ORDER BY TDU DESC
SELECT
USER_ID,SESSION_TYPE,
RANK() OVER(PARTITION BY SESSION_TYPE ORDER BY TDU DESC)
FROM CTE
Unique Money Transfer Relationships
[PayPal SQL Interview Question]
You are given a table of PayPal payments showing the payer, the recipient, and the amount paid.
A two-way unique relationship is established when two people send money back and forth.
Write a query to find the number of two-way unique relationships in this data.
Assumption:
• A payer can send money to the same recipient multiple times.
payments Table:
Column Name Type
payer_id integer
recipient_id integer
Column Name Type
amount integer
payments Example Input:
payer_id recipient_id amount
101 201 30
201 101 10
101 301 20
301 101 80
201 301 70
Example Output:
unique_relationships
Explanation
There are 2 unique two-way relationships between:
• ID 101 and ID 201
• ID 101 and ID 301
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
SELECT COUNT(PAYER_ID)/2 AS UNIQUE_RELATIONSHIP FROM
SELECT PAYER_ID,RECIPIENT_ID FROM PAYMENTS
INTERSECT
SELECT RECIPIENT_ID,PAYER_ID FROM PAYMENTS
) AS T
Email Table Transformation [Facebook SQL
Interview Question]
Each Facebook user can designate a personal email address, a business email address, and a
recovery email address.
Unfortunately, the table is currently in the wrong format, so you need to transform its structure to
show the following columns (see example output): user id, personal email, business email, and
recovery email. Sort your answer by user id in ascending order.
users Table:
Column Name Type
user_id integer
email_type varchar
email varchar
users Example Input:
user_id email_type Email
Example Output:
user_id personal business recovery
Explanation
This task is basically just asking you to pivot/transform the shape of the data. It's all the same
data as the input above, just in different format.
Each row will represent a single user with all three of their emails listed. The first row shows User
ID 123 (who may or may not be Nick Singh); their personal email is [email protected], their
business email is [email protected], and so on.
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
SELECT
USER_ID,
MAX(CASE WHEN lower(EMAIL_TYPE) = 'personal' THEN Email else null end),
MAX(CASE WHEN lower(EMAIL_TYPE) = 'business' THEN Email else null end),
MAX(CASE WHEN lower(EMAIL_TYPE) = 'recovery' THEN Email else null end)
FROM USERS
GROUP BY USER_ID
ORDER BY USER_ID
Photoshop Revenue Analysis [Adobe SQL
Interview Question]
For every customer that bought Photoshop, return a list of the customers, and the total spent on
all the products except for Photoshop products.
Sort your answer by customer ids in ascending order.
adobe_transactions Table:
Column Name Type
customer_id integer
product string
revenue integer
adobe_transactions Example Input:
customer_id product revenue
123 Photoshop 50
123 Premier Pro 100
123 After Effects 50
234 Illustrator 200
234 Premier Pro 100
Example Output:
customer_id revenue
123 150
Explanation: User 123 bought Photoshop, Premier Pro + After Effects, spending $150 for those
products. We don't output user 234 because they didn't buy Photoshop.
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
SELECT customer_id,sum(revenue) FROM ADOBE_TRANSACTIONS
WHERE CUSTOMER_ID IN
SELECT CUSTOMER_ID FROM ADOBE_TRANSACTIONS
WHERE PRODUCT ='Photoshop'
AND PRODUCT NOT IN ('Photoshop')
group by customer_id
order by customer_id
Repeat Purchases on Multiple Days [Stitch
Fix SQL Interview Question]
This is the same question as problem #7 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the table below containing information on user purchases. Write a query
to obtain the number of users who purchased the same product on two or more different days.
Output the number of unique users.
PS. On 26 Oct 2022, we expanded the purchases data set, thus the official output may vary from
before.
purchases Table:
Column Name Type
user_id integer
product_id integer
quantity integer
purchase_date datetime
purchases Example Input:
user_id product_id quantity purchase_date
536 3223 6 01/11/2022 12:33:44
827 3585 35 02/20/2022 14:05:26
536 3223 5 03/02/2022 09:33:28
536 1435 10 03/02/2022 08:40:00
827 2452 45 04/09/2022 00:00:00
Example Output:
repeat_purchasers
The dataset you are querying against may have different input & output - this is just an
example!
;WITH CTE
AS
SELECT USER_ID,PRODUCT_ID,
CAST(PURCHASE_DATE AS DATE) FROM PURCHASES
SELECT COUNT(DISTINCT P.USER_ID) AS REPEAT_PURCHASES
FROM CTE
INNER JOIN PURCHASES P ON CTE.USER_ID = P.USER_ID AND
CTE.PRODUCT_ID = P.PRODUCT_ID
WHERE
CAST(CTE.PURCHASE_DATE AS DATE) <> CAST(P.PURCHASE_DATE AS DATE)
AND EXTRACT(YEAR FROM P.PURCHASE_DATE) ='2022'
HAVING COUNT(DISTINCT CTE.PURCHASE_DATE) >1
Compressed Mode [Alibaba SQL Interview
Question]
Given a table containing the item count for each order and the frequency of orders with that item
count, write a query to determine the mode of the number of items purchased per order on
Alibaba. If there are several item counts with the same frequency, you should sort them in
ascending order.
Effective April 22nd, 2023, the problem statement and solution have been revised for enhanced
clarity.
items_per_order Table:
Column Name Type
item_count integer
order_occurrences integer
items_per_order Example Input:
item_count order_occurrences
1 500
2 1000
3 800
4 1000
Example Output:
mode
Explanation
Based on the example output, the order_occurrences value of 1000 corresponds to the
highest frequency among all item counts. Specifically, both item counts of 2 and 4 have occurred
1000 times, making them tied for the most common number of occurrences.
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
SELECT I.ITEM_COUNT FROM ITEMS_PER_ORDER AS I
INNER JOIN
SELECT MAX(ORDER_OCCURRENCES) AS order_occurences
FROM items_per_order
) T ON I.ORDER_OCCURRENCES = T.order_occurences
ORDER BY ITEM_COUNT
Compensation Outliers [Accenture SQL
Interview Question]
Your team at Accenture is helping a Fortune 500 client revamp their compensation and benefits
program. The first step in this analysis is to manually review employees who are potentially
overpaid or underpaid.
An employee is considered to be potentially overpaid if they earn more than 2 times the
average salary for people with the same title. Similarly, an employee might be underpaid if they
earn less than half of the average for their title. We'll refer to employees who are both
underpaid and overpaid as compensation outliers for the purposes of this problem.
Write a query that shows the following data for each compensation outlier: employee ID, salary,
and whether they are potentially overpaid or potentially underpaid (refer to Example Output
below).
employee_pay Table:
Column Name Type
employee_id integer
salary integer
title varchar
employee_pay Example Input:
employee_id salary title
101 80000 Data Analyst
102 90000 Data Analyst
103 100000 Data Analyst
104 30000 Data Analyst
105 120000 Data Scientist
106 100000 Data Scientist
107 80000 Data Scientist
108 310000 Data Scientist
Example Output:
employee_id salary status
104 30000 Underpaid
108 310000 Overpaid
Explanation
In this example, 2 employees qualify as compensation outliers. Employee 104 is a Data Analyst,
and the average salary for this position is $75,000. Meanwhile, the salary of employee 104 is less
than $37,500 (half of $75,000); therefore, they are underpaid.
The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint
select
employee_id,salary,
case when salary < round(avgsal,0)/2 then 'Underpaid'
when salary > round(avgsal,0)/2 then 'Overpaid' END as status
from employee_pay
inner join
select employee_pay.title,avg(salary) as avgsal
from employee_pay
group by employee_pay.title
t on employee_pay.title= t.title
where employee_id in
104,108,111,112
)
order by employee_id