0% found this document useful (0 votes)

210 views43 pages

Complex SQL Queries

The document describes several SQL interview questions and solutions related to analyzing transactional datasets. It includes questions about obtaining the third transaction for each user, calculating time spent on different app activities by age group, determining 3-day rolling averages of tweets by user, and identifying the highest grossing products by category. Hints and example inputs/outputs are provided for writing SQL queries to address each problem.

Uploaded by

pradhansnehasis382

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

210 views43 pages

Complex SQL Queries

Uploaded by

pradhansnehasis382

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

User's Third Transaction [Uber SQL

Interview Question]

This is the same question as problem #11 in the SQL Chapter of Ace the Data Science Interview!

Assume you are given the table below on Uber transactions made by users. Write a query to
obtain the third transaction of every user. Output the user id, spend and transaction date.

transactions Table:

Column Name Type

user_id integer

spend decimal

transaction_date timestamp

transactions Example Input:

user_id spend transaction_date

111 100.50 01/08/2022 12:00:00

111 55.00 01/10/2022 12:00:00

121 36.00 01/18/2022 12:00:00

145 24.99 01/26/2022 12:00:00

111 89.60 02/05/2022 12:00:00

Example Output:

user_id spend transaction_date

111 89.60 02/05/2022 12:00:00

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

;with cte

SELECT user_id,spend,transaction_date,

row_number() over(partition by user_id order by transaction_date) as rk

FROM transactions

select user_id,spend,transaction_date

from cte

where rk = 3

Sending vs. Opening Snaps [Snapchat SQL

Interview Question]
This is the same question as problem #25 in the SQL Chapter of Ace the Data Science Interview!

Assume you're given tables with information on Snapchat users, including their ages and time
spent sending and opening snaps.

Write a query to obtain a breakdown of the time spent sending vs. opening snaps as a
percentage of total time spent on these activities grouped by age group. Round the percentage
to 2 decimal places in the output.

Notes:

• Calculate the following percentages:

o time spent sending / (Time spent sending + Time spent opening)
o Time spent opening / (Time spent sending + Time spent opening)
• To avoid integer division in percentages, multiply by 100.0 and not 100.

Effective April 15th, 2023, the solution has been updated and optimised.
activities Table

Column Name Type

activity_id Integer

user_id Integer

activity_type string ('send', 'open', 'chat')

time_spent Float

activity_date Datetime

activities Example Input

activity_id user_id activity_type time_spent activity_date

7274 123 open 4.50 06/22/2022 12:00:00

2425 123 send 3.50 06/22/2022 12:00:00

1413 456 send 5.67 06/23/2022 12:00:00

1414 789 chat 11.00 06/25/2022 12:00:00

2536 456 open 3.00 06/25/2022 12:00:00

age_breakdown Table

Column Name Type

user_id Integer

age_bucket string ('21-25', '26-30', '31-25')

age_breakdown Example Input

user_id age_bucket

123 31-35

456 26-30

789 21-25

Example Output

age_bucket send_perc open_perc

26-30 65.40 34.60

31-35 43.75 56.25

Explanation

Using the age bucket 26-30 as example, the time spent sending snaps was 5.67 and the time
spent opening snaps was 3.

To calculate the percentage of time spent sending snaps, we divide the time spent sending snaps
by the total time spent on sending and opening snaps, which is 5.67 + 3 = 8.67.

So, the percentage of time spent sending snaps is 5.67 / (5.67 + 3) = 65.4%, and the percentage
of time spent opening snaps is 3 / (5.67 + 3) = 34.6%.

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

;with cte

SELECT a.user_id,sum(time_spent) topen FROM activities a

inner join age_breakdown b on a.user_id=b.user_id

where lower(activity_type) = 'open'

group by a.user_id

), cte2 as

SELECT a.user_id,sum(time_spent) as tsend FROM activities a

inner join age_breakdown b on a.user_id=b.user_id

where lower(activity_type) = 'send'

group by a.user_id

),cte3 as

SELECT a.user_id,sum(time_spent) ttotal FROM activities a

inner join age_breakdown b on a.user_id=b.user_id

where lower(activity_type) in ('send','open')

group by a.user_id

select age_bucket,

round(tsend/ttotal*100,2) as send_perc,

round(topen/ttotal*100,2) as open_perc

from cte

inner join cte2 on cte.user_id=cte2.user_id

inner join cte3 on cte.user_id=cte3.user_id

inner join age_breakdown b on b.user_id=cte.user_id

order by age_bucket

Tweets' Rolling Averages [Twitter SQL

Interview Question]
This is the same question as problem #10 in the SQL Chapter of Ace the Data Science Interview!

Given a table of tweet data over a specified time period, calculate the 3-day rolling average of
tweets for each user. Output the user ID, tweet date, and rolling averages rounded to 2 decimal
places.
Notes:

• A rolling average, also known as a moving average or running mean is a time-series

technique that examines trends in data over a specified period of time.
• In this case, we want to determine how the tweet count for each user changes over a 3-
day period.

Effective April 7th, 2023, the problem statement, solution and hints for this question have been
revised.

tweets Table:

Column Name Type

user_id integer

tweet_date timestamp

tweet_count integer

tweets Example Input:

user_id tweet_date tweet_count

111 06/01/2022 00:00:00 2

111 06/02/2022 00:00:00 1

111 06/03/2022 00:00:00 3

111 06/04/2022 00:00:00 4

111 06/05/2022 00:00:00 5

Example Output:

user_id tweet_date rolling_avg_3d

111 06/01/2022 00:00:00 2.00

111 06/02/2022 00:00:00 1.50

user_id tweet_date rolling_avg_3d

111 06/03/2022 00:00:00 2.00

111 06/04/2022 00:00:00 2.67

111 06/05/2022 00:00:00 4.00

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

;with cte

SELECT

user_id,tweet_date,

round(avg(tweet_count) over

(partition by user_id order by tweet_date

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2)

as rsum

FROM tweets

select * from cte

Highest-Grossing Items [Amazon SQL

Interview Question]

This is the same question as problem #12 in the SQL Chapter of Ace the Data Science Interview!

Assume you're given a table with information on Amazon customers and their spending on
products in different categories, write a query to identify the top two highest-grossing products
within each category in the year 2022. The output should include the category, product, and total
spend.

product_spend Table:

Column Name Type

category string

product string

user_id integer

Spend decimal

transaction_date timestamp

product_spend Example Input:

category product user_id spend transaction_date

appliance refrigerator 165 246.00 12/26/2021 12:00:00

appliance refrigerator 123 299.99 03/02/2022 12:00:00

appliance washing machine 123 219.80 03/02/2022 12:00:00

electronics vacuum 178 152.00 04/05/2022 12:00:00

electronics wireless headset 156 249.90 07/08/2022 12:00:00

electronics vacuum 145 189.00 07/15/2022 12:00:00

Example Output:

category product total_spend

appliance refrigerator 299.99

appliance washing machine 219.80

category product total_spend

electronics vacuum 341.00

electronics wireless headset 249.90

The dataset you are querying against may have different input & output - this is just an
example!

;with cte

SELECT

category,product,

sum(spend) as total_spend

FROM product_spend

where extract(year from transaction_date) = '2022'

group by category,product

cte2 as

select

category,

product,

total_spend,

row_number() over(partition by category order by total_spend desc) as rk

from cte

order by category,total_spend DESC

select

category,product,total_spend

from cte2
where rk<=2

Signup Activation Rate [TikTok SQL

Interview Question]

New TikTok users sign up with their emails. They confirmed their signup by replying to the text
confirmation to activate their accounts. Users may receive multiple text messages for account
confirmation until they have confirmed their new account.

A senior analyst is interested to know the activation rate of specified users in the emails table.
Write a query to find the activation rate. Round the percentage to 2 decimal places.

Definitions:

• emails table contain the information of user signup details.

• texts table contains the users' activation information.

Assumptions:

• The analyst is interested in the activation rate of specific users in the emails table, which
may not include all users that could potentially be found in the texts table.
• For example, user 123 in the emails table may not be in the texts table and vice
versa.

Effective April 4th 2023, we added an assumption to the question to provide additional clarity.

emails Table:

Column Name Type

email_id integer

user_id integer

signup_date datetime
emails Example Input:

email_id user_id signup_date

125 7771 06/14/2022 00:00:00

236 6950 07/01/2022 00:00:00

433 1052 07/09/2022 00:00:00

texts Table:

Column Name Type

text_id integer

email_id integer

signup_action varchar

texts Example Input:

text_id email_id signup_action

6878 125 Confirmed

6920 236 Not Confirmed

6994 236 Confirmed

'Confirmed' in signup_action means the user has activated their account and successfully
completed the signup process.

Example Output:

confirm_rate

0.67
Explanation:

67% of users have successfully completed their signup and activated their accounts. The
remaining 33% have not yet replied to the text to confirm their signup.

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

;with cte

SELECT count(e.user_id) as total

FROM emails e

inner join texts t on e.email_id=t.email_id

),cte2 as

SELECT count(t.email_id) as signup_total

FROM emails e

inner join texts t on e.email_id=t.email_id

where lower(t.signup_action) ='confirmed'

select

round(round(signup_total,2)/round(total,2),2)

from cte,cte2

Pharmacy Analytics (Part 4) [CVS Health

SQL Interview Question]

CVS Health is trying to better understand its pharmacy sales, and how well different drugs are
selling.
Write a query to find the top 2 drugs sold, in terms of units sold, for each manufacturer. List your
results in alphabetical order by manufacturer.

pharmacy_sales Table:

Column Name Type

product_id integer

units_sold integer

total_sales decimal

Cogs decimal

manufacturer varchar

Drug varchar

pharmacy_sales Example Input:

product_id units_sold total_sales cogs manufacturer drug

94 132362 2041758.41 1373721.70 Biogen UP and UP

9 37410 293452.54 208876.01 Eli Lilly Zyprexa

50 90484 2521023.73 2742445.9 Eli Lilly Dermasorb

61 77023 500101.61 419174.97 Biogen Varicose Relief

136 144814 1084258.00 1006447.73 Biogen Burkhart

Tizanidine
109 118696 1433109.50 263857.96 Eli Lilly
Hydrochloride

Example Output:

manufacturer top_drugs

Biogen Burkhart
manufacturer top_drugs

Biogen UP and UP

Eli Lilly Tizanidine Hydrochloride

Eli Lilly TA Complete Kit

Explanation

Biogen sold 144,814 units of Burkhart drug (ranked 1) followed by the second highest with
132,362 units of UP and UP drug (ranked 2).

Eli Lilly sold 118,696 units of Tizanidine Hydrochloride drug (ranked 1) followed by the second
highest with 90,484 units of TA Complete Kit drug (ranked 2).

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

;WITH CTE

SELECT MANUFACTURER,DRUG,

ROW_NUMBER() OVER(PARTITION BY MANUFACTURER ORDER BY UNITS_SOLD DESC) AS

TOP_DRUGS

FROM PHARMACY_SALES

SELECT MANUFACTURER, DRUG

FROM CTE

WHERE TOP_DRUGS<=2

Supercloud Customer [Microsoft SQL

Interview Question]
A Microsoft Azure Supercloud customer is a company which buys at least 1 product from each
product category.

Write a query to report the company ID which is a Supercloud customer.

As of 5 Dec 2022, data in the customer_contracts and products tables were updated.

customer_contracts Table:

Column Name Type

customer_id integer

product_id integer

amount integer

customer_contracts Example Input:

customer_id product_id Amount

1 1 1000

1 3 2000

1 5 1500

2 2 3000

2 6 2000

products Table:

Column Name Type

product_id integer

product_category string

product_name string
products Example Input:

product_id product_category product_name

1 Analytics Azure Databricks

2 Analytics Azure Stream Analytics

4 Containers Azure Kubernetes Service

5 Containers Azure Service Fabric

6 Compute Virtual Machines

7 Compute Azure Functions

Example Output:

customer_id

Explanation:

Customer 1 bought from Analytics, Containers, and Compute categories of Azure, and thus is a
Supercloud customer. Customer 2 isn't a Supercloud customer, since they don't buy any
container services from Azure.

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

;with cte

select count(distinct product_category) as pct from products

),cte2 as
(

select customer_id,count(distinct product_category) as ict

from customer_contracts c

inner join products p on c.product_id=p.product_id

group by customer_id

select cte2.customer_id from cte

inner join cte2 on cte.pct = cte2.ict

order by cte2.customer_id

User Shopping Sprees [Amazon SQL

Interview Question]

In an effort to identify high-value customers, Amazon asked for your help to obtain data about
users who go on shopping sprees. A shopping spree occurs when a user makes purchases on 3
or more consecutive days.

List the user IDs who have gone on at least 1 shopping spree in ascending order.

transactions Table:

Column Name Type

user_id integer

amount float

transaction_date timestamp

transactions Example Input:

user_id amount transaction_date

1 9.99 08/01/2022 10:00:00

user_id amount transaction_date

1 55 08/17/2022 10:00:00

2 149.5 08/05/2022 10:00:00

2 4.89 08/06/2022 10:00:00

2 34 08/07/2022 10:00:00

Example Output:

user_id

Explanation

In this example, user_id 2 is the only one who has gone on a shopping spree.

The dataset you are querying against may have different input & output - this is just an
example!

select distinct t.user_id from transactions t

inner join

select transactions.user_id,

extract(days from max(transaction_date)-min(transaction_date)) as diff

from transactions

group by transactions.user_id

) p on t.user_id=p.user_id

where p.diff =2
Histogram of Users and Purchases
[Walmart SQL Interview Question]

This is the same question as problem #13 in the SQL Chapter of Ace the Data Science Interview!

Assume you are given the table on Walmart user transactions. Based on a user's most recent
transaction date, write a query to obtain the users and the number of products bought.

Output the user's most recent transaction date, user ID and the number of products sorted by
the transaction date in chronological order.

P.S. As of 10 Nov 2022, the official solution was changed from output of the transaction date,
number of users and number of products to the current output.

user_transactions Table:

Column Name Type

product_id integer

user_id integer

Spend decimal

transaction_date timestamp

user_transactions Example Input:

product_id user_id spend transaction_date

3673 123 68.90 07/08/2022 12:00:00

9623 123 274.10 07/08/2022 12:00:00

1467 115 19.90 07/08/2022 12:00:00

2513 159 25.00 07/08/2022 12:00:00

1452 159 74.50 07/10/2022 12:00:00

Example Output:

transaction_date user_id purchase_count

07/08/2022 12:00:00 115 1

07/08/2022 12:00:000 123 2

07/10/2022 12:00:00 159 1

The dataset you are querying against may have different input & output - this is just an
example!

;with cte

select user_id,max(transaction_date) as maxd

from user_transactions

group by user_id

select u.transaction_date,u.user_id,count(u.product_id)

from cte

inner join user_transactions u on cte.user_id=u.user_id

and u.transaction_date = cte.maxd

group by u.transaction_date,u.user_id

order by u.transaction_date,u.user_id

Card Launch Success [JPMorgan Chase SQL

Interview Question]

Your team at JPMorgan Chase is soon launching a new credit card. You are asked to estimate
how many cards you'll issue in the first month.
Before you can answer this question, you want to first get some perspective on how well new
credit card launches typically do in their first month.

Write a query that outputs the name of the credit card, and how many cards were issued in its
launch month. The launch month is the earliest record in the monthly_cards_issued table for
a given card. Order the results starting from the biggest issued amount.

monthly_cards_issued Table:

Column Name Type

issue_month integer

issue_year integer

card_name string

issued_amount integer

monthly_cards_issued Example Input:

issue_month issue_year card_name issued_amount

1 2021 Chase Sapphire Reserve 170000

2 2021 Chase Sapphire Reserve 175000

3 2021 Chase Sapphire Reserve 180000

3 2021 Chase Freedom Flex 65000

4 2021 Chase Freedom Flex 70000

Example Output:

card_name issued_amount

Chase Sapphire Reserve 170000

Chase Freedom Flex 65000

Explanation

Chase Sapphire Reserve card was launched on 1/2021 with an issued amount of 170,000 cards
and the Chase Freedom Flex card was launched on 3/2021 with an issued amount of 65,000
cards.

The dataset you are querying against may have different input & output - this is just an
example!

;with cte

SELECT card_name,issue_year,issued_amount,

row_number() over(partition by card_name order by

issue_year, issue_month) as rk

from monthly_cards_issued

select card_name,issued_amount from cte

where rk=1

order by issued_amount DESC

LinkedIn Power Creators (Part 2) [LinkedIn

SQL Interview Question]

The LinkedIn Creator team is looking for power creators who use their personal profile as a
company or influencer page. This means that if someone's Linkedin page has more followers than
all the company they work for, we can safely assume that person is a Power Creator. Keep in
mind that if a person works at multiple companies, we should take into account the company
with the most followers.

Write a query to return the IDs of these LinkedIn power creators in ascending order.

Assumptions:

• A person can work at multiple companies.

• In the case of multiple companies, use the one with largest follower base.

This is the second part of the question, so make sure your start with Part 1 if you haven't
completed that yet!

personal_profiles Table:

Column Name Type

profile_id integer

Name string

followers integer

personal_profiles Example Input:

profile_id name Followers

1 Nick Singh 92,000

2 Zach Wilson 199,000

3 Daliana Liu 171,000

4 Ravit Jain 107,000

5 Vin Vashishta 139,000

6 Susan Wojcicki 39,000

employee_company Table:

Column Name Type

personal_profile_id integer

company_id integer
employee_company Example Input:

personal_profile_id company_id

1 4

1 9

2 2

3 1

4 3

5 6

6 5

company_pages Table:

Column Name Type

company_id integer

Name string

followers integer

company_pages Example Input:

company_id Name followers

1 The Data Science Podcast 8,000

2 Airbnb 700,000

3 The Ravit Show 6,000

4 DataLemur 200
company_id Name followers

5 YouTube 1,6000,000

6 DataScience.Vin 4,500

9 Ace The Data Science Interview 4479

Example Output:

profile_id

This output shows that profile IDs 1-5 are all power creators, meaning that they have more
followers than their each of their company pages, whether they work for 1 company or 3.

The dataset you are querying against may have different input & output - this is just an
example!

;WITH CTE

SELECT PERSONAL_PROFILE_ID,

SUM(FOLLOWERS) AS FLW

FROM employee_company EC

INNER JOIN company_pages C ON EC.COMPANY_ID = C.COMPANY_ID

GROUP BY PERSONAL_PROFILE_ID

ORDER BY PERSONAL_PROFILE_ID

), CTE2 AS
(

SELECT PROFILE_ID,FOLLOWERS

FROM PERSONAL_PROFILES

SELECT PROFILE_ID FROM CTE

INNER JOIN CTE2 ON CTE.PERSONAL_PROFILE_ID=CTE2.PROFILE_ID

WHERE CTE2.FOLLOWERS > CTE.FLW

First Transaction [Etsy SQL Interview

Question]

This is the same question as problem #9 in the SQL Chapter of Ace the Data Science Interview!

Assume you are given the table below on user transactions. Write a query to obtain the list of
customers whose first transaction was valued at $50 or more. Output the number of users.

Clarification:

• Use the transaction_date field to determine which transaction should be labeled as

the first for each user.
• Use a specific function (we can't give too much away!) to account for scenarios where a
user had multiple transactions on the same day, and one of those was the first.

user_transactions Table:

Column Name Type

transaction_id integer

user_id integer

Spend decimal

transaction_date timestamp
user_transactions Example Input:

transaction_id user_id spend transaction_date

759274 111 49.50 02/03/2022 00:00:00

850371 111 51.00 03/15/2022 00:00:00

615348 145 36.30 03/22/2022 00:00:00

137424 156 151.00 04/04/2022 00:00:00

248475 156 87.00 04/16/2022 00:00:00

Example Output:

Users

Explanation: Only user 156 has a first transaction valued over $50.

The dataset you are querying against may have different input & output - this is just an
example!

select count(distinct user_id) as users

from

select user_transactions.user_id,SPEND,

rank() over(partition by user_id order by transaction_date asc) as first_tran

from user_transactions

where T.spend >=50.00

and first_tran =1
International Call Percentage [Verizon SQL
Interview Question]

A phone call is considered an international call when the person calling is in a different country
than the person receiving the call.

What percentage of phone calls are international? Round the result to 1 decimal.

Assumption:

• The caller_id in phone_info table refers to both the caller and receiver.

phone_calls Table:

Column Name Type

caller_id integer

receiver_id integer

call_time timestamp

phone_calls Example Input:

caller_id receiver_id call_time

1 2 2022-07-04 10:13:49

1 5 2022-08-21 23:54:56

5 1 2022-05-13 17:24:06

5 6 2022-03-18 12:11:49
phone_info Table:

Column Name Type

caller_id integer

country_id integer

network integer

phone_number string

phone_info Example Input:

caller_id country_id network phone_number

1 US Verizon +1-212-897-1964

2 US Verizon +1-703-346-9529

3 US Verizon +1-650-828-4774

4 US Verizon +1-415-224-6663

5 IN Vodafone +91 7503-907302

6 IN Vodafone +91 2287-664895

Example Output:

international_calls_pct

50.0

Explanation

There is a total of 4 calls with 2 of them being international calls (from caller_id 1 => receiver_id
5, and caller_id 5 => receiver_id 1). Thus, 2/4 = 50.0%

The dataset you are querying against may have different input & output - this is just an
example!
Gimme a Hint

;WITH CTE

SELECT

SUM(CASE WHEN I.COUNTRY_ID<>I1.COUNTRY_ID THEN 1 ELSE NULL END)

AS INTERNATION_CALLS, COUNT(*) TOTAL_CALLS

FROM PHONE_CALLS P

LEFT JOIN PHONE_INFO I ON P.CALLER_ID = I.CALLER_ID

LEFT JOIN PHONE_INFO I1 ON P.RECEIVER_ID = I1.CALLER_ID

SELECT

ROUND(ROUND(INTERNATION_CALLS,2)/ROUND(TOTAL_CALLS,2)*100,1) AS INT_CALL_PCT

FROM CTE

User Session Activity [Twitter SQL

Interview Question]
This is the same question as problem #24 in the SQL Chapter of Ace the Data Science Interview!

Assume you are given the table containing Twitter user session activities.

Write a query that ranks users according to their total session durations (in minutes) in
descending order for each session type between the start date (2022-01-01) and the end date
(2022-02-01).

Output the user ID, session type, and the ranking of the total session duration.

sessions Table:

Column Name Type

session_id Integer
Column Name Type

user_id Integer

session_type string ("like", "reply", "retweet")

duration integer (in minutes)

start_date Timestamp

session Example Input:

session_id user_id session_type duration start_date

6368 111 Like 3 12/25/2021 12:00:00

1742 111 retweet 6 01/02/2022 12:00:00

8464 222 Reply 8 01/16/2022 12:00:00

7153 111 retweet 5 01/28/2022 12:00:00

3252 333 Reply 15 01/10/2022 12:00:00

Example Output:

user_id session_type ranking

333 reply 1

222 reply 2

111 retweet 1

Explanation: User 333 is listed on the top due to the highest duration of 15 minutes. The ranking
resets on 3rd row as the session type changes.

The dataset you are querying against may have different input & output - this is just an
example!
;WITH CTE

SELECT USER_ID,SESSION_TYPE,SUM(DURATION) AS TDU

FROM SESSIONS

WHERE START_date between '2022-01-01' and '2022-02-01'

GROUP BY USER_ID,SESSION_TYPE

ORDER BY TDU DESC

SELECT

USER_ID,SESSION_TYPE,

RANK() OVER(PARTITION BY SESSION_TYPE ORDER BY TDU DESC)

FROM CTE

Unique Money Transfer Relationships

[PayPal SQL Interview Question]

You are given a table of PayPal payments showing the payer, the recipient, and the amount paid.
A two-way unique relationship is established when two people send money back and forth.
Write a query to find the number of two-way unique relationships in this data.

Assumption:

• A payer can send money to the same recipient multiple times.

payments Table:

Column Name Type

payer_id integer

recipient_id integer
Column Name Type

amount integer

payments Example Input:

payer_id recipient_id amount

101 201 30

201 101 10

101 301 20

301 101 80

201 301 70

Example Output:

unique_relationships

Explanation

There are 2 unique two-way relationships between:

• ID 101 and ID 201

• ID 101 and ID 301

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

SELECT COUNT(PAYER_ID)/2 AS UNIQUE_RELATIONSHIP FROM

SELECT PAYER_ID,RECIPIENT_ID FROM PAYMENTS

INTERSECT

SELECT RECIPIENT_ID,PAYER_ID FROM PAYMENTS

) AS T

Email Table Transformation [Facebook SQL

Interview Question]
Each Facebook user can designate a personal email address, a business email address, and a
recovery email address.

Unfortunately, the table is currently in the wrong format, so you need to transform its structure to
show the following columns (see example output): user id, personal email, business email, and
recovery email. Sort your answer by user id in ascending order.

users Table:

Column Name Type

user_id integer

email_type varchar

email varchar

users Example Input:

user_id email_type Email

123 personal [email protected]

123 business [email protected]

123 recovery [email protected]

234 personal [email protected]

234 business [email protected]

Example Output:

user_id personal business recovery

123 [email protected] [email protected] [email protected]

234 [email protected] [email protected]

Explanation

This task is basically just asking you to pivot/transform the shape of the data. It's all the same
data as the input above, just in different format.

Each row will represent a single user with all three of their emails listed. The first row shows User
ID 123 (who may or may not be Nick Singh); their personal email is [email protected], their
business email is [email protected], and so on.

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

SELECT

USER_ID,

MAX(CASE WHEN lower(EMAIL_TYPE) = 'personal' THEN Email else null end),

MAX(CASE WHEN lower(EMAIL_TYPE) = 'business' THEN Email else null end),

MAX(CASE WHEN lower(EMAIL_TYPE) = 'recovery' THEN Email else null end)

FROM USERS

GROUP BY USER_ID

ORDER BY USER_ID

Photoshop Revenue Analysis [Adobe SQL

Interview Question]
For every customer that bought Photoshop, return a list of the customers, and the total spent on
all the products except for Photoshop products.

Sort your answer by customer ids in ascending order.

adobe_transactions Table:

Column Name Type

customer_id integer

product string

revenue integer

adobe_transactions Example Input:

customer_id product revenue

123 Photoshop 50

123 Premier Pro 100

123 After Effects 50

234 Illustrator 200

234 Premier Pro 100

Example Output:

customer_id revenue

123 150

Explanation: User 123 bought Photoshop, Premier Pro + After Effects, spending $150 for those
products. We don't output user 234 because they didn't buy Photoshop.

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

SELECT customer_id,sum(revenue) FROM ADOBE_TRANSACTIONS

WHERE CUSTOMER_ID IN

SELECT CUSTOMER_ID FROM ADOBE_TRANSACTIONS

WHERE PRODUCT ='Photoshop'

AND PRODUCT NOT IN ('Photoshop')

group by customer_id

order by customer_id

Repeat Purchases on Multiple Days [Stitch

Fix SQL Interview Question]
This is the same question as problem #7 in the SQL Chapter of Ace the Data Science Interview!

Assume you are given the table below containing information on user purchases. Write a query
to obtain the number of users who purchased the same product on two or more different days.
Output the number of unique users.

PS. On 26 Oct 2022, we expanded the purchases data set, thus the official output may vary from
before.

purchases Table:

Column Name Type

user_id integer

product_id integer

quantity integer

purchase_date datetime
purchases Example Input:

user_id product_id quantity purchase_date

536 3223 6 01/11/2022 12:33:44

827 3585 35 02/20/2022 14:05:26

536 3223 5 03/02/2022 09:33:28

536 1435 10 03/02/2022 08:40:00

827 2452 45 04/09/2022 00:00:00

Example Output:

repeat_purchasers

The dataset you are querying against may have different input & output - this is just an
example!

;WITH CTE

SELECT USER_ID,PRODUCT_ID,

CAST(PURCHASE_DATE AS DATE) FROM PURCHASES

SELECT COUNT(DISTINCT P.USER_ID) AS REPEAT_PURCHASES

FROM CTE

INNER JOIN PURCHASES P ON CTE.USER_ID = P.USER_ID AND

CTE.PRODUCT_ID = P.PRODUCT_ID

WHERE

CAST(CTE.PURCHASE_DATE AS DATE) <> CAST(P.PURCHASE_DATE AS DATE)

AND EXTRACT(YEAR FROM P.PURCHASE_DATE) ='2022'

HAVING COUNT(DISTINCT CTE.PURCHASE_DATE) >1

Compressed Mode [Alibaba SQL Interview

Question]

Given a table containing the item count for each order and the frequency of orders with that item
count, write a query to determine the mode of the number of items purchased per order on
Alibaba. If there are several item counts with the same frequency, you should sort them in
ascending order.

Effective April 22nd, 2023, the problem statement and solution have been revised for enhanced
clarity.

items_per_order Table:

Column Name Type

item_count integer

order_occurrences integer

items_per_order Example Input:

item_count order_occurrences

1 500

2 1000

3 800

4 1000
Example Output:

mode

Explanation

Based on the example output, the order_occurrences value of 1000 corresponds to the
highest frequency among all item counts. Specifically, both item counts of 2 and 4 have occurred
1000 times, making them tied for the most common number of occurrences.

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

SELECT I.ITEM_COUNT FROM ITEMS_PER_ORDER AS I

INNER JOIN

SELECT MAX(ORDER_OCCURRENCES) AS order_occurences

FROM items_per_order

) T ON I.ORDER_OCCURRENCES = T.order_occurences

ORDER BY ITEM_COUNT

Compensation Outliers [Accenture SQL

Interview Question]

Your team at Accenture is helping a Fortune 500 client revamp their compensation and benefits
program. The first step in this analysis is to manually review employees who are potentially
overpaid or underpaid.
An employee is considered to be potentially overpaid if they earn more than 2 times the
average salary for people with the same title. Similarly, an employee might be underpaid if they
earn less than half of the average for their title. We'll refer to employees who are both
underpaid and overpaid as compensation outliers for the purposes of this problem.

Write a query that shows the following data for each compensation outlier: employee ID, salary,
and whether they are potentially overpaid or potentially underpaid (refer to Example Output
below).

employee_pay Table:

Column Name Type

employee_id integer

salary integer

title varchar

employee_pay Example Input:

employee_id salary title

101 80000 Data Analyst

102 90000 Data Analyst

103 100000 Data Analyst

104 30000 Data Analyst

105 120000 Data Scientist

106 100000 Data Scientist

107 80000 Data Scientist

108 310000 Data Scientist

Example Output:

employee_id salary status

104 30000 Underpaid

108 310000 Overpaid

Explanation

In this example, 2 employees qualify as compensation outliers. Employee 104 is a Data Analyst,
and the average salary for this position is $75,000. Meanwhile, the salary of employee 104 is less
than $37,500 (half of $75,000); therefore, they are underpaid.

The dataset you are querying against may have different input & output - this is just an
example!

Gimme a Hint

select

employee_id,salary,

case when salary < round(avgsal,0)/2 then 'Underpaid'

when salary > round(avgsal,0)/2 then 'Overpaid' END as status

from employee_pay

inner join

select employee_pay.title,avg(salary) as avgsal

from employee_pay

group by employee_pay.title

t on employee_pay.title= t.title

where employee_id in

104,108,111,112
)

order by employee_id

Advanced SQL
No ratings yet
Advanced SQL
10 pages
SQL (Danny's Diner)
No ratings yet
SQL (Danny's Diner)
33 pages
100 Days Data Analyst Learning Roadmap
No ratings yet
100 Days Data Analyst Learning Roadmap
6 pages
Python Lists: List Initialization
No ratings yet
Python Lists: List Initialization
25 pages
SQL Joins Tutorial: Cross Join, Full Outer Join, Inner Join, Left Join, and Right Join
No ratings yet
SQL Joins Tutorial: Cross Join, Full Outer Join, Inner Join, Left Join, and Right Join
22 pages
100 SQL Questions With Real Examples-2
No ratings yet
100 SQL Questions With Real Examples-2
16 pages
Hoist Replacement Decision Analysis
No ratings yet
Hoist Replacement Decision Analysis
1 page
20 SQL Queries For Interview - Complex SQL Queries For Interview
No ratings yet
20 SQL Queries For Interview - Complex SQL Queries For Interview
8 pages
Data Science Machine Learning
No ratings yet
Data Science Machine Learning
15 pages
Vignesh R 22071471559 Jan 2024: Tcs NQT - It
No ratings yet
Vignesh R 22071471559 Jan 2024: Tcs NQT - It
1 page
Network Monitoring System
No ratings yet
Network Monitoring System
25 pages
Learn SQL For FREE 30 Days ROADMAP by Rishabh Mishra
No ratings yet
Learn SQL For FREE 30 Days ROADMAP by Rishabh Mishra
7 pages
120+ Py Interview Q&A Py
100% (1)
120+ Py Interview Q&A Py
137 pages
Python IQ
No ratings yet
Python IQ
123 pages
Python Keywords
100% (1)
Python Keywords
3 pages
Power BI Interview Questions at Deloitte
0% (1)
Power BI Interview Questions at Deloitte
6 pages
Python JSON: Convert Data Easily
No ratings yet
Python JSON: Convert Data Easily
1 page
3 - Power BI - Query Editor - Row Transformation
100% (1)
3 - Power BI - Query Editor - Row Transformation
43 pages
Python Problem Solving Course Overview
No ratings yet
Python Problem Solving Course Overview
3 pages
Gate 2024 Da Sample Question Paper Final
No ratings yet
Gate 2024 Da Sample Question Paper Final
29 pages
F9 Mind Map
100% (1)
F9 Mind Map
23 pages
Python Interview Questions Overview
No ratings yet
Python Interview Questions Overview
49 pages
Data Cleaning and SQL String Functions
No ratings yet
Data Cleaning and SQL String Functions
7 pages
SQL Introduction
100% (1)
SQL Introduction
67 pages
Python Training Course in Hyderabad
100% (1)
Python Training Course in Hyderabad
10 pages
10 SQL Commands
No ratings yet
10 SQL Commands
18 pages
Jupiter Notebook Tricks
100% (1)
Jupiter Notebook Tricks
9 pages
Variable Assignment - Python PDF
No ratings yet
Variable Assignment - Python PDF
1 page
COMP 2131 - Self-Paced
No ratings yet
COMP 2131 - Self-Paced
19 pages
Python Interview Questions
No ratings yet
Python Interview Questions
23 pages
Python Variables and Operations Guide
No ratings yet
Python Variables and Operations Guide
105 pages
ETL
No ratings yet
ETL
50 pages
DataVisualization 05BH0504pdf 2024 07 04 08 02 44
No ratings yet
DataVisualization 05BH0504pdf 2024 07 04 08 02 44
7 pages
H2o Training Day
No ratings yet
H2o Training Day
180 pages
Data Science
No ratings yet
Data Science
71 pages
Database Systems Scse
No ratings yet
Database Systems Scse
80 pages
Practical R Programming Guide
No ratings yet
Practical R Programming Guide
103 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
49 pages
SQL Subqueries Guide
No ratings yet
SQL Subqueries Guide
7 pages
SQL Server Database Programming Guide
No ratings yet
SQL Server Database Programming Guide
237 pages
The Data Warehouse Toolkit - The Complete - Guide - To - Dimensional - Modeling - Chapter 02
100% (1)
The Data Warehouse Toolkit - The Complete - Guide - To - Dimensional - Modeling - Chapter 02
35 pages
Database Management Systems by Raghu Ramakrishnan: Special Features of Book
No ratings yet
Database Management Systems by Raghu Ramakrishnan: Special Features of Book
3 pages
Retail Sales Analysis Project SQL 1744909207
100% (1)
Retail Sales Analysis Project SQL 1744909207
6 pages
Toad UserGuide
No ratings yet
Toad UserGuide
348 pages
Introduction To Business Intelligence
No ratings yet
Introduction To Business Intelligence
31 pages
Brochure Tableau
50% (2)
Brochure Tableau
4 pages
DBMS Query Examples for Sales Data
No ratings yet
DBMS Query Examples for Sales Data
24 pages
Le Wagon - Data Science Course Syllabus
No ratings yet
Le Wagon - Data Science Course Syllabus
37 pages
Charts in Tableau
No ratings yet
Charts in Tableau
48 pages
Basic SQL: ITCS 201 Web Programming Part II
No ratings yet
Basic SQL: ITCS 201 Web Programming Part II
29 pages
SQL Basics: Accessing and Manipulating Databases
No ratings yet
SQL Basics: Accessing and Manipulating Databases
8 pages
SQL Roadmap for Beginners Guide
No ratings yet
SQL Roadmap for Beginners Guide
3 pages
SQL For Beginners
No ratings yet
SQL For Beginners
171 pages
Data Science Course with 1:1 Coaching
No ratings yet
Data Science Course with 1:1 Coaching
33 pages
SQL Interview Questions Day 13-20
No ratings yet
SQL Interview Questions Day 13-20
23 pages
SQL Exercises
No ratings yet
SQL Exercises
17 pages
Operation Analytics and Investigating Metric Spike
50% (2)
Operation Analytics and Investigating Metric Spike
14 pages
Operation Analytics
No ratings yet
Operation Analytics
10 pages
SQL Real Time Questions 1756357699
No ratings yet
SQL Real Time Questions 1756357699
6 pages
SQL Queries for User Data Analysis
No ratings yet
SQL Queries for User Data Analysis
7 pages
Is LM1
No ratings yet
Is LM1
2 pages
Class 9 - 1 (1) - Reduced
No ratings yet
Class 9 - 1 (1) - Reduced
15 pages
Microbiology 19 - Bacterial Transduction
No ratings yet
Microbiology 19 - Bacterial Transduction
22 pages
Cell Biology & Genetics Study Guide
No ratings yet
Cell Biology & Genetics Study Guide
57 pages
Class 2 - 1
No ratings yet
Class 2 - 1
53 pages
Safety, GM Crops BTC 814-18-19 - Part 2
No ratings yet
Safety, GM Crops BTC 814-18-19 - Part 2
40 pages
National Income 2
No ratings yet
National Income 2
4 pages
India's rDNA Guidelines Timeline
No ratings yet
India's rDNA Guidelines Timeline
37 pages
Pentose Phosphate Pathway Overview
No ratings yet
Pentose Phosphate Pathway Overview
8 pages
Microbial Culture Techniques
No ratings yet
Microbial Culture Techniques
19 pages
Some Questions
No ratings yet
Some Questions
7 pages
Gluconeogenesis
No ratings yet
Gluconeogenesis
11 pages
Signaling
No ratings yet
Signaling
24 pages
Cell Cycle
No ratings yet
Cell Cycle
52 pages
Immunology: Concepts & Clinical Applications
No ratings yet
Immunology: Concepts & Clinical Applications
31 pages
Oxidative Phosphorylation
No ratings yet
Oxidative Phosphorylation
18 pages
BTC402 2 Extra
No ratings yet
BTC402 2 Extra
11 pages
BTC402 4d
No ratings yet
BTC402 4d
23 pages
Bioenergetics and Thermodynamics
No ratings yet
Bioenergetics and Thermodynamics
14 pages
Understanding the BCG Matrix Analysis
No ratings yet
Understanding the BCG Matrix Analysis
30 pages
ELISA and FACS in Antibody Diagnostics
No ratings yet
ELISA and FACS in Antibody Diagnostics
26 pages
BTC 402 - 2f
No ratings yet
BTC 402 - 2f
19 pages
BTC 402 - 2d
No ratings yet
BTC 402 - 2d
23 pages
BTC 402 - 3a
No ratings yet
BTC 402 - 3a
29 pages
Consumer Purchase Decision Process
No ratings yet
Consumer Purchase Decision Process
32 pages
Apnaklub B2B Platform Data Analysis
No ratings yet
Apnaklub B2B Platform Data Analysis
12 pages
Hotel Revenue Insights Challenge
No ratings yet
Hotel Revenue Insights Challenge
12 pages
Ethical Guidelines for Human Research
No ratings yet
Ethical Guidelines for Human Research
4 pages
IPR & Enterpreneurship
No ratings yet
IPR & Enterpreneurship
42 pages
Book Analysis Report
No ratings yet
Book Analysis Report
6 pages
BSBWHS616 Student Assessment Tasks
No ratings yet
BSBWHS616 Student Assessment Tasks
20 pages
Understanding Generalized Anxiety Disorder
No ratings yet
Understanding Generalized Anxiety Disorder
9 pages
Building Construction Sanfoundry Mcqs
100% (1)
Building Construction Sanfoundry Mcqs
43 pages
Sample Test 1: Written
No ratings yet
Sample Test 1: Written
9 pages
O Level Environmental Exam Guide
No ratings yet
O Level Environmental Exam Guide
18 pages
Indian Oil Internship Letter Format
No ratings yet
Indian Oil Internship Letter Format
2 pages
Canguilhem, Georges - The Decline of The Idea of Progress
No ratings yet
Canguilhem, Georges - The Decline of The Idea of Progress
24 pages
CIA I Feb 2025 Seating 25.02.25
No ratings yet
CIA I Feb 2025 Seating 25.02.25
6 pages
Comprehensive Guide To Credentialing Therapy Dog Teams 11296166
No ratings yet
Comprehensive Guide To Credentialing Therapy Dog Teams 11296166
65 pages
Marketers' Guide to Culture & Geography
No ratings yet
Marketers' Guide to Culture & Geography
4 pages
Nurs 350 Pico Paper, Spring 2014
No ratings yet
Nurs 350 Pico Paper, Spring 2014
12 pages
Two Forms of Movements
No ratings yet
Two Forms of Movements
3 pages
Budhanilkantha School Achievements
No ratings yet
Budhanilkantha School Achievements
176 pages
Key Events in Jesus' Early Ministry
No ratings yet
Key Events in Jesus' Early Ministry
8 pages
F1 Housekeeping Schedule
No ratings yet
F1 Housekeeping Schedule
2 pages
How To Make A Sponge Spicule Prep
No ratings yet
How To Make A Sponge Spicule Prep
2 pages
Major Fuction of Computer and Computer Applications For Mass Comm
No ratings yet
Major Fuction of Computer and Computer Applications For Mass Comm
9 pages
SAP Finance SAP Profit Center Tutorial 1674309483
No ratings yet
SAP Finance SAP Profit Center Tutorial 1674309483
10 pages
Grand Hyper Corporate Brochure
No ratings yet
Grand Hyper Corporate Brochure
16 pages
Online Lucky Draw Terms & Conditions
No ratings yet
Online Lucky Draw Terms & Conditions
2 pages
Fibonacci Patterns in Rabbits and Sunflowers
No ratings yet
Fibonacci Patterns in Rabbits and Sunflowers
3 pages
QP - E&et - Mid - 2023-24
No ratings yet
QP - E&et - Mid - 2023-24
3 pages
Confirmation - Delhi - Aloft
No ratings yet
Confirmation - Delhi - Aloft
2 pages
TN3 U3 L4
No ratings yet
TN3 U3 L4
3 pages
Part Number Packer Machine
No ratings yet
Part Number Packer Machine
1 page
Differential Diagnosis of Autism Spectrum Disorder 2022
100% (12)
Differential Diagnosis of Autism Spectrum Disorder 2022
249 pages
Psychology Degree Planning
No ratings yet
Psychology Degree Planning
9 pages
Understanding Grasses, Forbs, and Shrubs
No ratings yet
Understanding Grasses, Forbs, and Shrubs
10 pages
Jose Rizal'S Education in Europe
No ratings yet
Jose Rizal'S Education in Europe
16 pages
4a.copy of BV Format
No ratings yet
4a.copy of BV Format
2 pages