Interview Prep Guide
Interview Prep Guide
Interview
GUIDE
Written by Karun Thankachan, Sai Kumar Bysani, Meri Nova, Dawn Choo
About the Authors
I am Sai Kumar, 3 years ago I moved to the United States to pursue my master's in data
science from the University of Connecticut. I had 0 work experience in the Data field and 0
corporate work experience. I am currently working as a lead data analyst at Blue Cross Blue
Shield of South Carolina. I also worked as a Data Scientist at LEGO as an Intern. I have
been consistently creating data content and have been giving back to the data community
ever since. I have been featured on Times Square, Fox, NBC, and 20 more articles for my
work in the field of data science. Feel free to shoot a message on my LinkedIn:)
I am also the founder of Ask Data Dawn – a career coaching service aimed at helping others
get their dream job in the Data field.
Karun Thankachan
I am also a mentor on Topmate! For free Data Science Coaching and Interview Preparation
Resources, check out - https://topmate.io/karun
Also, recently started a community for folks to continue to progress in Data Science, check it
out here and follow to be get a chance to join this invite-only community.
https://www.linkedin.com/company/buildml
Meri Nova
Nowadays, I am building the largest Data Community called Break Into Data. I invite the
most influential experts in Data for weekly speaker sessions and host practical project-based
workshops and coding challenges. Join us to upskill and land your next job in data!
Let’s Get started!
Welcome to your one-stop roadmap to cracking Data Analyst (DA) & Data Scientist
(DS) interviews.
This guide is not for the novice. We assume you have experience coding
(Python/SQL) and have done data-related projects. The guide is for those who
are looking to land a DA/DS over the next few months.
The guide provides an introduction to the three roles - Data Analyst, Applied Data
Scientist, and Product Data Scientist. It's important to understand what projects in
these roles look like to know which role fits best with your current experience.
Applying to relevant roles will improve your chance of cracking it.
The guide then goes over the types of interviews you could face in each role and
how links resources to prepare for them. The key benefit of the guide is that it's a
minimal set of resources that you can finish within limited time.
● identify the role you want to crack based on the Role Expectations section
● identify the type of interview you will face in that role
● prepare using the provided resources in the Interview Preparation section
Table of contents
Role Expectations 3
Data Analyst 3
Applied Data Scientist 6
Product Data Scientist 7
Interview Preparation 9
Coding 10
Statistics & Experimentation 15
Data Modelling and Visualization 16
Machine Learning and Deep Learning 18
Product Sense 21
System Design 22
Behavioral & Culture Fit 23
Take-Home Assignment 24
Role Expectations
Every role in the data-domain aims to use data to drive decision that grows business.
While there are over a dozen roles (you can find more about a few of them here), in
this roadmap we focus on the following three
Data Analyst
A data analyst transforms raw data into meaningful insights that guide strategic
decisions and drive business growth. They identify inefficiencies and optimize
workflows, enhancing operational efficiency and reducing costs. By analyzing
customer data, they uncover trends and preferences that improve products and
services, boosting customer satisfaction. Data analysts also play a crucial role in risk
mitigation, detecting potential issues early to prevent problems. They support
business development by identifying new market opportunities and optimizing
resource allocation. Additionally, they ensure data integrity and compliance with
regulatory standards, maintaining organizational credibility and trust.
Business Problem
Defining the problem that needs to be solved. This is the foundational step where the
objectives and goals of the analysis are established.
Data Collection
Gathering data from various sources such as databases, spreadsheets, cloud, APIs,
or other systems. This step involves identifying and acquiring the relevant data
needed for analysis.
Data Exploration
Exploring the data that has been collected to understand its structure, content, and
initial patterns. This step helps in gaining a preliminary understanding of the data.
Data Cleaning and Preprocessing
Identifying and addressing issues like missing values, outliers, duplicates, and
inconsistencies in the data. This may involve using tools like Excel, Python, R, or
SQL. It also includes adding calculated fields if required to prepare the data for
analysis.
Data Analysis
Data Visualization
Ad hoc Requests
Responding to unplanned inquiries or urgent needs for data analysis, often requiring
quick turnaround times. They prioritize, gather, analyze, and present data to address
specific queries or issues raised by stakeholders, adjusting their workflow to
accommodate these additional tasks while balancing ongoing projects.
What does the interview look like?
● Screening Round
○ Initial assessment of basic qualifications and fit for the role.
○ Often conducted via phone or video call.
● Hiring Manager Round
○ Interview with the hiring manager to discuss skills, experience, and role
expectations.
○ Opportunity to ask detailed questions about the team and projects.
● Coding Round (Python/SQL – Depends on Company, Team, Role)
○ Technical assessment focusing on coding skills in Python, SQL, or
both.
○ This may include solving data manipulation problems or writing
queries.
● Take-home Assessment (Depends on Company, Team, Role)
○ Extended assignment to be completed at home within a specified
timeframe.
○ Typically involves analyzing data and presenting findings or solving a
specific problem.
● Panel Interview and/or Executive Round
○ Final stage involves interviews with multiple team members or
executives. It assesses collaborative skills, cultural fit, and ability to
communicate effectively.
Applied Data Scientist
For example, when developing a forecasting model you want to see you have
enough data to detect trends, are there trends/seasonality you can utilize, does the
data have too many anomalies/null values making prediction infeasible etc.
Feature Engineering
For instance, when forecasting apart from the historical value there could be
regressors you can add such as weekend indicator, holiday indicator etc.
In the prior forecasting model, you would weigh between metrics e.g.
MAPE/sMAPE/MASE etc. Then set baseline as mean forecaster, and experiment
between models such as ARIMA/ETS/DeepANT based on prior work on problems
similar to yours.
Productionalization and A/B Testing
For example, in the above forecasting example, you may finalize an ARIMA model
by ensuring back testing gives MAPE values that are likely to support business
goals. Then you would docker-ize the model, create batch training and predicting
pipeline on AWS SageMaker, and Tableau dashboard to track input/prediction
drift/MAPE of predictions as and when live data comes in.
Note: For online inference pipelines, e.g. a recommender system model that
suggests an item similar to the one you are currently viewing - as part of
productionalization you may also be required to A/B test. To see if the ML solution is
creating a positive impact on user experience, in comparison to the no-ML involved
experience for users.
The above project requirements are reflected in the interview process. As such, a
typical interview process can involve
● Coding
● SQL
● Statistics & Experimentation
● Machine Learning and Deep Learning
● System Design
● Take-home Assignment
Product Data Scientist
Product Data Scientists typically sit on a Product team where they work with
Engineers, Product Managers, Designers and other cross-functional partners.
In the tech industry, a "product" can be thought of as anything that the company
makes. It could be as broad as an entire app (e.g. Instagram), or it could be a
specific feature in the app (e.g. Instagram Stories). Most Product Data Scientists
would be staffed on a product team, and the product team is responsible for
monitoring and improving the product that they own.
Successful Product Data Scientists are those who can use data to successfully
inform strategic and tactical decisions. They also invest in helping build out the
team’s data and experimentation infrastructure.
Product Data Scientists dig into user behavior and product usage trends. They
uncover opportunities and risks for the business by analyzing patterns in the data.
Experimentation
Product Data Scientists own experiments from end-to-end, starting with the
experiment set-up, metric definition and analysis and recommendations.
They are often also responsible for building out the experimentation infrastructure to
allow the company to iterate on experiments and learn quickly.
Product Data Scientists develop and maintain regular reports on key product metrics.
They are often the owners of metrics, which means they are responsible for checking
on the metrics, communicating them to the company, and investigating any
anomalies. Their goal is to help the company understand the high-level question of
“how are we doing?”
Product Data Scientists often collaborate with Data Engineers to design and
implement data models that support product analytics. They help create and
maintain data pipelines to ensure data quality and reliability. A successful
collaboration between Data Science and Data Engineering results in robust data
infrastructure that enables efficient analysis and decision-making.
Note: Make sure you prepare for the interview rounds that correspond to your role.
Disclaimer: Interview process are quite dynamic, so we cannot guarantee these are
the only type of rounds you will face. However, being proficient in the following
should help crack any other type of interview you may face.
Coding
● Work through questions on the LeetCode Blind 75. The solutions can be
found on NeetCode. Prioritize Arrays, String, and Dynamic Programming, as
these are the most commonly asked.
● Dynamic Programming is going to be the hardest to overcome. Check out
these patterns to make it slightly easier.
ML Coding
These are less popular. These rounds, instead of assessing problem-solving, test
understanding of ML algorithms.
To prepare for these rounds, work through coding the most popular ML algorithms
● Linear Regression
● Logistic Regression
● Decision Trees
● kNN
● K-Means
Emma Ding, is a great content creator for everything Data Science
These are focused on how you would use Python, R or SQL on a real Data project.
For example, they could provide you with a few datasets, a vague business question,
and ask you to uncover insights on the data.
● Data cleaning
● Data visualization and analysis
● Interpreting findings and making recommendations
To prepare for these rounds, work through end-to-end portfolio projects that
require you to clean data and extract analyses from the data. As a start, you can use
some guided projects like this -
However, as you get familiar with the process of building DS solutions, make sure
you’re working on your own Data projects.
You can expect a lot more difficult and different questions based on the company &
the role. 😉
Resources to Learn:
3. Simplilearn - A mix of free and paid resources, suitable for structured learning.
○ Complete SQL Playlist -
https://youtube.com/playlist?list=PLEiEAq2VkUUKL3yPbn8yWnatjUg0
P0I-Z&si=rZXNcUNYgsYdLN34
4. SQL for Data Science - https://www.coursera.org/learn/sql-for-data-science -
Great for a structured course with certification
1. Leetcode - https://leetcode.com/
○ This list is for the top 50 SQL questions and is free -
https://leetcode.com/studyplan/top-sql-50/
○ This list is for the top 50 advanced SQL questions -
https://leetcode.com/studyplan/premium-sql-50/ - To solve these you
would need subscribtion
2. Interview Query - It also has amazing case studies for Data Analytics and
Data Science - Use my code “VENKATA” to get 10% off if you plan on taking
the subscription - https://www.interviewquery.com/?via=venkata
3. Dataford - It has all the SQL interview questions asked in top companies -
Use my code “SAI20” to get 20% off if you plan on taking the subscription-
https://www.dataford.io/?via=VENKATA
4. Hackerrank - https://www.hackerrank.com/
Statistics & Experimentation
● We want to predict which users are likely to upgrade to our premium service.
How would you build and validate a logistic regression model for this
purpose?
● Our e-commerce platform has implemented a new recommendation
algorithm. How would you design an experiment to test if it significantly
improves customer purchase rates?
Resources to Learn:
● "Practical Statistics for Data Scientists" by Peter Bruce and Andrew Bruce
○ Chapter 1: Exploratory Data Analysis
○ Chapter 2: Data and Sampling Distributions
○ Chapter 3: Statistical Experiments and Significance Testing
○ Chapter 4: Regression and Prediction
○ Chapter 5: Classification
● “Trustworthy Online Controlled Experiments” by Ron Kohavi
○ Read the entire book :)
● Familiarize yourself with A/B testing methodologies from experiment design to
analysis
○ Emma Ding’s AB testing cheat sheet
● Practice explaining complex statistical concepts in simple terms
● Use ChatGPT to come up with situations where you would use each statistical
concept in a real-world setting for that specific company.
I am studying for Data Science interviews and I’m currently learning about
[INSERT CONCEPT]. Can you come up with some examples for how a Data
Scientist might use these concepts in [INSERT COMPANY]? Also, can you
come up with some interview questions to test my knowledge on this
concept?
Data Modelling and Visualization
The data modeling and visualization interview aims to assess the candidate's
proficiency in designing data models, their ability to visualize data effectively, and
their understanding of the tools and techniques used in the industry.
Sample Questions
● What visualization would you use to show sales trends over time?
● How would you visualize the relationship between multiple variables?
● How would you design a data model for a retail sales system?
● Describe a situation where you'd use a star schema vs. a snowflake schema
● What are the advantages and disadvantages of denormalization?
Tips to learn
Practice Platforms:
● Kaggle - https://www.kaggle.com/
Offers datasets and competitions to practice data modeling and visualization.
● Leetcode - https://leetcode.com/
While primarily for coding, also includes SQL and database design problems.
● DataCamp - https://www.datacamp.com/
Interactive courses and projects to practice data visualization and modeling.
Machine Learning and Deep Learning
These rounds test your ability to fit and tune models and make informed decisions
when it comes to selecting models and metrics.
To prepare for these rounds, focus on the following as these are concepts that are
tested irrespective of the projects you have worked on -
● Read ‘Deep Learning’ by Ian Goodfellow. The following are the main concepts
to learn
○ Chapter 6 - Understanding Basics of Neural Network
○ Chapter 7 - Regularization of Neural Networks
○ Chapter 8 - Optimizing Neural Networks
○ Chapter 9 - CNNs, its variants, and optimizing them
○ Chapter 10 - RNNs, its variants, and optimizing them
In addition to the above, you will be asked a question based on the projects you
have worked on. For your own projects, make sure you reasoning behind your
‘design decisions’ i.e.
● Opportunity sizing
● Metric definition
● Metric investigation
Sample Questions
● What are some ways to increase user retention on an online grocery shopping
app?
● You're part of the search team at an e-commerce company. The marketing
team has noticed that the search results for certain product categories are not
as relevant as they could be, leading to lower conversion rates. How would
you approach this problem and improve the search relevance?
This interview tests how you approach a problem that can be solved using machine
learning. A few examples of the type of questions are -
● Read Chip Huyen’s ML System Design book - read everything, but give
special focus to the case studies.
● Check out the Jay Feng’s Mock Interview series
● Rarely, companies ask to code during system design rounds (you can confirm
this with the recruiter before interviews). In case they do ask check out these
videos from Exponent.
○ Predict App Deletion
○ Instagram Ranking
○ Predict Netflix Watch Times
○ Fake News Detection System
Behavioral & Culture Fit
Sample Questions
● Tell me about a conflict you had with your co-worker and how you handled it.
● Tell me about a time when you had to convince a partner to do something that
they were initially resistant to
● Make sure you know every experience and bullet point on your resume, and
are able to talk about it in detail.
● Use this 3-step framework to prepare before every interview
1. Prepare 5 - 10 stories using the STAR framework
● Walmart Sales Forecasting Accuracy - What will the sales for store-item over
next X days
● Santander Customer Transaction Prediction - Which customer will make
transactions
● First, understand the Data Science LifeCycle - use this to structure your
analysis and communicate how you approach a problem
● Next, practice solving problems that require a Data Science solution. You
would typically already have this from your projects/work experience.
However, if you are rusty bush up on your technical domain of choise using
the ‘Getting Started’ projects on Kaggle
Note: These are not examples of projects you want to put on your resume.,
Rather, these can be used to brush up on different technical domains. You
can find more on Kaggle, based on the area of expertise
● Most importantly, research the company and business domain the team is
working in. Align the direction of solution, business metrics you choose, and
insights you communicate to what would be of interest to the team.
That’s it! Try to relax before your interviews, listen to cues from your interviews, and
go in with a positive mindset!
Feel free to connect with us on Linkedin and at Break Into Data PRO Community!
As part of the BID PRO membership, you will gain access to: