Python Interview Questions 1653100147

Uploaded by

Raju Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

204 views

Python Interview Questions 1653100147

Uploaded by

Raju Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 24

® python 4 - Knowing Python is one of the crucial skills every data scientist should hone. And it's not without reason. Python's ability, combined with Pandas library, to manipulate and analyze data in a number of different ways makes it an ideal tool for a data science job. It comes as no surprise that all the companies looking for data scientists will test their Python skills on a job interview. We'll have a look at what technical concepts, along with Python/Pandas functions, you should be familiar with to land a data science job, These are the five topics we'll talk about: Aggregation, Grouping, and Ordering Data Joining Tables Filtering DataText Manipulation Datetime Manipulation It goes without saying that these concepts are rarely tested separately, so by solving one question you'll have to showcase your knowledge of multiple Python topics. Aggregation, Grouping & Ordering Data These three technical topics often come all together and they are fundamental to creating reports and doing any kind of data analysis. They allow you to perform some mathematical operations and present your findings in a representable and user-friendly way. We'll show you several practical examples to ensure you know what we're talking about. Python Coding Interview Question #1: Class Performance This Box interview question asks you:“You are given a table containing assignment scores of students in a class. Write a query that identifies the largest difference in total score of all assignments. Output just the difference in total score between the two students.” pytho Table you need to use is box_scores, which has the following columns: id intea student object Jassignment1 intea lassignment2 intea Jassignment3 inte4 Data from the table look like this: “ student csstonmentt ssomend signees As a first step towards answering the question, you should sum the scores from all assignments: import pandas as pd import numpy as np box_scores[‘total_score’] = box_scores[‘assignment1* ]+box_scores[‘assignment2' ]+box_scc This part of the code will give you this:Now that you know that, the next step is to find the largest difference between the total scores. You need to use the max() and min() functions to do that. Or, to be more specific, a difference between these two functions’ output. Add this to the above code, and you've got a final answer: import pandas as pd import numpy as np box_scores['total_score'] = box_scores[‘assignment1' ]+box_scores[ 'assignment2' ]+box_scc box_scores['total_score'].max() - box_scores['total_score® ].min() This is the output you're looking for: 94 The question asked to output only this difference, so no other columns are needed. Python Coding Interview Question #2: Inspection Scores For BusinessesThe previous question didn’t require any data grouping and ordering, unlike the following question by the City of San Francisco: Here's a question by the City of San Francisco “Find the median inspection score of each business and output the result along with the business name. Order records based on the inspection score in descending order. Try to come up with your own precise median calculation. In Postgres there is ‘percentile_disc’ function available, however it's only approximation.” Link to the question: https://platform,stratascratch,com/coding/974 1-inspection-scores-for- businesses?python=1 Here, you should use the notnull() function to make sure you get only businesses that have the inspection score. Additionally, you have to group data on business_name and calculate the median for the inspection_score. Use the median() function. Also, use the sort_values() to sort the output in descending order. Python Coding Interview Question #3: Number Of Records By Variety Take a look at this Microsoft question: “Find the total number of records that belong to each variety in the dataset. Output the variety along with the corresponding number of records. Order records by the variety in ascending order.” Link to the question: https://platform. stratascratch.com/coding/10168-number-of-records-by- variety?python=1 This shouldn't be hard to solve after the first two examples. First, you should group by the columns variety and sepal_length. To find the number of records per variety, use the count() function. Finally, use the sort_values to sort by variety in alphabetical order.Joining Tables In all the previous examples, we were given only one table. We selected these examples, so it's easier for you to understand how aggregation, grouping, and ordering data in Python work. However, as a data scientist, you'll more often than not have to know how to write a query that pulls data from several tables. Python Coding Interview Question #4: Lowest Priced Orders One of the easiest ways to join two tables in Python is by using the merge() function. We'll do that to solve the Amazon question: “Find the lowest order cost of each customer. Output the customer id along with the first name and the lowest order price.” python=1 You're given two tables to work with. The first table is customers: id inte4 first_name object last_name object city object laddress object phone_number object Here’s the data:a festnane estrone ov — pone suber The second table is named orders with the following columns: id inte joust_id intea lorder_date ‘datetime64ins] lorder_details object total_order_cost intea And the data is: Since you need the data from both tables, you'll have to merge or inner join them: import pandas as pd import numpy as np merge = pd.merge(customers, orders, left_on="id", right_on="cust_id") You do that on the column id from the table customers, and the column cust_id from the table orders. The result shows two tables as one:Mix fekrame lastname ety seress phore_wumber iy cant id order dae Once you've done that, use the groupby() function to group the output by cust_id and first_name. These are the columns the question asks you to show. You need to show the lowest order cost for each customer, too. You do that using the min() function The complete answer is thus import pandas as pd import numpy as np merge = pd.merge(customers, orders, left_on="id", right_on="cust_id") result = merge.groupby(["cust_id", "first_name"])["total_order_cost"].min().reset_inde» This code returns the desired output Python Coding Interview Question #5: Income By Title and Gender Here, we have another question from the City of San Francisco:“Find the average total compensation based on employee titles and gender. Total compensation is calculated by adding both the salary and bonus of each employee. However, not every employee receives a bonus so disregard employees without bonuses in your calculation. Employee can receive more than one bonus. Output the employee title, gender (.0., sex), along with the average total compensation.” gender?pythor When answering this question, the first step should be to group by worker and bonus while using the sum() function to get the bonus per worker id. Then you should merge the tables you have at your disposal. This is again an inner join. Once you do that, you can get the total compensation by adding salary and bonus. The last step is to output the employee title, gender, and average total compensation, which you get by using the mean() function. Python Coding Interview Question #6: Product Transaction Count Here's a question by Microsoft: “Find the number of transactions that occurred for each product. Output the product name along with the corresponding number of transactions and order records by the product id in ascending order. You can ignore products without transactions.” count?python=1 Here are some tips on writing a code. First, you should use the notnull() function to get the products with at least one transaction. Next, inner join this table with the table excel_sql_inventory_data using the merge() function. Use groupby() and transform() to get the number of transactions. Then get rid of the duplicate products and show the number of transactions for every product. Finally, sort the output by the product_id.Data Filtering 10101 cr RJ When you use Python, you'll usually use it on huge amounts of data. However, you won't be required to output all data because that is simply pointless. Analyzing data also includes setting certain criteria to pull only data you want to see in your output. For that, you should use certain ways of filtering data. While merge() also filters data in a way, here we're talking about using the comparison operators =), between(), or some other ways to limit the number of rows in the output. Let's see how this is done in Python! Python Coding Interview Question #7: Find the Top 10 Ranked Songs in 2010 This is a question you could be asked at the Spotify interview: “What were the top 10 ranked songs in 2010? Output the rank, group name, and song name but do not show the same song twice. Sort the result based on the year_rank in ascending order.”songs-in-20102python=1 To solve the problem, you need only the table billboard_top_100_year_end id inte4 year inte lyear_rank int64 /group_name object artist object song_name object The data from the table looks like this: Here's how we approach answering the question. import pandas as pd import numpy as np conditions = billboard_top_1¢@_year_end[(billboard_top_10@_year_end['year'] == 201@) & The above code sets up two conditions. The first one is using the '==" operator. By using it, we select only songs appearing in 2010. The second condition selects only songs that had a ranking between 1 and 10 Running this code returns:Ce) ‘ ay Soe tt Son Oa te te After that, we need to select only three columns: year_rank, group_name, and song_name. We will also remove duplicates using the drop_duplicates() function That makes the code complete: import pandas as pd import numpy as np conditions = billboard top_1¢@_year_end[(billboard_top_10@_year_end['year'] =- 2010) & result = conditions[["year_rank", "group_name', ‘song_name’]].drop_duplicates() It will give you the top 10 ranked songs in 2010:Kesna Katy Per eat. Snoop D099 Eminem feat. Ra The Way You Le Python Coding Interview Question #8: Apartments in New York City and Harlem Try and solve the question by Airbnb: “Find the search details of 50 apartment searches the Harlem neighborhood of New York City.” city-and-harlem?python=1 Here are some hints. You need to set three conditions that will get you only apartment category, only those in Harlem, and the city has to be NYC. All three conditions will be set using the operator. You don’t need to show all apartments, so use the head() function to limit the number of rows in the output. Python Coding Interview Question #9: Duplicate Emails The last question focused on filtering data is by Salesforce: “Find all emails with duplicates.”This question is rather simple. You need to use the groupby() function to group by email and find how many times each email address appears. Then use the '>' operator on the number of email addresses to get duplicates. Manipulating Text When working with data, you'll have to manipulate it to make it more suitable for your analysis This is often the case with text data. It includes allocating new values to data according to the text stored, parsing and merging text, or finding its length, position of a certain letter, sign, etc. Python Coding Interview Question #10: Reviews Bins on Reviews Number The next question is by Airbnb: “To better understand the effect of the review count on the price of accommodation, categorize the number of reviews into the following groups along with the price. 0 reviews: NO 1 to 5 reviews: FEW 6 to 15 reviews: SOME 16 to 40 reviews: MANY more than 40 reviews: A LOT Output the price and its categorization. Perform the categorization on accommodation level." Link to the question: https://platform,stratascratch,com/coding/9628-reviews-bins-on-reviews- number?python=1 You're working with only one table, but the one with quite a lot of columns. The table is airbnb_search_details, and the columns are: id inte4 Price floatedproperty_type object room_type object [amenities object accommodates inté4 bathrooms inté4 bed_type object [cancellation_policy object [cleaning _fee bool city object host_identity_verified object host_response_rate object host_since datetimeé4ins} neighbourhood object number_of_reviews inté4 review_scores_rating floate4 zipcode int64 bedrooms: inté4 beds inté4 Here are several first rows from the table: e rice property ype room type amenities accommodates batrooms bed.pe cancel 12519361 SSSR Apatnent Entre (Vlas ene 2 1 Bada herlont conning’ Smoke delet" "Crben ee 719541228636 Cabin Prince rom (Weless 2 a RelBed — moderte Inter Kienen Washer Oryer"Seke elec Fra i “Fre fextrgusher Essentals-Hale ‘dyer translater mesing ‘enhostrg_anenty_49"vonlaton missing hosting arenty 60°) The first step in writing the code should be getting the number of reviews. import pandas as pd import numpy as np num_reviews = airbnb_search_details[ 'number_of_reviews']You get this: number_of_reviews 14 alatcg 88 Next, you'd want to get the accommodation with 0 reviews, then with 1-5, 6-15, 16-40, and more than 40 reviews. To get that, you'll need the combination of the the between() function, import pandas as pd import numpy as np ” and '>' operators, andnum_reviews = airbnb_search_details[ ‘number_of_reviews'] condlist = [num_reviews == @, num_reviews.between(1,5),num_reviews.between(5,15),num_re Here's what your current output should look like: FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE SE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE Now comes working with text in the shape of assigning the categories. And these are: NO, FEW, SOME, MANY, A LOT. Your code up until now is: import pandas as pd import numpy as np num_reviews = airbnb_search_details[ ‘number_of_reviews'] condlist = [num_reviews == @, num_reviews.between(1,5),num_reviews.between(5,15),num_re choicelist = ['NO', ‘FEW’, ‘SOME’, MANY’, "A LOT" ] OK, here are your categories:NO FEW SOME MANY ALOT The final step is to allocate these categories to the accommodation and list its price: import pandas as pd import numpy as np num_reviews = airbnb_search_details[ ‘number_of_reviews'] condlist = [num_reviews == @, num_reviews.between(1,5),num_reviews.between(5,15),num_re choicelist = ['NO', 'FEW','SOME', "MANY", "A LOT" ] airbnb_search_details['reviews_qualification'] = np.select(condlist, choicelist) result = airbnb_search_details[['reviews_qualification', ‘price']] This code will get you the desired output:rice Few 5508 rew 06 29 Python Coding Interview Question #11: Business Name Lengths The next question is by the City of San Francisco: “Find the number of words in each business name. Avoid counting special symbols as words (e.g. &). Output the business name and its count of words.” Link to the question: https://platform.stratascratch.com/coding/10131-business-name-lengths? python=4 When answering the question, you should first find only distinct businesses using the drop_duplicates() function. Then use the replace() function to replace alll the special symbols with blank, so you don't count them later. Use the split() function to split the text into a list, and then use the len() function to count the number of words. Python Coding Interview Question #12: Positions Of Letter ‘a’ This question by Amazon asks you to: “Find the position of the letter ‘a’ in the first name of the worker ‘Amitah’. Use 1-based indexing, e.g. position of the second letter is 2.”python=1 There are two main concepts in the solution. The first is filtering the worker ‘Amitah’ using the operator. The second one is using the find() function on a string to get the position of the letter ‘a’, Manipulating Datetime ‘As a data scientist, you'll be working with dates a lot. Depending on the data available, you could be asked to convert data to datetime, extract a certain period of time (such as month or year), or manipulate datetime in any other way that's suitable. Python Coding Interview Question #13: Number of Comments Per User in Past 30 days Here's a question by Meta/Facebook:“Return the total number of comments received for each user in the last 30 days. Don't output users who haven't received any comment in the defined time period. Assume today is 2020-02- 10.” user-in-past-30-days?pythor You can find data in the table fb_comments_count: user_id intéa created_at datetime64ins] number_of_comments inte Data is here, too: user_id 18 25 78 37 41 created_at 2019-12-29 00:00:00 2019-12-21 00:00:00 2020-01-04 00:00:00 2020-02-01 00:00:00 2019-12-23 00:00:00 number_of_comments Have a look at the solution, and then we'll explain it below: import pandas as p. from datetime impor id rt timedelta result = fb_comments_count[ (#b_conments_count[ 'created_at'] >= pd.to_datetime('2020-@2-b_comments_count["created_at'] <= pd.to_datetime('2020-02-10"))].groupby(‘user_id’ )[ ‘number_of_conments*].sum().reset_index() To find the comments not older than thirty days from 2020-02-10, you first need to convert this date to datetime using the to_datetime() function, To get the latest date of the comments you're interested in, subtract 30 days from today using the timedelta() function. All the comments you're interested in have date equal to or greater than this difference. Also, you want to exclude all the comments that are posted after 2020-02-10, That's why there's a second condition. Finally, group by the user_id and use the sum() function to get the comments per user. If you did everything right, you'd get this output Python Coding Interview Question #14: Finding User Purchases This is the question by Amazon “Write a query that'll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.” Link to the question: https://platform.stratascratch.com/codit python=1 /10322-finding-user-purchases?To solve it, you need to use the strftime() function to get the date of purchase in an MM-DD- YYYY format. Then use the sort_values() to sort the output in ascending order according to the user's ID and the date of purchase. To get the previous order, apply the shift() function, group by the user_id, and show the purchase dates. Use the to_datetime to convert the order's and the previous order's date, and then find the difference between the two dates. Finally, filter the result so it outputs only users with seven days or less between the first and the second purchase, and use the unique() function to get only the distinct users. Python Coding Interview Question #15: Customer Revenue In March The last question is by Meta/Facebook: “Calculate the total revenue from each customer in March 2019. Include only customers who were active in March 2019. Output the revenue along with the customer id and sort the results based on the revenue in descending order.” Link to the question: https://platform.stratascratch.com/coding/9782-customer-revenue-in- march?python=1 You'll need to_datetime() on the column order_date. Then extract March and the year 2019 from the same column. Finally, group by the cust_id and sum the column total_order_cost, which will be the revenue you're looking for. Use the sort_values() to sort the output according to revenue in descending order. ConclusionBy showing you 15 interview questions from top companies, we covered five main topics interviewers are interested in when testing your Python skills. We kicked off with aggregation, grouping, and ordering of data. Then we showed you how to. join tables and filter your output. Finally, you learned how to manipulate text and datetime data These are not the only concepts you should know, of course. But it should give you a sound basis for interview preparation and answering some more python interview questions. To practice more Python Pandas functions, check out our post “Python Pandas Interview Questions for Data Science” that will give you an overview of the data manipulation with Pandas and the types of Pandas questions asked in Data Science Interviews.

Dan Stefanica, Radoš Radoičić, Tai-Ho Wang - 150 Most Frequently Asked Questions on Quant Interviews, Third Edition (Pocket Book Guides for Quant Interviews)-FE Press, LLC (2024)
No ratings yet
Dan Stefanica, Radoš Radoičić, Tai-Ho Wang - 150 Most Frequently Asked Questions on Quant Interviews, Third Edition (Pocket Book Guides for Quant Interviews)-FE Press, LLC (2024)
280 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
10 pages
M1 - Introducing Google Cloud v5.2 - ILT
No ratings yet
M1 - Introducing Google Cloud v5.2 - ILT
69 pages
Microsoft Tree Questions
No ratings yet
Microsoft Tree Questions
29 pages
40 R Programming Interview Questions & Answers For All Levels - DataCamp
No ratings yet
40 R Programming Interview Questions & Answers For All Levels - DataCamp
22 pages
Accenture Data Scientist Interview Questions
No ratings yet
Accenture Data Scientist Interview Questions
13 pages
Interview Quations Data Science
50% (2)
Interview Quations Data Science
3 pages
CALO - Job Description
No ratings yet
CALO - Job Description
3 pages
Kenny-230717-Google Data Scientist Guide
No ratings yet
Kenny-230717-Google Data Scientist Guide
8 pages
Data Scientist Interview Questions and Answers PDF
No ratings yet
Data Scientist Interview Questions and Answers PDF
37 pages
76 - Sample - Chapter Kunci M2K3 No 9
No ratings yet
76 - Sample - Chapter Kunci M2K3 No 9
94 pages
ALX Data Analytics Program Description
No ratings yet
ALX Data Analytics Program Description
6 pages
Machine Learning Interview Questions
100% (1)
Machine Learning Interview Questions
4 pages
Model Test Paper Dbms
No ratings yet
Model Test Paper Dbms
14 pages
Vignesh R 22071471559 Jan 2024: Tcs NQT - It
No ratings yet
Vignesh R 22071471559 Jan 2024: Tcs NQT - It
1 page
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
ETL Testing Int - 1
No ratings yet
ETL Testing Int - 1
16 pages
Datanest - Data Science Interview
No ratings yet
Datanest - Data Science Interview
19 pages
Data Analytics Interview Handbook Isb
No ratings yet
Data Analytics Interview Handbook Isb
40 pages
Leetcode DSA Complete Sheet
No ratings yet
Leetcode DSA Complete Sheet
11 pages
Introduction To Splunk
No ratings yet
Introduction To Splunk
7 pages
Python Interview Questions
No ratings yet
Python Interview Questions
12 pages
W Purch Cost F
100% (1)
W Purch Cost F
25 pages
DSA Interview Questions
No ratings yet
DSA Interview Questions
5 pages
100 Data Scientist Interview Questions by DataInterview 1688929352
No ratings yet
100 Data Scientist Interview Questions by DataInterview 1688929352
7 pages
How To Use LeetCode For Data Science SQL Interviews - StrataScratch
No ratings yet
How To Use LeetCode For Data Science SQL Interviews - StrataScratch
1 page
Advanced Certification in Data Science and Artificial Intelligence
No ratings yet
Advanced Certification in Data Science and Artificial Intelligence
18 pages
100 SQL Formulas Each Student Should Know
No ratings yet
100 SQL Formulas Each Student Should Know
10 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
12 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
Top Data Analyst Interview Questions
No ratings yet
Top Data Analyst Interview Questions
28 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
SQL Server Interview Questions Developers PDF
No ratings yet
SQL Server Interview Questions Developers PDF
142 pages
Data Science Interview Questions (#Day11) PDF
100% (1)
Data Science Interview Questions (#Day11) PDF
11 pages
60+ MySQL Interview Questions and Answers [2025 Updated]
No ratings yet
60+ MySQL Interview Questions and Answers [2025 Updated]
12 pages
Adobe
No ratings yet
Adobe
25 pages
Linear Regression Interview Questions
No ratings yet
Linear Regression Interview Questions
4 pages
Python Interview Questions and Answers For 2019 - Intellipaat
No ratings yet
Python Interview Questions and Answers For 2019 - Intellipaat
25 pages
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
No ratings yet
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
25 pages
SQL Cheat Sheet My Analytics School
No ratings yet
SQL Cheat Sheet My Analytics School
21 pages
What Are Some of The Best Websites To Learn Competitive Coding - Quora
No ratings yet
What Are Some of The Best Websites To Learn Competitive Coding - Quora
4 pages
Python Interview Questions
No ratings yet
Python Interview Questions
54 pages
Introduction
100% (1)
Introduction
49 pages
Infy TQ
No ratings yet
Infy TQ
6 pages
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
No ratings yet
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
23 pages
Data Science Interview Questions and Answers For 2020
No ratings yet
Data Science Interview Questions and Answers For 2020
20 pages
Crop Yield Prediction
No ratings yet
Crop Yield Prediction
5 pages
Pythin Qa
No ratings yet
Pythin Qa
8 pages
WWW Interviewbit Com Python Interview Questions
No ratings yet
WWW Interviewbit Com Python Interview Questions
23 pages
MAANG
No ratings yet
MAANG
32 pages
100 Days Data Analyst Learning Roadmap
No ratings yet
100 Days Data Analyst Learning Roadmap
6 pages
Scaling AI and ML
No ratings yet
Scaling AI and ML
4 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
Hadoop Hive Cheat Sheet - Developer Guide For SQL To HiveQL - Qubole
No ratings yet
Hadoop Hive Cheat Sheet - Developer Guide For SQL To HiveQL - Qubole
19 pages
Python Lists: List Initialization
No ratings yet
Python Lists: List Initialization
25 pages
Artificial Intelligence Mcqs
No ratings yet
Artificial Intelligence Mcqs
173 pages
MYSQL MCQs - 1
No ratings yet
MYSQL MCQs - 1
3 pages
Python Technical Interviews Questions
100% (1)
Python Technical Interviews Questions
15 pages
Interview questions
No ratings yet
Interview questions
24 pages
Python DA Interview Topics
No ratings yet
Python DA Interview Topics
2 pages

Python Interview Questions 1653100147

Uploaded by

Python Interview Questions 1653100147

Uploaded by

You might also like