1. What are the basic data types in Python?
Python has several basic data types:
Numbers: Integers (whole numbers), floats (decimals), booleans (True/False).
Strings: Sequences of characters enclosed in quotes (e.g., “Hello”).
Lists: Ordered collections of elements enclosed in square brackets (e.g., [1, 2, “apple”]).
Tuples: Similar to lists but immutable (cannot be changed) and enclosed in parentheses
(e.g., (1, 2, “apple”)).
Sets: Unordered collections of unique elements enclosed in curly braces (e.g., {1, 2, 3}).
Dictionaries: Key-value pairs enclosed in curly braces (e.g., {“name”: “John”, “age”: 30}).
2. Explain the difference between lists and tuples.
Lists: Mutable (can be changed), indexed by position (e.g., list[0] accesses the first
element).
Tuples: Immutable, indexed by position, often used for fixed data or named sets.
3. How do you iterate through a list in Python?
You can use a for loop:
4. What is a dictionary and how do you access its elements?
Key-value pairs where keys are unique identifiers and values can be any Python type.
Access elements by key:
5. Explain the concept of conditional statements in Python.
if, else, and elif statements control program flow based on conditions.
Example:
6. What are functions and how do you define them?
Reusable blocks of code that take arguments and perform specific tasks.
Defined with def:
7. What is the purpose of modules and packages in Python?
Modules group related functions and variables.
Packages group modules into hierarchies.
Used for code organization and import for reuse.
8. How would you handle missing values in a Dataset?
Handling missing values in a dataset is crucial for ensuring the quality and reliability of analysis.
There are several approaches, depending on the context and the nature of the data:
1. Understanding the Context:
1.1 Analyze the Missing Data: Investigate why the values are missing. Are they missing at
random, or is there a systematic reason? This understanding can influence how we handle
them.
2. Types of Missing Data:
MCAR (Missing Completely at Random): The missingness is unrelated to the data. In this
case, we might remove the affected rows without bias.
MAR (Missing at Random): The missingness relates to other observed data. Here,
imputation methods can be used.
MNAR (Missing Not at Random): The missingness is related to the missing values
themselves. This requires more complex handling, often involving modeling.
3. Handling Strategies for missing values:
3.1 Removal:
Listwise Deletion: Remove rows with missing values if they constitute a small portion of
the dataset and their removal won’t affect the results.
Column Removal: Eliminate columns that have a high percentage of missing values, as
they may not contribute meaningful information.
3.2 Imputation:
Mean/Median/Mode Imputation: For numerical features, replace missing values with the
mean or median; for categorical features, use the mode. This is simple but can introduce
bias.
K-Nearest Neighbors (KNN) Imputation: Use the values from the nearest neighbors to fill in
the missing values, which can preserve data relationships.
Regression Imputation: Predict missing values using regression models based on other
available features.
3.3 Using Algorithms That Handle Missing Values:
Some algorithms, like decision trees or random forests, can handle missing values
intrinsically without requiring imputation.
4. Creating Indicator Variables: For certain analyses, create a new binary column indicating
whether a value was missing, which can help capture potential patterns in the data.
Data Enrichment: If feasible, explore augmenting the dataset with additional sources of
data to fill in gaps.
5. Documenting the Process: It's essential to document the decisions made regarding missing
values, including the rationale and methods used, as this transparency is vital for reproducibility.
9. How do you handle errors and exceptions in Python?
try and except blocks handle runtime errors.
Example:
10. Explain the difference between a shallow and a deep copy.
Shallow copy: Copies reference to the original data structure, modifying the copy changes
the original.
Deep copy: Creates a new, independent copy of the data structure, modifying the copy
does not affect the original.
11. What is the purpose of garbage collection in Python?
Python automatically frees unused memory resources, optimizing memory usage.
12. How do you import external libraries in Python?
Use import statement to import modules:
13. Explain the difference between mutable and immutable objects.
Mutable: Can be changed (e.g., lists, dictionaries).
Immutable: Cannot be changed (e.g., strings, tuples)
14. What is the in operator used for?
in Operator:
Checks if an element exists in a sequence (list, tuple, string).
15. How do you write a docstring in Python?
Triple-quoted strings used to document functions, explaining their purpose, arguments,
and return values.
16. What is the purpose of the indentation in Python?
Pythonsyntax uses indentation (spaces or tabs) to define code blocks and control program
flow.
17. Explain the difference between == and is operators.
== checks values for equality.
is checks object identity (same location in memory).
18. What is the use of lambda functions?
Small, anonymous functions defined in one line using lambda keyword. Useful for concise,
one-time function use.
19. Explain the concept of class and object in Python.
Classes define blueprints for objects (instances) with attributes and methods.
Objects represent specific instances of a class with unique attribute values.
20. What are init and str methods used for?
init: This is the constructor method called when an object is created. It’s used to initialize
the object’s attributes with specific values or perform necessary setup.
Example:
str: This method defines how the object is represented when printed or converted to a
string. It allows you to customize the object’s representation for better readability or
information display.
Python Example Code:
21. Write a python code to merge datasets based on a common column?
Explanation: pd.merge() combines df1 and df2 on the id column. The how parameter
specifies the type of join: inner, outer, left, or right.
Code:
import pandas as pd
# Sample data
data1 = {'id': [1, 2, 3, 4], 'name': ['Alice', 'Bob', 'Charlie', 'David']}
data2 = {'id': [3, 4, 5, 6], 'age': [25, 30, 35, 40]}
# Creating DataFrames
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# Merging on the 'id' column
merged_df = pd.merge(df1, df2, on='id', how='inner') # Use 'outer', 'left', or 'right' for different
join types
# Display the merged DataFrame
print(merged_df)
22. How do you achieve inheritance in Python
Allows creating new classes (subclasses) by inheriting attributes and methods from
existing classes (superclasses).
Promotes code reuse and simplifies building complex object hierarchies.
Python Example:
23. What is the purpose of generators in Python?
Functions that create iterators, producing values one at a time instead of storing the entire
sequence in memory.
Efficient for iterating over large datasets or situations where generating all elements
upfront is unnecessary.
Use yield keyword to return each element sequentially.
Python Example:
24. Explain the concept of context managers in Python.
Provide safe and efficient handling of resources like files, databases, or network
connections.
Use the with statement to automatically perform resource allocation upon entering the
block and deallocation upon exiting, even if exceptions occur.
Ensures resources are properly closed and prevents leaks.
Python Example:
25. What is the significance of PEP 8 in Python?
PEP 8 is the official style guide for writing Python code. It recommends formatting
conventions for indentation, spacing, naming, and other aspects. Following PEP 8
improves code readability, maintainability, and consistency within the Python community.
It avoids confusion and makes collaborating on code easier.
Python Example:
26. Explain the difference between global and local variables.
Global variables: Defined outside functions and are accessible everywhere within the
program.
Python Example:
Local variables: Defined inside functions and have their scope limited to the specific
function they are defined in. Local variables take precedence over global variables with the
same name within their scope. Accessing a global variable inside a function requires
explicit declaration using global keyword.
Python Example:
27. What are some popular testing frameworks for Python?
Unittest
Built-in with the standard library.
Simple and beginner-friendly.
Best for unit testing individual modules and functions.
Python Example:
Pytest
Most popular and flexible framework.
Supports various testing types like unit, integration, and functional tests.
Highly customizable and extensible with plugins.
Python Example:
Doctest
Extracts and runs test examples from docstrings.
Encourages clear and documented code.
Simple for small projects or quick tests.
Python Example:
Behave and Lettuce:
Focus on behavior-driven development (BDD).
Write tests in human-readable language like Gherkin.
Good for collaborative testing and non-technical stakeholders.
Python Example:
Selenium:
For testing web applications through browser automation.
Simulates user interactions like clicking buttons and entering text.
Requires additional libraries like Selenium WebDriver.
Python Example:
Technical Interview Questions
Python Automation Interview Questions
Prepare with 15 Best Python Automation Interview Questions (with Code Examples).
1. Explain your approach to automating a simple web login workflow.
Answer: Discuss identifying essential elements (username, password field, login button),
using explicit waits to handle page loads, and capturing successful login confirmation.
Example Code:
2. How would you automate data-driven testing using a CSV file?
Answer: Explain reading data from the CSV, parameterizing test steps with the data, and
reporting results based on success/failure for each data point.
Example Code:
3. Describe your experience with API testing frameworks like requests or pytest-rest.
Answer: Highlight sending API requests, validating responses with assertions, handling
different status codes, and using data-driven approaches for API test cases.
Example Code:
4. How do you monitor and maintain the health of your automated test scripts?
Answer: Discuss scheduling regular test runs, integrating tests with CI/CD pipelines,
reporting results with tools like pytest-html, and analyzing trends for stability and
identifying regressions.
5. Explain your approach to handling dynamic web elements when automating browser
interactions.
Answer: Mention using WebDriverWait with expected conditions like presence of
element, using CSS selectors with unique identifiers, and leveraging libraries like Selenium
DevTools for dynamic element analysis.
6. How can you ensure the security of your automation scripts and data?
Answer: Discuss avoiding hardcoding sensitive information like credentials, using
environment variables, storing secrets securely, and following secure coding practices for
data handling.
7. How do you integrate AI/ML models into your automation workflows?
Answer: Discuss using libraries like TensorFlow, PyTorch, and Scikit-learn for model
training and deployment. Mention techniques like model prediction within automation
tasks, anomaly detection, and automated parameter tuning.
Example (Model Prediction in Selenium):
8. Describe your experience with containerization for deploying automation scripts.
Answer: Mention containerizing scripts and test environments with Docker, using
orchestration tools like Kubernetes, and ensuring secure deployments within
containerized environments.
9. How do you handle errors when working with large datasets?
Answer:
Data Validation: Implement checks during data ingestion to identify inconsistencies or
missing values early.
Try-Except Blocks: Use Python’s try-except to catch exceptions gracefully, allowing the
program to continue running or log errors without crashing.
Logging: Maintain detailed logs of errors to monitor and troubleshoot issues efficiently.
Chunk Processing: Process data in smaller chunks to isolate errors and reduce memory
load, making it easier to identify problematic segments.
Testing: Perform unit tests and validation checks on a sample of the dataset before full-
scale processing.
10. Explain your experience with performance testing tools like Locust or JMeter.
Answer: Highlight generating load on web applications, analyzing performance metrics like
response times, using scripts to simulate user behavior, and identifying performance
bottlenecks.
Example (Locust Script for Load Testing):
11. How do you stay updated with the latest trends and advancements in Python automation?
Answer: Mention attending conferences, reading blogs and documentation, participating
in online communities, contributing to open-source projects, and exploring new
frameworks and libraries.
Python Basic Coding Interview Questions
1. Reverse a String
Example:
2. Check if a number is even or odd:
Example:
3. Print all factors of a number:
Example:
4. Calculate the factorial of a number:
Example:
5. Swap two numbers without using a temporary variable:
Example:
6. Check if a string is a palindrome:
Example:
7. Find the sum of all elements in a list
Example:
8. Count the occurrences of a character in a string
Example:
9. Print all Fibonacci numbers up to a given number
Example:
10. Write a Python code to pivot a dataframe?
df.pivot() reshapes the DataFrame by setting date as the index, category as columns, and
value as the data to fill the new table.
import pandas as pd
# Sample data
data = {
'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'category': ['A', 'B', 'A', 'B'],
'value': [10, 20, 15, 25]
}
# Creating DataFrame
df = pd.DataFrame(data)
# Pivoting the DataFrame
pivot_df = df.pivot(index='date', columns='category', values='value')
# Display the pivoted DataFrame
print(pivot_df)
Professional Level Coding Interview Questions
1. Implement a function to check if a string is a palindrome
Example:
2. Write a python code to calculate the accuracy, precision, and recall of a classification model?
Best to calculate accuracy, precision, and recall for a classification model using scikit-learn.
from sklearn.metrics import accuracy_score, precision_score, recall_score
# Sample true labels and predicted labels
y_true = [0, 1, 1, 0, 1, 1, 0, 0, 1, 0]
y_pred = [0, 0, 1, 0, 1, 1, 1, 0, 1, 0]
# Calculating metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
# Displaying the results
print(f'Accuracy: {accuracy:.2f}, Precision: {precision:.2f}, Recall: {recall:.2f}')
accuracy_score computes the proportion of correct predictions.
precision_score measures the ratio of true positives to predicted positives.
recall_score calculates the ratio of true positives to actual positives.
3. Write a function to find the longest common substring (LCS) of two strings
Example:
4. Create a function to iterate through a nested dictionary of arbitrary depth
Example:
5. Design a class to represent and manipulate complex data structures like graphs
Python Code:
6. Implement a decorator to measure the execution time of a function?
Here’s how to implement a decorator in Python to measure the execution time of a
function
Example:
Core Concept Based Interview Questions
1. What are the benefits of using Python as a programming language?
High-level and interprete: Easy to learn, write, and debug compared to compiled
languages.
Object-oriented: Supports clean and modular code design with classes and objects.
Extensive libraries: Rich ecosystem of libraries for various tasks, reducing development
time.
Dynamically typed: No need for explicit type declarations, offering flexibility.
Large and active community: Abundant resources, tutorials, and support available.
2. Explain the difference between lists and tuples in Python.
Lists: Mutable – elements can be changed after creation. Use square brackets ([]).
Tuples: Immutable – elements cannot be changed after creation. Use parentheses (()).
3. How do you implement a loop in Python? What are the different loop types?
for loop: Iterates through a sequence of elements.
while loop: Executes a block of code repeatedly while a condition is true.
nested loops: Loops within loops for complex iteration needs.
4. Describe how you would handle exceptions in Python.
Use try-except blocks to catch and handle specific exceptions gracefully.
finally block can be used to run cleanup code regardless of exceptions.
5. What is the purpose of functions in Python? How do you define and call a function?
Functions encapsulate code for reusability and modularity.
Use def keyword to define a function, and call it with its name and arguments.
6. Explain the concept of object-oriented programming (OOP) in Python.
OOP focuses on building objects with attributes and methods.
Classes define object blueprints, and instances define specific objects.
7. What are the built-in data structures available in Python?
Lists, tuples, dictionaries, sets, strings, numbers, Booleans. Each has specific properties
and uses.
8. How do you import and use modules in Python?
Use the import keyword to import specific modules or functions from libraries.
Use aliases to avoid long module names.
9. Briefly explain the concept of package management in Python.
Tools like pip and virtual environments help manage dependencies and versions of
libraries.
10. How do you debug Python code? Explain some common debugging techniques.
Use the built-in debugger (pdb) for step-by-step execution and variable inspection.
Print statements can be strategically placed to track program flow.
Error messages should be carefully analyzed and interpreted.
11. How do you handle categorical variables with many levels?
To handle categorical variables with many levels, I use the following strategies:
Frequency Encoding: Replace categories with their frequency counts to reduce
dimensionality.
Target Encoding: Encode categories based on the mean of the target variable for each
category, which captures useful information.
Dimensionality Reduction: Apply techniques like PCA or clustering to group similar
categories and reduce levels.
Grouping Rare Categories: Combine infrequent categories into an "Other" category to
simplify the variable.
Use of Dummy Variables: Create binary (one-hot) encoded variables for categories, but
limit this to manageable levels to avoid high dimensionality.
Pandas Interview Questions for Python
1. Explain the key data structures in Pandas: Series and DataFrame.
Series: One-dimensional labeled array; efficient for holding ordered data.
DataFrame: Two-dimensional labeled data structure with rows and columns; ideal for
tabular data.
2. How do you create a Pandas DataFrame from various data sources (CSV, Excel, SQL)?
Use built-in functions like pd.read_csv, pd.read_excel, and pd.read_sql with appropriate
parameters.
3. Describe data manipulation techniques in Pandas for cleaning and filtering data.
Handling missing values (fillna, dropna), selection (loc, iloc), indexing and slicing, filtering
with conditions (query, mask).
4. Explain how to perform data aggregation and group-by operations in Pandas.
Use groupby with aggregate functions like sum, mean, count, and custom functions.
5. How do you handle merges and joins between two DataFrames based on specific conditions?
Use merge with appropriate join types (inner, left, right, outer) and merge keys.
6. Demonstrate data visualization techniques in Pandas using matplotlib or seaborn.
Create basic plots like histograms, bar charts, line charts, and boxplots.
7. Explain how you would handle time series data in Pandas.
Use DatetimeIndex, resampling, rolling statistics, and specialized plotting functions.
8. Discuss best practices for handling missing values and outliers in Pandas datasets.
Impute missing values with specific strategies, identify and handle outliers using statistical
methods.
9. How do you optimize Pandas code for performance and efficiency?
Vectorized operations, data type optimizations, caching results, using appropriate indexing
methods.
10. Explain the integration of Pandas with other Python libraries like NumPy or Scikit-learn.
Utilize NumPy for fast array operations and Scikit-learn for statistical analysis and machine
learning tasks.
Selenium Interview Questions for Python
1. What are the benefits of using Python Selenium for web automation?
Open-source and free to use.
Supports various web browsers and operating systems.
Provides comprehensive API for interacting with web elements.
Integrates well with other Python libraries for data manipulation and testing frameworks.
2. Explain the different Selenium WebDriver options available (Chrome, Firefox, etc.).
Each WebDriver interacts with a specific browser. Choose based on project requirements
and compatibility.
Some popular options include ChromeDriver, FirefoxDriver, EdgeDriver.
3. How do you locate web elements on a page using Selenium?
Different locators like by ID, name, class name, XPath, CSS selector offer unique targeting
abilities.
Choose the most efficient and robust locator based on the element structure.
4. Explain how to handle dynamic web elements that change their attributes or IDs.
Use WebDriverWait with ExpectedConditions to wait for elements to become visible or
clickable.
Consider using relative locators or XPath constructs that adapt to dynamic changes.
5. How do you automate user interactions like clicking buttons, entering text, and submitting
forms?
Use corresponding methods like click(), send_keys(), and submit() on identified elements.
Consider handling JavaScript alerts and confirmations if encountered.
6. Describe methods for dealing with frames and nested elements in web pages.
Use switch_to.frame() to switch context to frames, then locate and interact with elements
within.
Consider using relative locators that traverse through the element hierarchy.
7. How do you capture screenshots or specific page elements during test execution?
Use save_screenshot() method to capture entire page or element.screenshot() for specific
elements.
Integrate screenshots into test reports for visual evidence of failures.
8. Explain how to synchronize your automation script with page loading and element appearance.
Use explicit waits with WebDriverWait and ExpectedConditions to avoid timing issues.
Consider implicit waits as a fallback mechanism for elements that appear consistently.
9. How do you handle different scenarios like pop-ups, alerts, and JavaScript prompts?
Use browser-specific methods like accept_alert() or dismiss_alert() to handle alerts.
Utilize WebDriverWait for JavaScript execution with execute_script() method.
10. How do you integrate data and assertions into your Selenium automation scripts?
Read data from external files or APIs, utilize libraries like pandas for data manipulation.
Use assertion libraries like pytest or unittest to verify successful test execution and
expected outcomes.
Scenario Based Interview Questions
1. You’re building a web scraper to collect product details from an e-commerce site. How would
you handle dynamic page elements and potential access blocks?
Answer: I’d use Selenium with WebDriverWait and ExpectedConditions to handle dynamic
elements. For access blocks, I’d try user-agent rotation, headless browsing, and changing
IP addresses to evade detection. If that fails, I’d consider alternative data sources or APIs.
2. You’re analyzing customer purchase data for a clothing store. How would you identify trends
and segments for targeted marketing campaigns?
Answer: I’d use Pandas for data manipulation and analysis. I’d look for patterns in
purchase history, demographics, and location data using group-by functions and
visualizations. K-means clustering could help identify distinct customer segments for
targeted campaigns.
3. You’re building a financial data dashboard. How would you ensure real-time updates and
handle latency issues?
Answer: I’d use libraries like Flask or Dash to build the interactive dashboard. For real-time
updates, I’d consider WebSockets or SSE (Server-Sent Events) for server-to-client
communication. To minimize latency, I’d cache frequently accessed data, optimize queries,
and leverage asynchronous tasks.
4. You’re dealing with a large CSV file containing messy data. How would you clean and validate it
before further analysis?
Answer: I’d use regular expressions and Pandas utilities to handle missing values,
inconsistencies, and invalid formats. Data validation libraries like Pandas-Schema or PyPI’s
“datachecker” could also be helpful.
5. You’re tasked with optimizing a Python script that takes too long to run. How would you
approach performance improvement?
Answer: I’d analyze the code for bottlenecks using profiling tools like cProfile or
line_profiler. Based on the results, I’d optimize algorithms, utilize vectorized operations,
memoization, and data caching techniques.
6. You’re building a machine learning model for sentiment analysis. How would you prepare and
pre-process your text data for optimal results?
Answer: I’d use Natural Language Processing (NLP) libraries like NLTK or spaCy for
tokenization, cleaning, stop word removal, and stemming/lemmatization. TF-IDF or similar
techniques could be employed for feature engineering.
7. You’re building a REST API endpoint. How would you handle authentication, authorization, and
error handling?
Answer: I’d utilize libraries like Flask-JWT or Flask-RESTful for authentication and
authorization. For error handling, I’d define custom error codes and responses based on
the type of error encountered.
8. You’re tasked with automating repetitive tasks in Excel using Python. How would you approach
this?
Answer: I’d use libraries like openpyxl to manipulate Excel spreadsheets. I’d automate
tasks like data extraction, cleaning, formatting, and report generation using loops and
conditional statements.
9. You encounter a bug in your Python code. How do you debug it effectively?
Answer: I’d use the built-in Python debugger (pdb) to step through the code line by line.
I’d also utilize print statements strategically, analyze error messages, and leverage IDE
features like breakpoint debugging.
10. You’re working on a team project with other developers. How do you ensure consistent coding
style and collaboration on Python code?
Answer: I’d advocate for using linters and formatters like Black and Pylint for code style
standardization. Utilizing Git and code versioning tools like GitHub would facilitate
collaboration and version control.