50 PySpark Interview Questions.pdf
50 PySpark Interview Questions.pdf
Asked PySpark
Interview Questions
in 2024
4. Aggregations
1. How do you use groupBy with aggregations in
PySpark?
2. What is countDistinct, and how is it used?
3. How can you calculate multiple aggregations on
the same DataFrame?
4. What is the difference between agg() and
groupBy()?
5. How can you use window functions in PySpark?
6. Joins
1. What are the different types of joins in PySpark?
2. How do you perform a broadcast join in PySpark?
3. What is the difference between a left join and an
inner join?
4. How do you optimize joins in PySpark?
5. Explain semi and anti joins in PySpark.
8. Data Serialization
and File Formats
1. What are the different file formats PySpark can
read and write?
2. How do you write a DataFrame to Parquet format?
3. How do you read and write data to S3 using
PySpark?
4. What is the difference between CSV and ORC file
formats?
5. How do you enable compression when writing files
in PySpark?
https://www.seekhobigdata.com/