Databricks Certified Data Engineer Associate Exam Cheat Sheet Exam By Dudley - Page 1
Free Questions for Databricks-
Certified-Data-Engineer-Associate
Shared by Dudley on 02-09-2025
For More Free Questions and Preparation Resources
Check the Links on Last Page
Databricks Certified Data Engineer Associate Exam Cheat Sheet Exam By Dudley - Page 2
Question 1
Question Type: MultipleChoice
A data engineering team has two tables. The first table march_transactions is a collection of all
retail transactions in the month of March. The second table april_transactions is a collection of all
retail transactions in the month of April. There are no duplicate records between the tables.
Which of the following commands should be run to create a new table all_transactions that
contains all records from march_transactions and april_transactions without duplicate records?
Options:
A) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
INNER JOIN SELECT * FROM april_transactions;
B) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
UNION SELECT * FROM april_transactions;
C) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
OUTER JOIN SELECT * FROM april_transactions;
D) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
INTERSECT SELECT * from april_transactions;
E) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
MERGE SELECT * FROM april_transactions;
Answer:
B
Explanation:
The correct command to create a new table that contains all records from two tables without
duplicate records is to use theUNIONoperator. The UNION operator combines the results of two
queries and removes any duplicate rows. The INNER JOIN, OUTER JOIN, and MERGE operators do
not remove duplicate rows, and the INTERSECT operator only returns the rows that are common
to both tables. Therefore, option B is the only correct answer.Reference:Databricks SQL
Reference - UNION,Databricks SQL Reference - JOIN,Databricks SQL Reference - MERGE,
[Databricks SQL Reference - INTERSECT]
Databricks Certified Data Engineer Associate Exam Cheat Sheet Exam By Dudley - Page 3
Question 2
Question Type: MultipleChoice
A data analyst has developed a query that runs against Delta table. They want help from the data
engineering team to implement a series of tests to ensure the data returned by the query is
clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following operations could the data engineering team use to run the query and
operate with the results in PySpark?
Options:
A) SELECT * FROM sales
B) spark.delta.table
C) spark.sql
D) There is no way to share data between PySpark and SQL.
E) spark.table
Answer:
C
Explanation:
The spark.sql operation allows the data engineering team to run a SQL query and return the
result as a PySpark DataFrame. This way, the data engineering team can use the same query
that the data analyst has developed and operate with the results in PySpark. For example, the
data engineering team can use spark.sql(''SELECT * FROM sales'') to get a DataFrame of all the
records from the sales Delta table, and then apply various tests or transformations using PySpark
APIs. The other options are either not valid operations (A, D), not suitable for running a SQL query
(B, E), or not returning a DataFrame (A).Reference:Databricks Documentation - Run SQL
queries,Databricks Documentation - Spark SQL and DataFrames.
Question 3
Question Type: MultipleChoice
A data engineering team has two tables. The first table march_transactions is a collection of all
Databricks Certified Data Engineer Associate Exam Cheat Sheet Exam By Dudley - Page 4
retail transactions in the month of March. The second table april_transactions is a collection of all
retail transactions in the month of April. There are no duplicate records between the tables.
Which of the following commands should be run to create a new table all_transactions that
contains all records from march_transactions and april_transactions without duplicate records?
Options:
A) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
INNER JOIN SELECT * FROM april_transactions;
B) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
UNION SELECT * FROM april_transactions;
C) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
OUTER JOIN SELECT * FROM april_transactions;
D) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
INTERSECT SELECT * from april_transactions;
E) CREATE TABLE all_transactions AS
SELECT * FROM march_transactions
MERGE SELECT * FROM april_transactions;
Answer:
B
Explanation:
The correct command to create a new table that contains all records from two tables without
duplicate records is to use theUNIONoperator. The UNION operator combines the results of two
queries and removes any duplicate rows. The INNER JOIN, OUTER JOIN, and MERGE operators do
not remove duplicate rows, and the INTERSECT operator only returns the rows that are common
to both tables. Therefore, option B is the only correct answer.Reference:Databricks SQL
Reference - UNION,Databricks SQL Reference - JOIN,Databricks SQL Reference - MERGE,
[Databricks SQL Reference - INTERSECT]
Question 4
Question Type: MultipleChoice
Databricks Certified Data Engineer Associate Exam Cheat Sheet Exam By Dudley - Page 5
A data analyst has developed a query that runs against Delta table. They want help from the data
engineering team to implement a series of tests to ensure the data returned by the query is
clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following operations could the data engineering team use to run the query and
operate with the results in PySpark?
Options:
A) SELECT * FROM sales
B) spark.delta.table
C) spark.sql
D) There is no way to share data between PySpark and SQL.
E) spark.table
Answer:
C
Explanation:
The spark.sql operation allows the data engineering team to run a SQL query and return the
result as a PySpark DataFrame. This way, the data engineering team can use the same query
that the data analyst has developed and operate with the results in PySpark. For example, the
data engineering team can use spark.sql(''SELECT * FROM sales'') to get a DataFrame of all the
records from the sales Delta table, and then apply various tests or transformations using PySpark
APIs. The other options are either not valid operations (A, D), not suitable for running a SQL query
(B, E), or not returning a DataFrame (A).Reference:Databricks Documentation - Run SQL
queries,Databricks Documentation - Spark SQL and DataFrames.
Databricks Certified Data Engineer Associate Exam Cheat Sheet Exam By Dudley - Page 6
To Get Premium Files for Databricks-
Certified-Data-Engineer-Associate Visit
https://www.p2pexams.com/products/databricks-certifi
ed-data-engineer-associate
For More Free Questions Visit
https://www.p2pexams.com/databricks/pdf/databricks-
certified-data-engineer-associate