Databricks Certified Data Engineer Associate - Practice Questions
Advanced Lakehouse Architecture
Q: Which layer handles security and access control?
A. Delta Lake Schema Enforcement
B. Improves read performance on selective queries
C. Data governance layer
D. Avoids scanning irrelevant data files
Answer: C
Q: Why is Delta Lake better than plain Parquet?
A. Delta Lake Transaction Log
B. Z-order clustering and caching
C. Supports ACID and time travel
D. Centralized governance
Answer: C
Q: What feature in Delta Lake enables scalable metadata handling?
A. Data governance layer
B. Delta Lake Transaction Log
C. Unifies analytics and machine learning on one platform
D. Avoids scanning irrelevant data files
Answer: B
Q: What is a Lakehouse paradigm?
A. Stores metadata as part of the transaction log
B. Unifies analytics and machine learning on one platform
C. Centralized governance
D. Data governance layer
Answer: B
Q: What is a primary benefit of Unity Catalog for large organizations?
A. Data governance layer
B. Stores metadata as part of the transaction log
C. Delta Lake Schema Enforcement
D. Centralized governance
Answer: D
Databricks Certified Data Engineer Associate - Practice Questions
Q: How does Delta Lake handle metadata scaling?
A. Improves read performance on selective queries
B. Data governance layer
C. Unifies analytics and machine learning on one platform
D. Stores metadata as part of the transaction log
Answer: D
Q: What is the function of `OPTIMIZE ZORDER BY`?
A. Improves read performance on selective queries
B. Z-order clustering and caching
C. Unifies analytics and machine learning on one platform
D. Delta Lake Transaction Log
Answer: A
Q: What is the role of data skipping in Delta Lake?
A. Unifies analytics and machine learning on one platform
B. Centralized governance
C. Supports ACID and time travel
D. Avoids scanning irrelevant data files
Answer: D
Q: Which component ensures strong schema enforcement?
A. Z-order clustering and caching
B. Delta Lake Schema Enforcement
C. Supports ACID and time travel
D. Improves read performance on selective queries
Answer: B
Q: How does the Lakehouse optimize query performance?
A. Z-order clustering and caching
B. Data governance layer
C. Unifies analytics and machine learning on one platform
D. Delta Lake Schema Enforcement
Answer: A
Data Quality & Testing
Databricks Certified Data Engineer Associate - Practice Questions
Q: How do you test SQL transformations?
A. Integration and regression tests
B. Runtime error
C. Use mock data and compare results
D. Ensure code correctness in isolation
Answer: C
Q: What is a good practice to validate schema before writing?
A. Integration and regression tests
B. Use mock data and compare results
C. Completeness and accuracy
D. Use assert statements or schema checks
Answer: D
Q: What is the purpose of unit tests in pipelines?
A. Ensure code correctness in isolation
B. Use assert statements or schema checks
C. Runtime error
D. Use mock data and compare results
Answer: A
Q: Which tool allows data expectations to be defined and validated?
A. Using expectations with 'fail', 'drop', or 'quarantine'
B. Completeness and accuracy
C. Delta Live Tables with expectations
D. Integration and regression tests
Answer: C
Q: How can bad data be redirected during ETL?
A. Runtime error
B. Using expectations with 'fail', 'drop', or 'quarantine'
C. Continuous monitoring and validation
D. Delta Live Tables with expectations
Answer: B
Q: What kind of tests are suitable for production pipelines?
Databricks Certified Data Engineer Associate - Practice Questions
A. Ensure code correctness in isolation
B. Integration and regression tests
C. Runtime error
D. Catch regressions early
Answer: B
Q: What type of error does schema mismatch cause?
A. Integration and regression tests
B. Runtime error
C. Continuous monitoring and validation
D. Ensure code correctness in isolation
Answer: B
Q: What is a key element of data quality?
A. Continuous monitoring and validation
B. Use assert statements or schema checks
C. Completeness and accuracy
D. Using expectations with 'fail', 'drop', or 'quarantine'
Answer: C
Q: What is the benefit of pipeline test automation?
A. Use mock data and compare results
B. Integration and regression tests
C. Catch regressions early
D. Delta Live Tables with expectations
Answer: C
Q: Which feature in DLT ensures reliability?
A. Completeness and accuracy
B. Continuous monitoring and validation
C. Delta Live Tables with expectations
D. Use mock data and compare results
Answer: B
Deployment & Job Orchestration
Q: What task type runs notebooks in workflows?
Databricks Certified Data Engineer Associate - Practice Questions
A. Notebook task
B. Job clusters
C. Databricks Secrets API
D. Databricks Asset Bundles
Answer: A
Q: What metadata helps with pipeline debugging?
A. Databricks Secrets API
B. Notebook task
C. Run logs and task outputs
D. Repos and deployment APIs
Answer: C
Q: How to monitor job failures?
A. Use Change Data Feed
B. Notebook task
C. Run logs and task outputs
D. Enable alerts or use audit logs
Answer: D
Q: What mechanism isolates production jobs?
A. Use multi-task jobs in Jobs UI
B. Use Change Data Feed
C. Repos and deployment APIs
D. Job clusters
Answer: D
Q: Which tool allows deployment promotion?
A. Notebook task
B. Run logs and task outputs
C. Databricks Asset Bundles
D. Use Change Data Feed
Answer: C
Q: How to reprocess only updated data?
A. Run logs and task outputs
Databricks Certified Data Engineer Associate - Practice Questions
B. Notebook task
C. Using Git integration
D. Use Change Data Feed
Answer: D
Q: What feature allows CI/CD in Databricks?
A. Use multi-task jobs in Jobs UI
B. Databricks Asset Bundles
C. Notebook task
D. Repos and deployment APIs
Answer: D
Q: What is the best way to schedule complex workflows?
A. Databricks Secrets API
B. Databricks Asset Bundles
C. Use multi-task jobs in Jobs UI
D. Using Git integration
Answer: C
Q: How are secrets managed securely?
A. Use multi-task jobs in Jobs UI
B. Notebook task
C. Repos and deployment APIs
D. Databricks Secrets API
Answer: D
Q: How can jobs be version controlled?
A. Enable alerts or use audit logs
B. Repos and deployment APIs
C. Notebook task
D. Using Git integration
Answer: D
Performance Tuning & Optimization
Q: How to reduce small file problems?
A. Join reordering and cost-based optimizer
Databricks Certified Data Engineer Associate - Practice Questions
B. Use OPTIMIZE command
C. Improve performance of repeated queries
D. Broadcast join
Answer: B
Q: What helps reduce shuffle in joins?
A. Spark UI
B. Improves I/O pruning
C. Broadcast join
D. Data skipping
Answer: C
Q: Which command compacts Delta files?
A. OPTIMIZE
B. Spark UI
C. Data skipping
D. Use OPTIMIZE command
Answer: A
Q: What is a common cause of slow queries?
A. Join reordering and cost-based optimizer
B. Broadcast join
C. Skewed data or unnecessary shuffles
D. OPTIMIZE
Answer: C
Q: What tool visualizes Spark DAGs?
A. spark.sql.shuffle.partitions
B. Use OPTIMIZE command
C. Spark UI
D. Join reordering and cost-based optimizer
Answer: C
Q: What parameter sets parallelism in Spark?
A. Improves I/O pruning
B. spark.sql.shuffle.partitions
Databricks Certified Data Engineer Associate - Practice Questions
C. Broadcast join
D. Use OPTIMIZE command
Answer: B
Q: Why is caching used?
A. Data skipping
B. Improve performance of repeated queries
C. Join reordering and cost-based optimizer
D. Broadcast join
Answer: B
Q: What improves performance of star schema joins?
A. Join reordering and cost-based optimizer
B. spark.sql.shuffle.partitions
C. Data skipping
D. Broadcast join
Answer: A
Q: How does Z-order help in performance?
A. Skewed data or unnecessary shuffles
B. Improve performance of repeated queries
C. OPTIMIZE
D. Improves I/O pruning
Answer: D
Q: Which function avoids scanning non-relevant data?
A. Data skipping
B. Skewed data or unnecessary shuffles
C. OPTIMIZE
D. Improves I/O pruning
Answer: A
Streaming & Incremental Data Processing
Q: What mechanism enables stateful processing in Spark?
A. StateStore
B. Use upserts or deduplication techniques
Databricks Certified Data Engineer Associate - Practice Questions
C. Handles late data gracefully
D. Small batch of streaming data processed at intervals
Answer: A
Q: How is idempotence maintained in streaming?
A. Small batch of streaming data processed at intervals
B. Use upserts or deduplication techniques
C. Use checkpoints and write-ahead logs
D. Set mergeSchema=True during writeStream
Answer: B
Q: What is the purpose of watermarking in streaming?
A. Handles late data gracefully
B. Set mergeSchema=True during writeStream
C. Use upserts or deduplication techniques
D. start()
Answer: A
Q: How is schema evolution handled in streaming ingestion?
A. Delta Lake
B. Set mergeSchema=True during writeStream
C. Use checkpoints and write-ahead logs
D. start()
Answer: B
Q: Which method supports exactly-once delivery in Delta?
A. writeStream with checkpointing
B. Change Data Feed (CDF)
C. Use upserts or deduplication techniques
D. Delta Lake
Answer: A
Q: What command triggers a streaming job?
A. Use checkpoints and write-ahead logs
B. writeStream with checkpointing
C. Handles late data gracefully
Databricks Certified Data Engineer Associate - Practice Questions
D. start()
Answer: D
Q: How to ensure fault-tolerance in streaming?
A. Use checkpoints and write-ahead logs
B. Handles late data gracefully
C. StateStore
D. Delta Lake
Answer: A
Q: What format is optimal for streaming ingest?
A. Use checkpoints and write-ahead logs
B. Small batch of streaming data processed at intervals
C. StateStore
D. Delta Lake
Answer: D
Q: What is a micro-batch in Spark Structured Streaming?
A. Use upserts or deduplication techniques
B. Small batch of streaming data processed at intervals
C. start()
D. Change Data Feed (CDF)
Answer: B
Q: What feature enables processing changes only since last run?
A. Use checkpoints and write-ahead logs
B. start()
C. Change Data Feed (CDF)
D. writeStream with checkpointing
Answer: C