Databricks Generative AI Engineer Associate Exam Free Dumps
Databricks Generative AI Engineer Associate Exam Free Dumps
2.A Generative AI Engineer is creating an agent-based LLM system for their favorite monster truck
team. The system can answer text based questions about the monster truck team, lookup event dates
via an API call, or query tables on the team’s latest standings.
How could the Generative AI Engineer best design these capabilities into their system?
A. Ingest PDF documents about the monster truck team into a vector store and query it in a RAG
architecture.
B. Write a system prompt for the agent listing available tools and bundle it into an agent system that
runs a number of calls to solve a query.
C. Instruct the LLM to respond with “RAG”, “API”, or “TABLE” depending on the query, then use
text parsing and conditional statements to resolve the query.
D. Build a system prompt with all possible event dates and table information in the system prompt.
Use a RAG architecture to lookup generic text questions and otherwise leverage the information in
the system prompt.
Answer: B
Explanation:
In this scenario, the Generative AI Engineer needs to design a system that can handle different types
of queries about the monster truck team. The queries may involve text-based information, API
lookups for event dates, or table queries for standings. The best solution is to implement a tool-based
agent system.
Here’s how option B works, and why it’s the most appropriate answer:
System Design Using Agent-Based Model:
In modern agent-based LLM systems, you can design a system where the LLM (Large Language
Model) acts as a central orchestrator. The model can "decide" which tools to use based on the query.
These tools can include API calls, table lookups, or natural language searches. The system should
contain a system prompt that informs the LLM about the available tools.
System Prompt Listing Tools:
By creating a well-crafted system prompt, the LLM knows which tools are at its disposal. For instance,
one tool may query an external API for event dates, another might look up standings in a database,
and a third may involve searching a vector database for general text-based information. The agent
will be responsible for calling the appropriate tool depending on the query.
Agent Orchestration of Calls:
The agent system is designed to execute a series of steps based on the incoming query. If a user
asks for the next event date, the system will recognize this as a task that requires an API call. If the
user asks about standings, the agent might query the appropriate table in the database. For text-
based questions, it may call a search function over ingested data. The agent orchestrates this entire
process, ensuring the LLM makes calls to the right resources dynamically. Generative AI Tools and
Context:
This is a standard architecture for integrating multiple functionalities into a system where each query
requires different actions. The core design in option B is efficient because it keeps the system
modular and dynamic by leveraging tools rather than overloading the LLM with static information in a
system prompt (like option D).
Why Other Options Are Less Suitable:
A (RAG Architecture): While relevant, simply ingesting PDFs into a vector store only helps with text-
based retrieval. It wouldn’t help with API lookups or table queries.
C (Conditional Logic with RAG/API/TABLE): Although this approach works, it relies heavily on manual
text parsing and might introduce complexity when scaling the system.
D (System Prompt with Event Dates and Standings): Hardcoding dates and table information into a
system prompt isn’t scalable. As the standings or events change, the system would need constant
updating, making it inefficient.
By bundling multiple tools into a single agent-based system (as in option B), the Generative AI
Engineer can best handle the diverse requirements of this system.
3.After changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a
shorter context length that the company self-hosts, the Generative AI Engineer is getting the following
error:
What TWO solutions should the Generative AI Engineer implement without changing the response
generating model? (Choose two.)
A. Use a smaller embedding model to generate
B. Reduce the maximum output tokens of the new model
C. Decrease the chunk size of embedded documents
D. Reduce the number of records retrieved from the vector database
E. Retrain the response generating model using ALiBi
Answer: C, D
Explanation:
Problem Context: After switching to a model with a shorter context length, the error message
indicating that the prompt token count has exceeded the limit suggests that the input to the model is
too large.
Explanation of Options:
Option A: Use a smaller embedding model to generate C This wouldn't necessarily address the issue
of prompt size exceeding the model’s token limit.
Option B: Reduce the maximum output tokens of the new model C This option affects the output
length, not the size of the input being too large.
Option C: Decrease the chunk size of embedded documents C This would help reduce the size of
each document chunk fed into the model, ensuring that the input remains within the model's context
length limitations.
Option D: Reduce the number of records retrieved from the vector database C By retrieving fewer
records, the total input size to the model can be managed more effectively, keeping it within the
allowable token limits.
Option E: Retrain the response generating model using ALiBi C Retraining the model is contrary to
the stipulation not to change the response generating model.
Options C and D are the most effective solutions to manage the model’s shorter context length
without changing the model itself, by adjusting the input size both in terms of individual document size
and total documents retrieved.
4.A Generative AI Engineer has a provisioned throughput model serving endpoint as part of a RAG
application and would like to monitor the serving endpoint’s incoming requests and outgoing
responses. The current approach is to include a micro-service in between the endpoint and the user
interface to write logs to a remote server.
Which Databricks feature should they use instead which will perform the same task?
A. Vector Search
B. Lakeview
C. DBSQL
D. Inference Tables
Answer: D
Explanation:
Problem Context: The goal is to monitor the serving endpoint for incoming requests and outgoing
responses in a provisioned throughput model serving endpoint within a Retrieval-Augmented
Generation (RAG) application. The current approach involves using a microservice to log requests
and responses to a remote server, but the Generative AI Engineer is looking for a more streamlined
solution within Databricks.
Explanation of Options:
Option A: Vector Search: This feature is used to perform similarity searches within vector databases.
It doesn’t provide functionality for logging or monitoring requests and responses in a serving
endpoint, so it’s not applicable here.
Option B: Lakeview: Lakeview is not a feature relevant to monitoring or logging request-response
cycles for serving endpoints. It might be more related to viewing data in Databricks Lakehouse but
doesn’t fulfill the specific monitoring requirement.
Option C: DBSQL: Databricks SQL (DBSQL) is used for running SQL queries on data stored in
Databricks, primarily for analytics purposes. It doesn’t provide the direct functionality needed to
monitor requests and responses in real-time for an inference endpoint.
Option D: Inference Tables: This is the correct answer. Inference Tables in Databricks are designed
to store the results and metadata of inference runs. This allows the system to log incoming requests
and outgoing responses directly within Databricks, making it an ideal choice for monitoring the
behavior of a provisioned serving endpoint. Inference Tables can be queried and analyzed, enabling
easier monitoring and debugging compared to a custom microservice.
Thus, Inference Tables are the optimal feature for monitoring request and response logs within the
Databricks infrastructure for a model serving endpoint.
5.A Generative Al Engineer would like an LLM to generate formatted JSON from emails.
This will require parsing and extracting the following information: order ID, date, and sender email.
Here’s a sample email:
They will need to write a prompt that will extract the relevant information in JSON format with the
highest level of output accuracy.
Which prompt will do that?
A. You will receive customer emails and need to extract date, sender email, and order ID. You should
return the date, sender email, and order ID information in JSON format.
B. You will receive customer emails and need to extract date, sender email, and order ID. Return the
extracted information in JSON format.
Here’s an example: {“date”: “April 16, 2024”, “sender_email”: “[email protected]”,
“order_id”: “RE987D”}
C. You will receive customer emails and need to extract date, sender email, and order ID. Return the
extracted information in a human-readable format.
D. You will receive customer emails and need to extract date, sender email, and order ID. Return the
extracted information in JSON format.
Answer: B
Explanation:
Problem Context: The goal is to parse emails to extract certain pieces of information and output this in
a structured JSON format. Clarity and specificity in the prompt design will ensure higher accuracy in
the LLM’s responses.
Explanation of Options:
Option A: Provides a general guideline but lacks an example, which helps an LLM understand the
exact format expected.
Option B: Includes a clear instruction and a specific example of the output format. Providing an
example is crucial as it helps set the pattern and format in which the information should be structured,
leading to more accurate results.
Option C: Does not specify that the output should be in JSON format, thus not meeting the
requirement.
Option D: While it correctly asks for JSON format, it lacks an example that would guide the LLM on
how to structure the JSON correctly.
Therefore, Option B is optimal as it not only specifies the required format but also illustrates it with an
example, enhancing the likelihood of accurate extraction and formatting by the LLM.
6.A Generative AI Engineer developed an LLM application using the provisioned throughput
Foundation Model API. Now that the application is ready to be deployed, they realize their volume of
requests are not sufficiently high enough to create their own provisioned throughput endpoint. They
want to choose a strategy that ensures the best cost-effectiveness for their application.
What strategy should the Generative AI Engineer use?
A. Switch to using External Models instead
B. Deploy the model using pay-per-token throughput as it comes with cost guarantees
C. Change to a model with a fewer number of parameters in order to reduce hardware constraint
issues
D. Throttle the incoming batch of requests manually to avoid rate limiting issues
Answer: B
Explanation:
Problem Context: The engineer needs a cost-effective deployment strategy for an LLM application
with relatively low request volume.
Explanation of Options:
Option A: Switching to external models may not provide the required control or integration necessary
for specific application needs.
Option B: Using a pay-per-token model is cost-effective, especially for applications with variable or
low request volumes, as it aligns costs directly with usage.
Option C: Changing to a model with fewer parameters could reduce costs, but might also impact the
performance and capabilities of the application.
Option D: Manually throttling requests is a less efficient and potentially error-prone strategy for
managing costs.
Option B is ideal, offering flexibility and cost control, aligning expenses directly with the application's
usage patterns.
7.A Generative Al Engineer is helping a cinema extend its website's chat bot to be able to respond to
questions about specific showtimes for movies currently playing at their local theater. They already
have the location of the user provided by location services to their agent, and a Delta table which is
continually updated with the latest showtime information by location. They want to implement this new
capability In their RAG application.
Which option will do this with the least effort and in the most performant way?
A. Create a Feature Serving Endpoint from a FeatureSpec that references an online store synced
from the Delta table. Query the Feature Serving Endpoint as part of the agent logic / tool
implementation.
B. Query the Delta table directly via a SQL query constructed from the user's input using a text-to-
SQL LLM in the agent logic / tool
C. implementation. Write the Delta table contents to a text column.then embed those texts using an
embedding model and store these in the vector index Look
up the information based on the embedding as part of the agent logic / tool implementation.
D. Set up a task in Databricks Workflows to write the information in the Delta table periodically to an
external database such as MySQL and query the information from there as part of the agent logic /
tool implementation.
Answer: A
Explanation:
The task is to extend a cinema chatbot to provide movie showtime information using a RAG
application, leveraging user location and a continuously updated Delta table, with minimal effort and
high performance. Let’s evaluate the options.
Option A: Create a Feature Serving Endpoint from a FeatureSpec that references an online store
synced from the Delta table. Query the Feature Serving Endpoint as part of the agent logic / tool
implementation
Databricks Feature Serving provides low-latency access to real-time data from Delta tables via an
online store. Syncing the Delta table to a Feature Serving Endpoint allows the chatbot to query
showtimes efficiently, integrating seamlessly into the RAG agent’s tool logic. This leverages
Databricks’ native infrastructure, minimizing effort and ensuring performance.
Databricks
Reference: "Feature Serving Endpoints provide real-time access to Delta table data with low latency,
ideal for production systems" ("Databricks Feature Engineering Guide," 2023).
Option B: Query the Delta table directly via a SQL query constructed from the user's input using a text-
to-SQL LLM in the agent logic / tool
Using a text-to-SQL LLM to generate queries adds complexity (e.g., ensuring accurate SQL
generation) and latency (LLM inference + SQL execution). While feasible, it’s less performant and
requires more effort than a pre-built serving solution.
Databricks
Reference: "Direct SQL queries are flexible but may introduce overhead in real-time applications"
("Building LLM Applications with Databricks").
Option C: Write the Delta table contents to a text column, then embed those texts using an
embedding model and store these in the vector index. Look up the information based on the
embedding as part of the agent logic / tool implementation
Converting structured Delta table data (e.g., showtimes) into text, embedding it, and using vector
search is inefficient for structured lookups. It’s effort-intensive (preprocessing, embedding) and less
precise than direct queries, undermining performance.
Databricks
Reference: "Vector search excels for unstructured data, not structured tabular lookups" ("Databricks
Vector Search Documentation").
Option D: Set up a task in Databricks Workflows to write the information in the Delta table periodically
to an external database such as MySQL and query the information from there as part of the agent
logic / tool implementation
Exporting to an external database (e.g., MySQL) adds setup effort (workflow, external DB
management) and latency (periodic updates vs. real-time). It’s less performant and more complex
than using Databricks’ native tools.
Databricks
Reference: "Avoid external systems when Delta tables provide real-time data natively" ("Databricks
Workflows Guide").
Conclusion: Option A minimizes effort by using Databricks Feature Serving for real-time, low-latency
access to the Delta table, ensuring high performance in a production-ready RAG chatbot.
8.A Generative Al Engineer has successfully ingested unstructured documents and chunked them by
document sections. They would like to store the chunks in a Vector Search index. The current format
of the dataframe has two columns: (i) original document file name (ii) an array of text chunks for each
document.
What is the most performant way to store this dataframe?
A. Split the data into train and test set, create a unique identifier for each document, then save to a
Delta table
B. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a
Delta table
C. First create a unique identifier for each document, then save to a Delta table
D. Store each chunk as an independent JSON file in Unity Catalog Volume. For each JSON file, the
key is the document section name and the value is the array of text chunks for that section
Answer: B
Explanation:
Problem Context: The engineer needs an efficient way to store chunks of unstructured documents to
facilitate easy retrieval and search. The current dataframe consists of document filenames and
associated text chunks.
Explanation of Options:
Option A: Splitting into train and test sets is more relevant for model training scenarios and not
directly applicable to storage for retrieval in a Vector Search index.
Option B: Flattening the dataframe such that each row contains a single chunk with a unique identifier
is the most performant for storage and retrieval. This structure aligns well with how data is indexed
and queried in vector search applications, making it easier to retrieve specific chunks efficiently.
Option C: Creating a unique identifier for each document only does not address the need to access
individual chunks efficiently, which is critical in a Vector Search application.
Option D: Storing each chunk as an independent JSON file creates unnecessary overhead and
complexity in managing and querying large volumes of files.
Option B is the most efficient and practical approach, allowing for streamlined indexing and retrieval
processes in a Delta table environment, fitting the requirements of a Vector Search index.
9.A Generative Al Engineer is working with a retail company that wants to enhance its customer
experience by automatically handling common customer inquiries. They are working on an LLM-
powered Al solution that should improve response times while maintaining a personalized interaction.
They want to define the appropriate input and LLM task to do this.
Which input/output pair will do this?
A. Input: Customer reviews; Output Group the reviews by users and aggregate per-user average
rating, then respond
B. Input: Customer service chat logs; Output Group the chat logs by users, followed by summarizing
each user's interactions, then respond
C. Input: Customer service chat logs; Output: Find the answers to similar questions and respond with
a summary
D. Input: Customer reviews: Output Classify review sentiment
Answer: C
Explanation:
The task described in the question involves enhancing customer experience by automatically
handling common customer inquiries using an LLM-powered AI solution. This requires the system to
process input data (customer inquiries) and generate personalized, relevant responses efficiently.
Let’s evaluate the options step-by-step in the context of Databricks Generative AI Engineer
principles, which emphasize leveraging LLMs for tasks like question answering, summarization, and
retrieval-augmented generation (RAG).
Option A: Input: Customer reviews; Output: Group the reviews by users and aggregate per-user
average rating, then respond
This option focuses on analyzing customer reviews to compute average ratings per user. While this
might be useful for sentiment analysis or user profiling, it does not directly address the goal of
handling common customer inquiries or improving response times for personalized interactions.
Customer reviews are typically feedback data, not real-time inquiries requiring immediate responses.
Databricks
Reference: Databricks documentation on LLMs (e.g., "Building LLM Applications with Databricks")
emphasizes that LLMs excel at tasks like question answering and conversational responses, not just
aggregation or statistical analysis of reviews.
Option B: Input: Customer service chat logs; Output: Group the chat logs by users, followed by
summarizing each user's interactions, then respond
This option uses chat logs as input, which aligns with customer service scenarios. However, the
output?grouping by users and summarizing interactions?focuses on user-specific summaries rather
than directly addressing inquiries. While summarization is an LLM capability, this approach lacks the
specificity of finding answers to common questions, which is central to the problem. Databricks
Reference: Per Databricks’ "Generative AI Cookbook," LLMs can summarize text, but for customer
service, the emphasis is on retrieval and response generation (e.g., RAG workflows) rather than user
interaction summaries alone.
Option C: Input: Customer service chat logs; Output: Find the answers to similar questions and
respond with a summary
This option uses chat logs (real customer inquiries) as input and tasks the LLM with identifying
answers to similar questions, then providing a summarized response. This directly aligns with the goal
of handling common inquiries efficiently while maintaining personalization (by referencing past
interactions or similar cases). It leverages LLM capabilities like semantic search, retrieval, and
response generation, which are core to Databricks’ LLM workflows.
Databricks
Reference: From Databricks documentation ("Building LLM-Powered Applications," 2023),
an exact extract states: "For customer support use cases, LLMs can be used to retrieve relevant
answers from historical data like chat logs and generate concise, contextually appropriate responses."
This matches Option C’s approach of finding answers and summarizing them.
Option D: Input: Customer reviews; Output: Classify review sentiment
This option focuses on sentiment classification of reviews, which is a valid LLM task but unrelated to
handling customer inquiries or improving response times in a conversational context. It’s more suited
for feedback analysis than real-time customer service.
Databricks
Reference: Databricks’ "Generative AI Engineer Guide" notes that sentiment analysis is a common
LLM task, but it’s not highlighted for real-time conversational applications like customer support.
Conclusion: Option C is the best fit because it uses relevant input (chat logs) and defines an LLM task
(finding answers and summarizing) that meets the requirements of improving response times and
maintaining personalized interaction. This aligns with Databricks’ recommended practices for LLM-
powered customer service solutions, such as retrieval-augmented generation (RAG) workflows.
10.A Generative AI Engineer is designing a RAG application for answering user questions on
technical regulations as they learn a new sport.
What are the steps needed to build this RAG application and deploy it?
A. Ingest documents from a source C> Index the documents and saves to Vector Search C> User
submits queries against an LLM C> LLM retrieves relevant documents C> Evaluate model C> LLM
generates a response C> Deploy it using Model Serving
B. Ingest documents from a source C> Index the documents and save to Vector Search C> User
submits queries against an LLM C> LLM retrieves relevant documents C> LLM generates a response
-> Evaluate model C> Deploy it using Model Serving
C. Ingest documents from a source C> Index the documents and save to Vector Search C> Evaluate
model C> Deploy it using Model Serving
D. User submits queries against an LLM C> Ingest documents from a source C> Index the
documents and save to Vector Search C> LLM retrieves relevant documents C> LLM generates a
response C> Evaluate model C> Deploy it using Model Serving
Answer: B
Explanation:
The Generative AI Engineer needs to follow a methodical pipeline to build and deploy a Retrieval-
Augmented Generation (RAG) application. The steps outlined in option B accurately reflect this
process:
Ingest documents from a source: This is the first step, where the engineer collects documents (e.g.,
technical regulations) that will be used for retrieval when the application answers user questions.
Index the documents and save to Vector Search: Once the documents are ingested, they need to be
embedded using a technique like embeddings (e.g., with a pre-trained model like BERT) and stored in
a vector database (such as Pinecone or FAISS). This enables fast retrieval based on user queries.
User submits queries against an LLM: Users interact with the application by submitting their queries.
These queries will be passed to the LLM.
LLM retrieves relevant documents: The LLM works with the vector store to retrieve the most relevant
documents based on their vector representations.
LLM generates a response: Using the retrieved documents, the LLM generates a response that is
tailored to the user's question.
Evaluate model: After generating responses, the system must be evaluated to ensure the retrieved
documents are relevant and the generated response is accurate. Metrics such as accuracy,
relevance, and user satisfaction can be used for evaluation.
Deploy it using Model Serving: Once the RAG pipeline is ready and evaluated, it is deployed using a
model-serving platform such as Databricks Model Serving. This enables real-time inference and
response generation for users.
By following these steps, the Generative AI Engineer ensures that the RAG application is both
efficient and effective for the task of answering technical regulation questions.