chatGPT Prompt Engineering
chatGPT Prompt Engineering
Some of the examples demonstrated here currently work only with our most capable model, gpt-4. In
general, if you find that a model fails at a task and a more capable model is available, it's often worth
trying again with the more capable model.
These models can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too
simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see.
The less the model has to guess at what you want, the more likely you’ll get it.
Tactics:
Provide examples
Language models can confidently invent fake answers, especially when asked about esoteric topics or
for citations and URLs. In the same way that a sheet of notes can help a student do better on a test,
providing reference text to these models can help in answering with fewer fabrications.
Tactics:
Just as it is good practice in software engineering to decompose a complex system into a set of modular
components, the same is true of tasks submitted to a language model. Complex tasks tend to have
higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow
of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.
Tactics:
Use intent classification to identify the most relevant instructions for a user query
For dialogue applications that require very long conversations, summarize or filter previous
dialogue
If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time.
Similarly, models make more reasoning errors when trying to answer right away, rather than taking time
to work out an answer. Asking for a "chain of thought" before an answer can help the model reason its
way toward correct answers more reliably.
Tactics:
Instruct the model to work out its own solution before rushing to a conclusion
Use inner monologue or a sequence of queries to hide the model's reasoning process
Compensate for the weaknesses of the model by feeding it the outputs of other tools. For example, a
text retrieval system (sometimes called RAG or retrieval augmented generation) can tell the model
about relevant documents. A code execution engine like OpenAI's Code Interpreter can help the model
do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a
language model, offload it to get the best of both.
Tactics:
Use code execution to perform more accurate calculations or call external APIs
Improving performance is easier if you can measure it. In some cases a modification to a prompt will
achieve better performance on a few isolated examples but lead to worse overall performance on a
more representative set of examples. Therefore to be sure that a change is net positive to performance
it may be necessary to define a comprehensive test suite (also known an as an "eval").
Tactic:
Tactics
Each of the strategies listed above can be instantiated with specific tactics. These tactics are meant to
provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to
try creative ideas not represented here.
In order to get a highly relevant response, make sure that requests provide any important details or
context. Otherwise you are leaving it up to the model to guess what you mean.
Worse Better
How do I add numbers How do I add up a row of dollar amounts in Excel? I want to do this
in Excel? automatically for a whole sheet of rows with all the totals ending up on the
right in a column called "Total".
Who’s president? Who was the president of Mexico in 2021, and how frequently are elections
held?
Write code to calculate Write a TypeScript function to efficiently calculate the Fibonacci sequence.
the Fibonacci Comment the code liberally to explain what each piece does and why it's
sequence. written that way.
Summarize the Summarize the meeting notes in a single paragraph. Then write a markdown
meeting notes. list of the speakers and each of their key points. Finally, list the next steps or
action items suggested by the speakers, if any.
The system message can be used to specify the persona used by the model in its replies.
SYSTEM When I ask for help to write something, you will reply with a document that
contains at least one joke or playful comment in every paragraph.
USER Write a thank you note to my steel bolt vendor for getting the delivery in on time
and in short notice. This made it possible for us to deliver an important order.
Delimiters like triple quotation marks, XML tags, section titles, etc. can help demarcate sections of text
to be treated differently.
SYSTEM You will be provided with a pair of articles (delimited with XML tags) about the same
topic. First summarize the arguments of each article. Then indicate which of them makes
a better argument and explain why.
SYSTEM You will be provided with a thesis abstract and a suggested title for it. The thesis title
should give the reader a good idea of the topic of the thesis but should also be eye-
catching. If the title does not meet these criteria, suggest 5 alternatives.
For straightforward tasks such as these, using delimiters might not make a difference in the output
quality. However, the more complex a task is the more important it is to disambiguate task details. Don’t
make the model work to understand exactly what you are asking of them.
Some tasks are best specified as a sequence of steps. Writing the steps out explicitly can make it easier
for the model to follow them.
SYSTEM Use the following step-by-step instructions to respond to user inputs. Step 1 - The user
will provide you with text in triple quotes. Summarize this text in one sentence with a
prefix that says "Summary: ". Step 2 - Translate the summary from Step 1 into Spanish,
with a prefix that says "Translation: ".
Providing general instructions that apply to all examples is generally more efficient than demonstrating
all permutations of a task by example, but in some cases providing examples may be easier. For
example, if you intend for the model to copy a particular style of responding to user queries which is
difficult to describe explicitly. This is known as "few-shot" prompting.
ASSISTANT The river that carves the deepest valley flows from a modest spring; the grandest
symphony originates from a single note; the most intricate tapestry begins with a
solitary thread.
You can ask the model to produce outputs that are of a given target length. The targeted output length
can be specified in terms of the count of words, sentences, paragraphs, bullet points, etc. Note however
that instructing the model to generate a specific number of words does not work with high precision.
The model can more reliably generate outputs with a specific number of paragraphs or bullet points.
If we can provide a model with trusted information that is relevant to the current query, then we can
instruct the model to use the provided information to compose its answer.
Given that all models have limited context windows, we need some way to dynamically lookup
information that is relevant to the question being asked. Embeddings can be used to implement efficient
knowledge retrieval.
Tactic: Instruct the model to answer with citations from a reference text
If the input has been supplemented with relevant knowledge, it's straightforward to request that the
model add citations to its answers by referencing passages from provided documents. Note that
citations in the output can then be verified programmatically by string matching within the provided
documents.
SYSTEM You will be provided with a document delimited by triple quotes and a
question. Your task is to answer the question using only the provided
document and to cite the passage(s) of the document used to answer the
question. If the document does not contain the information needed to
answer this question then simply write: "Insufficient information." If an
answer to the question is provided, it must be annotated with a citation. Use
the following format for to cite relevant passages ({"citation": …}).
Tactic: Use intent classification to identify the most relevant instructions for a user query
For tasks in which lots of independent sets of instructions are needed to handle different cases, it can be
beneficial to first classify the type of query and to use that classification to determine which instructions
are needed. This can be achieved by defining fixed categories and hardcoding instructions that are
relevant for handling tasks in a given category. This process can also be applied recursively to
decompose a task into a sequence of stages. The advantage of this approach is that each query will
contain only those instructions that are required to perform the next stage of a task which can result in
lower error rates compared to using a single query to perform the whole task. This can also result in
lower costs since larger prompts cost more to run
Suppose for example that for a customer service application, queries could be usefully classified as
follows:
SYSTEM You will be provided with customer service queries. Classify each query into a
primary category and a secondary category. Provide your output in json format
with the keys: primary and secondary.
Based on the classification of the customer query, a set of more specific instructions can be provided to
a model for it to handle next steps. For example, suppose the customer requires help with
"troubleshooting".
SYSTEM You will be provided with customer service inquiries that require troubleshooting in a
technical support context. Help the user by:
- Ask them to check that all cables to/from the router are connected. Note that it is
common for cables to come loose over time.
- If all cables are connected and the issue persists, ask them which router model they
are using
- Now you will advise them how to restart their device:
-- If the model number is MTD-327J, advise them to push the red button and hold it for
5 seconds, then wait 5 minutes before testing the connection.
-- If the model number is MTD-327S, advise them to unplug and replug it, then wait 5
minutes before testing the connection.
- If the customer's issue persists after restarting the device and waiting 5 minutes,
connect them to IT support by outputting {"IT support requested"}.
- If the user starts asking questions that are unrelated to this topic then confirm if they
would like to end the current chat about troubleshooting and classify their request
according to the following scheme:
Notice that the model has been instructed to emit special strings to indicate when the state of the
conversation changes. This enables us to turn our system into a state machine where the state
determines which instructions are injected. By keeping track of state, what instructions are relevant at
that state, and also optionally what state transitions are allowed from that state, we can put guardrails
around the user experience that would be hard to achieve with a less structured approach.
Tactic: For dialogue applications that require very long conversations, summarize or filter previous
dialogue
Since models have a fixed context length, dialogue between a user and an assistant in which the entire
conversation is included in the context window cannot continue indefinitely.
There are various workarounds to this problem, one of which is to summarize previous turns in the
conversation. Once the size of the input reaches a predetermined threshold length, this could trigger a
query that summarizes part of the conversation and the summary of the prior conversation could be
included as part of the system message. Alternatively, prior conversation could be summarized
asynchronously in the background throughout the entire conversation.
An alternative solution is to dynamically select previous parts of the conversation that are most relevant
to the current query.
Tactic: Summarize long documents piecewise and construct a full summary recursively
Since models have a fixed context length, they cannot be used to summarize a text longer than the
context length minus the length of the generated summary in a single query.
To summarize a very long document such as a book we can use a sequence of queries to summarize
each section of the document. Section summaries can be concatenated and summarized producing
summaries of summaries. This process can proceed recursively until an entire document is summarized.
If it’s necessary to use information about earlier sections in order to make sense of later sections, then a
further trick that can be useful is to include a running summary of the text that precedes any given point
in the book while summarizing content at that point. The effectiveness of this procedure for
summarizing books has been studied in previous research by OpenAI using variants of GPT-3.
Tactic: Instruct the model to work out its own solution before rushing to a conclusion
Sometimes we get better results when we explicitly instruct the model to reason from first principles
before coming to a conclusion. Suppose for example we want a model to evaluate a student’s solution
to a math problem. The most obvious way to approach this is to simply ask the model if the student's
solution is correct or not.
SYSTEM First work out your own solution to the problem. Then compare your solution to the
student's solution and evaluate if the student's solution is correct or not. Don't
decide if the student's solution is correct until you have done the problem yourself.
USER Problem Statement: I'm building a solar power installation and I need help working
out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and
an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of
square feet.
Tactic: Use inner monologue or a sequence of queries to hide the model's reasoning process
The previous tactic demonstrates that it is sometimes important for the model to reason in detail about
a problem before answering a specific question. For some applications, the reasoning process that a
model uses to arrive at a final answer would be inappropriate to share with the user. For example, in
tutoring applications we may want to encourage students to work out their own answers, but a model’s
reasoning process about the student’s solution could reveal the answer to the student.
Inner monologue is a tactic that can be used to mitigate this. The idea of inner monologue is to instruct
the model to put parts of the output that are meant to be hidden from the user into a structured format
that makes parsing them easy. Then before presenting the output to the user, the output is parsed and
only part of the output is made visible.
Alternatively, this can be achieved with a sequence of queries in which all except the last have their
output hidden from the end user.
First, we can ask the model to solve the problem on its own. Since this initial query doesn't require the
student’s solution, it can be omitted. This provides the additional advantage that there is no chance that
the model’s solution will be biased by the student’s attempted solution.
Next, we can have the model use all available information to assess the correctness of the student’s
solution.
SYSTEM Compare your solution to the student's solution and evaluate if the student's
solution is correct or not.
USER Problem statement: """<insert problem statement>"""
Finally, we can let the model use its own analysis to construct a reply in the persona of a helpful tutor.
SYSTEM You are a math tutor. If the student made an error, offer a hint to the student in a
way that does not reveal the answer. If the student did not make an error, simply
offer them an encouraging comment.
USER Problem statement: """<insert problem statement>""" Your solution: """<insert
model generated solution>""" Student’s solution: """<insert student's solution>"""
Analysis: """<insert model generated analysis from previous step>"""
Suppose that we are using a model to list excerpts from a source which are relevant to a particular
question. After listing each excerpt the model needs to determine if it should start writing another or if
it should stop. If the source document is large, it is common for a model to stop too early and fail to list
all relevant excerpts. In that case, better performance can often be obtained by prompting the model
with followup queries to find any excerpts it missed on previous passes.
SYSTEM You will be provided with a document delimited by triple quotes. Your task is to
select excerpts which pertain to the following question: "What significant paradigm
shifts have occurred in the history of artificial intelligence.
" Ensure that excerpts contain all relevant context needed to interpret them - in
other words don't extract small snippets that are missing important context. Provide
output in JSON format as follows:
[{"excerpt": "..."},
...
{"excerpt": "..."}]
USER """<insert document here>"""
ASSISTANT [{"excerpt": "the model writes an excerpt here"},
...