0% found this document useful (0 votes)
14 views66 pages

paniit-demystifying-llms

Uploaded by

brianpeiris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views66 pages

paniit-demystifying-llms

Uploaded by

brianpeiris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Demystifying LLMs

Subbarao Kambhampati
Slides
School of Computing & AI

Talk on 5/4/2024 for


https://bit.ly/3wvT2Mp
1
1978-83 LC (Alakananda, Jamuna & Tapti)
2022: DAA
Human-AI Interaction
• We have focused on explainable human-AI
interaction.
• Our setting involves collaborative problem
solving, where the AI agents provide decision
support to the human users in the context of
explicit knowledge sequential decision-
making tasks (such as mission planning)
• In contrast, much work in social robotics and HRI
has focused on tacit knowledge tasks (thus
making explanations mostly moot)
• We assume that the AI agent either learns the
human model or has prior access to it.
• We have developed frameworks for proactive
explanations based on model reconciliation
as well as on-demand foil-based explanations
• We have demonstrated the effectiveness of
our techniques with systematic (IRB
approved) human subject studies

13
Agenda for today
• Trends in AI Technology leading to LLMs
• LLM basics—auto-regressive training of n-gram models
• LLMs as doing Approximate Retrieval
• Hallucinations—always or sometimes?
• Style vs. Content/Form vs. Factuality
• Quest to improve Factuality
• Prompt engineering; Fine-tuning; RAGging
• Data Quality; Synthetic Data
• LLMs as approximate knowledge sources
• ..and resurgence of (approximate) knowledge-based systems
• Are LLMs AI-Complete?
• Planning/Reasoning
• Societal Impacts
Trends in AI Technology-1
Trends in AI Technology-2
From Deep & Narrow to From Discriminative Classification to
Broad & Shallow Generative Imagination
• AI systems used to have deep expertise in • AI systems used to focus on
narrow domains “identification” and “classification”
• The old “expert systems”, Deep Blue for • Is this a picture of a dog? Is this an x-ray of a
Chess; Alpha Go for Go, Alpha Fold for malignant tumor? Is this a spam mail?
Protein Folding etc. • P(dog|Picture); P(tumor|x-ray); P(Spam|text)
• Recent trend is to develop systems with • Recent trend is to learn the “distribution
broad expertise. But they tend to be of the objects”
shallow in their understanding • Draw me a picture of a dog. Write me a
• Large Language Models, Diffusion Models spam mail
• Learning P(tumor,x-ray) P(Spam, text)
• (Thinking in terms of Broad & Shallow vs.
Deep & Narrow is more instructive than
talking of AGI vs AI.. )
• The implied guarantees are different.. LLMs ~ Broad & Shallow
Generative Systems
for Language (or any other sequential data)
Scope of Today’s Talk
• We will focus mostly on auto-regressively
trained Large Language models (LLMs)
• LLMs are one part of the Generative AI
paradigm of learning distribution of
language, vision and audio

• We will focus mostly on capabilities and


limitations of LLMs
• There is a lot to be said about their societal
and ethical impacts, but we won’t cover
those primarily here.
LLM
I come to leverage LLMs, not to lament them..

.. O judgment! thou art


fled to brutish beasts,
And men (& LLMs) have lost their reason. 19
21
Agenda for today
• Trends in AI Technology leading to LLMs
• LLM basics—auto-regressive training of n-gram models
• LLMs as doing Approximate Retrieval
• Hallucinations—always or sometimes?
• Style vs. Content/Form vs. Factuality
• Quest to improve Factuality
• Prompt engineering; Fine-tuning; RAGging
• Data Quality; Synthetic Data
• LLMs as approximate knowledge sources
• ..and resurgence of (approximate) knowledge-based systems
• Are LLMs AI-Complete?
• Planning/Reasoning
• Societal Impacts
LLMs are N-gram models on STEROIDS
• Text is a long sequence of words (including spaces,
punctuations)
• An n-gram model of language learns to predict n-th word given
the preceding n-1 words
• Probabilistically speaking it learns Pr(Wn |W1...Wn-1)
• Unigram predicts each word independently (no preceding context)
• Bigram predicts each word given the previous word
• A 3001-gram model learns to predict the next word given the previous
3000 words
• ChatGPT is just a 3001-gram model
• The power of an n-gram model depends on
• How much text it trains on
• How big is the n (context) and
• How high-capacity is the function learning Pr(Wn |W1...Wn-1)
• ChatGPT trains on ~600 gigabytes of text on the Web
• It learns a very high capacity function that has 175 billion parameters
• Learns Pr(Wn |W1...Wn-1) for all possible nth words Wn (Vocabulary of the
language, ~50K in English)
“All other IITs are in eternal awe of IIT Madras.”
• Each prefix of the sentence is a training • LLM uses its current function to guess the
example next word
• All ____ • All right
• All other____ • Guess: right Correct: other
• All other IITs_____
• .. • Error= {other - right}
• All other IITs are in eternal awe of IIT_____ • To the LLMs, all vocabulary tokens are just
vectors in some high dimensional
embedding space; so the difference is well
defined as the vector difference
• Propagate this error back through the
function, and change the parameters so
the error is reduced
• Using back propagation (aka Chain Rule of
derivatives with dynamic programming);
the basic workhorse of all neural networks.
GPT3 Vocabulary size; ~50K • <Go to the next example>
word embeddings dim: 12K

Training LLMS
..but the count table is Ginormous! (and is VERY
sparse)
• With an n-gram model, you need to keep track of the
conditional distributions for (n-1)-sized prefixes.
• With a vocabulary size |V| (~ 50000), there are |V|n-1
different prefixes!!
• Easy for unigram (1 prefix), bigram (|V| prefixes) and trigram (|V|2
prefixes)
• For ChatGPT’s 3001-gram model, with a 50,000 word vocabulary, we
are looking at a whopping (50000)3000 conditional distributions
• (and most entries will be zero—as the chance of seeing the same 3000-word
sequence again is vanishingly small!)
• What LLMs do is to essentially compress/approximate this
ginormous count table with a function
• That is while high capacity (176 billion weights!) is3000
still vanishingly
small compared to the ginormous count ((50000) >> 176 billion or
a trillion!)
• ..and oh by the way, the compressed function winds up having fewer
zeros
• It approximates both the non-zero counts and zero counts, so.. Transformers are a
• GENERALIZATION!!! (not particularly principled)
• In essence the function learns to “abstract” and “cluster” over parallelization of the
“similar” sequences recurrent neural networks
(graphic by James Campbell)
So ChatGPT is just completing your prompt by
repeatedly predicting the next word given the
previous 3000 words
• But, the function it learns to predict the next word is a very high capacity
one (with 175 billion parameters for ChatGPT and over a trillion for GPT4)
• This function is learned by analyzing 500 gb of text
• The learning phase is very time consuming (and is feasible only because of the
extreme computational power utilized)

• And all conversation—whether everyday or deeply philosophical—is, at


some level, completing the prompt (saying words in the context window of
other words that have already been said!)

• Thus it is that ChatGPT can “converse” with you on any subject!


• Really?
LLMs Look at everything we say as a
prompt to be completed..
Whether we think we are asking questions,
pouring our hearts, are talking to them,
LLMs just see what we say as text prompts to be completed

• Write an essay on the origins and impacts of Jim


Crow
• Write a poem on the Cow in the style of
Shakespeare.
• Why did the Silicon Valley Bank fail?
• Explain all the ways Wild Cats envy Sun Devils
• Write some TicZ code to produce a sketch of
a unicorn..

If there is “meaning” in these completions—facts, humor, pathos—it is in our heads!


But how can these
prompt completion AI as an Ersatz Natural Science
beasts generate such
coherent plausible
text that also seems
SO right sometimes?

Answer: MAGIC..!
Some possible factors:

à Almost everything we know is also already


on the web (and is fodder for LLM training)

à Completion over large (3000 word) context


windows can be more directed (low-
entropy) than we have intuitions about.
(This is not a 3-gram model completing “left
and … “)
LLMs and Approximate Retrieval

• Retrieval in Databases: Given a query (key),


retrieve the records that exactly match the
query
• Retrieval in IR systems (e.g. Google): Given a
(textual) query, retrieve all the records that are
similar to the query
• The records themselves are not modified in any way
• Approximate retrieval in LLMs: Given a (textual)
query (prompt), generate the most likely
completion
• Note that the completion is NOT guaranteed to be
one of the stored records
• This generative creativity is the boon/bane of LLMs
Hallucination and
“Approximate Retrieval”
• LLMs are n-gram models, and thus do
not index and retrieve
• All they ever do is hallucinate
completions to the prompt
• Such that the completion is in the same
distribution as the text they have been
trained on
• Prompt engineering doesn’t change
this!
• Whether or not changing the prompt
gives the ”factual completion” depends
on the prompter knowing enough to tell
whether the given answer is the accurate
one.
LLMs as Idea Generators
(“Muses”)
• “I get many ideas, and I throw away the bad
ones”
• Linus Pauling on how he managed to get TWO Nobels

45
46
47
48
Style vs. Content
Form vs. Factuality
• LLMs (and Generative AI in general)
capture the distribution of the data
they are trained on
• Style is a distributional property
• ..and LLMs are able to learn this (they
have been called the vibe machines..)
• Correctness/factuality is an instance
level property
• ..LLMs can’t guarantee this
• Civilizationally, we had always thought
style is harder than content
• And even assumed that good style implies
good content!
• LLMs (and GenAI in general) turn this
intuition on its head!
Agenda for today
• Trends in AI Technology leading to LLMs
• LLM basics—auto-regressive training of n-gram models
• LLMs as doing Approximate Retrieval
• Hallucinations—always or sometimes?
• Style vs. Content/Form vs. Factuality
• Quest to improve Factuality
• Prompt engineering; Fine-tuning; RAGging
• Data Quality; Synthetic Data
• LLMs as approximate knowledge sources
• ..and resurgence of (approximate) knowledge-based systems
• Are LLMs AI-Complete?
• Planning/Reasoning
• Societal Impacts
Standard ways to improve LLM responses
Prompting (“in-context learning”) Fine Tuning
(doesn’t change LLM parameters) (Changes LLM parameters)
• If you don’t like what an LLM is giving as an answer • Fine tune the parameters of a pre-trained LLM by
to your prompt, you can add additional prompts making it look specifically at the data of interest to
you
• The LLM will then take the new context window • Give it lots of plan sequences, so it learns better
(including what it said and what you said) to conditional distributions on predicting plans
predict the next sequence of words/tokens • Use labeled <prompt, response> pairs to make its
• Every word in the context window—including the responses “more palatable”
ones LLM added-–is changing the conditional • Use Supervised techniques or RL techniques to improve
parameters to be more consistent with the finetuning
distribution with which the next token being data
predicted. • [There is also evidence that big companies use more
• Note that all these conditional distributions have been “polished”/”annotated” data during fine tuning phase–
precomputed! including paying humans to generate data adorned
• Nothing inside LLM is changing because of your prompts with derivational information—which is often not
included in the web text]
• The undeniable attraction of “prompting” is that it
is natural for us! It is sort of how we interact with • Because fine tuning is changing the parameters of
each other! the LLM, while its performance on the specific
• There is a whole cottage industry on the “art” of good
task (be a better planner, be less offensive) may
prompting improve, it also changes its performance on other
• “How to ask LLMs nicely?”
tasks in unpredictable ways
• If you give k examples of good answers as part of the • Microsoft claims that GPT4 had more AGI sparks before
it was lobotomized with RLHF to be less offensive! 56
prompt, it is called “k-shot in-context learning”
Back-Prompting by Humans
(..and the Clever Hans peril..)
• Humans doing the verification & giving helpful
prompts to the LLM)
• Okay when the humans know the domain and can
correct the plan (with some guarantees)
• Okay for "this essay looks good enough" kind of critiquing
• But for planning, with end users not aware of the domain
physics, the plans that humans are happy with may still not
be actually executable
• When humans know the correct answer (plan) there
is also the very significant possibility of Clever Hans
effect
• Humans unwittingly/unknowingly/non-deliberately giving
important hints

57
RAGging on LLMs to Improve their Factuality
IR
• Given LLMs don’t stick to the script, you (Google)
might want to
• Send the user prompt/query to Google (or Prompt
some IR system) LLM
• Instead of old style keyword search, do search
with embeddings (“Vector DB”)
• Take the top result(s) and have the LLM
summarize them
• Or alternately, just add those results to the
context window to provide more factual
background to further queries..
• Not that different from what LLM
“search engines” Perplexity do..
• Pure LLMs to a kind of “search by
imagination..”
Impact of Training Data Type/Quality on
LLM Performance
• Quality of training data matters in improving the
quality of LLM completions
• This is why most modern LLMs are trained without 4Chan
(..and upsampling good quality sources like Wikipedia,
NYTimes etc..)
• When the data is not readily available, getting it can
be quite costly
• Web contains “correct data” but not as much “corrections
data”
• Getting derivation/thought behind the data can be quite
expensive.
• But even sticking just to purely factual data for
training still doesn’t eliminate hallucinations
• Remember LLMs are not retrieving stored documents, but
completing the prompt on the fly (approximate retrieval)
Can’t LLMs “Self-Train”
on Synthetic Data?
• If getting quality data is hard, one
alternative would be generating
“synthetic data” and training LLMs on it
• The question is how is synthetic data
generated?
• Idea 1: LLMs generate their own data, test it
for correctness, and use it in further
training
• Idea 2: Use external solvers to generate the
synthetic training data
• Let me compile an outside System 2 to my
System 1
Synthetic Data Conundrums
Solving Blocksworld: GoFAI vs LLaMAI
GOFAI LLaMAI
• Get the domain model • Get the domain model
• Get a combinatorial search planner • Get a combinatorial search planner
• Have the planner solve the problem • Make a trillion Blocksworld problems
• Make the planner solve them all
• Finetune GPT4 with the problems and solutions
• (Alternately, index the trillion solutions in a vector DB
for later RAG)
• Have the finetuned/RAG’ed GPT4 guess the
solution for the given problem
• (Ensure the correctness of the guess with an external
validator/Simulator working LLM-Modulo)
• If, by luck, it guesses right, write a NeurIPS/ICLR
paper about the effectiveness of synthetic data
Synthetic Data Conundrums
Solving Blocksworld: GoFAI vs LLaMAI
GOFAI LLaMAI
• Get the domain model • Get the domain model
• Get a combinatorial search planner • Get a combinatorial search planner
• Have the planner solve the problem • Make a trillion Blocksworld problems
• Make the planner solve them all
• Finetune GPT4 with the problems and solutions
• (Alternately, index the trillion solutions in a vector DB
for later RAG)
• Have the finetuned/RAG’ed GPT4 guess the
solution for the given problem
• (Ensure the correctness of the guess with an external
validator/Simulator working LLM-Modulo)
• If, by luck, it guesses right, write a NeurIPS/ICLR
paper about the effectiveness of synthetic data
LLMs as Approximate Knowledge Sources

64

Avenging Polanyi’s Revenge


Everybody was all against knowledge-based systems
But now everyone is effectively doing knowledge-based systems!
LLMs as Approximate Knowledge Sources

65

Avenging Polanyi’s Revenge


Everybody was all against knowledge-based systems
But now everyone is effectively doing knowledge-based systems!
Agenda for today
• Trends in AI Technology leading to LLMs
• LLM basics—auto-regressive training of n-gram models
• LLMs as doing Approximate Retrieval
• Hallucinations—always or sometimes?
• Style vs. Content/Form vs. Factuality
• Quest to improve Factuality
• Prompt engineering; Fine-tuning; RAGging
• Data Quality; Synthetic Data
• LLMs as approximate knowledge sources
• ..and resurgence of (approximate) knowledge-based systems
• Are LLMs AI-Complete?
• Planning/Reasoning
• Societal Impacts
LLMs = AI?
• Clearly LLMs (and GenAI systems)
exhibit impressive abilities in
generating text in response to
prompts
• Is there anything in AI they don’t
cover?
• Reasoning, Planning..
• If so, can we assume that those
abilities will automatically accrue as
LLMs are scaled up? (More
parameters/data)
Little a priori reason to believe that LLMs can reason/plan

71
72

LLM’s Approximate Retrieval upends our


intuitions re: their guesses
Computational Complexity of the underlying Background Knowledge is easier for LLMs
task has no bearing on LLM guesses (approximately..)
• The underlying complexity of the problem has • Much has been made in traditional AI of the
no impact on the LLM’s ability to guess the difficulty of getting relevant knowledge.
answer • Having been trained on web-scale collective
• They are just as fast in guessing answers to knowledge of humanity, LLMs are remarkably
undecidable questions as they are in guessing
answers to constant time questions better at this
• ..and in neither case do they have any guarantees
about their guess
• They are pretty good (with no guarantees—
and some brittleness) at
• Corollary: The usual problem characteristic— • Commonsense
Stochasticity, Partial Observability etc. — that • Domain knowledge
make it computationally harder don’t matter
in LLM’s ability to guess • Theory of Mind
• Analogies
• After all, they take constant time per token • (In addition, of course, to linguistic abilities
• ..and no, asking LLMs to “pause” doesn’t change such as summarization, elaboration, format
any of this!
change etc.)
Approximate retrieval of Plans

Planning

What Planning is & What LLMs are good at..


Planning (as used in common parlance) Contrasting what AI Planning & LLMs
involves bring to the table
• Planning knowledge • AI Planning (aka ICAPS planning) assumes that
• Actions, preconditions and effects the planning knowledge is given up front, and
• General Recipes: Task reduction schemata (e.g. focuses generation and verification
HTN planning) techniques
• Old examples: Case libraries • Emphasis on guaranteeing
completeness/correctness of the plans w.r.t. the
• Plan generation/verification techniques model
• Interaction analysis/resolution • By and large the common paradigm—although there
have been occasional mutinies
• Plan merging techniques • Model-Lite Planning approaches
• Plan modification techniques
• LLMs, trained as they are on everything ever
put on the web, have a kind of "approximate
LLMs accept any planning problem—even if it not omniscience". This helps them spit out
actions, recipes, or cases
expressible in PDDL standard—and they don’t give • But they lack the ability to stitch the recipes
any correctness guarantees. together to ensure that there is no actually
interaction free!
AI Planners will give formal guarantees, but only
74
accept problems expressible in their language.
75
LLM’s Can’t Plan; But they can help planning
in LLM-Modulo Frameworks
LLMs can’t plan in Autonomous Modes LLMs can support planning (and expand the
(and many claims to the contrary are range of planning tasks) in LLM-Modulo
questionable) Frameworks
• LLMs can’t do planning in • LLMs can be used in conjunction
autonomous mode with external verifiers and solvers
• CoT, Fine Tuning etc. don’t help in an LLM-Modulo framework
that much (as they don’t generalize (with the verifiers doing back
enough) prompting )
• In the LLM-Modulo framework, LLMs
• They can’t improve by self- can play multiple roles
verification (since they can’t self- • Guess plans
verify!) • Guess domain models
• Help elaborate the problem
• Having humans iteratively prompt specification
is an invitation for Clever Hans • Translate formats
effect..
Chain of Thought Prompting
• Chain of Thought prompting (CoT) has become a bit of a
religion among LLM aficionados.
• The basic idea of CoT is to give the LLM a couple of
examples showing how to solve the problem—with the
expectation that it figures out how to solve other instances
• It is clear (and pretty non-controversial) that CoT
involves giving additional task/problem specific
knowledge. The question is how general this problem
specific knowledge needs to be.
• The more general the knowledge, the easier it is
for the humans to provide it; but higher the degree
of reasoning LLM has to do to operationalize it.
• Our work (in planning) calls into question the
effectiveness of CoT
78
79
Acting vs. Planning: The Agentic LLM Goldrush
• LLMs can obviously be used to invoke external The Agentification
actions
• Think “Webservice Orchestration Frameworks”
which allow you to write your own “agents”
• LLM as the core controller of external components
• Which in turn is controlled by human prompting
• Safety issues include both safety of the outside
components and safety of the prompt-based control of
LLMs
• LLMs can’t themselves be expected to ”plan”
this orchestration!
• The actual orchestration is done with human help
(“language” programming)
• The “planning” part is basically pipelining the right
external services – and is done with human help
• One core external service they all use is “external Allowing LLMs to make their own “plans” to invoke
memory” to write into and retrieve external services would be rife with safety concerns!
• Because LLMs themselves have no memory beyond their
context window.
• Think L2/L3 rather than L5 automation.. (Think having a gun lying around in a home with a toddler..)
Weng, Lilian. (Jun 2023). LLM-powered Autonomous Agents". Lil’Log. https://lilianweng.github.io/posts/2023-06-23-agent/.
LLM’s Can’t Plan; But they can help planning
in LLM-Modulo Frameworks
LLMs can’t plan in Autonomous Modes LLMs can support planning (and expand the
(and many claims to the contrary are range of planning tasks) in LLM-Modulo
questionable) Frameworks
• LLMs can’t do planning in • LLMs can be used in conjunction
autonomous mode with external verifiers and solvers
• CoT, Fine Tuning etc. don’t help in an LLM-Modulo framework
that much (as they don’t generalize (with the verifiers doing back
enough) prompting )
• In the LLM-Modulo framework, LLMs
• They can’t improve by self- can play multiple roles
verification (since they can’t self- • Guess plans
verify!) • Guess domain models
• Help elaborate the problem
• Having humans iteratively prompt specification
is an invitation for Clever Hans • Translate formats
effect..
LLM-Modulo Frameworks for Planning

83
To be presented at ICML 2024
LLMs as Behavior Critics to catch undesirable robot behaviors
Can LLMs capture human preferences in embodied AI tasks?

• It may be intractable to construct formal


verifiers for tasks that have a wide scope.
• LLMs or VLMs can be a proxy of common
human preferences and undesirability

• We evaluated GPT-4V with videos of


diverse suboptimal behaviors
• GPT-4V critic catches 69% of undesirable
behaviors (recall rate) while only 62% of
the critiques are valid (precision rate)

• Results confirm the broadness of GPT-4V's


knowledge & the subpar precision of its
outputs
"Task Success" is not Enough: Investigating the Use of Video-
Language Models as Behavior Critics for Catching
Undesirable Agent Behaviors
Lin Guan*, Yifan Zhou*, Denis Liu, Yantian Zha, Heni Ben Amor,
Subbarao Kambhampati.
LLMs as Behavior Critics to catch undesirable robot behaviors
Can LLMs capture human preferences in embodied AI tasks?

• It may be intractable to construct formal


verifiers for tasks that have a wide scope.
• LLMs or VLMs can be a proxy of common
human preferences and undesirability

• We evaluated GPT-4V with videos of


diverse suboptimal behaviors
• GPT-4V critic catches 69% of undesirable
behaviors (recall rate) while only 62% of
the critiques are valid (precision rate)

• Results confirm the broadness of GPT-4V's


knowledge & the subpar precision of its
outputs
"Task Success" is not Enough: Investigating the Use of Video-
Language Models as Behavior Critics for Catching
Undesirable Agent Behaviors
Lin Guan*, Yifan Zhou*, Denis Liu, Yantian Zha, Heni Ben Amor,
Subbarao Kambhampati.
Doesn’t Co-Pilot for Code show that LLMs
can Plan?
• Co-Pilot has humans in the loop
• The incremental interpreters can direct
people’s attention to syntax errors
• Github and General Web are quite
different as training corpora
• People don’t put their non-working
code on github; general web has
4Chan!
Societal Angst about Generative AI
• Plagiarism
• Students writing essays with LLMs like ChatGPT
• Some stop-gap ways to detect text generated by specific LLMs exist
• But they need the buy-in from the LLM suppliers—what incentive do they have?
• Art pieces/styles being copied without consent
• “Deep Fakes”
• Eventually, it will be hard to tell whether a picture or a story is written by humans or AI
systems
• Bias
• These systems learn from our collective (unwashed) subconscious and thus get all our biases.
Getting those biases out of them would be challenging
• LLMs are our Freudian collective Id.. (System 1). They don’t have System 2.
• Existential angst..
• If they are doing well in all our exams, then what are we good for?
• May be our exams were not measuring reasoning capability to begin with
Agenda for today
• Trends in AI Technology leading to LLMs
• LLM basics—auto-regressive training of n-gram models
• LLMs as doing Approximate Retrieval
• Hallucinations—always or sometimes?
• Style vs. Content/Form vs. Factuality
• Quest to improve Factuality
• Prompt engineering; Fine-tuning; RAGging
• Data Quality; Synthetic Data
• LLMs as approximate knowledge sources
• ..and resurgence of (approximate) knowledge-based systems
• Are LLMs AI-Complete?
• Planning/Reasoning
• Societal Impacts
Summary
• Thanks to their approximate omniscience, LLMs
present us an amazing resource
• There is a temptation to confuse their
approximate retrieval capabilities for factuality
and AI-Completeness
• They can provide approximate knowledge about
almost any task/area/question
• They are ushering in a new resurgence of
(approximate) Knowledge-based AI systems
• LLMs can play many constructive roles in hybrid
reasoning systems – such as LLM-Modulo
Frameworks
@rao2z on Twitter

You might also like