Lecture3__InsideAnAgent
Lecture3__InsideAnAgent
Inside an Agent
• Agent State
• Policy
• Value functions
• Model
Agent Environment Loop
Agent state
• Everything the agent takes with it from one time step to
another.
Agent state
• Agent state is the
information used
to determine what
happens next
• Formally, state is a
function of the
history: St = f (Ht)
History
• The history is the sequence of observations, actions,
rewards
• Football scenario
Trajectory
• Trajectory τ is a sequence of states and action
State Value function
• The state-value Vπ(s) is the expected total
reward, starting from state s and acts according
to policy π.
• Used to evaluate the goodness/badness of
states and therefore to select between actions
, π]
• Extreme cases
– γ = 0 (myopic policy)
– γ = 1 (undiscounted policy)
is updated as
| =s, A =π(s)]
to policy 𝛑.
and taking arbitrary action a then forever after act according
• Model based
– The agent has the model of the environment.
– Optionally has value function and/or policy
Thank You