0% found this document useful (0 votes)

18 views12 pages

Cognitive Mapping and Episodic Memory

Uploaded by

ucervan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views12 pages

Cognitive Mapping and Episodic Memory

Uploaded by

ucervan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Neurocomputing 595 (2024) 127812

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Cognitive mapping and episodic memory emerge from simple associative

learning rules
Ekaterina D. Gribkova 1 ,2 ,3 , ∗, Girish Chowdhary 1 ,4 ,5 , Rhanor Gillette 2 ,3 ,6
University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA

ARTICLE INFO ABSTRACT

Communicated by R. Mao Episodic memory enables animals to map contexts and environmental features in space and time but is
underused in artificial intelligence (AI). Here we show how simple associative learning rules can be expanded
Dataset link: https://github.com/Entience/ASI to basic episodic memory in AI. We augment an agent-based foraging simulation, ASIMOV, modeled on the
MOV-FAM
simple neuronal circuitry of an invertebrate forager, by adding a novel computational module for simple
Keywords: episodic memory, the Feature Association Matrix (FAM). The FAM is a set of computationally light, graph
Episodic memory learning algorithms which functionally resemble the auto- and hetero-associative circuits of the hippocampus
Cognitive mapping for episodic memory. In simulations, FAM enables highly efficient foraging and navigation and shows how
Spatial learning higher-order conditioning mechanisms give rise to spatial cognitive mapping by chaining pair-wise associations
Computational model and encoding them with additional contexts. Thus, FAM demonstrates a biologically inspired, bottom-up
Agent-based simulation
enhancement of AI for higher-order cognition.

1. Introduction reward prediction error, is uniquely associated with the reverse replay
of place cell sequences rather than forward replay [7].
Episodic memory encodes both spatial and temporal contexts of past Thus, episodic memory provides context-dependent encoding and
events and experience [1–3]. It is a major function of the vertebrate recall of spatiotemporal events. Through cognitive mapping of ob-
hippocampus that sustains exploration, learning, and exploitation of jects organized in context, episodic memory provides the associative
new environments. In the hippocampus, experience is represented as substrates for autonoetic consciousness (awareness of the past) [8]
an integrated cognitive map of associations in spatial and tempo- and divergent creative thought [9] in humans, other mammals, and
ral contexts. Events are presented as objects organized in context, birds [10]. Contemporary artificial intelligence (AI) lacks the attributes
and memory is laid down in episodic sequences of events in time– of natural intelligence in terms of episodic memory and formation of
space. In particular, the firing of ‘‘place’’ cells in the hippocampus cognitive maps to be applied and generalized across different contexts
encodes specific spatiotemporal contexts for episodic memories [3]. and environments. Most current algorithms require significant compu-
Place cells in region CA3 of the hippocampus make up a recurrent tational power and resources for training large networks in specific
auto-associative network for learning and formation of sequences (Fig. tasks, and they usually cannot generalize across different tasks. Most
S1), especially for navigation [4,5]. Assignment of salience and reward AIs lack the innate, dynamic, and predetermined neural circuitry of
value to sequences formed in CA3 may occur through region CA1 of animals. As yet there are no AI algorithms with the flexibility and
the hippocampus [6], which consists of a hetero-associative neuronal range of social and creative endeavor seen in even lesser mammals and
network that receives dopaminergic reward input from midbrain struc- insects.
tures, such as the substantia nigra compacta and ventral tegmental area. There is still the question of: ‘‘how did episodic memory emerge?’’
Notably, the processing of reward, like encoding reward magnitude or Episodic memory may have evolved independently multiple times
across the phyla in animals that exploit territories with home bases,

∗ Corresponding author.
E-mail addresses: [email protected] (E.D. Gribkova), [email protected] (G. Chowdhary), [email protected] (R. Gillette).
1
Coordinated Science Laboratory.
2
Neuroscience Program.
3
Center for Artificial Intelligence Innovation.
4
Department of Agricultural and Biological Engineering.
5
Department of Computer Science.
6
Department of Molecular and Integrative Physiology.

https://doi.org/10.1016/j.neucom.2024.127812
Received 20 October 2023; Received in revised form 4 March 2024; Accepted 5 May 2024
Available online 11 May 2024
0925-2312/© 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

Fig. 1. The Feature Association Matrix (FAM). In this example an agent encounters specific inputs as three odors and a reward. The FAM can be configured to receive a variety
of different inputs. In this example, for sensory inputs FAM notes on which side the agent senses an odor (ex: Odor 1 left, Odor 1 right, Odor 2 left, etc.). The FAM assigns
appropriate strength, order, and reward to corresponding pair-wise associations. At bottom right, FAM appears as three different grids, where each arrow in the grids is an input,
with reward input yellow, and all other arrows as sensory inputs (Odor 1 Left, Odor 1 Right, . . . , Odor 3 Left, Odor 3 Right). Each box in the grids is the association between
an input pair. Supplementary Video S1 shows the FAM forming and chaining associations in an animated version. A. Eligibility traces for the sequence of inputs encountered by
the agent. B. The agent explores the environment on the yellow dotted path, encountering specific inputs as odors and a reward. C. Matrix of association Strengths. The Strength
of association for two inputs is determined by how their eligibility traces correlate or overlap in time. Brightest boxes show strongest associations, indicating that the presented
sequence is memorized. D. Matrix of association Orders. Order of an association for two inputs is determined by which comes first and the temporal gap between them. Grid box
colors indicate relative Order between inputs: gray, blue, and yellow indicate zero Order, negative Order, and positive Order, respectively. Brighter colors indicate Order values
of higher magnitude. E. Matrix of association Rewards. The Reward of an association between two inputs is determined by how closely the input pair occurs to a Reward input.
Grid box color indicates the relative reward value for that association: brightest red for highest reward values, and darkest red for the lowest. Note that reward value assignment
occurs after the agent has encountered a reward input (see Supplementary Video S1).

have complex social networks, or engage in complex niche modifica- goal-directed artificial agent, the ASIMOV (Algorithm of Selectivity by
tions like nest-building. An early driving influence in the evolution of Incentive, Motivation and Optimized Valuation) forager [11], which is
simple episodic memory may have been selection for more efficient built with enhanced evaluative abilities onto the Cyberslug model of
exploitation of environments with distinguishable landmark features, decision-making in foraging [12]. The FAM allows the ASIMOV agent
and basic niche modification. In this study, we explore how the simplest to learn landmark features and rewarding items in its spatial environ-
classical conditioning mechanisms in a foraging simulation can be ment, reproducing a potential early phase in the evolution of simpler
expanded to sequence learning and simple episodic memory. episodic memory. In particular, the FAM shows how spatial cognitive
mapping may have evolved from a simple olfactory learning system,
2. Models of memory much like how directional olfaction has been hypothesized as the basis
for evolution of spatial cognitive mapping and hippocampal-like cir-
Presently, effective models of sequence formation follow hippocam- cuitry [13,14], and furthermore, it shows how autonoetic consciousness
pal function and are similarly complex. They use realistic spiking may have evolved from systems controlling decision-making in spatial
conductance-based neuron models to model neuronal network activities foraging [15,16].
of ‘‘theta waves’’ and ‘‘sharp wave ripples’’ in computations requiring
For all simulations presented here, the sensory inputs that the
massive computer power and time. Here we introduce a simpler, ab-
FAM receives consist of simple sensory activation by up to 5 different
stracted model that allows learning sequential presentation of features,
odors, however, the FAM can be configured to receive a variety of
as well as backwards replay of those sequences triggered by reward
different and more complex inputs. For instance, here the sensory
cues. This Feature Association Matrix (FAM) model borrows several
inputs to the FAM consider the side on which the agent senses an
principles from hippocampal function and organization, such as auto-
odor (Fig. 1, Supplementary Video S1), or can even use motor feedback
associative and hetero-associative architecture, but uses non-spiking
association units, with sensory inputs and a reward input for learning from the agent’s turn responses. Enhancement of the inputs to the FAM
associations (Fig. 1, Supplementary Video S1). in terms of internal states and sensorimotor feedback can make for
Presentation of each (sensory or reward) feature input to an asso- more intricate associations and sequence memorization, thus expanding
ciation unit activates a decaying ‘‘eligibility trace’’. During the trace’s possibilities for more complex preferences and behavior in the agent,
decay, associations between different pairs of inputs can be strength- and even building primitive bases for aesthetic creative behaviors, as
ened or weakened, and specific properties (Strength, Order, expected in the construction of nests, burrows, and other structures through
Reward) can be established for a pair-wise association. With pre- sequentially organized motor actions.
sentation of a reward input, a gradient of expected reward among Fig. 2 shows the development and operations of the FAM, starting
linked associations is established proportional to their relational and with ASIMOV’s initial learning algorithms for classical conditioning, up
temporal proximity to the reward input. Thus, the FAM enables mem- to the simple cognitive mapping algorithms of the FAM with encoding
orization and retrieval of sequences of presented sensory cues, par- of additional contexts. It also shows how the FAM uses pairwise as-
ticularly when reinforced by reward input. We joined the FAM to a sociations and their Strength, Order, Expected Reward and additional

2
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

Fig. 2. Development of FAM architecture supporting increased complexity of memory. Top: Initial ASIMOV versions used Rescorla–Wagner algorithms for classical conditioning to
establish associative strengths between input stimuli S1 , . . . , SN and Reward (R) input. Initial FAM development, with associative Strength, Order and Expected Reward calculations
enabled higher-order conditioning and sequence learning in ASIMOV’s forager, and additional context encoding in the form of a memory vector matrix enabled cognitive graphing
and spatial learning. Bottom: FAM’s use of pairwise associations, where Strength, Order, Expected Reward and additional contexts allowed the FAM to build sequences and cognitive
graph structures. In particular, building sequences requires pair-wise associations with Order, while Expected Reward assignment requires both Strength and Order. A sequence
can be further transformed into a spatial cognitive graph structure by incorporating additional distance and direction information (memory vectors).

context (memory vector), in building sequences and cognitive graph FAM uses both Hebbian-like and reinforcement-like learning to assign
structures. expected rewards, but in addition it employs simple timing dependent
plasticity similar to spike-timing-dependent plasticity and asymmetric
2.1. Related models Hebbian learning rules to determine the order in which inputs occur
in a sequence. In contrast to other network models of hippocampal
An earliest example of an auto-associative network for sequence memory [18,25,26], the FAM does not have spiking units and does not
memorization is an auto-associative correlation matrix memory (CMM) use conductance-based neuron models. While this means that the FAM
[17,18]. The CMM is based on a correlation matrix of component is less biophysically realistic, it also means that the computations are
input signals and uses Hebbian modification of connections to mem- simpler, take less time and computing memory, and may be easier to
orize patterns among binary input elements. This may be one of the expand to a much greater scale than most biophysical memory models.
simplest artificial neural networks (ANNs) using a hippocampal CA3-
3. ASIMOV-FAM simulation
like architecture as well as Hebbian learning. However, the CMM
does not learn sequences of inputs, and uses simple binary inputs.
3.1. FAM model
Unlike CMM models, the FAM does not rely on binary stimulus events
and considers the order and expected reward of each pair-wise com-
The Feature Association Matrix (FAM) memorizes and replays se-
bination of input elements. Another model mimicking hippocampal quences. For a possible set of inputs or features presented to a forager,
structure is that of Lawrence et al. [19], which has CA1 and CA3- such as sensory and reward inputs, the FAM memorizes the sequence
like mechanisms and architecture: it uses recurrent connections, spiking of presentation by changing associative strength and order between
LIF neurons, and Hebbian learning with a delay component, to form pairs of inputs. Further, the association matrix assigns expected reward
robust sequence memory. Auto-associative weights are used for pattern values to each association, allowing for replay of memorized sequences.
completion, while hetero-associative weights drive the state of the Notably, the association matrix is plastic: memorized sequences may be
system from one pattern to the next. This model shows improved unlearned if no longer presented or rewarded, and if there are enough
performance over the Hopfield model, another widely-used auto- and other presentations of competing sequences.
hetero-associative sequence-learning model, by using slower, or de- The eligibility trace of an input (Fig. 1A), like a sensory or reward
layed, hetero-associative synapses to allow the network to retain more input, is a temporary record of the occurrence of that input, which can
of a short-term memory effect. The FAM has a different and simpler last longer than the input itself. Therefore, in consuming a rewarding
architecture, operates in a simple sensory environment, and is closely item the reward input itself may be instantaneous, but the eligibility
related to place graph learning algorithms [20,21] but uses more com- trace of the reward input decays more slowly, thereby facilitating
plex learning rules to establish associations. In contrast to the model of reinforcement learning for inputs separated by short stretches of time.
Lawrence et al. [19], the Tolman–Eichenbaum Machine [22], and most The eligibility trace, 𝐸𝑖 , of an input 𝑖 can be expressed as:
other neural network-based spatial memory models, the basic units in { 1−𝑘𝐸
the FAM are primarily analogs of synapses rather than neurons [20]. 𝑘𝐸 + , if 𝑆𝑖 ≥ 𝑆𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
𝐸𝑖 𝑡+1 = 1+𝑒𝑎𝐸 ⋅𝑆𝑖 +𝑏𝐸 , (1)
Like the models of Barrera et al. [23] and Strösslin et al. [24], the 𝐸𝐷𝑒𝑐𝑎𝑦 ⋅ 𝐸𝑖𝑡 , if 𝑆𝑖 < 𝑆𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑

3
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

⎧
⎪ 𝑑𝐶 − 𝑑𝑆 ⋅ 𝑆𝑝𝑒𝑒𝑑 , if 𝐷𝑃 𝑎𝑡ℎ = 0
⎪ ∑
𝑁
where 𝐸𝐷𝑒𝑐𝑎𝑦 =⎨ 1 − 𝑑𝐸 𝑆𝑗 , if 𝐷𝑃 𝑎𝑡ℎ > 0 ,
⎪ 𝑗=1
⎪ 𝑗≠𝑖
⎩

values for constants 𝑘𝐸 , 𝑎𝐸 , 𝑏𝐸 , 𝑆𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 , 𝑑𝐶 , 𝑑𝑆 , and 𝑑𝐸 are given in

Table S1, and 𝑆𝑖 denotes the intensity of the stimulus input 𝑖. When 𝑆𝑖 is
greater than a sensory threshold for detection, 𝑆𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 , the eligibility
trace is activated, and when 𝑆𝑖 is lower than 𝑆𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 , the eligibility
trace decays with 𝐸𝐷𝑒𝑐𝑎𝑦 , which depends on path integration distance
(𝐷𝑃 𝑎𝑡ℎ ) and the movement speed of the forager (𝑆𝑝𝑒𝑒𝑑). This ensures
that the eligibility trace is maintained in the absence of other inputs
(𝑆𝑗 ) when the forager is using path integration to travel between distant
landmarks, and otherwise, the eligibility trace decays.
There are three variables in the FAM for each pair-wise combination
of input eligibility traces: Strength, Order, and Reward. Respectively,
these determine (1) the degree of correlation (overlap) between the
two inputs, (2) the order in which they are received and the relative
Fig. 3. ASIMOV’s neural network of foraging decision. Appetitive State (feeding
time interval between them, and (3) the expected reward that their network excitation) sums Incentive with motivation (Satiation) to regulate turn response
combination provides. The Strength of an association between input direction. In parallel, a somatic map of stimuli sets turn trajectory. Incentive sums
𝑖 and input 𝑗 (Fig. 1C) ultimately depends on the co-occurrence and inputs predicting nutritional value (Resource Signal) with learned positive and negative
overlap of the eligibility traces of input 𝑖 and input 𝑗 (Eqs. (2)–(4)): values of prey odor signatures (R+ and R-), and is integrated with motivation, Reward
Experience (HRC output), and input from Pain into Appetitive State (feeding network).
The FAM replaces a Rescorla–Wagner learning algorithm for establishing associations,
𝐶𝑖𝑗𝑡+1 = 𝐶𝑖𝑗𝑡 + 𝛥𝐶𝑖𝑗 , (2) and chains them to form memorized sequences with salience. FAM output contributes
to Incentive and Somatic Map calculations.

𝛥𝐶𝑖𝑗 = 𝑘𝐶 (𝐸𝑖 𝐸𝑗 − 𝐸𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 + 𝑘𝑎 ⋅ 𝑅𝑒𝑤𝑎𝑟𝑑 𝑖𝑗 )(𝐸𝑖 + 𝐸𝑗 ) (3)

+ 𝑘𝑑 (𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 − 𝐶𝑖𝑗 ) , eligibility traces, 𝐸𝑖 and 𝐸𝑗 (Eq. (6)). Further, to enhance memorization
of a rewarded sequence, an association’s 𝑀 and Order values can be
stabilized when the association has a Reward value assigned to it.
1 This ensures that a memorized sequence which previously provided
𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 = , (4)
𝑎 ⋅𝐶 𝑡 +𝑏
1 + 𝑒 𝑆 𝑖𝑗 𝑆 rewards is less easily forgotten than memorized sequences that did not.
As shown in Fig. S2, 𝑀 and Order can be used to estimate the time
where 𝑘𝐶 , 𝐸𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 , 𝑘𝑎 , 𝑘𝑑 , 𝑎𝑆 , and 𝑏𝑆 are constants, whose values are
difference between input eligibility traces.
provided in Table S1. 𝐶𝑖𝑗 is primarily a function of the two eligibility
When a reward input is presented, a Reward value is assigned to the
traces 𝐸𝑖 and 𝐸𝑗 . When both traces are active (𝐸𝑖 +𝐸𝑗 > 0), 𝐶𝑖𝑗 generally
associations of a memorized sequence, such that the association occur-
grows if they overlap significantly (𝐸𝑖 𝐸𝑗 > 𝐸𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ), and decreases
ring closest in sequence to the reward is assigned the highest reward
when there is not enough overlap. While 𝐶𝑖𝑗 does not have a strict
value. This value represents the expected reward of an association.
upper bound, 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 is simply a logistic function of 𝐶𝑖𝑗 that is
Thus, a range of Reward values is established across the associations
bounded between 0 and 1. Furthermore, 𝐶𝑖𝑗 dynamics slightly depend of a memorized sequence, indicating each association’s ‘‘proximity’’ to
on the association’s Reward and Strength values (Eq. (3)), in such a way the reward input in the memorized sequence (Fig. 1E). Reward value
that 𝐶𝑖𝑗 slowly decays to 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 to prevent fast unbounded growth, is assigned using the previously calculated Order and Strength values
and high 𝑅𝑒𝑤𝑎𝑟𝑑𝑖𝑗 effectively lowers 𝐶𝑖𝑗 ’s trace overlap threshold and of an association. If a reward input is part of the association, then the
therefore helps stabilize 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 at a maximum of 1. Thus, 𝐶𝑖𝑗 , and Reward value is assigned as 𝑅𝑒𝑤𝑎𝑟𝑑𝑖𝑗 = 𝑘𝑅 ⋅ 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 , where 𝑘𝑅 is a
consequently associative 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 , serve as indicators of traces’ 𝐸𝑖 and constant (Table S1). For each association where neither input directly
𝐸𝑗 co-occurrence and history of overlap. provides reward, the Reward value is calculated as follows:
The Order of an association between inputs 𝑖 and 𝑗 (Fig. 1D)
⎧
indicates the temporal sequence in which the inputs arrive relative to
⎪ ∑
𝑁
𝑘𝑆 ⋅ 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 ⋅ 𝑅𝑒𝑤𝑎𝑟𝑑 𝑙𝑗 , if 𝑂𝑟𝑑𝑒𝑟𝑖𝑗 ≥ 0
each other (Eqs. (5)–(6)). For instance, if input 𝑗 follows input 𝑖, Order ⎪ 𝑙=1
will increase towards a positive value, and if input 𝑗 precedes input 𝑖, 𝑅𝑒𝑤𝑎𝑟𝑑 𝑖𝑗 = ⎨ (8)
⎪ ∑𝑁
Order will decrease towards a negative value. If inputs i and j occur at 𝑘𝑆 ⋅ 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 ⋅ 𝑅𝑒𝑤𝑎𝑟𝑑 𝑖𝑙 , if 𝑂𝑟𝑑𝑒𝑟𝑖𝑗 < 0 ,
⎪
the same time, Order will be close to zero. Order is simply a scaling of ⎩ 𝑙=1

the variable 𝑀 (Eq. (7)), calculated based on the difference between where 𝑁 is the total number of different inputs in the FAM, and
the eligibility traces of input 𝑗 and input 𝑖 (Eqs. (5)–(7)): 𝑘𝑆 is a constant between 0 and 1 (Table S1). The association’s own
Strength is multiplied by the summed Rewards of all associations that
𝑀𝑖𝑗𝑡+1 = 𝑀𝑖𝑗𝑡 + 𝛥𝑀𝑖𝑗 , (5) occur right after, as indicated by Orders (Eq. (8)). For example, in a
memorized sequence of inputs 𝑖, 𝑗, 𝑘, let 𝐴𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑜𝑛𝑗𝑘 be the pair-wise
𝑘𝑀 (𝐸𝑖 + 𝐸𝑗 )((𝐸𝑗 − 𝐸𝑖 ) − 𝑀𝑖𝑗𝑡 ) association between inputs 𝑗 and 𝑘, with a Reward value of 𝑅𝑗𝑘 . If it
𝛥𝑀𝑖𝑗 = ⋅𝑅𝑒𝑤𝑎𝑟𝑑 𝑖𝑗 +𝑏𝑀
, (6) is the only association that occurs right after 𝐴𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑜𝑛𝑖𝑗 , as indicated
1 + 𝑒𝑎𝑀
by 𝑂𝑟𝑑𝑒𝑟𝑖𝑗 > 0 and 𝑂𝑟𝑑𝑒𝑟𝑗𝑘 > 0, then 𝐴𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑜𝑛𝑖𝑗 will simply have the
𝑂𝑟𝑑𝑒𝑟𝑖𝑗 = 𝑘𝑂 ⋅ 𝑀𝑖𝑗𝑡 , (7) assigned Reward value 𝑅𝑖𝑗 = 𝑘𝑆 ⋅ 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑖𝑗 ⋅ 𝑅𝑗𝑘 .
As the agent explores its environment (Fig. 1B, Supplementary
where 𝑘𝑀 , 𝑎𝑀 , 𝑏𝑀 , and 𝑘0 are constants (Table S1). To ensure that Video S1) it receives sensory inputs and uses the FAM to calculate the
𝑂𝑟𝑑𝑒𝑟𝑖𝑗 and 𝑀𝑖𝑗 change only when at least one of the eligibility Strength and Order for each pairwise combination of these inputs, effec-
traces is active, the calculation also depends on the sum of the two tively memorizing sequences of associations. Memorized sequences are

4
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

plastic and can easily change unless the agent receives a reward input, of logistic functions, over all odors 𝑂, for both real and imagined
as from consuming a rewarding prey item. A reward input stabilizes sensory activations (𝑆 variables) and rewards (𝑅 variables):
the memorized sequences leading to the reward, such that the Strength ∑ (1 + 𝑅𝑂𝑟𝑒𝑎𝑙 )(𝑆𝑂𝐿𝑟𝑒𝑎𝑙 − 𝑆𝑂𝑅𝑟𝑒𝑎𝑙 )
and Order of associations in the sequence are less easily changed. Essen- 𝑆𝑜𝑚𝑎𝑡𝑖𝑐𝑀𝑎𝑝 = − (10)
tially, the Strengths and Orders of associations in the FAM determine a 𝑂 1 + 𝑒−𝑎𝐹 ⋅𝑂𝐹
( )
memorized sequence, and the Rewards of these associations determine ∑ (1 + 𝑅𝑂𝑖𝑚𝑔 )(𝑆𝑂𝐿𝑖𝑚𝑔 − 𝑆𝑂𝑅𝑖𝑚𝑔 ) 𝑅𝑃 𝑆𝑃 𝐿 − 𝑆𝑃 𝑅
− 𝑘𝑔 − ,
the stability and salience of the memorized sequence. Sequences that 1 + 𝑒−𝑎𝐹 ⋅𝑂𝐹 1 + 𝑒−𝑎𝐹 ⋅𝑃
𝑂
do not lead to reward are easily changed and extinguished.
where
| |
3.2. ASIMOV simulation environment 𝑂𝐹 = ||𝑅𝑂𝑟𝑒𝑎𝑙 || ⋅ 𝑆𝑂𝑟𝑒𝑎𝑙 + |𝑅𝑂𝑖𝑚𝑔 | ⋅ 𝑆𝑂𝑖𝑚𝑔 (11)
( | | )
∑
The FAM implements an associative memory system in the agent- − |𝑅𝐾𝑟𝑙 | ⋅ 𝑆𝐾𝑟𝑒𝑎𝑙 + |𝑅𝐾𝑖𝑚𝑔 || ⋅ 𝑆𝐾𝑖𝑚𝑔 − |𝑅𝑃 | ⋅ 𝑃 ,
|
| | | | | |
based model, ASIMOV [11], replacing an earlier Rescorla–Wagner 𝐾≠𝑂
learning algorithm (Fig. 3). In particular, through a series of step-
wise modifications to the ASIMOV agent’s circuitry, the FAM enables 𝑃 = 𝑃 𝑎𝑖𝑛 , (12)
more complex incentive calculations, path integration and encoding,
and recall of memories as search images in the agent’s somatic map. and constant 𝑘𝑔 is provided in Table S1. The variable subscripts indicate
Inputs to ASIMOV’s forager and the FAM include different odor whether the odor is actually being sensed (‘‘real’’) or if it is part of an
inputs, each of which can activate a left or right sensor, and a re- imagined search image (‘‘img’’), and they specify whether it is left (‘‘L’’)
ward input. ASIMOV’s Incentive variable, representing the incentive or right (‘‘R’’) sensor activation. The addition of Incentives to Somatic
potential of a stimulus, integrates sensory information with innate and Map reduces ‘‘misattribution’’ errors, where sensing two different odors
learned valences. In our initial version, we expressed Incentive using together may cause over-valuation of an odor that in fact has less
the learned Order and Reward values of FAM, and stimulus input incentive value by itself. If the agent veers off the path of a memory
intensities as follows: vector, a correction vector is easily calculated with path integration and
simple vector algebra, and used for the imagined sensory activation to
𝑘𝐼
𝑁 ∑
∑ 𝑁
(
𝐼𝑛𝑐𝑒𝑛𝑡𝑖𝑣𝑒 = ⋅ 𝑅𝑒𝑤𝑎𝑟𝑑 𝑖𝑗 ⋅ (𝑆𝑖𝐴 + 𝑆𝑗𝐵 ) (9) guide the agent back towards the next odor landmark. With significant
𝑁 2 𝑖=1 𝑗=1 deviation from the memory vector, the agent can split the memory
( ) ( )) vector into a pair of detour and correction vectors and encode them into
⋅ 𝑇𝐼 − 𝑂𝑟𝑑𝑒𝑟𝑖𝑗 ⋅ 𝑆𝑖𝐴 ⋅ 𝑇𝐼 + 𝑂𝑟𝑑𝑒𝑟𝑖𝑗 ⋅ 𝑆𝑗𝐵
the FAM. This is particularly useful when learning to navigate simple
obstacles. ASIMOV’s Satiation variable was fixed at values of 0.01 and
for A = L , R 0.10 for Temporal Sequence Learning and Spatial Learning simula-
and B = L , R , tions, respectively, to test how the FAM’s learning algorithms affect
the agent’s approach/avoidance and foraging behavior. All simulations
where 𝑘𝐼 and 𝑇𝐼 are constants (Table S1), 𝑁 is the total number of were coded in NetLogo [29], with the extensions for Matrix, CSV,
different inputs in the FAM. The variable subscripts for stimulus input and Qlearning. The NetLogo code for the ASIMOV-FAM simulations is
(‘‘A’’ and ‘‘B’’) can take left (‘‘L’’) or right (‘‘R’’) sensor activations, such available at https://github.com/Entience/ASIMOV-FAM.
that Eq. (9) sums over all four combinations of left and right activation
for stimulus inputs 𝑆𝑖 and 𝑆𝑗 (ex: 𝑆𝑖𝐿 and 𝑆𝑗𝐿 , 𝑆𝑖𝐿 and 𝑆𝑗𝑅 , . . . ). After 4. Results
the ASIMOV’s agent learns a sequence and Reward values, Incentive
values allow it to traverse a learned overlapping sequence. For instance, 4.1. Higher-order conditioning
if the agent encounters the first stimulus in a learned overlapping
sequence, Incentive is initially high and promotes approach. As the The FAM enables artificial agents like ASIMOV to use second-
agent approaches an overlap of the first and second stimuli of the order conditioning mechanisms [30,31]. In second-order conditioning
sequence, Incentive becomes negative, causing the agent to move away a neutral stimulus pairs with an already conditioned stimulus to elicit
from the first encountered stimulus and towards the next one. the same conditioned response. Second-order conditioning is notably
impaired by hippocampal lesions [32].
3.3. Path integration We explore ASIMOV-FAM higher-order conditioning in the experi-
ment mode, Temporal Sequence Learning, where the forager is immo-
To navigate non-overlapping spatial sequences, path integration bilized but retains free turning responses for approach-avoidance, its
was used to calculate overall direction and distance traveled between satiation is fixed at 0.1, and temporal sequences of odors and reward
pairs of landmarks. On every tick, the heading vector, with length 1, are presented to it. Temporal Sequence 1 consists of Odor 1, Odor 2,
was calculated. As the agent traveled between landmarks, all heading and Odor 3, presented in two different phases: Learning Phase and
vectors were summed to produce a memory vector. When the agent Post-Learning Phase. (Fig. 4A).
reached the next landmark, this memory vector was then assigned In the Learning Phase presentations, there is no temporal overlap
to the corresponding pair-wise landmark association, and the path of odor stimuli, but there is overlap in the odors’ eligibility traces
integrator was reset, much as in mammalian navigation [27,28]. This (Fig. 4Ai, Learning Phase Odor and Reward Sequence). Thus, the
allows the FAM to build a cognitive graph of its spatial environment forager establishes Strengths and Orders of associations using the over-
(Fig. 2). With the existence of a memory vector for some odor pair, lapping eligibility traces, memorizes the sequence by chaining the
a corresponding search image can be generated to provide the agent pairwise associations, and establishes expected Reward values during
with ‘‘imagined’’ left and right sensor activation to guide its navigation the reward input. Notably, the forager does not turn towards the odor
from one odor landmark to the next. Basically, the difference in left sources during these presentations (Fig. 4Aii), as Reward values are
and right imagined sensor activation should orient the agent onto not established until the very end of the Learning Phase. In the second
the path of the memory vector. This allows the agent to attempt phase of presentations (Fig. 4Ai, Post-Learning Phase Odor Sequence),
shortcuts between odor landmarks it has visited before. Somatic Map there is no overlap in eligibility traces. With each Post-Learning Phase
was modified to include this imagined sensory activation, as well as presentation, the forager turns towards the odor source (Fig. 4Aii) with
Incentive. Calculations of the Somatic Map variable involve summation a slight oscillatory motion, which is due to its satiation level (fixed

5
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

Fig. 4. FAM mediates higher-order associative learning of temporal sequences in ASIMOV’s forager. A. In Presentation Mode where the forager is immobilized but freely turns,
a temporal sequence of odors is presented: Odor 1, Odor 2, Odor 3, followed by a reward. Ai. During the learning phase there is no temporal overlap in presentation of odors,
but there is overlap in eligibility traces (ETs). In a second set of the same odor presentations (post-learning sequence), where the eligibility traces do not overlap, Incentive is
increased due to the learned associations. Aii. Turning responses corresponding to odor presentations in (Ai). In the initial learning phase the forager does not turn towards the
presented odors. In the second presented set the forager turns towards the presented odor sources due to the learned associations. Aiii. Visualization of FAM expected rewards
at specific time points is indicated by the blue arrows and corresponding turning responses in (Aii). B. As in (A), the forager forms associations and increases stimulus-specific
Incentive values as it encounters a temporal sequence of odors. Bi. In this case the stimuli are presented as closely occurring pairs during the first 600 ticks: Odor 3 then reward
input, Odor 1 then Odor 3, and Odor 2 then Odor 1. Note that during this period there is no temporal overlap in presentation of odors, but there is overlap in eligibility traces
(ETs). Bii. Turning responses corresponding to odor presentations in (Bi). After encountering all pairs of inputs, the forager shows appetitive turns towards all the odors. Biii.
Visualization of FAM expected rewards at specific time points is indicated by the blue arrows and corresponding turning responses in (Bii).

at 0.1). At lower satiation levels, the turn towards the odor source from the odor source. Notably, each appetitive turn towards the odor
becomes smoother and less oscillatory, while at higher satiation levels, source is due to Incentive (Fig. 4Ai dark blue trace), which is increased
the oscillatory motion can increase and the forager may even turn away because of the associations learned during first phase presentations.

6
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

Fig. 5. ASIMOV’s forager with FAM, showing spatial learning and navigation of overlapping sequences. In a virtual environment, the forager encounters a sequence of odor sources,
Odor 1, Odor 2, and Odor 3 for 𝑡 = 0 to 240 ticks. The purple lines indicate the forager’s previous movement. At 𝑡 =∼240 ticks, the forager encounters Odor 3 Source and receives
a reward input, establishing Reward values in its FAM. After 200 ticks, the forager comes exclusively in direct contact with the Odor 3 Source, the only one that provided reward
throughout the simulation.

Because Incentive calculation depends on odors’ expected rewards as Odor 3, receiving reward input only when it encountered the source
well as sensory activation, it likewise has a slightly oscillatory increase of Odor 3 at around 200 ticks. Strengths and Orders of the FAM were
during the odor presentations, due to the forager’s oscillatory turning established as the forager encountered the odor sources from 𝑡 = 0 to
motions. While Fig. 4A does not explicitly show the typical second- 240 ticks. FAM Reward values were established only when the forager
order conditioning paradigm, the simulation of Fig. 4B does, using a received reward input at ∼240 ticks. At later times, the FAM showed
different temporal sequence (Temporal Sequence 2). Here, the forager almost no change in Strength, Order, or Reward values. Notably, after
shows similar increases in Incentive and appetitive turning behavior first encountering the odor sequence, the forager no longer had direct
with stimuli presented in closely occurring pairs during the Learning contact with Odor 1 and Odor 2 sources and went exclusively for the
Phase: Odor 3 then reward input, Odor 1 then Odor 3, and Odor 2 Odor 3 source. This was because only the Odor 3 source provided
then Odor 1. In this case, even though the reward input is given early reward throughout the simulation. As seen after 𝑡 = 240 ticks (Fig. 5),
on, after the forager’s first encounter with Odor 1, it can still assign the forager often stayed close to Odor 1 and Odor 2 sources, navigating
expected rewards to subsequently formed associations (Fig. 4Biii), due along the edges of the odors in either direction, until it reached Odor
to Strength and Order calculations. In the Post-Learning Phase, Odor 2, 3 source. Because the forager here has no knowledge of distance or
Odor 1, and Odor 3 are presented without overlap of eligibility traces, direction between landmarks, it must instead use its simple knowledge
eliciting an appetitive turn in every case (Fig. 4Bii). Notably, Incentive of overlaps between landmarks (Strength, Order) to search around a
values decrease with repeated presentations of an odor (Fig. 4Bi) landmark for the next one in the sequence. For instance, as shown
because after the first 200 ticks, all odors are repeatedly presented ∼570 ticks, the forager approached the Odor 1 landmark, circling
without reward, thus decreasing expected reward values as seen in the around its edge. When it encountered the Odor 2 landmark, it was
much fainter FAM expected reward values for 𝑡 = 1254 ticks (Fig. 4Biii), repelled by Odor 1 and attracted by Odor 2, progressing further in its
which affects Incentive calculations. Thus, the forager comes to ignore memorized sequence until it reached the rewarded Odor 3 landmark.
stimuli no longer rewarded. This shows that with the FAM, the forager can learn a sequence of
stimuli in the environment and use the learned associations to navigate
4.2. Spatial sequence learning the environment in ways that provide more reward.

To explore how the FAM affects the agent’s foraging and naviga- 4.3. Spatial map learning
tion behavior, we set up a virtual environment with three different
overlapping odor sources (Fig. 5). In the simulation, ASIMOV’s forager The previous simulations show how FAM’s sequence memoriza-
initially encountered the sequence of odor sources, Odor 1, Odor 2, tion allows learning and traversal of overlapping, unique landmarks

7
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

Fig. 6. ASIMOV–FAM simulations demonstrating simple cognitive mapping in spatial navigation. Brown traces show the paths traveled by the agent, ‘‘CyberOctopus’’. The addition
of path integration allows learning and traversal of a map of non-overlapping, unique landmarks, as the basis for cognitive mapping. A. After the agent encounters all 3 landmarks
in the environment, the learned map is used to make shortcuts between landmarks (t > 800 ticks), and for orienting correctly even when there is a forced detour from previous
paths (Perturbation and Correction, t > 7000 ticks). B. Visualizing the agent’s spatial map construction in an environment with 5 landmarks. With path integration, it learns
shortcuts (drawn as memory vectors) between pairs of visited landmarks. When the agent encounters the yellow rewarded odor source (t =∼690 ticks), it chains the associations
it has formed to include previously formed memory vectors. This builds a full map of shortcuts, even between unvisited pairs of landmarks, using simple vector algebra.

(Fig. 5). However, this does not allow the agent to travel the gaps Similar calculations occur in the mammalian hippocampal formation,
when landmarks are far apart. For this we added path integration to particularly with grid cells supporting vector calculations, and vector
encode distance and direction (memory vectors) between landmarks trace cells forming a code for vector memory, which may even support
in the FAM, allowing the agent to make shortcuts between pairs of abstract cognitive mapping for path planning [33]. A full map of
odor landmarks it previously visited (Fig. 6A). Even with a forced memory vectors can be constructed by chaining pair-wise associations,
deviation from its original path, the agent is still able to correctly re- much as expected reward values are assigned along the chain. Fig. 6B
orient and make a shortcut to the next landmark (𝑡 > 7000 ticks). shows this process. Encoded memory vectors are drawn between each

8
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

Fig. 7. Visualizing the agent’s obstacle learning and avoidance. The agent begins with a memory vector already established (yellow vector), which guides its movement from the
light blue odor landmark to the yellow one. It encounters a wall, around which it navigates with the help of the memory vector. Detour and correction vectors are shown in
light blue, as the agent is made to deviate from its memory vector. Once the agent is back on the path of the correction vector (t > 200 ticks), the pair of vectors (detour and
correction) are memorized into the FAM for future use.

pair of landmarks that the agent visits (𝑡 < 690 ticks). When reward distance and direction between stimuli (Fig. 2). Thus, AI can be en-
is encountered (𝑡 =∼690 ticks), a full map of memory vectors is built hanced for higher order cognition in a biologically inspired bottom-up
through pair-wise vector additions, much like how expected rewards approach, starting from the simple neuronal circuitry of an invertebrate
are assigned back along the chained pair-wise associations. In a more forager.
complex environment with obstacles, it is important for the agent to
learn how to navigate around them. Thus, we enabled the agent to 5.1. Neural circuit analog
encode not just a single vector between landmarks, but also simple
sequences of vectors for the odor landmark pair. In Fig. 7 for instance, The FAM abstracts physiological circuits that learn sequences of
given an already established memory vector (yellow vector), if the sensory cues, like those responsible for place field coding in the hip-
agent encounters an obstacle, it attempts to memorize a simple path pocampus (Fig. S1). Each pair-wise association of the matrix is analo-
to avoid it for faster navigation in the future. Specifically, it splits its gous to the synapses of place cells in region CA3 of the hippocampus.
memory vector into a detour vector and correction vector (light blue Similarly to hippocampal processing of reward through reverse replay
vectors) and encodes them into the FAM as it reaches the yellow odor of sequences [7], the FAM assigns reward values through a reverse
landmark. This type of vector sequence memorization can be crucial traversal of a learned sequence using previously established Strength
for navigating maze-like environments. and Order values. Thus, the FAM is an artificial neural network-based
memory model with unsupervised and reward-based learning rules. A
neuronal circuit analog of the Feature Association Matrix, two sensory
5. Discussion
inputs (Odor 1, Odor 2) and one reward input (Reward 1) can be
used as a simple example of association matrix function (Fig. 8). In
The FAM is a simple model for memorizing and replaying sequences Fig. 8, before training, all synapses or associations are weak. O1 , O2 ,
that shows how mechanisms for higher-order conditioning can be and R1 represent neurons receiving input Odor 1, Odor 2, and Reward
expanded to simple cognitive mapping. Specifically, the FAM forms 1, respectively. During training, inputs are presented in the sequence
pair-wise associations, encoding associative Strength and Order, and Odor 1, Odor 2, Reward 1, such that their eligibility traces overlap. The
chains them together to establish a memorized sequence of inputs. synapse from neuron O1 to O2 strengthens, followed by strengthening
Sequence stability and salience are established during reward input, of the synapse from O2 to R1 neuron. The plasticity mechanism by
using the previously learned chain of associations to assign expected which synapses strengthen resembles spike timing dependent plasticity
Reward values for each input pair. This simple model, incorporated into (STDP); this is expressed in the code through the association’s Strength
the ASIMOV agent, enables higher-order associative learning (Fig. 4), and Order variables. Thus, after training, the colored synapses SO2,O1
as well as spatial sequence learning and navigation (Fig. 5). Encoding and SR1,O2 represent the associations that are formed. The neurons that
additional context into the FAM’s pair-wise associations, in particular receive reward (like R1 ) heterosynaptically facilitate these synapses.
the distances and directions from path integration, allows the agent to In this case, positive feedback strengthens the modulatory synapses
build a spatial map of shortcuts and learn to navigate obstacles (Fig. 6– of neuron R1 . If the reward neuron reactivates synapses that have
Fig. 7). This is especially useful for learning to navigate environments already been strengthened (SO2,O1 and SR1,O2 ), it essentially replays
with sparse sensory input [34]. Notably, developing the FAM for more the memorized sequence. This replay can then strengthen the reward
complex memory simply involves encoding additional contexts, here neuron’s synapses, M1 and M2 , based on ‘‘shortest path’’, such that M2

9
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

Fig. 8. A neuronal circuit analog of the FAM. In this example there are two sensory inputs, Odor 1 and Odor 2, and one reward input, Reward 1. Before training, no associations
are formed yet (gray connections), and no reward values have been assigned yet (light orange connections). In training, a sequence of three inputs is presented as Odor 1, Odor
2, Reward 1. Sequential overlap of the eligibility of these three inputs increases the strengths of the associations SO2,O1 and SR1,O2 (dark orange and solid blue connections) after
training. Following reward input there is also an assignment of reward value to the associations SO2,O1 and SR1,O2 through the strengthening of heterosynaptic connections M1 and
M2.

strengthens more than M1 . This establishes a reversible sequence via the learning process for 𝜖-greedy Q-learning (with initial 𝜖 = 0.5,
the reward neuron, reflected in the code in assignment of a reward 𝜖 decay = 0.08, learning rate = 0.99, and discount factor = 0.75),
value to each association. The FAM’s path integration and memory with no convergence for Q-learning after 5.6 × 105 simulation ticks
vector encoding resembles calculations in the mammalian hippocampal (∼700 trials). In contrast, our 3-landmark simulation with ASIMOV-
formation [35], with grid cells supporting vector calculations, and FAM shows noticeable trajectory convergence after ∼1250 ticks, after
vector trace cells forming a code for vector memory in spatial learning encountering the spatial sequence just once (Fig. 6 A). We also tested
and likely extending to abstract cognitive mapping [33]. The FAM is a toy example of a 2-landmark spatial environment with the same
analogous to the cognitive graphing hypothesis of spatial memory [27, sensory information available (Fig. S3 bottom panels), showing that
36–38], where spatial exploration generates a place graph encoded convergence is possible in a much smaller state space, in a shorter
with local metric (distance and direction) information, which can also time-span (380 trials∕simulation ticks).
flexibly support learning of non-Euclidean virtual environments. The One of the primary reasons that ASIMOV-FAM is able to do one-shot
FAM essentially represents the adjacency matrices of these cognitive learning on an environment such as this is because it is implemented
graphs. with biologically-inspired pre-wired circuitry which directly ties sen-
sory and incentive inputs to motor output. Basically, ASIMOV-FAM only
5.2. Comparisons of model performance needs to learn and establish the Incentive values, thereby significantly
reducing the complexity of the problem and enabling simple one-shot
It is difficult to draw exact comparisons of performance with other learning. It is also able to deal with continuous state–action spaces,
models, due to the significant differences in sensory environments, much like Actor–Critic (AC) reinforcement learning. Implementing AC
the programming languages used, model size differences, as well as algorithms or additional RL algorithms in our simulation for compar-
different training paradigms. Unlike in the majority of hippocampal ison is beyond the scope of this current study, as that would require
model simulations [39], our environment is not a visual one but rather either implementing them from the ground up in NetLogo, or switching
olfactory and our model uses much simpler non-spiking units. Some of to a different programming language (such as Python) to implement
the same sensory environments, path integration system, and visual
the closest comparisons consist of foraging simulations with a focus on
representation as ASIMOV-FAM. We also note that most RL algorithms,
path integration [40,41]. For instance, Goldschmidt et al. [40] present
including AC methods, will require extensive training, and have signif-
a computational model combining path integration and acquisition of
icant difficulties with sparse reward feedback [45,46], which is not the
vector memories in an insect-inspired simulation of foraging. In this
case in our simulation.
model the homing trajectories look fairly close to that of our agent’s
learned shortcut trajectories, and similarly, they are learned relatively
CRediT authorship contribution statement
quickly (<15 trials). However, unlike Goldschmidt et al. and other path
integration-based foraging models, our ASIMOV-FAM model explores Ekaterina D. Gribkova: Conceptualization, Formal analysis, In-
spatial learning in the broader context of sequence learning and the vestigation, Methodology, Software, Visualization, Writing – original
evolution of episodic memory. draft, Writing – review & editing. Girish Chowdhary: Funding acqui-
Reinforcement learning (RL) algorithms also bear some relation sition, Writing – review & editing. Rhanor Gillette: Conceptualiza-
to our FAM algorithm for the aspect of behavioral sequence learn- tion, Formal analysis, Funding acquisition, Investigation, Methodology,
ing. However, RL algorithms typically require extensive training, well- Software, Visualization, Writing – original draft, Writing – review &
defined reward functions, and have difficulties with one-shot learning editing.
and changing environments [42,43]. For a simple comparison, we
implemented a Q-learning algorithm in NetLogo using the same 3- Declaration of competing interest
landmark environment as for ASIMOV-FAM, providing discretized odor
and path integration information for state definitions [44]. We note The authors declare that they have no known competing finan-
that fine-tuning of the reward function was required. In Supplemen- cial interests or personal relationships that could have appeared to
tary Information Fig. S3 (top panels), we provide trajectories during influence the work reported in this paper.

10
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

Data availability [22] J.C. Whittington, T.H. Muller, S. Mark, G. Chen, C. Barry, N. Burgess, T.E.
Behrens, The Tolman-Eichenbaum machine: Unifying space and relational mem-
ory through generalization in the hippocampal formation, Cell 183 (5) (2020)
The NetLogo code for the ASIMOV-FAM simulation is available at 1249–1263. e23.
https://github.com/Entience/ASIMOV-FAM. [23] A. Barrera, A. Cáceres, A. Weitzenfeld, V. Ramirez-Amaya, Comparative exper-
imental studies on spatial memory and learning in rats and robots, J. Intell.
Acknowledgments Robot. Syst. 63 (3–4) (2011) 361–397.
[24] T. Strösslin, D. Sheynikhovich, R. Chavarriaga, W. Gerstner, Robust self-
localisation and navigation based on hippocampal place cells, Neural Netw. 18
Funding (9) (2005) 1125–1140.
[25] S. Kunec, M.E. Hasselmo, N. Kopell, Encoding and retrieval in the CA3 region
This work was supported by the Office of Naval Research, United of the hippocampus: A model of theta-phase separation, J. Neurophysiol. 94 (1)
States [grant number N00014-19-1-2373]. (2005) 70–82.
[26] G. Buzsáki, Two-stage model of memory trace formation: A role for ‘‘noisy’’ brain
states, Neuroscience 31 (3) (1989) 551–570.
Appendix A. Supplementary data [27] W.H. Warren, Non-euclidean navigation, J. Exp. Biol. 222 (2019).
[28] A.S. Etienne, R. Maurer, V. Boulens, A. Levy, T. Rowe, Resetting the path
Supplementary material related to this article can be found online integrator: A basic condition for route-based navigation, J. Exp. Biol. 207 (9)
(2004) 1491–1508.
at https://doi.org/10.1016/j.neucom.2024.127812.
[29] U. Wilensky, NetLogo. Evanston, IL: Center for connected learning and
computer-based modeling, Northwestern University, 1999.
References [30] R.D. Hawkins, W. Greene, E.R. Kandel, Classical conditioning, differential con-
ditioning, and second-order conditioning of the Aplysia gill-withdrawal reflex in
[1] B. Subagdja, A.H. Tan, Neural modeling of sequential inferences and learning a simplified mantle organ preparation, Behav. Neurosci. 112 (3) (1998) 636.
over episodic memory, Neurocomputing 161 (2015) 229–242. [31] W. Brogden, Higher order conditioning, Am. J. Psychol. (1939) 579–591.
[2] S.J. Gershman, N.D. Daw, Reinforcement learning and episodic memory in [32] A. Gilboa, M. Sekeres, M. Moscovitch, G. Winocur, Higher-order conditioning is
humans and animals: An integrative framework, Annu. Rev. Psychol. 68 (2017) impaired by hippocampal lesions, Curr. Biol. 24 (18) (2014) 2202–2207.
101–128. [33] S. Poulter, S.A. Lee, J. Dachtler, T.J. Wills, C. Lever, Vector trace cells in
[3] D.M. Smith, S.J. Mizumori, Hippocampal place cells, context, and episodic the subiculum of the hippocampal formation, Nature Neurosci. 24 (2) (2021)
memory, Hippocampus 16 (9) (2006) 716–729. 266–275.
[4] N. Van Strien, N. Cappaert, M. Witter, The anatomy of memory: An interactive [34] K. Cheng, P. Schultheiss, S. Schwarz, A. Wystrach, R. Wehner, Beginnings of a
overview of the parahippocampal–hippocampal network, Nat. Rev. Neurosci. 10 synthetic approach to desert ant navigation, Behav. Processes 102 (2014) 51–61.
(4) (2009) 272.
[35] B.L. McNaughton, F.P. Battaglia, O. Jensen, E.I. Moser, M.B. Moser, Path
[5] M.E. Hasselmo, E. Schnell, E. Barkai, Dynamics of learning and recall at integration and the neural basis of the’cognitive map’, Nat. Rev. Neurosci. 7
excitatory recurrent synapses and cholinergic modulation in rat hippocampal (8) (2006) 663–678.
region CA3, J. Neurosci. 15 (7) (1995) 5249–5262.
[36] W.H. Warren, D.B. Rothman, B.H. Schnapp, J.D. Ericson, Wormholes in virtual
[6] M.W. Jung, H. Lee, Y. Jeong, J.W. Lee, I. Lee, Remembering rewarding futures:
space: From cognitive maps to cognitive graphs, Cognition 166 (2017) 152–163.
A simulation-selection model of the hippocampus, Hippocampus 28 (12) (2018)
[37] J.D. Ericson, W.H. Warren, Probing the invariant structure of spatial knowledge:
913–930.
Support for the cognitive graph hypothesis, Cognition 200 (2020) 104276.
[7] R.E. Ambrose, B.E. Pfeiffer, D.J. Foster, Reverse replay of hippocampal place cells
is uniquely modulated by changing reward, Neuron 91 (5) (2016) 1124–1136. [38] M. Peer, I.K. Brunec, N.S. Newcombe, R.A. Epstein, Structuring knowledge with
cognitive maps and cognitive graphs, Trends Cogn. Sci. 25 (1) (2021) 37–54.
[8] E. Tulving, Episodic memory: From mind to brain, Annu. Rev. Psychol. 53 (1)
(2002) 1–25. [39] N. Cazin, P. Scleidorovich, A. Weitzenfeld, P.F. Dominey, Real-time sensory–
[9] P.P. Thakral, K.P. Madore, S.E. Kalinowski, D.L. Schacter, Modulation of hip- motor integration of hippocampal place cell replay and prefrontal sequence
pocampal brain networks produces changes in episodic simulation and divergent learning in simulated and physical rat robots for novel path optimization, Biol.
thinking, Proc. Natl. Acad. Sci. 117 (23) (2020) 12729–12740. Cybernet. 114 (2020) 249–268.
[10] T.A. Allen, N.J. Fortin, The evolution of episodic memory, Proc. Natl. Acad. Sci. [40] D. Goldschmidt, P. Manoonpong, S. Dasgupta, A neurocomputational model of
110 (Supplement 2) (2013) 10379–10386. goal-directed navigation in insect-inspired artificial agents, Front. Neurorobotics
[11] E.D. Gribkova, M. Catanho, R. Gillette, Simple aesthetic sense and addiction 11 (2017) 20.
emerge in neural relations of cost-benefit decision in foraging, Sci. Rep. 10 (1) [41] X. Sun, Q. Fu, J. Peng, S. Yue, An insect-inspired model facilitating autonomous
(2020) 1–11. navigation by incorporating goal approaching and collision avoidance, Neural
[12] J.W. Brown, D. Caetano-Anollés, M. Catanho, E. Gribkova, N. Ryckman, K. Netw. 165 (2023) 106–118.
Tian, M. Voloshin, R. Gillette, Implementing goal-directed foraging decisions [42] Y. Li, Reinforcement learning in practice: Opportunities and challenges, 2022,
of a simpler nervous system in simulation, eNeuro 5 (1) (2018) ENEURO. arXiv preprint arXiv:2202.11296.
0400–17.2018. [43] K. Khetarpal, M. Riemer, I. Rish, D. Precup, Towards continual reinforcement
[13] L.F. Jacobs, From chemotaxis to the cognitive map: the function of olfaction, learning: A review and perspectives, J. Artificial Intelligence Res. 75 (2022)
Proc. Natl. Acad. Sci. 109 (2012) 10693–10700. 1401–1476.
[14] L.F. Jacobs, The PROUST hypothesis: The embodiment of olfactory cognition, [44] K. Doya, Reinforcement learning in continuous time and space, Neural Comput.
Anim. Cogn. 26 (1) (2023) 59–72. 12 (1) (2000) 219–245.
[15] T.T. Hills, Animal foraging and the evolution of goal-directed cognition, Cogn. [45] J. Hare, Dealing with sparse rewards in reinforcement learning, 2019, arXiv
Sci. 30 (1) (2006) 3–41. preprint arXiv:1910.09281.
[16] T.T. Hills, S. Butterfill, From foraging to autonoetic consciousness: The primal [46] G. Matheron, N. Perrin, O. Sigaud, Understanding failures of deterministic
self as a consequence of embodied prospective foraging, Curr. Zool. 61 (2) (2015) actor–critic with continuous action spaces and sparse rewards, in: International
368–381. Conference on Artificial Neural Networks, Springer, 2020, pp. 308–320.
[17] T. Kohonen, Correlation matrix memories, IEEE Trans. Comput. 100 (4) (1972)
353–359.
[18] V. Cutsuridis, T. Wennekers, Hippocampus, microcircuits and associative
memory, Neural Netw. 22 (8) (2009) 1120–1128. Ekaterina D. Gribkova is a postdoctoral research asso-
[19] M. Lawrence, T. Trappenberg, A. Fine, Rapid learning and robust recall of long ciate at the University of Illinois at Urbana-Champaign,
sequences in modular associator networks, Neurocomputing 69 (7–9) (2006) where she received her Ph.D. in Neuroscience. Her research
634–641. is focused on computational neuroscience and neuro-
[20] T. Madl, K. Chen, D. Montaldi, R. Trappl, Computational cognitive models of behavioral studies of marine invertebrates, with specific
spatial memory in navigation space: A review, Neural Netw. 65 (2015) 18–43. interest in biologically-inspired artificial intelligence, and
[21] G. Franz, H.A. Mallot, J.M. Wiener, Graph-based models of space in architecture computational models of behavior, cognition and neural
and cognitive science: A comparative analysis, in: 17th International Conference plasticity.
on Systems Research, Informatics and Cybernetics, INTERSYMP 2005, Interna-
tional Institute for Advanced Studies in Systems Research and Cybernetics, 2005,
pp. 30–38.

11
E.D. Gribkova et al. Neurocomputing 595 (2024) 127812

Girish Chowdhary received his Ph.D. degree in aerospace Rhanor Gillette received his Ph.D. in Neuroscience from
engineering in 2010 from Georgia Institute of Technology. the University of Toronto and is presently Professor of
He is an associate professor and Donald Biggar Willet Molecular & Integrative Physiology at the University of
Faculty Fellow at the University of Illinois at Urbana- Illinois at Urbana-Champaign. He is a systems neuroethol-
Champaign (UIUC), USA. He is a member of the UIUC ogist studying decision mechanisms in simple and com-
Coordinated Science Laboratory and director of the UIUC plex nervous systems, and modeling them in agent-based
Distributed Autonomous Systems Laboratory and the Field computational simulations.
Robotics Engineering and Science Hub. His research inter-
ests include theoretical insights and practical algorithms for
adaptive autonomy, with applications in field robotics.

Databricks Certified Professional Data Engineer 1 1
No ratings yet
Databricks Certified Professional Data Engineer 1 1
16 pages
Computational Exploration in Cognitive Neuroscience
No ratings yet
Computational Exploration in Cognitive Neuroscience
501 pages
Episodic Memory and Beyond
No ratings yet
Episodic Memory and Beyond
33 pages
Beyond The Cognitive Map. From Place Cells To Episodic Memory. A. David Redish
No ratings yet
Beyond The Cognitive Map. From Place Cells To Episodic Memory. A. David Redish
444 pages
Positioning Is All You Need
No ratings yet
Positioning Is All You Need
14 pages
Hippocampus and MEMORY
No ratings yet
Hippocampus and MEMORY
34 pages
Chapter 6 Summary PowerPoint
No ratings yet
Chapter 6 Summary PowerPoint
28 pages
A-tale-of-two-algorithms--Structured-slots-explain
No ratings yet
A-tale-of-two-algorithms--Structured-slots-explain
20 pages
Full
No ratings yet
Full
78 pages
Article Text June 26 2023
No ratings yet
Article Text June 26 2023
9 pages
Episodic Memory: Episodic Memory Is The Memory of Autobiographical Events (Times, Places, Associated Emotions, and Other
No ratings yet
Episodic Memory: Episodic Memory Is The Memory of Autobiographical Events (Times, Places, Associated Emotions, and Other
8 pages
IEEE20101
No ratings yet
IEEE20101
19 pages
2023 04 07 536077v1 Full
No ratings yet
2023 04 07 536077v1 Full
38 pages
Category-Specific Semantic Memory - Converging Evidence From BOLD FMRI and Alzheimer's Disease NeuroImage
No ratings yet
Category-Specific Semantic Memory - Converging Evidence From BOLD FMRI and Alzheimer's Disease NeuroImage
33 pages
TMP F7 CC
No ratings yet
TMP F7 CC
24 pages
Cortical Dynamics of Memory
No ratings yet
Cortical Dynamics of Memory
10 pages
neurobiology of memory 2
No ratings yet
neurobiology of memory 2
65 pages
Neurobiology of Learning and Memory
No ratings yet
Neurobiology of Learning and Memory
37 pages
Nihms 1740454
No ratings yet
Nihms 1740454
37 pages
1-s2.0-S0303264722002064-main
No ratings yet
1-s2.0-S0303264722002064-main
20 pages
Brief Communications: Bidirectional Prefrontal-Hippocampal Interactions Support Context-Guided Memory
No ratings yet
Brief Communications: Bidirectional Prefrontal-Hippocampal Interactions Support Context-Guided Memory
5 pages
Lecture10 11 Combination
No ratings yet
Lecture10 11 Combination
73 pages
Lecture 02
No ratings yet
Lecture 02
16 pages
A neural-level model of spatial memory
No ratings yet
A neural-level model of spatial memory
45 pages
2501.00663v1
No ratings yet
2501.00663v1
27 pages
Schedlbauer 2014
No ratings yet
Schedlbauer 2014
9 pages
Brain Networks Underlying Episodic Memory Retrieval Michael D Rugg and Kaia L Vilberg
No ratings yet
Brain Networks Underlying Episodic Memory Retrieval Michael D Rugg and Kaia L Vilberg
6 pages
memory neuroscience
No ratings yet
memory neuroscience
31 pages
Neural Mechanisms of attending to items in working memory
No ratings yet
Neural Mechanisms of attending to items in working memory
12 pages
An Evolving Perspective On The Dynamic Brain - Notes From The Brain Conference On Dynamics of The Brain - Temporal Aspects of Computation
No ratings yet
An Evolving Perspective On The Dynamic Brain - Notes From The Brain Conference On Dynamics of The Brain - Temporal Aspects of Computation
22 pages
Deep Reniforcement Leanring Paper
No ratings yet
Deep Reniforcement Leanring Paper
13 pages
Localization of Memory
No ratings yet
Localization of Memory
27 pages
Howard 2017
No ratings yet
Howard 2017
6 pages
Memoria y Aprendizaje
No ratings yet
Memoria y Aprendizaje
9 pages
The Cognitive Neuroscience of Memory Encoding and Retrieval Studies in Cognition 1st Edition Amanda Parker - The special ebook edition is available for download now
100% (1)
The Cognitive Neuroscience of Memory Encoding and Retrieval Studies in Cognition 1st Edition Amanda Parker - The special ebook edition is available for download now
84 pages
Nuero-Symbolism
From Everand
Nuero-Symbolism
Larry Lee matthews
No ratings yet
Banino 10096744 Thesis AARAV
No ratings yet
Banino 10096744 Thesis AARAV
162 pages
Baram 2020
No ratings yet
Baram 2020
13 pages
03-Item and source-fMRI-shape
No ratings yet
03-Item and source-fMRI-shape
8 pages
tmp494D TMP
No ratings yet
tmp494D TMP
10 pages
Ororbia Maze Learning
No ratings yet
Ororbia Maze Learning
10 pages
Neurobiologia
No ratings yet
Neurobiologia
20 pages
Introduction To Cognitive Psychology
No ratings yet
Introduction To Cognitive Psychology
11 pages
Transformer for Hippocampal Formation
No ratings yet
Transformer for Hippocampal Formation
20 pages
Neurophys 9 Associative Memory
No ratings yet
Neurophys 9 Associative Memory
23 pages
The Systems Neuroscience of Human Memory
No ratings yet
The Systems Neuroscience of Human Memory
35 pages
Neurophys 10 Hebbian Learning
No ratings yet
Neurophys 10 Hebbian Learning
44 pages
[EBOOK PDF] Download complete The Cognitive Neuroscience of Memory Encoding and Retrieval Studies in Cognition 1st Edition Amanda Parker ebook
No ratings yet
[EBOOK PDF] Download complete The Cognitive Neuroscience of Memory Encoding and Retrieval Studies in Cognition 1st Edition Amanda Parker ebook
86 pages
Visual Features Are Processed Before Navigational Affordances in The Human Brain
No ratings yet
Visual Features Are Processed Before Navigational Affordances in The Human Brain
7 pages
How to build a cognitive map
No ratings yet
How to build a cognitive map
25 pages
Cognição espacial e o cérebro
No ratings yet
Cognição espacial e o cérebro
21 pages
Burgess 08 Spatial
No ratings yet
Burgess 08 Spatial
21 pages
Spatial Memory in Rodents and Humans
No ratings yet
Spatial Memory in Rodents and Humans
9 pages
Chapter 5
No ratings yet
Chapter 5
7 pages
The Hippocampus As A Cognitive Map of Social Space
No ratings yet
The Hippocampus As A Cognitive Map of Social Space
3 pages
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
Kinematics of the Brain Activities
From Everand
Kinematics of the Brain Activities
Mostafa M. Dini
No ratings yet
Supervised Machine Learning for Science: How to stop worrying and love your black box
From Everand
Supervised Machine Learning for Science: How to stop worrying and love your black box
Christoph Molnar
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Swarm Intelligence: Fundamentals and Applications
From Everand
Swarm Intelligence: Fundamentals and Applications
Fouad Sabry
No ratings yet
Q1 LE Mathematics 8 Lesson 1 Week 1
No ratings yet
Q1 LE Mathematics 8 Lesson 1 Week 1
10 pages
RGC 2000 Software Operation
No ratings yet
RGC 2000 Software Operation
74 pages
Painting and Drawing Principles and Techniques
75% (4)
Painting and Drawing Principles and Techniques
279 pages
Agile Life Cycle
100% (2)
Agile Life Cycle
41 pages
4024 Paper 1A
No ratings yet
4024 Paper 1A
17 pages
High Speed Tool Steels
No ratings yet
High Speed Tool Steels
3 pages
PSQCA 2WheelerAuto
No ratings yet
PSQCA 2WheelerAuto
11 pages
Writing A Test Strategy: Tor Stålhane
No ratings yet
Writing A Test Strategy: Tor Stålhane
32 pages
Chapter 4
No ratings yet
Chapter 4
13 pages
أجهزة قياس معدل التدفق ومستوى السائل PDF
No ratings yet
أجهزة قياس معدل التدفق ومستوى السائل PDF
15 pages
After Completing This Lesson, You Will Be Able To
No ratings yet
After Completing This Lesson, You Will Be Able To
32 pages
Prabhat Tiwari: Education Skills
No ratings yet
Prabhat Tiwari: Education Skills
1 page
Buchanan 1969 1999 Cost and Choice An Inquiry in Economic Theory
No ratings yet
Buchanan 1969 1999 Cost and Choice An Inquiry in Economic Theory
120 pages
Addition 4digit 1digit All
No ratings yet
Addition 4digit 1digit All
20 pages
Exp 7 Field Density (Sand Replacement Method) PDF
No ratings yet
Exp 7 Field Density (Sand Replacement Method) PDF
21 pages
Am8530H/Am85C30: Serial Communications Controller
No ratings yet
Am8530H/Am85C30: Serial Communications Controller
195 pages
Aws Developer Guide
No ratings yet
Aws Developer Guide
784 pages
JJ207 Sesi Jun2013
No ratings yet
JJ207 Sesi Jun2013
15 pages
Glenn A. Hunter - Global Product Manager, Reciprocating: Previous
No ratings yet
Glenn A. Hunter - Global Product Manager, Reciprocating: Previous
73 pages
USER MANUAL Taurus Polar 3B Polure
No ratings yet
USER MANUAL Taurus Polar 3B Polure
28 pages
JSW Jaigarh Port LTD.: Sap Order No
No ratings yet
JSW Jaigarh Port LTD.: Sap Order No
9 pages
Installation and Configuration Manual: Version 1.6 - October 2020
No ratings yet
Installation and Configuration Manual: Version 1.6 - October 2020
54 pages
10 Pre Board Question Papers (Kvs)
No ratings yet
10 Pre Board Question Papers (Kvs)
85 pages
JETIR1807691
No ratings yet
JETIR1807691
6 pages
Surfer 8 No.3
No ratings yet
Surfer 8 No.3
9 pages
Carbide Tooling
No ratings yet
Carbide Tooling
90 pages
03 - Thomson Et Al - Current Signature Analysis To Detect Induction Motors Faults
No ratings yet
03 - Thomson Et Al - Current Signature Analysis To Detect Induction Motors Faults
9 pages
6731ce27d9ae9d19ebc770e8 ## VPRP Mega Test-07 ROI Dropper 10-Nov-2024 Ph-1 Hs 349 Questions
No ratings yet
6731ce27d9ae9d19ebc770e8 ## VPRP Mega Test-07 ROI Dropper 10-Nov-2024 Ph-1 Hs 349 Questions
20 pages
Photogrammetry Assignment 1
No ratings yet
Photogrammetry Assignment 1
19 pages