0% found this document useful (0 votes)
9 views18 pages

Enhancing Learning and Retention Through The Distribution of Practice Repetitions Across Multiple Sessions

The study investigates how the distribution of practice repetitions across multiple sessions affects learning and retention, focusing on the spacing effect. Results indicate that retention is significantly influenced by the total number of sessions rather than the specific distribution of repetitions within those sessions. The findings support the Predictive Performance Equation model, which can help optimize study schedules for enhanced knowledge acquisition and retention.

Uploaded by

rencegalang753
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views18 pages

Enhancing Learning and Retention Through The Distribution of Practice Repetitions Across Multiple Sessions

The study investigates how the distribution of practice repetitions across multiple sessions affects learning and retention, focusing on the spacing effect. Results indicate that retention is significantly influenced by the total number of sessions rather than the specific distribution of repetitions within those sessions. The findings support the Predictive Performance Equation model, which can help optimize study schedules for enhanced knowledge acquisition and retention.

Uploaded by

rencegalang753
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Memory & Cognition (2023) 51:455–472

https://doi.org/10.3758/s13421-022-01361-8

Enhancing learning and retention through the distribution of practice


repetitions across multiple sessions
Matthew M. Walsh1 · Michael A. Krusmark2,3 · Tiffany Jastrembski2 · Devon A. Hansen4,5 · Kimberly A. Honn4,5 ·
Glenn Gunzelmann2

Accepted: 18 September 2022 / Published online: 3 October 2022


© The Psychonomic Society, Inc. 2022

Abstract
The acquisition and retention of knowledge is affected by a multitude of factors including amount of practice, elapsed time
since practice occurred, and the temporal distribution of practice. The third factor, temporal distribution of practice, is at
the heart of research on the spacing effect. This research has consistently shown that separating practice repetitions by a
delay slows acquisition but enhances retention. The current study addresses an empirical gap in the spacing effects litera-
ture. Namely, how does the allocation of a fixed number of practice repetitions among multiple sessions impact learning
and retention? To address this question, we examined participants’ acquisition and retention of declarative knowledge given
different study schedules in which the number of practice repetitions increased, decreased, or remained constant across
multiple acquisition sessions. The primary result was that retention depended strongly on the total number of sessions in
which an item appeared, but not on how practice repetitions were distributed among those sessions. This outcome was
consistent with predictions from a computational cognitive model of skill acquisition and retention called the Predictive
Performance Equation (PPE). The success of the model in accounting for the patterns of performance across a large set of
study schedules suggests that it can be used to tame the complexity of the design space and to identify schedules to enhance
knowledge acquisition and retention.

Keywords Spacing effect · Memory · Cognitive model

Introduction studies have revealed that the acquisition and retention of


knowledge is impacted by multiple factors (for a review, see
For more than a century, psychologists have conducted Anderson, 1995). One of the most important is the distribu-
experiments to identify factors that impact the acquisition tion of practice over time. Separating practice repetitions
and retention of knowledge, and more basically, to under- by a delay slows acquisition but enhances retention. This
stand how information is organized, stored, and accessed effect, called the spacing effect, is one of the most widely
from memory (Ebbinghaus, 1885/1964; Jost, 1897). These replicated results in psychology (Benjamin & Tullis, 2010;
Cepeda et al., 2006; Delaney et al., 2010). In their 1992
review of research on learning, memory, and perception,
* Matthew M. Walsh Bruce and Bahrick identified the spacing effect as one of the
[email protected] most important topics from psychology during the previous
1
century. Things have scarcely changed since; the spacing
The RAND Corporation, Pittsburgh, PA, USA
effect remains an active topic of investigation (for recent
2
Air Force Research Laboratory, Wright-Patterson Air Force reviews, see Carpenter et al., 2012; Walsh, Gluck, Gunzel-
Base, Dayton, OH, USA
mann, Jastrzembski, & Krusmark, 2018b), and has been
3
L3Harris Technologies, Wright-Patterson Air Force Base, repeatedly identified as an empirically supported principle
Dayton, OH, USA
for effective learning (Dunlosky et al., 2013; Pashler et al.,
4
Sleep and Performance Research Center, Washington State 2007).
University Health Sciences, Spokane, WA, USA
Although the spacing effect has been extensively stud-
5
Elson S. Floyd College of Medicine, Washington State ied and widely replicated, our understanding of this
University Health Sciences, Spokane, WA, USA

13
Vol.:(0123456789)
456 Memory & Cognition (2023) 51:455–472

phenomenon remains incomplete. Additionally, the spacing reviewed by Cepeda et al., 2006). These experiments show
effect is better seen as a collection of effects. Multiple vari- that the spacing effect, first observed in Ebbinghaus’s multi-
ables including time between repetitions within a session, day studies, also appears along far briefer timescales ranging
number of sessions, time between sessions, and time from from seconds to hours (for a review, see Cepeda et al., 2006).
study to test contribute to the efficiency and effectiveness of Laboratory studies have since returned to designs where
learning, retention, and relearning. These spacing-related spacing is manipulated by varying elapsed time between two
variables interact with one another and with other factors or more study sessions delivered on different days (Carpen-
such as number of repetitions and item difficulty. Further, ter et al., 2009; Cepeda et al., 2009; Gerbier et al., 2015;
these effects are modulated by individual differences. The Kapler et al., 2015; Küpper-Tetzel et al., 2014; Rawson &
complexity of the design space created by these variables Dunlosky, 2013; Rawson et al., 2013; Rawson et al., 2018;
precludes straightforward prediction based on verbal theo- see also Bahrick, 1979). Once again, these studies show that
ries of the spacing effect and limits educational prescrip- increasing the amount of time (i.e., days) between study ses-
tions that can be drawn from the accompanying literature sions benefits retention.
(Dempster, 1989). Recently, researchers have turned from the question of
Our goals in this paper are twofold. First, we address an how much space to include between sessions to how to dis-
empirical gap in the spacing effects literature. Specifically, tribute that space across multiple sessions. For example,
we examine how the allocation of a fixed number of practice Küpper-Tetzel et al. (2014) conducted an experiment where
repetitions among multiple sessions impacts learning and participants learned and reviewed trivia facts during ses-
retention. Second, we present a computational model of the sions that occurred on three separate days. They were then
spacing effect called the Predictive Performance Equation tested after a short (15 min) or long (35 days) retention inter-
(PPE). Computational models such as PPE can unify exist- val. The experiment had three spacing conditions: one with
ing findings related to the temporal distribution of practice uniform inter-session intervals (ISIs) separating the study
and other study variables and can make quantitative predic- sessions (3 days, 3 days), one with contracting ISIs (5 days,
tions about how different study schedules will impact knowl- 1 day), and one with expanding ISIs (1 day, 5 days). Küp-
edge acquisition and retention. We evaluate PPE using data per-Tetzel et al. observed an interaction between retention
from the experiment mentioned above, which encompasses interval and study schedule; the contracting schedule was
an extremely large number of study schedules and spans best when the retention interval was short, and the expand-
multiple days. Computational models such as PPE, if found ing schedule was best when the retention interval was long.
to be valid representations of the psychological processes Results from similar experiments are only partially con-
underlying knowledge acquisition and retention, can be used sistent, however. Expanding schedules sometimes enhance
to tame the complexity of the instruction design space. retention relative to uniform schedules (Gerbier et al., 2015;
Küpper-Tetzel et al., 2014), and they sometimes produce
The spacing effect equivalent retention (Carpenter & DeLosh, 2005; Karpicke
& Bauernschmidt, 2011; Karpicke & Roediger III, 2007).
Most basically, studies of the spacing effect compare mem- The far more consistent finding is that including any amount
ory performance following massed versus spaced practice. of time between sessions (i.e., expanding or uniform sched-
When study time devoted to an item is uninterrupted, learn- ules) is more effective than delivering all practice within one
ing is massed. Alternatively, when measurable time or inter- session (Gerbier & Koenig, 2012).
vening items separate study repetitions, learning is spaced. The spacing effect is related to but distinct from the test-
Massed practice typically accelerates acquisition, whereas ing effect – that is, the finding that aside from assessing what
spaced practice improves retention (for recent reviews, see one knows, taking a memory test enhances retention (Roe-
Carpenter et al., 2012; Cepeda et al., 2006; Delaney et al., diger III & Karpicke, 2006). Some studies of the spacing
2010). Further, following long delays, after which memory effect involve passive study, whereas others involve active
performance is low, items initially studied in a spaced man- retrieval. The spacing effect holds in both cases, although
ner are relearned more quickly (Walsh et al., 2018a, b). the interaction between these effects has received relatively
In his seminal work on the spacing effect, Ebbinghaus little attention.
found that fewer repetitions were needed to produce errorless
memory 1 day after the completion of study when the repeti- Theories and models of the spacing effect
tions were spaced across multiple days rather than massed
within 1 day (1885/1964). Following this initial demonstra- Multiple theories have been proposed to account for the
tion of the benefits of separating repetitions by one or more basic finding that distributing practice enhances retention.
days, the majority of spacing effect studies adopted inter- These are broadly aligned with three hypotheses. According
study intervals of less than 1 day (87% of the 323 studies to the deficient processing hypothesis, people dedicate less

13
Memory & Cognition (2023) 51:455–472 457

attention or effort to encoding item repetitions that occur individuals, the model can be used to make personalized
after a short delay (Greene, 1989; Hintzman, 1974). Spac- predictions and training prescriptions (Walsh, Gluck, Gun-
ing enhances retention by increasing the likelihood that zelmann, Jastrzembski, Krusmark, Myung, et al., 2018a).
all repetitions are fully processed. According to the study- We recently proposed a novel computational model of
phase retrieval hypothesis, practice prompts the retrieval and the spacing effect called PPE (Jastrzembski & Gluck, 2009;
strengthening of an item’s original memory trace (Bjork, Walsh, Gluck, Gunzelmann, Jastrzembski, Krusmark,
1994; Hintzman, 2004). The increase in strength is propor- Myung, et al., 2018a). PPE is an extension of an earlier
tional to the difficulty of the successful retrieval, which is model called the General Performance Equation (Ander-
greater when practice is spaced. In this way, the study-phase son & Schunn, 2000), which predicted performance based
retrieval hypothesis appeals to the more general notion of on: (1) amount of practice; and (2) elapsed time between
desirable difficulties – that is, the idea that appropriate but practice and test. Additionally, PPE incorporates informa-
difficult conditions of learning enhance retention (Bjork & tion about the history of successive lags between practice
Bjork, 2011). Lastly, according to the context variability repetitions into the calculation of the knowledge decay rate
hypothesis, items are encoded along with contextual ele- in the model (c.f., Pavlik & Anderson, 2005). In this model,
ments present at study (Glenberg, 1979). Because these ele- longer lags boost retention by reducing subsequent decay
ments fluctuate over time, items are encoded with a more rate. This is conceptually related to accessibility theories, in
diverse set of contextual elements when practice is spaced. which repetitions of items that are more difficult to retrieve
The greater diversity of contextual elements encoded with – such as after a long delay – result in greater improvements
items during spaced practice facilitates retrieval. in recall due to retrieval difficulty. In addition to accounting
These hypotheses are not exclusive. For example, defi- for the basic spacing effect, specifically slower acquisition
cient processing could impact learning and retention within but enhanced retention following temporally distribute prac-
a session, whereas study-phase retrieval or context variabil- tice, PPE accounts for a multitude of other spacing-related
ity could impact learning and retention between sessions phenomena (Walsh, Gluck, Gunzelmann, Jastrzembski, &
(Braun & Rubin, 1998; Delaney et al., 2010). This is con- Krusmark, 2018b).
sistent with the finding that different neural processes are
potentiated by spacing along different timescales (Smolen Experiment overview
et al., 2016). Additionally, deficient processing could occur
during passive study, whereas study-phase retrieval and con- The benefits of spacing are well understood, yet basic ques-
text variability could occur during passive study and active tions involving the temporal distribution of practice remain
testing. Finally, study-phase retrieval and contextual varia- unanswered. This paper addresses a theoretical gap in exist-
bility could both impact memory (Toppino & Gerbier, 2014; ing research on the spacing effect. Specifically, when a fixed
Verkoeijen et al., 2004). In fact, in some computational mod- number of acquisition sessions are administered at predeter-
els of the spacing effect, the retrieval of an existing trace mined intervals, how should item repetitions be allocated
in a new context allows it to be associated with different among sessions to maximize retention? This is akin to a
contextual elements (Mozer et al., 2009; Raaijmakers, 2003). scenario in which a training manager, operating within the
In addition to these theories, researchers have begun to constraints of calendar-based training, can only control how
develop cognitive models, instantiated as running compu- a fixed number of practice repetitions are divided among
tational simulations, to explain how the spacing of practice sessions. Our experiment had four types of practice trial
and other factors affect knowledge acquisition and retention distributions across acquisition sessions, all with the same
(Mozer et al., 2009; Pavlik & Anderson, 2005; Raaijmak- total number of practice repetitions: (1) massed – all repeti-
ers, 2003). Computational models take as input the number tions within one acquisition session; (2) level – repetitions
and timing of practice events, and make quantitative predic- equally divided among the total number of acquisition ses-
tions about learning, retention, and relearning, providing a sions; (3) ramp – number of repetitions increasing with each
means to assess explanations for existing phenomena and acquisition session; and (4) taper – number of repetitions
to develop hypotheses regarding unexplored manipulations. decreasing with each acquisition session. Retention of all
In this way, researchers can use simulations to explore the items was tested in a session that was 12 h following the last
implications of theories that are too complex to simply acquisition session for half of the items from each condition,
imagine (McClelland, 2009). Computational models also and 36 h after the last acquisition session for the other half
enable application (Gray, 2008). A model, once validated, of the items.
can predict the effectiveness of different training schedules Based on the existing spacing effect literature, we hypoth-
to select the optimal schedule. Further, by estimating model esized that retention would be lowest in the massed con-
parameters that correspond to psychological processes (e.g., dition. It was less clear, however, whether the different
memory decay, baseline performance, etc.) at the level of allocations of repetitions in the level, ramped, and tapered

13
458 Memory & Cognition (2023) 51:455–472

conditions would affect retention. Gradually decreasing the Methods


amount of practice in each session (i.e., tapering) may lead
to progressively more difficult retrievals. This is related to Participants
the expanding condition in studies that compared multi-
session schedules with different ISIs (Gerbier et al., 2015; Thirty-eight individuals (20 males, 18 females) completed
Küpper-Tetzel et al., 2014). the study. Participants’ ages ranged from 22 to 40 years
In related work, Finley et al. (2011) manipulated cue (with a mean of 26.4 years and a standard deviation of 4.6).
informativeness across trials within a single session (see Three participants reported being left-handed, the remaining
also Fiechter & Benjamin, 2019). They showed participants 35 reported being right-handed. All participants gave writ-
English–Iñupiaq word pairs. After the initial presentation, ten, informed consent, and the study was approved by the
they varied the number of visible letters in the target Iñu- Institutional Review Board of Washington State University.
piaq word from zero to five. The number of visible letters All were physically and psychologically healthy and were
decreased across trials in a diminishing cue condition. Finley free of drugs and alcohol, as assessed by physical examina-
et al. (2011) reasoned that decreasing the number of cues tion, blood chemistry, urinalysis, breathalyzer, history, and
would create progressively more difficult retrievals, as with questionnaires.
expanding schedules in earlier experiments, and that this Individuals made up the control condition of a larger
would benefit acquisition and long-term retention. Final test study that examined the effects of total sleep deprivation on
performance was in fact higher in the diminishing cue con- performance in a battery of cognitive tasks. Results from
dition.1 Together, these experiments along with the study- participants who experienced sleep deprivation are not
phase retrieval hypothesis suggest that tapered practice will included here. Instead, we focus on the performance of par-
improve retention relative to level or ramped practice. By ticipants in the control condition and their performance on
gradually increasing retrieval difficulty, the tapered schedule a declarative memory task.
may enhance retention.
In addition to addressing this theoretical gap, our experi- Laboratory setting
ment overcomes a methodological limitation of nearly every
existing study of the spacing effect. Most multi-session The study took place inside the Human Sleep and Cognition
studies involve mixed or between-subjects designs (Bah- Laboratory of the Sleep and Performance Research Center
rick, 1979; Cepeda et al., 2008; Küpper-Tetzel et al., 2014; at Washington State University Health Sciences Spokane
Rawson & Dunlosky, 2013; but see Gerbier et al., 2015). In campus. Participants were studied in groups of up to four
our experiment, all patterns of practice trial distributions and were assigned their own room for performance testing
across acquisition schedules and retention intervals were and for scheduled sleep periods. They were in the laboratory
crossed and administered within-subjects, creating 17 differ- for 4 days (3 nights) for a total of 65 consecutive hours. The
ent schedules that required 11 sessions to complete. Indeed, laboratory environment and participants’ sleep schedule was
this was a major motivation for conducting the study – to carefully controlled. Participants were not allowed to use
determine whether existing computational cognitive mod- phones, personal computers/tablets, or otherwise commu-
els could account for the interplay between the collection nicate with the outside world, and no visitors were allowed
of spacing-related effects. The main advantage of a within- during the study. Participants were continuously monitored
subject design, in addition to the greater statistical power it by trained research personnel.
affords, is the constraint it provides in terms of evaluating
a computational model. Stated differentially, the within- Experiment task
subject design effectively increases the number and diver-
sity of observations the model must account for, providing Participants completed a cued-recall task. The task was pre-
a stronger test of the theory. sented on a 15-in. LCD screen approximately 21 in. from
the participant. The task required participants to memo-
rize pairs made up of two-digit numbers and nonsensical
line drawings called “droodles” (Nishimoto et al., 2010).
Each digit-droodle pair was made up of a number and a line
drawing that appeared only in that pair. When a pair was
1
The effect of cue condition was diminished in Experiment 2 when presented for the first time, the droodle and corresponding
corrective feedback was provided after each trial. The benefits of number appeared simultaneously on the computer monitor
diminishing cues in Experiment 1 may relate to the fact that, in the
absence of corrective feedback, only successful retrieval attempts
for 6 s. Participants typed the number with their dominant
were beneficial. Providing more cues early in study would increase hand using the keypad of a standard keyboard, and they
the proportion of successful retrievals.

13
Memory & Cognition (2023) 51:455–472 459

23 16

Inter-trial interval Stimulus onset and response Feedback


0.5 sec < 7.0 sec 2 sec

Fig. 1  Sequence of events during a trial of the experiment task with an incorrect response “23” and corrective feedback “16”

received accuracy feedback (Fig. 1). During each subsequent on day 3 (36 h or 12 h after the last acquisition session).2
presentation, only the droodle appeared. Participants were For the remaining three study schedules (long ten, long two,
instructed to try to recall the corresponding digit and to type and long four; Table 1), repetitions were administered during
it in. After they responded (or after 6 s passed), feedback acquisition sessions spanning both days.
for correct or incorrect responses in the form of a smiling Each digit-droodle pair was presented a total of 20 times
or frowning face appeared for 2 s. The correct response also across the acquisition sessions and twice during the retention
appeared below the droodle. The next trial began 0.5 s later. session. When a pair was presented for the first time (i.e.,
initial study), the droodle and number both appeared. This
Procedure and study schedules trial was excluded from analyses. In the remaining trials, the
droodle appeared and participants tried to recall the number.
Participants completed two familiarization sessions prior to They received corrective feedback after responding. Thus,
the study with 10 digit-droodle pairs that did not appear each trial provided a measure of learning and an opportunity
again. During familiarization, a trained research assistant for study. Pairs from multiple schedules appeared during
was in the room to provide instruction on the task. Data from each session and were randomly intermixed.3 Participants
these sessions were not analyzed. During the main experi- were not told about the different study schedules, and so
ment, participants completed 11 sessions spread over 48 h. trials within a session appeared as a homogenous sequence.
All sessions occurred between 09:00 and 21:00 . The assignment of schedules to sessions was such that a total
Participants learned a total of 51 digit-droodle pairs of 102 trials occurred during each of the ten acquisition ses-
divided evenly among 17 study schedules, giving three sions and again during the final retention session. The dura-
pairs per schedule (Table 1). All schedules included the tion of acquisition sessions depended on how quickly par-
exact same number of practice repetitions (20), but they ticipants responded, and generally lasted from 8 to 10 min.
differed in terms of how those repetitions were distrib- To reiterate, each participant completed all 17 study
uted across acquisition sessions. For example, in the level schedules depicted in Table 1. The only reason that it was
schedule, each item was studied four times during each
of five acquisition sessions. In the tapered schedule, the
number of repetitions began at six and decreased across
2
acquisition sessions. In the ramp schedule, the number of Given the exact sessions histories for the different schedules,
elapsed time before the final test was slightly longer (e.g., 38 h and 14
repetitions began at two and increased across acquisition h) for the short two and short three schedules (Table 1)
sessions. 3
For example, at 09:00 on Day 1, participants would see the three
The sequences of repetitions during the acquisition phase pairs from the level study schedule four times each, the three pairs
of the experiment were mirrored across two sets of seven from the massed early study schedule 20 times each, the three pairs
schedules on day 1 and day 2 (level, massed early, massed from the taper study schedule six times each, the three pairs from the
ramp study schedule twice each, and the three pairs from the long ten
late, taper, ramp, short two, and short three; Table 1). The study schedule twice each. These pairs appeared in a random order
sets differed in the amount of time before the retention test and were not identified as part of a particular study schedule.

13
460 Memory & Cognition (2023) 51:455–472

Table 1  Schedules and number of repetitions during sessions on Days 1, 2, and 3


Day 1 Day 2 Day 3
09:00 13:00 15:00 19:00 21:00 09:00 13:00 15:00 19:00 21:00 09:00

Level 4 4 4 4 4 2
Massed Early 20 - - - - 2
Massed Late - - - - 20 2
Taper 6 5 4 3 2 2
Ramp 2 3 4 5 6 2
Short Two - 10 - 10 - 2
Short Three - 5 10 5 - 2
Level 4 4 4 4 4 2
Massed Early 20 - - - - 2
Massed Late - - - - 20 2
Taper 6 5 4 3 2 2
Ramp 2 3 4 5 6 2
Short Two - 10 - 10 - 2
Short Three - 5 10 5 - 2
Long Ten 2 2 2 2 2 2 2 2 2 2 2
Long Two - - 10 - - - - 10 - - 2
Long Four - 5 - 5 - - 5 - 5 - 2

possible to collect data using so many schedules and ses- once on Day 2 with a 12-h final retention interval. Because
sions was because participants remained in the laboratory for the retention interval manipulation followed the acquisition
three and a half consecutive days. This is the most diverse phase, we collapsed across sets when analyzing acquisition
set of spacing conditions administered to a single group of performance, but treated the sets separately when analyz-
participants to date, and the dataset provides an exception- ing retention. Figure 2 shows trial-level accuracy for each
ally strong test of both psychological theories and computa- schedule, along with the three other schedules that spanned 2
tional models of the spacing effect. days and had a single retention interval. The first 19 points in
From a theoretical standpoint, the conditions of primary each panel show performance during the acquisition phase.
interest were the level, tapered, and ramped schedules. The Breaks in the curves denote session boundaries. The final
massed late schedule serves as a comparison to the various two sets of points in each panel show retention 12 and 36 h
spaced schedules. The remaining schedules do not permit as after the final acquisition session.
direct a comparison due to confounds, like the duration of As seen in Fig. 2, participants ultimately learned the digit-
the retention intervals. However, we included these sched- droodle pairs for all study schedules. Accuracy for the final
ules to equate the number of items appearing in each section. repetition of each item during the acquisition phase ranged
Additionally, all these schedules are informative for model from 92% to 98% by schedule (with a mean of 96%). Yet the
validation. efficiency of learning varied widely by schedule. To exam-
ine these differences, we performed a logistic mixed-effects
regression with the lme4 package (Bates et al., 2014) in the
Empirical results R statistical computing environment. The analysis included
fixed effects for schedule and trial and random effects for
We focused on the effects of schedule and retention interval item and subject. The analysis revealed main effects of trial
on response accuracy. In what follows, we examine response (z = 53.990, p < .001) and schedule (z = 2918.085, p <
accuracy during the acquisition and retention sessions. .001).4 These effects appeared as increased accuracy by
trial, and increased accuracy for schedules with repetitions
Acquisition phase distributed across fewer sessions (Table 2). To confirm the
latter observation, we repeated the logistic mixed-effects
Two sets of schedules (level, massed early, massed late,
taper, ramp, short two, and short three) occurred twice
– once on Day 1 with a 36-h final retention interval and 4
Coefficient estimates reported in Appendix Table 5.

13
Memory & Cognition (2023) 51:455–472 461

Fig. 2  Trial-level accuracy for 17 study schedules formed by par- tion performance. Breaks in curves denote session boundaries. Black
tially crossing ten acquisition schedules and two retention intervals. points correspond to observed performance, and red points corre-
R1 denotes 12-h retention performance and R2 denotes 36-h reten- spond to model fits

analysis, only including number of sessions in which an item


appeared instead of schedule. The main effect of number
Table 2  Mean Accuracy (± 1 standard error of the mean) during the of sessions was significant (z = 277.22, p < .001) and the
acquisition phase, and during retention tests 1 and 2 coefficient was negative (B = -0.10, SE = 0.004), indicating
Study schedule Acquisition Retention 1 Retention 2 that acquisition performance was generally lower for items
spread across multiple sessions.
Massed Early .89 (.01) .31 (0.4) .25 (.04)
Differences in acquisition performance arose in part
Massed Late .94 (.01) .60 (.04) .35 (.03)
from forgetting what occurred between sessions for all non-
Short Two .81 (.02) .74 (.04) .35 (.03)
massed schedules (Fig. 2). The first occurrence of a digit-
Short Three .80 (.02) .81 (.04) .69 (.04)
droodle pair in each session provides a sort of intermediate
Ramp .78 (.02) .91 (.02) .78 (.05)
retention test. For multi-session schedules, session breaks
Taper .81 (.02) .95 (.02) .79 (.04)
with their associated forgetting created a set of scalloped
Level .78 (.02) .92 (.02) .77 (.03)
learning curves instead of a continuous learning curve. Each
Long Two .81 (.02) .65 (.05) -
session-level learning curve began from a progressively
Long Four .77 (.03) .89 (.03) -
higher intercept. The degree of relearning from the first to
Long Ten .76 (.03) .93 (.02) -
the second repetition of a digit-droodle pair within a session

13
462 Memory & Cognition (2023) 51:455–472

was substantial. For most schedules, performance equaled Table 3  Comparison of baseline models for retention performance
or exceeded the maximum value from the previous session
Model Fixed effect Degrees of Deviance Comparison model:
after one relearning trial. freedom Δdeviance (df)

Retention phase A Null 5 3922.00 -


B Session 6 3358.80 A: 563.2 (1) *
Points beyond the vertical lines in Fig. 2 show retention C Time 6 3685.20 B: 0.0 (0)
performance. We began by analyzing retention performance D Session, Time 7 3355.60 B: 3.5 (1) +
from the massed, level, ramped, and tampered schedules *
p < .0001
shown in Fig. 2. The analysis included fixed effects of trial +
p < .1
(first or second presentation), retention interval (12- and
36-h), and study schedule, and random effects for item
and subject. The analysis revealed main effects of trial (z appeared, and (2) The total elapsed time from initial to
= 89.344, p < .001), retention interval (z = 68.289, p < final acquisition presentation of an item.8 All models also
.001), and study schedule (z = 91.041, p < .001).5 Accuracy included fixed effects for trial number and length of retention
increased by trial and decreased with length of the reten- interval, and random effects for item and subject.
tion interval. Most importantly, accuracy was lowest for Table 3 compares the eight models formed by fully cross-
the massed schedule, and accuracy was equally high for the ing the two fixed effects. Of the single-factor models, the
level, ramped, and tapered schedules (Table 2). The benefits Session model had the smallest deviance. The two-factor
of having repeating items across multiple acquisition ses- model (Session + Time and Session) had lower deviance
sions were substantial. As compared to the massed schedule, than the Session model, yet the chi-square ratio test showed
the level schedule produced 1.9 times greater accuracy for that the improvement in fit was not sufficient to justify the
the first presentation of each studied item on the12-h reten- increased complexity (p > .05). Thus, the model that only
tion test (Cohen’s d = 1.30), and 5.1 times greater accuracy contained a fixed effect for number of acquisition sessions
on the 36-h retention test (Cohen’s d = 1.72). (along with retention interval and trial number) was the
Given that differences in acquisition and retention per- favored model.
formance for the level, taper, and ramp schedules were of Figure 3 (Observed) shows retention (i.e., average test
primary interest, we performed an additional analysis on performance) as a function of length of the retention interval
average test performance that included fixed effects for and the log of the number of acquisition sessions in which
study schedule (level, taper, and ramp) and phase (12-h the corresponding digit-droodle pairs appeared. As shown,
retention and 36-h retention), and random effect for item retention performance decreased with the duration of the
and subject. The analysis revealed a significant effect of retention interval and increased with the number of sessions
phase (z = 30.139, p < .001), but not of study schedule (z in which the item was previously studied.
= 0.031, n.s.).6 Thus, these results provide evidence that
length of retention interval and total number of learning Interim discussion
sessions affect retention, but they do not provide evi-
dence that how repetitions are distributed across sessions Retention was higher when practice was spread across multi-
affects retention. 7 ple sessions rather than massed within one session, replicat-
To determine which other characteristics of study sched- ing the spacing effect, and retention decreased with elapsed
ules affect retention, we fitted linear mixed effects models time from study to test, reflecting the standard forgetting
that included different combinations of two fixed effects: curve. Surprisingly, the total number of acquisition sessions
(1) The number of acquisition sessions in which an item in which an item appeared directly affected final retention,
whereas the way repetitions were allocated among acquisi-
tion sessions (i.e., level, ramp, taper) had no discernable
effect. This demonstrates the potency of reviewing and
5
Coefficient estimates reported in Appendix Table 6. relearning material across multiple sessions.
6
The coefficient estimates for the effects of the ramp and taper
schedules, relative to the level schedule, were nearly zero (Ramp: B =
0.0, SE = .24; Taper: B = .059, SE = .24).
7
With a significance criterion of α = .05 and power = .80, the mini-
mum sample size needed to detect a large effect size of f = 0.4 is N =
8
21 participants for a repeated-measures ANOVA with three measures The experiment design was unbalanced. Number of acquisition ses-
(i.e., level, taper, and ramped schedules). Thus, there was sufficient sions was moderately correlated with total elapsed study time (r2 =
statistical power to perform the comparison. .51). Future studies could seek to disentangle these effects.

13
Memory & Cognition (2023) 51:455–472 463

Fig. 3  Average test performance as a function of retention interval (marker colors) and the logarithm of the number of acquisition sessions in
which the digit-droodle pair appeared

Computational model by the history node in Fig. 4, which is shaded to denote


that the values are observable. For each trial, the depend-
PPE is a computational model that accounts for how ent variable was the number of correct responses aggre-
three factors affect knowledge acquisition, retention, and gated across the constituent items. This is represented by
relearning. The factors are: (1) amount of practice, (2) the y node in Fig. 4, which is also shaded to denote that
elapsed time from study to test, and (3) the distribution the values are observable. The model contains four free
of practice across time. PPE captures the effects of these parameters that vary by individual (the b, m, τ, and s).
factors on memory performance through a series of equa- These appear as unshaded nodes in Fig. 4 to denote that
tions that represent invariant cognitive features. Addition- the values are not observable, and that they are estimated
ally, PPE contains several free parameters that represent to maximize the probability of the observed responses.
individual differences in the efficiency and effectiveness Further, the model contains four hyper-parameter distri-
of various psychological processes involved in knowledge butions from which the individual parameter estimates are
acquisition and retention. PPE is described in detail in the drawn. These also appear as unshaded nodes in Fig. 4, and
Appendix. In the remainder of this section, we describe they are estimated from the data as well. Hyper-parame-
the approach to implementing and fitting the model and ters are identified as such because they control how the
we evaluate its ability to account for the data from the model parameters themselves are estimated. In this con-
experiment described above. We also evaluate PPE’s text, the hyper-parameters provide information about the
predictive capacity by fitting it to partial data from the distributions of values of the model parameters in the
experiment to test whether it can predict performance sample. The model was implemented and fitted using Stan
from omitted schedules and future time points. (Carpenter et al., 2017).
We evaluated the model based on three criteria:
Model fitting
1. Fit: Given data from all trials and schedules, can the
We implemented PPE as a Bayesian hierarchical model model account for acquisition and retention?
and fitted it using fully Bayesian inference. Figure 4 2. Schedule Prediction: Given data from all trials for some
expresses the model in graphical notation. We gave PPE schedules, can the model predict acquisition and reten-
each participant’s exact study history (i.e., the number tion for other schedules?
and timing of item presentations by session and schedule). 3. Temporal Prediction: Given data from some trials for
We collapsed across items by trial within each schedule, all schedules, can the model predict retention for other,
so the model received a history of 21 trials for each of later trials?
17 schedules and for all participants. This is represented

13
464 Memory & Cognition (2023) 51:455–472

Fig. 4  Graphical model for inferring rater accuracy using Predictive Performance Equation (PPE). Shaded boxes indicate the observable values

Table 4  Predictive Performance Equation (PPE) model fit and param- PPE captured four key effects from the experiment. First,
eter estimates PPE acquired items presented in one session more quickly
than items presented in multiple sessions. This occurred
Parameter μ σ
because items that appeared in two or more sessions decayed
m .06 .01 during the long intervals between sessions (Ti in Eq. 3,
b .01 .01 Appendix), and so level of performance dropped at the onset
τ .97 .03 of each new session. Second, memory performance for pre-
s .03 .02 viously studied items was somewhat lower at the start of a
new session than at the end of the previous session, but one
trial was often enough to restore the prior level of perfor-
Model fit mance. This reflects the strong recency effect of the previous
exposure on performance (Eqs. 3 and 4, Appendix). Third,
Table 4 shows the estimated means and standard deviations retention decreased with the elapsed time from final acquisi-
of the four hyper-parameter distributions from which the tion session to retention test. This arose from the power law
individual-level parameter estimates were drawn.9 After fit- of forgetting (T-d in Eq. 2, Appendix).
ting the model from the complete data, we used it to calcu- Fourth and finally, the number of sessions in which an
late the probability of responding correctly for all trials and item appeared, rather than the way repetitions were allocated
schedules. We aggregated these data across individuals, as among those sessions, primarily determined final retention
was done with the observations. The model fits are shown (Fig. 3, Predicted). For both the model and participants,
with the observations in Fig. 2. Across the 21 trials and retention accuracy was greatest for items that appeared in
17 study schedules, the correlation between group averaged five acquisition sessions (level, taper, and ramp), followed
data and PPE performance equaled .95 (root-mean-square by three sessions (short three), two sessions (short two), and
error = 5.80). one session (massed early and late).
The lags separating repetitions between acquisition ses-
sions were substantially longer than the lags separating
9 repetitions within sessions (lagj in Eq. 5, Appendix). In the
The Bayesian inference procedure returns draws (1,000) from the
parameters’ posterior distributions and the mean estimates are based model, the accumulation of such long lags greatly reduces
on the average of these draws per parameter. an item’s decay rate. Consequently, an item’s decay rate

13
Memory & Cognition (2023) 51:455–472 465

Fig. 5  Trial-level accuracy for two individuals on a subset of schedules formed by crossing four study schedules and two retention intervals

decreased with each new session that it appeared in. For Aggregating data across all participants potentially
example, average decay rates prior to retention tests were obscures individual differences. A further test of PPE
lowest for items that appeared in five acquisition sessions is whether it captures individual differences versus only
(level, taper, and ramp = -0.23), followed by items that accounting for average performance. Figure 5 shows data
appeared in three sessions (short three = -0.26), two ses- from two participants with very different performance pro-
sions (short two = -0.28), and one session (massed early and files. Although Fig. 5 only presents data from the subset of
late = -0.31). These differences in decay rates produced the schedules most central to the experiment, the basic findings
predicted differences in retention performance. hold across all schedules. S16 shows rapid learning, excel-
PPE did appear to overpredict retention for the massed lent retention between sessions, and excellent retention on
early schedule (Fig. 2, upper left panel). Observed accu- the final tests 12 and 36 h later. S28, on the other hand,
racy was lowest in this condition. Thus, the failure of the learns more gradually and has lower retention. The estimates
model to produce equivalent forgetting may reflect com- for decay intercept were lower for S16 than for S28 (inter-
pensation in parameter estimates to capture performance cept: .070 vs. .082), resulting in the observed performance
from the remaining 16 schedules. Alternatively, partici- differences.
pants may have encoded items in the massed early schedule To summarize, PPE accounted for acquisition, retention,
differently since all repetitions occurred in the very first and relearning across the 21 trials of the 17 separate study
acquisition session. schedules. These results held for group-averaged data and
As a test of the model’s flexibility, we generated 100 at the level of individuals. Most importantly, PPE accounted
permutations of study schedules and randomly assigned for the main effect of number of sessions on final retention.
the observed responses to these different permutations. We It did so in the following way. Multi-hour intervals from
then refitted the model for each permutation. The model’s the end of one session to the start of the next produce large
fit was better for the actual pairing of study schedules and values for lagj in Eq. 5 (Appendix). The accumulation of
observed responses than for any of the permutations. This such long lags for schedules that repeat in multiple sessions
suggests that the model is not overly flexible – it was unable reduced decay. Importantly, the decay term is based on the
to account as well for other patterns of results which, though logarithm of the lag. Although the longest lag between any
conceivable, were not observed. repetitions occurs in the schedule that repeats in only two

13
466 Memory & Cognition (2023) 51:455–472

Fig. 6  Prediction error for each schedule (rows) based on which schedule was used to estimate parameter values (columns)

sessions, the logarithm term limits the impact of exception- sessions, and long intervals between sessions. The asymmet-
ally long lags. Consequently, the number of long lags, rather ric success of predicting performance for massed schedules
than the duration of the longest lag, primarily determines using data from spaced schedules likely reflects the greater
decay and, subsequently, retention. heterogeneity of intervals present in the spaced schedules.
In other words, massed schedules do not provide sufficient
Model schedule predictions constraints to estimate model parameter values, leading to
overfitting and poor generalization.
We examined how well parameters estimated for each indi-
vidual and based on one schedule generalized to all other Model temporal predictions
schedules. Figure 6 shows the negative log-likelihood of the
observed responses for each schedule, aggregating across As a final test, we fit the model using data from the acqui-
trials and individuals. For example, in the first column, sition phase (i.e., the first 19 trials) and then used the fit-
parameters are estimated using the level spacing with long ted model to predict out-of-sample performance for each
RI schedule and model performance is assessed for each individual on the retention test (i.e., the final two trials). To
of the seventeen schedules using those estimates. Begin- improve prediction, we took advantage of the Bayesian hier-
ning along the main diagonal, model error was least when archical modeling approach and fit PPE using complete data
parameter estimation and model fit were based on the same from 35 participants to estimate the hyperparameter distribu-
schedule – this is data fitting rather than prediction. The tions, and set the hyperparameter distributions as the priors
red cells in Columns 6 through 9 reflect poor prediction, for the 36th participant. We then estimated parameter values
indicating that parameter estimates based on massed sched- for the 36th participant using data from the acquisition phase
ules generalize poorly to spaced schedules. Alternatively, only. We repeated this procedure for all participants.
the yellow cells in all other columns reflect good predic- Figure 7 shows observed retention, model fits (i.e.
tion, indicating that parameter estimates based on spaced parameter estimates using data from all trials), and model
schedules generalize well to spaced and massed schedules predictions (i.e. parameter estimates using data from the
alike. Most massed repetitions occur within one session first 19 trials). Across the 17 schedules, the correlation
and are separated by intervals of less than 2 min. Spaced between group averaged data and PPE performance equaled
items, in contrast, are separated by short intervals within .97 (root-mean-square error = 8.01). The correspondence

13
Memory & Cognition (2023) 51:455–472 467

Fig. 7  Observed retention (bars), and model retention based on fitting, prediction, hierarchical estimation, and schedule-specific hierarchical
estimation

between observed and predicted performance was high, accounted for these effects, and for the effects of 17 differ-
although PPE slightly overpredicted performance after the ent study schedules on knowledge acquisition, retention,
longest retention intervals and in the massed conditions. and relearning given a single set of parameter estimates
These results show that PPE can predict future perfor- per participant. This demonstrates the adequacy of PPE’s
mance, in part by pooling an individual’s data with infor- mechanisms to account for an extremely diverse set of
mation from other individuals. spacing-related effects. The study’s empirical and mod-
PPE is one of a class of computational cognitive models eling results contribute to the limited literature on multi-
that accounts for spacing-related effects (Walsh et al., 2018a, session spacing and learning and memory more generally.
b). A key empirical result from Walsh et al. (2018a) was
that two alternate models, the Pavlik and Anderson Model Empirical contributions and implications
and Search of Associative Memory (SAM) model, failed to
capture rapid relearning that occurs after long delays. Such Most studies of the spacing effect present a fixed number of
relearning was present between acquisition sessions and repetitions in one session (for a review, see Cepeda et al.,
before retention sessions in the current experiment, and we 2006). In contrast, relatively few studies present repetitions
expect that it would again present challenges for alternate in multiple sessions. Yet the few studies of multi-session
accounts. Future analyses should test this hypothesis. learning to date have demonstrated impressive retention
(Bahrick et al., 1993; Bahrick & Hall, 2005; Rawson et al.,
2018; Rawson & Dunlosky, 2013; Vaughn et al., 2016). For
Discussion example, Rawson et al. (2018) found that the advantage of
multi-session learning over single-session learning on one-
The goals of this experiment were to evaluate how the week retention ranged from 1.5 to 4 standard deviations.
allocation of a fixed number of practice repetitions among Students and educators are not entirely blind to the benefits
multiple sessions impacts learning and retention, and to of multi-session learning. In many real-world domains (e.g.,
evaluate a computational model of the spacing effect con- medical education, military training, athletics, and perform-
sidering those results. Our findings can be summarized ing arts) people do repeatedly rehearse knowledge and skills
simply. Repeatedly testing and restudying items during during each of multiple sessions that are distributed across
multiple sessions dramatically enhanced retention meas- time. Yet surprisingly little educational research focuses on
ured from 12 to 36 h later. The exact way repetitions were multi-session learning.
allocated among sessions – that is, whether they increased, The current experiment confirmed the advantage of multi-
decreased, or remained level – mattered less. The PPE session versus single-session learning. When repetitions

13
468 Memory & Cognition (2023) 51:455–472

were distributed across five sessions rather than massed Either of these methodological details may have minimized dif-
within one session, participants showed a 1.9 times improve- ferences related to the distribution of repetitions across sessions.
ment on retention after 12 h, and a 5.1 times improvement On the other hand, varying the number of practice repetitions
after 36 h. In addition to the relative advantage, the absolute across sessions may differ substantively from expanding inter-
level of performance following multi-session study remained session intervals or fading retrieval cues. Future experiments
high after 12 h (93%) and 36 h (78%). Thus, the benefits are needed to distinguish between these alternatives.
of multi-session learning in this experiment are not merely Rawson et al. (2018) found that the effects of multi-session
statistically significant, they are educationally meaningful. learning dominate how items are spaced within sessions, an out-
The current experiment goes beyond existing studies in come termed the “relearning override” (Rawson et al., 2018).
two ways. First, sessions were separated by hours rather The distribution of practice in our study is a combination of
than by days or weeks. We found that multi-session learn- three factors: the number of times an item is practiced in each
ing within a single day still produced a substantial benefit session; the number of sessions in which an item is practiced,
relative to single-session learning. Our results do not exclude and; the interval between sessions in which an item is practiced.
the possibility that including longer periods of time between Of these, number of sessions dwarfed the number of repetitions
sessions further enhances retention. However, they clearly per session and the interval between sessions, a finding that is
shows the benefits of giving multiple learning sessions consistent with the relearning override effect.
within a single day, which may be more feasible and cost Our experiment used retrieval-based practice with feedback.
effective than multi-day study in real-world scenarios. Although the spacing effect has been obtained in other experi-
Second, the current experiment examined how the distribu- ments using passive study, future experiments should replicate
tion of repetitions across sessions impacted retention. All pre- these findings using different forms of item presentation and
vious studies presented either a fixed number of repetitions per rehearsal. This would also provide a further test of PPE which
session, or a variable number of repetitions until participants does not, at present, distinguish between the manner of item
achieved some criterion level (e.g., one correct response per presentation.
item). We weakly expected that the tapered schedule, which In addition, the within-subject design, despite its obvious
included fewer repetitions in each subsequent session, would merits, introduces the possibility of inter-item contamina-
enhance retention relative to the level and ramped schedules, tion and interference effects. For example, the amount of
which included equal and increasing numbers of repetitions information presented during early learning sessions may
in each session. Concentrating repetitions early in study might interfere with items acquired during later learning sessions.
increase the strength of items in memory, reducing retrieval This could be addressed by replicating aspects of the current
failures. Gradually decreasing the number of repetitions experiment using a between-subject design, which would
across sessions, in turn, might produce a constant, desirable provide a further test of the model’s generalizability.
level of retrieval difficulty. This manipulation is conceptually
related to expanding the amount of time between sessions, Theoretical contributions and implications
which has sometimes been found to improve retention (Ger-
bier et al., 2015; Küpper-Tetzel et al., 2014; but see Carpenter PPE, which builds on the General Performance Equation
& DeLosh, 2005; Karpicke & Bauernschmidt, 2011; Karpicke and an ACT-R model of the spacing effect (Anderson &
& Roediger III, 2007). This manipulation is also conceptually Schunn, 2000; Pavlik & Anderson, 2005), fit the observed
related to diminishing the availability of retrieval cues across data and provides a grounded theory of the effects of multi-
sessions, which has been found to improve retention as well session learning on retention. In PPE, increasing the spacing
(Finley et al., 2011). Despite these conceptual similarities, between successive repetitions of an item reduces the item’s
however, we found that tapered, level, and ramped schedules decay rate. Successive repetitions can occur within one ses-
produced identical retention. sion, as in many earlier experiments, or within and between
On the one hand, the divergence between our results and the sessions, as in the current experiment. When items repeat
results from earlier studies may reflect methodological differ- between sessions, the multi-hour intervals from the end of
ences. In a review of experiments that varied session spacing, one session to the start of the next yield large values for lagj
Toppino et al. (2018) noted that expanding schedules most con- in Eq. 5. These are far longer than the seconds and minutes
sistently produced a benefit when initial acquisition involved separating repetitions within a session. The accumulation
few repetitions and study occurred without testing. Related, Fin- of such long lags greatly reduces decay rates and minimizes
ley et al. (2011) found that providing corrective feedback during forgetting of items that repeat across multiple sessions.
initial acquisition ameliorated the benefits of fading retrieval PPE attributes the greatest gains during multi-session learn-
cues. Participants in our study completed many practice trials ing to the first repetition of an item within a new session, which
before the retention test. Additionally, they attempted to retrieve follow the longest lags and are the most difficult. This explains
answers and received corrective feedback during practice trials. why PPE produced nearly equivalent retention for items in the

13
Memory & Cognition (2023) 51:455–472 469

level, tapered, and ramped schedules. All schedules spanned five were distributed across practice sessions, strongly affected
sessions and, therefore, all included the same number of tri- retention. PPE accounted for this finding in terms of the
als with multi-hour lags since an item’s previous presentation. effect of the long lags between sessions on an item’s decay
These dwarfed effects of other study variables. Schedules with rate.
less than five sessions included proportionately fewer trials with These conclusions have two educational implications.
multi-hour lags, and final retention was correspondingly lower First, to enhance long-term retention, item repetitions should
after those schedules. be spread across as many sessions as possible. The results
Do other theories of the spacing effect account for these from this study suggest that the exact number of repetitions
results? The deficient processing hypothesis, the contextual in each session matters far less than the total number of
variability hypothesis, and the study-phase retrieval hypoth- sessions, though further research is needed to evaluate the
esis predict that increasing spacing between repetitions will interaction of these factors in detail across broader ranges.
enhance retention. This is consistent with the finding that Second, computational cognitive models, once validated,
distributing repetitions across sessions separated by several can be used to prospectively simulate outcomes associated
hours enhanced retention. It is less clear what, if anything, with different training schedules when it would be infeasible
these hypotheses predict regarding the impact of different or unethical to do so with real students. In addition to unify-
distributions of repetitions across sessions. ing an otherwise disparate set of psychological phenomena,
The deficient processing hypothesis may predict an then, computational cognitive models like PPE can be used
advantage for the level schedule. According to this hypoth- to design and evaluate study and training schedules.
esis, spaced repetitions are more likely to be fully processed.
Building on that idea, participants may be more likely to
fully process all repetitions of an item in the level schedule Appendix
because the item repeats fewer times in each session. The
context variability hypothesis may predict an advantage for Supporting Results
the ramped schedule. According to this hypothesis, context
fluctuates over time. Presenting more repetitions at the end
of study may increase contextual overlap from study to test. Table 5  Coefficient estimates, standard errors, z statistics and p-val-
Finally, the study-phase retrieval hypothesis may predict ues in final statistical model of response accuracy during acquisition
an advantage for the tapered schedule. According to this Fixed effectsa B SE z p
hypothesis, the successful retrieval of an item strengthens
(Intercept) -0.124 0.139 -0.124 n.s.
its representation in memory. Presenting more repetitions
Trial 0.177 0.003 0.177 < .001
during early sessions and gradually reducing the number of
Massed Early 1.009 0.069 1.009 < .001
repetitions may effectively balance the needs for retrieval
Massed Late 1.721 0.082 1.721 <.001
success and retrieval difficulty. Based on these considera-
Short Two 0.201 0.062 0.201 < .01
tions, the results are not wholly consistent with any of the
Short Three 0.188 0.061 0.188 < .01
three accounts. However, a limitation of all three accounts
Ramp 0.007 0.060 0.007 n.s.
is that they are general. When applied to a novel design,
Taper 0.176 0.062 0.176 < .01
like the one considered here, their predictions are often
Long Two 0.249 0.077 0.249 < .01
underspecified.
Long Four -0.099 0.074 -0.099 n.s.
Long Ten -0.152 0.073 -0.152 < .05.
Conclusion

Most studies of knowledge acquisition and retention focus


exclusively on single-session learning conditions. The mem- Table 6  Coefficient estimates, standard errors, z statistics, and p-val-
ues in final statistical model of response accuracy during retention
ory-enhancing techniques examined in these studies, no mat-
ter how effective, are unlikely to produce durable long-term Fixed effectsa B SE z p
retention. In contrast, multi-session learning has been shown (Intercept) 0.930 0.203 4.600 < .001
to be highly effective technique (Bahrick et al., 1993; Bah- Trial 1.141 0.121 9.395 < .001
rick & Hall, 2005; Rawson et al., 2018; Rawson & Dunlo- Massed Early -3.225 0.199 16.174 < .001
sky, 2013; Vaughn et al., 2016). In the current experiment, Massed Late -2.164 0.186 11.635 < .001
we built on those findings by asking how a fixed number Ramp 0.043 0.202 0.213 n.s.
of item repetitions should be distributed among sessions to Taper 0.260 0.208 1.248 n.s.
maximize retention. We found that the number of sessions in Retention Interval | Long -1.141 0.121 9.395 < .001
which an item appeared, but not the way in which repetitions

13
470 Memory & Cognition (2023) 51:455–472

Model description 1
Pn = ).
(6)
(
𝜏−M
1 + exp s n
In PPE, the effects of practice and elapsed time on activation
(Mn) are multiplicative,
To summarize, PPE has a total of seven parameters: c, x,
Mn = N ∙ T .c −d
(2) b, m, τ, and s. Four (b, m, τ, and s) are estimated, and two (c
and x) are typically fixed based on earlier simulation results
N is the number of practice repetitions, T is the elapsed reported in Walsh, et al. (2018b).
time, c is the learning rate, and d is the decay rate.10 In
PPE, activation increases as a power function of amount of Acknowledgements The views expressed in this paper are those of the
authors and do not represent the official position of the United States
practice (Nc), and activation decreases as a power function Air Force or Department of Defense. This research was supported by
of elapsed time since practice (T-d). the Air Force Research Laboratory’s Warfighter Readiness Research
Elapsed time is calculated as the weighted sum of the Division. Data collection was supported by NIH grant CA167691.
time since each of the previous study opportunities for an We would like to thank Dr. Bella Veksler for implementing the Digit/
Droodle task software that was created for this experiment. We would
item, also like to thank Dr. Florian Sense for insightful reviews and com-
n ments on an earlier draft of this article.

T= wi ∙ ti . (3)
i=1 Code availability The code for the model is available from the first
author.
The weight assigned to each study opportunity decreases
exponentially with time, Authors' contributions MMW developed the model, analyzed the data,
and prepared the manuscript; MAK designed the study, analyzed the
t−x data and prepared the manuscript; TJ and GG designed the study and
wi = ∑n i −x . (4) prepared the manuscript; DAH and KAH gathered the data and pre-
t
j=1 j pared the manuscript.

The variable x controls the steepness of weighting and is Funding This research was supported by the Air Force Research Labo-
fixed to 0.6 in all reported simulations. ratory’s Warfighter Readiness Research Division. Data collection was
The variable d in Eq. 2 accounts for decay. Decay is supported by NIH grant CA167691.
calculated based on the history of lags between successive
Data availability The data and materials for all experiments are avail-
study opportunities of an item (lagj), able from the first author. The experiment was not preregistered.
( n−1
)
1 ∑ 1 Declarations
dn = b + m ∙ ∙ ( ) . (5)
n − 1 j=1 log lagj + e
Conflicts of interest/competing interests None of the authors have
The key innovation in PPE is the way spacing impacts conflicts of interest or competing interests.
decay rate. This is produced by Eq. 5. The quantity inside
Ethics approval This study received approvals from the Washington
the summation approaches zero when lags are long, reduc- State University and the Air Force Research Laboratory Institutional
ing decay rate.11 Conversely, the quantity inside the summa- Review Boards.
tion approaches one when lags are short, increasing decay
rate. Spaced practice extends the lags between successive Consent to participate Informed consent was obtained from all indi-
vidual participants included in the study.
item repetitions, thereby reducing an item’s decay rate and
increasing retention. The effects of training history are Consent for publication The authors affirm that human research par-
scaled by a decay slope parameter (m) and offset by a decay ticipants provided informed consent for publication of study results.
intercept parameter (b).
The probability of recalling an item (Pn) is a logistic func-
tion of its activation, References
Anderson, J. R. (1995). Learning and memory. Wiley.
Anderson, J. R., & Schunn, C. D. (2000). Implications of the ACT-R
learning theory: No magic bullets. In R. Glaser (Ed.), Advances in
instructional psychology: Educational design and cognitive sci-
10
In these and other reported simulations (Walsh et al., 2018a, b), ence (Vol. 5). Erlbaum.
learning rate (c) is fixed to 0.1. Bahrick, H. P. (1979). Maintenance of knowledge: Questions about
11 memory we forgot to ask. Journal of Experimental Psychology:
Lag prior to the first presentation of an item (lag1) is ∞. Conse-
General, 108(3), 296–308.
quently, decay after the first presentation of an item is b (Eq. 5).

13
Memory & Cognition (2023) 51:455–472 471

Bahrick, H. P., Bahrick, L. E., Bahrick, A. S., & Bahrick, P. E. (1993). Gerbier, E., & Koenig, O. (2012). Influence of multiple-day temporal
Maintenance of foreign language vocabulary and the spacing distribution of repetitions on memory: A comparison of uniform,
effect. Psychological Science, 4(5), 316–321. expanding, and contracting schedules. The Quarterly Journal of
Bahrick, H. P., & Hall, L. K. (2005). The importance of retrieval fail- Experimental Psychology, 65, 514–525.
ures to long-term retention: A metacognitive explanation of the Gerbier, E., Toppino, T. C., & Koenig, O. (2015). Optimising retention
spacing effect. Journal of Memory and Language, 52(4), 566–577. through multiple study opportunities over days: The benefit of
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fit- an expanding schedule of repetitions. Memory, 23(6), 943–954.
ting linear mixed-effects models using lme4. arXiv preprint Glenberg, A. M. (1979). Component-levels theory of the effects of
arXiv:1406.5823. spacing of repetitions on recall and recognition. Memory & Cog-
Benjamin, A. S., & Tullis, J. (2010). What makes distributed practice nition, 7, 95–112.
effective? Cognitive Psychology, 61, 228–247. Gray, W. D. (2008). Cognitive modeling for cognitive engineering. In
Bjork, R. A. (1994). Memory and metamemory considerations in the R. Sun (Ed.), The Cambridge handbook of computational psychol-
training of human beings. In J. Metcalfe & A. Shimamura (Eds.), ogy (pp. 565–588). Cambridge University Press.
Metacognition: Knowing about knowing (pp. 185–205). MIT Greene, R. L. (1989). Spacing effects in memory: Evidence for a two-
Press. process account. Journal of Experimental Psychology: Learning,
Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, Memory, and Cognition, 15, 371–377.
but in a good way: Creating desirable difficulties to enhance learn- Hintzman, D. L. (1974). Theoretical implications of the spacing effect.
ing. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. In R. L. Solso (Ed.), Theories in cognitive psychology: The Loyola
Pomerantz (Eds.), Psychology and the real world: Essays illus- symposium (pp. 77–97). Erlbaum.
trating fundamental contributions to society (pp. 56–64). Worth. Hintzman, D. L. (2004). Judgment of frequency vs. recognition confi-
Braun, K., & Rubin, D. C. (1998). The spacing effect depends on an dence: Repetition and recursive reminding. Memory and Cogni-
encoding deficit, retrieval, and time in working memory: Evidence tion, 32, 336–350.
from once-presented words. Memory, 6, 37–65. Jastrzembski, T. S., & Gluck, K. A. (2009). A formal comparison of
Bruce, D., & Bahrick, H. P. (1992). Perceptions of past research. Amer- model variants for performance prediction. Proceedings of the
ican Psychologist, 47(2), 319–328. International Conference on Cognitive Modeling, Manchester, UK.
Carpenter, S., Cepeda, N., Rohrer, D., Kang, S. K., & Pashler, H. Jost, A. (1897). Die Assoziationsfestigkeit in ihrer Abha¨ngigkeit von
(2012). Using spacing to enhance diverse forms of learning: der Verteilung der Wiederholungen [the strength of associations
Review of recent research and implications for instruction. Edu- in their dependence on the distribution of repetitions]. Zeitschrift
cational Psychology Review, 24, 369–378. fu¨r Psychologie und Physiologie der Sinnesorgane, 14, 436–472.
Carpenter, S. K., & Delosh, E. L. (2005). Application of the testing and Kapler, I. V., Weston, T., & Wiseheart, M. (2015). Spacing in a simu-
spacing effects to name learning. Applied Cognitive Psychology, lated undergraduate classroom: Long-term benefits for factual and
19, 619–636. higher-level learning. Learning and Instruction, 36, 38–45.
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Karpicke, J. D., & Bauernschmidt, A. (2011). Spaced retrieval: Abso-
Betancourt, M., & Riddell, A. (2017). Stan: A probabilistic pro- lute spacing enhances learning regardless of relative spacing.
gramming language. Journal of Statistical Software, 76(1). Journal of Experimental Psychology: Learning, Memory, and
Carpenter, S. K., Pashler, H., & Cepeda, N. J. (2009). Using tests to Cognition, 37, 1250–1257.
enhance 8th grade students' retention of US history facts. Applied Karpicke, J. D., & Roediger III, H. L. (2007). Expanding retrieval prac-
Cognitive Psychology, 23(6), 760–771. tice promotes short-term retention, but equally spaced retrieval
Cepeda, N. J., Coburn, N., Rohrer, D., Wixted, J. T., Mozer, M. C., & enhances long-term retention. Journal of Experimental Psychol-
Pashler, H. (2009). Optimizing distributed practice: Theoretical ogy: Learning, Memory, and Cognition, 33(4), 704–719.
analysis and practical implications. Experimental Psychology, Küpper-Tetzel, C. E., Kapler, I. V., & Wiseheart, M. (2014). Contract-
56(4), 236–246. ing, equal, and expanding learning schedules: The optimal distri-
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). bution of learning sessions depends on retention interval. Memory
Distributed practice in verbal recall tasks: A review and quantita- & Cognition, 42(5), 729–741.
tive synthesis. Psychological Bulletin, 132, 354–380. McClelland, J. L. (2009). The place of modeling in cognitive science.
Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Topics in Cognitive Science, 1(1), 11–38.
Spacing effects in learning: A temporal ridgeline of optimal reten- Mozer, M. C., Pashler, H., Cepeda, N., Lindsey, R., & Vul, E. (2009).
tion. Psychological Science, 19, 1095–1102. Predicting the optimal spacing of study: A multiscale context
Delaney, P. F., Verkoeijen, P. P., & Spirgel, A. (2010). Spacing and model of memory. In Y. Bengio, D. Schuurmans, J. Lafferty, C.
testing effects: A deeply critical, lengthy, and at times discursive K. I. Williams, & A. Culotta (Eds.), Advances in neural informa-
review of the literature. Psychology of Learning and Motivation, tion processing systems 22 (pp. 1321–1329). NIPS Foundation.
53, 63–147. Nishimoto, T., Ueda, T., Miyawaki, K., & Une, Y. (2010). A norma-
Dempster, F. N. (1989). Spacing effects and their implications for tive set of 98 pairs of nonsensical pictures (droodles). Behavior
theory and practice. Educational Psychology Review, 1, 309–330. Research Methods, 42(3), 685–691.
Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, Pashler, H., Bain, P. M., Bottge, B. A., Graesser, A., Koedinger, K.,
D. T. (2013). Improving students’ learning with effective learning McDaniel, M., & Metcalfe, J. (2007). Organizing instruction and
techniques promising directions from cognitive and educational study to improve student learning. IES practice guide. NCER
psychology. Psychological Science in the Public Interest, 14, 4–58. 2007-2004. National Center for Education Research.
Ebbinghaus, H. (1885/1964). Über das Gedachtnis: Untersuchungen Pavlik, P. I., & Anderson, J. R. (2005). Practice and forgetting effects
zur Experimentellen Psychologie. Duncker & Humblot. on vocabulary memory: An activation-based model of the spacing
Fiechter, J. L., & Benjamin, A. S. (2019). Techniques for scaffolding effect. Cognitive Science, 29, 559–586.
retrieval practice: The costs and benefits of adaptive versus dimin- Raaijmakers, J. G. W. (2003). Spacing and repetition effects in human mem-
ishing cues. Psychonomic Bulletin & Review, 26(5), 1666–1674. ory: Application of the SAM model. Cognitive Science, 27, 431–452.
Finley, J. R., Benjamin, A. S., Hays, M. J., Bjork, R. A., & Kornell, Rawson, K. A., & Dunlosky, J. (2013). Relearning attenuates the ben-
N. (2011). Benefits of accumulating versus diminishing cues in efits and costs of spacing. Journal of Experimental Psychology:
recall. Journal of Memory and Language, 64, 289–298. General, 142, 1113–1129.

13
472 Memory & Cognition (2023) 51:455–472

Rawson, K. A., Dunlosky, J., & Sciartelli, S. M. (2013). The power of Verkoeijen, P. P., Rikers, R. M., & Schmidt, H. G. (2004). Detrimental
successive relearning: Improving performance on course exams influence of contextual change on spacing effects in free recall.
and long-term retention. Educational Psychology Review, 25, Journal of Experimental Psychology: Learning, Memory, and
523–548. Cognition, 30, 796–800.
Rawson, K., Vaughn, K., Walsh, M., & Dunlosky, J. (2018). Investigat- Walsh, M. M., Gluck, K. A., Gunzelmann, G., Jastrzembski, T., Krus-
ing and explaining the effects of successive relearning on long- mark, M., Myung, J. I., Pitt, M. A., & Zhou, R. (2018a). Mecha-
term retention. Journal of Experimental Psychology: Applied, 24, nisms underlying the spacing effect in learning: A comparison of
57–71. three computational models. Journal of Experimental Psychology:
Roediger III, H. L., & Karpicke, J. D. (2006). Test-enhanced learning: General, 147, 1325–1348.
Taking memory tests improves long-term retention. Psychological Walsh, M. M., Gluck, K. A., Gunzelmann, G., Jastrzembski, T., &
Science, 17(3), 249–255. Krusmark, M. (2018b). Evaluating the theoretical adequacy and
Smolen, P., Zhang, Y., & Byrne, J. H. (2016). The right time to learn: applied potential of computational models of the spacing effect.
Mechanisms and optimization of spaced learning. Nature Reviews Cognitive Science, 42, 644–691.
Neuroscience, 17, 77–88.
Toppino, T. C., & Gerbier, E. (2014). About practice: Repetition, spac- Publisher’s note Springer Nature remains neutral with regard to
ing, and abstraction. The Psychology of Learning and Motivation, jurisdictional claims in published maps and institutional affiliations.
60, 113–189.
Toppino, T. C., Phelan, H. A., & Gerbier, E. (2018). Level of initial Springer Nature or its licensor holds exclusive rights to this article under
training moderates the effects of distributing practice over mul- a publishing agreement with the author(s) or other rightsholder(s);
tiple days with expanding, contracting, and uniform schedules: author self-archiving of the accepted manuscript version of this article
Evidence for study-phase retrieval. Memory & Cognition, 46(6), is solely governed by the terms of such publishing agreement and
969–978. applicable law.
Vaughn, K. E., Dunlosky, J., & Rawson, K. A. (2016). Effects of suc-
cessive relearning on recall: Does relearning override the effects
of initial learning criterion? Memory & Cognition, 1–13.

13

You might also like