Language Learning - 2022 - Kim - The Effects of Spaced Practice On Second Language Learning A Meta Ana
Language Learning - 2022 - Kim - The Effects of Spaced Practice On Second Language Learning A Meta Ana
Introduction
Massed practice involves studying the same items in succession without any
intervening time or items, whereas spaced practice involves studying items
This research received no specific grant from any funding agency. We would like to express our
gratitude to the journal editor, Emma Marsden, and the anonymous Language Learning reviewers
for their insightful feedback and suggestions at every stage of the manuscript writing and revising
process. We would like to thank the following researchers, who kindly provided additional infor-
mation necessary for the current meta-analysis: Irina Elgort, Emilie Gerbier, Sean Kang, Jeffrey
Karpicke, Tatsuya Nakata, Steven Pan, and Ulf Schuetze.
Correspondence concerning this article should be addressed to Su Kyung Kim, University of
Western Ontario, Faculty of Education, 1137 Western Road, London, ON N6G 1G7, Canada.
E-mail: [email protected]
The handling editor for this article was Emma Marsden.
verbal memory learning tasks such as picture naming, fact recall, and paired-
associate learning (e.g., first language [L1] word pairs, L2−L1 word pairs).
Although previous reviews included L2−L1 word pairs as a verbal memory
task, L2 learning studies were limited in number (no more than seven studies,
e.g., Cepeda et al., 2006) or not clearly mentioned (Donovan & Radosevich,
1999), and thus the effects of spaced practice on L2 learning are less clear.
There has been a great deal of research investigating the effects of spaced
practice on L2 learning, but the effects reported have been inconsistent. Re-
search has revealed that (a) spaced practice benefited learning and retention
of L2 vocabulary (e.g., Bloom & Shuell, 1981) and L2 grammar (e.g., Suzuki,
Yokosawa, & Aline, 2020); (b) spaced practice was as effective as massed prac-
tice on immediate posttests (Lee & Choe, 2014); (c) longer spacing was supe-
rior to shorter spacing on delayed posttests measuring L2 vocabulary (e.g.,
Pashler, Zarow, & Triplett, 2003) and L2 grammar learning (Rogers, 2015);
(d) shorter spacing contributed to greater learning than longer spacing on
delayed posttests (e.g., Küpper-Tetzel, Erdfelder, & Dickhäuser, 2014); (e)
shorter spacing was as effective as longer spacing on delayed posttests (e.g.,
Kasprowicz, Marsden, & Sephton, 2019); (f) equal spacing was more effective
than expanding spacing on delayed posttests (e.g., Çekiç & Bakla, 2019); and
(g) equal spacing was as effective as expanding spacing on delayed posttests
(e.g., Kang, Lindsey, Mozer, & Pashler, 2014).
Given the limited number of L2 studies included in previous reviews and
the inconsistent results obtained from L2 studies on spaced practice, research
aimed at clarifying findings is warranted. Compared to skill learning and
verbal memory, there are arguably even more individual differences (e.g.,
language aptitude, Kasprowicz et al., 2019) and contextual variables (e.g.,
teaching techniques, Rogers & Cheung, 2020a; multiple modes of L2 input,
Serrano & Huang, 2018; type of knowledge to be learned, Suzuki & DeKeyser,
2017a; task complexity, Suzuki et al., 2020) involved in L2 learning. Further-
more, there is abundant evidence of various instructional treatment benefits
(e.g., form-focused instruction, implicit inductive teaching) in L2 learning
(e.g., Norris & Ortega, 2000). It is, therefore, important to clarify the overall
effects of spacing and the different types of spacing on L2 learning in order
to provide pedagogical guidance, as well as to identify useful directions for
future research. In addition, because learner-related variables (e.g., prior L2
knowledge, Nakata & Suzuki, 2019b) and methodological features (e.g., feed-
back, Nakata, 2015a) were noted as reasons for the inconsistency in findings, it
is important to explore whether and to what extent the effect of spaced practice
is moderated by different variables across studies. The present study aims to
Background Literature
Theories of Spaced Practice Effects
Many theories of spaced practice effects have been proposed and examined.
First, spacing between learning opportunities makes learning more difficult,
but desirably so (desirable difficulty framework, e.g., Bjork, 1994; Suzuki,
Nakata, & DeKeyser, 2019). Second, forgetting occurring via spacing cre-
ates more effortful retrieval attempts, which strengthens retention (Bjork,
1975). Third, spacing between learning opportunities enhances subsequent re-
peated learning (consolidation, e.g., Wickelgren, 1972). Fourth, spacing be-
tween learning opportunities results in more attentional processing, but massed
learning results in less processing (deficient processing, e.g., Jacoby, 1978;
Koval, 2019). Fifth, reducing the accessibility of information in memory after
spacing enhances additional learning of that information (accessibility prin-
ciple, e.g., Bjork & Bjork, 1992). Sixth, spacing makes subsequent repeated
learning more distinctive, and the learning in different contexts is better re-
membered (contextual variability theory, e.g., Melton, 1970). Seventh, spacing
manipulated between retrievals (i.e., testing information from memory) pro-
duces benefits on long-term retention (study-phase retrieval, e.g., Toppino &
Bloom, 2002).
the effects of spaced practice, there is as yet no clear description of the ex-
tent to which spaced practice affects L2 learning. This is because they mainly
investigated the relationship between spacing intervals (the interval between
learning opportunities) and retention intervals, and there were few L2 studies
examined.
Uchihara, Webb, and Yanagisawa’s (2019) meta-analysis included spacing
as a moderator variable and found that frequency effects in L2 incidental vo-
cabulary learning (whereby the higher the number of encounters with a word,
the better the learning) were larger when words were encountered in massed
conditions (defined as within one session), r = .38, 95% CI [.31, .45], than
when words were encountered in spaced conditions (defined as learning across
multiple sessions), r = .23, 95% CI [.12, .34]. However, spacing was not exam-
ined as the sole construct, so a clear picture of spacing effects on L2 vocabulary
learning was not obtained.
Learning Target
Most L2 spaced practice studies have investigated L2 vocabulary learning
(e.g., Koval, 2020). Positive effects have also been demonstrated with L2 gram-
mar or morphology (e.g., Suzuki et al., 2020) and L2 pronunciation (e.g., Car-
penter & Mueller, 2013). However, acquisition of vocabulary and grammar
may occur through different processes (Pinker, 1998). For example, Ullman
(2015) reported that declarative memory may play different roles in lexical
and grammatical aspects of learning and processing. Pronunciation learning
is a different skill from vocabulary and grammar learning (Li & DeKeyser,
2019). Therefore, the effects of spaced practice may not be the same among
different domains (vocabulary, grammar, and pronunciation) of a L2.
Number of Sessions
Spaced practice studies involve spacing within a single session or between
multiple sessions. Most single-session studies manipulate item spacing (i.e.,
studying items separated by an interval of other items), and most multiple-
session studies manipulate time spacing (i.e., studying items separated by an
interval of time). It is also possible for multiple-session studies to manipu-
late item spacing (i.e., manipulating item spacing within each session). Spaced
practice benefits have been observed when manipulated within a single session
(e.g., Nakata & Suzuki, 2019b) as well as between multiple sessions (e.g., Li
& DeKeyser, 2019). However, it is not clear whether the number of sessions
affects outcomes. Therefore, it may be methodologically and pedagogically
valuable to see whether it influences learning through spaced practice.
Type of Practice
Spaced practice can involve repeated practice in studying materials (study
trials), retrieving information from memory (test trials), or a combination
of studying and retrieval (test–restudy or study–test trials; e.g., Roediger &
Karpicke, 2006). Several studies have revealed long-term retention benefits
of information relearned in spaced practice (e.g., Verkoeijen, Rikers, & Öz-
soy, 2008). Other studies found that repeatedly assessing information across
time promotes learning (e.g., Lawrence, 2013). This suggests that both spaced
restudy and retrieval practice are effective for learning and retention. However,
studies comparing repeated restudy practice (study trials) to repeated retrieval
followed by feedback across time (test–restudy trials) found that the best re-
tention occurred in the test–restudy trials (e.g., Butler & Roediger, 2007). L2
studies have found positive effects of retrieval relative to restudy on L2 vo-
cabulary learning and retention (e.g., Barcroft, 2007). None of these studies,
Activity Type
Research (e.g., Dunlosky, Rawson, Marsh, Nathan, & Willingham, 2013) has
found the benefits of spaced practice to be general across a range of mate-
rials, such as verbal materials (e.g., word pairs, facts), visual materials (e.g.,
pictures, videos), and educational materials (e.g., lectures, mathematical for-
mulas). However, not all tasks yield large benefits of spaced practice. Dono-
van and Radosevich (1999) found that there was a large spacing effect with
a low level of task complexity, d = 0.97, 95% CI [0.88, 1.06], but a small
effect with a high level of task complexity, d = 0.07, 95% CI [−0.05, 0.18].
Spaced practice for L2 learning has also been studied with a wide range of
activities: paired-associate tasks (e.g., Nakata, 2015a), listening and reading
activities for form–meaning mapping (e.g., Kasprowicz et al., 2019), judgment
tasks (e.g., Li & DeKeyser, 2019), oral description using pictures (e.g., Suzuki
et al., 2020), and exercises such as multiple-choice tasks, fill-in-the-blanks
tasks (e.g., Bloom & Shuell, 1981), and crossword puzzles (e.g., Rogers &
Cheung, 2020b). These activities are used to help L2 learners to comprehend
target items (e.g., multiple-choice tasks, reading texts, listening and identifying
the correct spoken forms of words) and to produce target items (e.g., picture
description, making sentences, pronouncing words). Donovan and Radosevich
(1999) coded foreign language tasks (L2–L1 word pairs) as representing an
average level of task complexity and found a small-to-medium effect of spac-
ing, d = 0.42, 95% CI [0.36, 0.48]. However, there might be a difference in the
level of difficulty that learners experience in comprehending versus producing
target items, and hence this may impact the magnitude of spacing effects.
Provision of Feedback
Studies have demonstrated that spacing effects may be influenced by the pro-
vision of feedback after retrieval (e.g., Roediger & Karpicke, 2006). Cepeda
et al. (2006) reported that feedback may be a variable that explains differences
between equal and expanding spacing; when feedback is provided, expanding
spacing benefits performance because feedback minimizes the chance of for-
getting an item (Pashler, Cepeda, Wixted, & Rohrer, 2005). However, Cepeda
et al. (2006) could not examine the effect of feedback because all three stud-
ies included in their meta-analysis for equal and expanding spacing provided
feedback. It would be useful to examine the effects of feedback because spaced
practice studies that have provided feedback have reported contrasting results.
For example, Kang et al. (2014) failed to find a positive effect for expanding
spacing with feedback relative to equal spacing with feedback, whereas Nakata
(2015a) found expanding spacing with feedback to be superior to equal spac-
ing with feedback. However, it should be noted that Nakata found a significant
effect of expanding spacing only on a posttest involving receptive recall (from
L2 to L1), with very small effect sizes, d = 0.12−0.19, 95% CI [−0.80, 0.53].
Furthermore, given that feedback to correct learners’ responses has generally
been found to be beneficial to L2 learning (e.g., Li, 2010), it would be interest-
ing to see whether the effect of spaced practice is moderated by feedback.
Feedback Timing
The timing of feedback may also moderate learning through spaced practice.
Some studies in cognitive psychology found that delayed feedback (e.g., feed-
back given after all responses) had a greater effect on learning than imme-
diate feedback (e.g., Butler, Karpicke, & Roediger, 2007), but others found
more benefit from immediate feedback (e.g., Brosvic, Epstein, Cook, & Dihoff,
2005). The superiority of delayed feedback can be explained by the fact that
delayed feedback results in more laborious learning circumstances, which fits
with the desirable difficulty framework (e.g., Bjork, 1994; Suzuki et al., 2019).
In contrast, because immediate feedback is generally provided after each re-
sponse, it is more likely to make learners fully process feedback after both
incorrect and correct responses (Butler & Roediger, 2007).
In L2 studies, Nakata (2015b) examined feedback timing (immediate and
delayed) in four different repeated retrieval practice conditions (one, three, five,
or seven retrievals). Sixteen English–Japanese word pairs were divided into
two sets of eight items. One set was assigned to the immediate feedback con-
dition, in which feedback was provided immediately after each response. The
other set was assigned to the delayed feedback condition, in which feedback
was provided after all eight items were performed. The interval between the
last encounter with a given item and the posttest was controlled. Nakata found
no main effect of feedback timing for L2 vocabulary learning on either recep-
tive (from L2 to L1) or productive (from L1 to L2) recall posttests. On the
1-week delayed posttest, he found a significant effect of the immediate feed-
back on only the receptive recall posttest, with a very small effect size, d =
0.14, 95% CI [0.03, 0.51]. However, because this study did not manipulate the
spaced learning conditions, the effect of feedback timing on spaced practice
for L2 vocabulary learning and retention remains unclear. Furthermore, there
has been no empirical research on L2 grammar or pronunciation learning that
has directly investigated the interaction between spacing and feedback timing.
Given that the impact of feedback on learning and memory has been endorsed
by the majority of investigations, it is useful to examine whether immediate or
delayed feedback is more conducive to L2 learning in more versus less spaced
conditions.
Frequency of Practice
Spaced practice studies have included different numbers of encounters with
target items, ranging from one or two (e.g., Pyc & Rawson, 2009) to 27 or 30
(e.g., Suzuki, 2017). Greater frequency of practice can provide learners with
more time to restudy or more attempts to retrieve. Maddox and Balota (2015)
found, in a L1 study using low associate word pairs (e.g., apple–evil), signifi-
cant increases in retrieval practice performance as the number of tests during
the training sessions increased from one to five in a shorter spacing condition,
whereas in a longer spacing condition retrieval practice performance increased
from the one-test to the three-test condition, but did not increase further in the
five-test condition. These findings may suggest that providing more practice
does not always lead to better performance or better retention. Nakata (2017)
looked at the role of retrieval frequency (one, three, five, or seven retrievals)
within a single session for L2 vocabulary learning. He found that five or seven
retrievals led to better performance than one or three retrievals on both im-
mediate and delayed posttests. To our knowledge, there is no L2 empirical re-
search investigating the relationship between spaced conditions and frequency
of practice.
Retention Interval
Spaced practice effects may depend on when knowledge is measured (Cepeda
et al., 2006; Cepeda, Vul, Rohrer, Wixted, & Pashler, 2008; Rohrer & Pash-
ler, 2007). Cepeda et al. (2006) found a positive relationship between spacing
intervals and retention intervals (RIs); the longer the spacing, the greater the
retention. Rohrer and Pashler (2007) reported that spacing effects depended
jointly on spacing intervals and RI, arguing that the learning outcomes of dif-
ferent types of spaced practice may be better or worse depending on when the
final test is taken. Cepeda et al. (2008) found that longer spacing produces
better retention than shorter spacing at long RIs, whereas shorter spacing out-
performed longer spacing at short RIs. These findings suggest that the length
of RI may have a considerable impact on the effects of spaced practice.
Method
Literature Search
First, we comprehensively searched 22 relevant journals of cognitive psychol-
ogy, applied psychology, applied linguistics, and second language acquisition
for different combinations of key words: spacing effect, massed, interleaving,
blocking, lag effect, shorter spacing, longer spacing, absolute spacing, relative
spacing, equal spacing, fixed spacing, uniform spacing, expanding spacing,
second language learning, and foreign language learning. We then employed
the following electronic databases in order to extend the search: Education
Resources Information Center, Linguistics and Language Behavior Abstracts,
PsycINFO, and Google Scholar. In addition, we searched references in review
articles (e.g., Cepeda et al., 2006) and in book chapters (e.g., Carpenter, 2017).
We set 1979 as the starting point because Bahrick’s study from that year is one
of the classic experiments on spaced practice (as observed by Dunlosky et al.,
2013), and because there were very few L2 empirical studies prior to 1979 (cf.
Crothers & Suppes, 1967, Experiments 8, 9, 10, and 11), and those that existed
did not report sufficient statistical information to calculate effect sizes. We set
July 2020 as the completion point for our data collection.
In order to minimize the “file-drawer” problem in research synthesis (the
fact that some studies remain in researchers’ files because of the publication
bias toward studies reporting significant findings; Rosenthal, 1979), we consid-
ered retrieving “fugitive” literature (e.g., unpublished papers, doctoral theses,
conference presentations). However, due to the difficulty involved in retriev-
ing those sources, we decided to include only doctoral theses that are carefully
designed and provide detailed statistical information. We used the electronic
database ProQuest Global Dissertations and Theses to search for doctoral the-
ses, employing the same key words as for published studies.
Inclusion Criteria
All reports that appeared initially eligible for the meta-analysis were then ex-
amined in reference to a set of inclusion criteria. To be included in the meta-
analysis, a study report had to meet all of the following criteria:
1. The study had to examine the effect of spaced practice on L2 learning.
We took L2 learning to include learning of L2 vocabulary such as sin-
gle words or collocations (Snoder, 2017), L2 grammatical structures (e.g.,
past perfect tense; Bird, 2010), L2 morphological features (e.g., Japanese
te-form of the verb; Suzuki & DeKeyser, 2017a), L2 pronunciation (e.g.,
Mandarin monosyllables such as ba with different tones; Li & DeKeyser,
2019), and orthographic and phonological nonsense items (e.g., Nakata &
Elgort, 2021).
2. The study had to feature a comparison of one type of practice with an-
other type of practice in order to examine the effects of spaced practice
(i.e., comparing spaced with massed practice, longer with shorter spacing,
or equal with expanding spacing). For example, Uchihara et al. (2019)
meta-analysis included massed and spaced conditions as a moderator to
examine frequency effects in L2 incidental vocabulary learning. However,
the studies included in their meta-analysis were not included in the cur-
rent meta-analysis because none of them qualified as a comparative study
examining the effects of spaced practice.
3. Studies comparing blocking to interleaving were included (Carpenter &
Mueller, 2013; Nakata & Suzuki, 2019b; Pan, Tajran, Lovelett, Osuna, &
Rickard, 2019; Suzuki et al., 2020). Blocking corresponds to massed prac-
tice or shorter spacing (not pure massed practice), whereas interleaving is
equivalent to spaced practice or longer spacing (see Appendix S2 in the
Supporting Information online for the category criteria).
4. The study had to provide clear spacing intervals. For example, we ex-
cluded the study by Lightbown and Spada (1994) that compared 18 hours
per week to 2 hours per week because it was not clear whether the time dis-
tribution was either shorter or longer, or equal or expanding. Additionally,
we excluded studies involving spaced practice with different criterion lev-
els via a dropout method (where items that were correctly retrieved during
a trial were removed from the to-be-practiced list in the subsequent trial),
because the number of test–restudy trials per item was variable between
participants (e.g., five-drop group, Pyc & Rawson, 2007).
5. The study had to control for participants’ preexisting knowledge of tar-
get items (vocabulary, grammatical features, and pronunciation rules).
The PRISMA flow diagram presented in Figure 1 depicts the study inclu-
sion criteria (Moher, Liberati, Tetzlaff, & Altman, 2009) and provides the num-
ber of included and excluded references. More detailed information is reported
in Appendix S1 in the online Supporting Information for this article.
Forty-eight experiments reported in 37 studies (N = 3,411) were selected
for this meta-analysis. The 48 experiments were then divided into three differ-
ent categories of spaced schedules: (a) spaced versus massed, (b) longer versus
shorter, and (3) equal versus expanding comparisons (see Appendix S2 in the
Supporting Information online for the category criteria). Each category was
meta-analyzed separately.
versus shorter category, and immediate and delayed effects in the equal versus
expanding category.
We included a total of nine moderator variables: one learner-related vari-
able (age) and eight methodology-related variables (learning target, number
of sessions, type of practice, activity type, provision of feedback, feedback
timing, frequency of practice, and RI) (see Appendix S3 in the Supporting In-
formation online for the coding scheme). The coding sheet with the data (Kim
& Webb, 2021) is publicly available at http://www.iris-database.org.
Age
Because a limited number of studies reported the age of participants (21 of
37 studies, 57%), age was initially categorized according to grade levels (e.g.,
Grade 3, Rogers & Cheung, 2020a). However, because some studies involved
participants with a wide range of grade levels (e.g., Grades 9−12, Bloom &
Shuell, 1981; Grades 3−8, Lotfolahi & Salehi, 2016) or involved adults rang-
ing from 20 to 63 years (Kang et al., 2014), which makes it difficult to deter-
mine the differential effects of spaced practice on learners of different grade
levels, this variable was coded as young learners (Grades 1−12) and adult
learners (university students or older).
Learning Target
This variable consists of three types of L2 items: vocabulary (both single words
and multiword items), grammar (including morphological structure), and pro-
nunciation (a monosyllabic item with different tones or pronunciation rules).
Number of Sessions
This variable was coded as single session and multiple sessions. Note that the
number of sessions includes only training sessions and does not include testing
(immediate or delayed posttest) sessions. For example, if a study used time
spacing (e.g., a 10- minute interval between trials) within one training session,
followed by testing sessions (e.g., one immediate and two delayed posttests),
the study is coded as single session.
Type of Practice
Practice includes two types of conditions: study trial and test trial. A study trial
refers to an opportunity to restudy the target items that participants learned or
studied. A test trial refers to an opportunity to recall or retrieve the target items
that participants learned or studied. Note that feedback provided after a test
trial can also be an opportunity to restudy the target items that participants
learned in the initial learning session. Type of practice was coded as being one
of five types: test–restudy (all) trial (testing, followed by restudying all target
items); test–restudy (not recalled) trial (testing, followed by restudying only
the items that were not recalled); study trial; test trial; and study–test trial (for
details, see Tables S4.2 and S4.3, Appendix S4, in the Supporting Information
online).
Activity Type
This variable was coded as one of: paired associate; comprehension activi-
ties; production activities; and combined activities that involved both compre-
hension and production activities. Paired-associate learning included learning
from word lists or word cards. As in the descriptions of activities reported in the
meta-analysis by Shintani, Li, and Ellis (2013), L2 activities other than paired-
associate learning were coded as comprehension or production activities. Ad-
ditionally, activities that involved both comprehension (e.g., multiple-choice
tasks) and production (e.g., fill-in-the-blanks tasks) were coded as combined
activities. Note that although a paired-associate task can involve either recep-
tive retrieval (comprehending the L1 meaning of a L2 word) or productive re-
trieval (producing the L2 word corresponding to a L1 word given), we consider
paired-associate tasks as a separate type of activity, distinct from comprehen-
sion, production, and combined activities (for details, see Tables S4.4 and S4.5,
Appendix S4, in the Supporting Information online).
Frequency of Practice
Frequency of practice was reported as the amount of repeated practice (ex-
cluding the initial presentation to learn target items). Thus, this is different
from the total number of sessions, which includes the presentation, practice,
and posttest sessions used in the treatment. For example, Nakata and Suzuki
(2019a) included two sessions: The first session consisted of the pretest, learn-
ing session (presentation followed by three test trials), and immediate posttest,
whereas the second session involved a delayed posttest. Frequency of practice
in this study is 3 and the total number of sessions is 2. Following Suzuki (2017),
when a study administered two posttests (immediate and delayed) and RI was
Retention Interval
RI was coded as the number of days between the last learning session and
the final posttest. In the current meta-analysis, six studies administered multi-
ple delayed posttests (Bird, 2010; Li & DeKeyser, 2019; Lotfolahi & Salehi,
2016; Schuetze, 2014; Suzuki, 2017; Suzuki & DeKeyser, 2017a). Suzuki
(2017) pointed out that the first delayed posttest could influence the reten-
tion of knowledge measured by the second delayed posttest. Hence, the first
delayed posttest was considered another retrieval practice in Suzuki’s (2017)
study. Following Suzuki (2017), if a study involved 7-day and 35-day delayed
posttests, the calculated RI is 28 days (RI of the last delayed posttest − RI of
the delayed posttest administered before the last delayed posttest; 35 days − 7
days = 28 days).2 It should be noted that this was the case only if the RI was
manipulated within participants.3
Data Analysis
We used Comprehensive Meta-Analysis (version 3.3) software (Borenstein,
Hedges, Higgins, & Rothstein, 2013) to calculate the overall effect sizes and
conduct analyses for nine moderator variables. In order to address the first re-
search question, we aggregated effect sizes from the studies included in the
spaced versus massed comparison to produce a weighted mean effect size. For
the second research question, we aggregated effect sizes from the studies in-
cluded in the longer versus shorter and equal versus expanding categories. To
aggregate effect sizes, we used a random-effects model (using the unrestricted
maximum likelihood method) so that variation in intervention effects across
studies was accommodated (Borenstein, Hedges, Higgins, & Rothstein, 2009).
A significant between-group Q value indicates a heterogeneous distribution
with a common effect size among identified samples and thus facilitates sub-
sequent moderator analyses. However, a nonsignificant Q value is not always
taken as assurance that the effects are consistent, because the Q statistic and
its p value only address the variability of the null hypothesis (Borenstein et al.,
2009). In the current meta-analysis, therefore, we also report I2 statistics (the
proportion of variation in effect sizes across studies), tau (the standard devia-
tion of true effects), and prediction interval (how widely the effect sizes vary
across studies), which are intended to quantify heterogeneity (the distribution
of effects; Borenstein, Higgins, Hedges, & Rothstein, 2017). For the last re-
search question, we conducted moderator analyses in all of the three categories
(spaced vs. massed; longer vs. shorter; and equal vs. expanding). A random-
effects meta-regression (using the unrestricted maximum likelihood method)
was performed for continuous variables (frequency of practice and RI). The
statistical significance is assessed if the p value of the data analysis is less than
the prespecified alpha of 0.05.
deviation units. In the longer versus shorter comparison, longer and shorter
spacing data were coded as treated and baseline data, respectively. In the equal
versus expanding comparison, equal and expanding spacing data were coded
as treated and baseline data, respectively.
From 48 experiments, we identified 26 effect sizes in the spaced versus
massed comparison, including 11 with immediate posttests and 15 with de-
layed posttests. In the longer versus shorter spacing comparison, we identi-
fied 49 effect sizes, including 17 with immediate posttests and 32 with de-
layed posttests. Finally, in the equal versus expanding comparison, we identi-
fied 23 effect sizes, including 7 with immediate posttests and 16 with delayed
posttests.
The detection of outliers was performed to ensure the robustness of the re-
sults, because the presence of studies with extreme effect sizes may have an
impact on the results. Following previous meta-analyses (e.g., that by Shintani
et al., 2013), the effect sizes contributed by the included studies were trans-
formed into z scores, and any value (regardless of whether it was positive or
negative) larger than 2.0 was removed from the analysis. Outlier detection was
performed repeatedly until there were no further outliers. One outlier was iden-
tified from the z-score examination (Lotfolahi & Salehi, 2017: z = 2.152).
Finally, we assessed publication bias in the current data sets. Because most
studies in this meta-analysis were published (35, alongside one contribution to
conference proceedings, Khoii & Abed, 2017, and one doctoral thesis, Koval,
2020), our meta-analysis is more likely to include statistically significant find-
ings than statistically nonsignificant findings (Lipsey & Wilson, 2001); there-
fore, a bias might influence the results of our meta-analysis. Results demon-
strated that publication bias is considered to be a potential threat to conclusions
drawn about the effects of spaced practice. The true magnitudes of effects of
spaced practice on L2 learning might be smaller than those reported in this
meta-analysis, though it is not known how much smaller and whether it would
affect all three categories (spaced vs. massed, longer vs. shorter, and equal vs.
expanding) of comparisons and all the moderator variables in the same way
(see Appendix S6 in the Supporting Information online for publication bias
analyses).
Results
To What Extent Does Spacing Affect Second Language Learning?
Results showed that spaced practice led to significant improvement in L2
learning and retention compared to massed practice (see Figures 2 and 3).
Spaced practice was significantly more effective than massed practice on the
Figure 2 Overall average effect size (indicated by a diamond) of spaced practice when
compared to massed practice, and effect sizes with 95% confidence intervals for each
study (dependent variable = immediate posttest scores, k = 11). Effect sizes are calcu-
lated as Hedges’s g.
Figure 3 Overall average effect size (indicated by a diamond) of spaced practice when
compared to massed practice, and effect sizes with 95% confidence intervals for each
study (dependent variable = delayed posttest scores, k = 15). Effect sizes are calculated
as Hedges’s g.
than sampling error (I2 = 81.72), and the standard deviation of true effects (tau)
was 0.631. We predict that the true effects would fall in the range of −0.93 to
2.09, and it would make sense to apply moderator analyses or meta-regression
to explain the variance (Borenstein et al., 2009).
A spacing effect was also found on the delayed posttests, g = 0.80, 95% CI
[0.44, 1.17], and the confidence interval values (which do not pass zero) sug-
gested that the size of the spacing effect in the long term could be considered
medium to large (Plonsky & Oswald, 2014), and large with reference to Cohen
(1988) and to Schäfer and Schwarz (2019). A significant Q test (Q = 79.83,
p < .001) and high value of I2 (82.46%) indicated that the observed variance
would remain among identified samples. Tau was 0.639, and the prediction in-
terval tells us that most effects would fall in the range of −0.64 to 2.24. This
justified subsequent moderator analyses or meta-regression.
Figure 4 Overall average effect size of longer spaced practice (treated) when compared
to shorter spaced practice (baseline), and effect sizes with 95% confidence intervals for
each study (dependent variable = immediate posttest scores, k = 17). Effect sizes are
calculated as Hedges’s g.
Figure 5 Overall average effect size of longer spaced practice (treated) when compared
to shorter spaced practice (baseline), and effect sizes with 95% confidence intervals for
each study (dependent variable = delayed posttest scores, k = 32). Effect sizes are
calculated as Hedges’s g.
g = 0.40, 95% CI [0.16, 0.64] (see Figure 5). The confidence interval values,
with the lower bound only just above zero, suggested that the size of longer
spacing effects in the long term could be considered small (Plonsky & Oswald,
2014), or small to medium with reference to Cohen (1988), but in the medium
range within the domain of psychology (Schäfer & Schwarz, 2019). Tau was
0.607, and the prediction interval was −0.87 to 1.67 for the delayed effects.
We would predict that the true effect sizes would fall in this wide range. A
significant Q value (Q = 163.63, p < .001) and I2 value of 81.05% justified
subsequent moderator analyses or meta-regression.
Results showed that equal spacing was as effective as expanding spacing
on both immediate posttests, g = 0.15, 95% CI [−0.07, 0.37], and delayed
posttests, g = −0.15, 95% CI [−0.33, 0.03]; the confidence intervals crossed
zero (see Figures 6 and 7). I2 values in this comparison were zero on the im-
mediate posttests and 27.19% on the delayed posttests; a value near zero sug-
gested that almost no observed variance remained, thus no subsequent moder-
ator analysis for the immediate effects is reported; and the value on the delayed
posttests indicated that there was a small part (I2 = 27.19%) of an observed
dispersion. Tau was 0.188, and the prediction interval was −0.60 to 0.30.
Figure 6 Overall average effect size of equal spaced practice (treated) when compared
to expanding spaced practice (baseline), and effect sizes with 95% confidence intervals
for each study (dependent variable = immediate posttest scores, k = 7). Effect sizes are
calculated as Hedges’s g.
Figure 7 Overall average effect size of equal spaced practice (treated) when compared
to expanding spaced practice (baseline), and effect sizes with 95% confidence intervals
for each study (dependent variable = delayed posttest scores, k = 16). Effect sizes are
calculated as Hedges’s g.
Supporting Information online) showed that the bias is negligible. In the subset
of effects from immediate posttests from the equal versus expanding compari-
son, I2 and tau were zero, indicating that estimates of p-uniform should be ex-
amined. P-uniform enables testing of the extent of heterogeneity and considers
the statistical significance of effect sizes (van Aert, Wicherts, & van Assen,
2016). However, the results of both p-uniform and the random-effects model
were similar (very small effects with confidence intervals that crossed zero),
which led to the conclusion that random-effects meta-analysis results may be
interpreted as the standard meta-analytic estimates. Because most studies in-
cluded in the current meta-analysis were published studies (published studies
= 35, contribution to conference proceedings = 1, and PhD thesis = 1), a sym-
metrical distribution may not rule out publication bias. Therefore, the overall
effects of spaced practice on L2 learning from the current meta-analysis should
be interpreted with caution.
Age
Spacing promoted better learning when it involved adult learners, g = 0.66,
95% CI [0.13, 1.20], than when it involved young learners, g = 0.39, 95% CI
[−0.44, 1.22]. However, in the long term, the effects were larger with young
learners, g = 0.97, 95% CI [0.11, 1.82], than with adult learners, g = 0.77,
95% CI [0.36, 1.18]. Longer spacing significantly led to better retention than
shorter spacing when it involved adult learners, g = 0.54, 95% CI [0.27, 0.81].
95% CI Q tests
Variables k g Variance LL UL pa Q pa
Age
Spaced vs. massed 0.30 .58
Young 3 0.39 0.03 −0.44 1.22 .36
Adult 8 0.66 0.10 0.13 1.20 .01
Longer vs. shorter 0.45 .50
Young 3 −0.03 0.04 −0.42 0.37 .89
Adult 14 −0.19 0.02 −0.44 0.06 .14
Equal vs. expanding 2.18 .14
Young 2 0.35 0.09 0.01 0.69 .05
Adult 5 0.01 0.02 −0.29 0.30 .96
Learning target
Spaced vs. massed 1.71 .19
Vocabulary 8 0.76 0.08 0.26 1.25 .00
Grammar 3 0.14 0.08 −0.64 0.92 .72
Longer vs. shorter 15.59 .00
Vocabulary 9 0.14 0.02 −0.11 0.38 .28
Grammar 4 −0.41 0.02 −0.70 −0.13 .01
Pronunciation 4 −0.64 0.03 −0.98 −0.30 .00
Number of sessions
Spaced vs. massed 5.86 .02
Single session 6 1.04 0.01 0.49 1.59 .00
Multiple sessions 5 0.04 0.06 −0.55 0.63 .88
Longer vs. shorter 0.78 .38
Single session 10 −0.08 0.03 −0.40 0.23 .60
Multiple sessions 7 −0.27 0.02 −0.52 −0.01 .04
Equal vs. expanding 0.25 .62
Single session 4 0.07 0.03 −0.29 0.44 .70
Multiple sessions 3 0.19 0.06 −0.12 0.51 .23
Type of practice
Spaced vs. massed 1.34 .72
Test–restudy (all) 6 0.69 0.13 0.05 1.34 .04
Test–restudy (no recalled) 2 0.48 0.05 −0.06 1.55 .39
Study trial 2 0.81 0.45 −0.34 1.97 .17
Longer vs. shorter 11.74 .01
Test–restudy (all) 6 0.22 0.02 −0.08 0.51 .16
Test–restudy (no recalled) 3 −0.54 0.03 −0.89 −0.18 .00
(Continued)
Table 1 (Continued)
95% CI Q tests
Variables k g Variance LL UL pa Q pa
Learning Target
Spacing led to better learning and retention when it involved L2 vocabulary,
g = 0.76−1.15, 95% CI [0.26, 1.49], than when it involved L2 grammar,
g = 0.11−0.14, 95% CI [−0.64, 0.92]. However, the confidence intervals for
L2 grammar learning crossed zero, suggesting that the spacing effects could
be statistically unstable when learning involves L2 grammar. Shorter spacing
was significantly more effective for the immediate learning of L2 pronuncia-
tion, g = −0.64, 95% CI [−1.06, −0.21] (not passing through zero), and of
grammar, g = −0.41, 95% CI [−0.70, −0.13] (not passing through zero), but
95% CI Q tests
a a
Variables k g Variance LL UL p Q p
Age
Spaced vs. massed 0.16 .69
Young 3 0.97 0.25 0.11 1.82 .03
Adult 12 0.77 0.04 0.36 1.18 .00
Longer vs. shorter 4.35 .04
Young 8 −0.04 0.03 −0.52 0.44 .86
Adult 24 0.54 0.02 0.27 0.81 .00
Learning target
Spaced vs. massed 13.78 .00
Vocabulary 10 1.15 0.04 0.81 1.49 .00
Grammar 5 0.11 0.03 −0.32 0.54 .61
Longer vs. shorter 0.54 .76
Vocabulary 22 0.34 0.02 0.04 0.64 .03
Grammar 8 0.56 0.07 0.06 1.06 .03
Pronunciation 2 0.42 0.06 −0.57 1.42 .41
Number of sessions
Spaced vs. massed 1.91 .17
Single session 9 0.61 0.05 0.16 1.05 .01
Multiple sessions 6 1.12 0.10 0.55 1.69 .00
Longer vs. shorter 6.83 .01
Single session 11 0.76 0.04 0.42 1.11 .00
Multiple sessions 21 0.18 0.02 −0.10 0.45 .21
Equal vs. expanding 0.68 .41
Single session 6 −0.04 0.02 −0.35 0.28 .81
Multiple sessions 10 −0.20 0.02 −0.42 0.02 .08
Type of practice
Spaced vs. massed 3.35 .34
Test–restudy (all) 10 0.70 0.05 0.25 1.14 .00
Test–restudy (no recalled) 2 1.73 0.65 0.67 2.79 .00
Study trial 2 0.69 0.43 −0.36 1.73 .20
Longer vs. shorter 15.86 .00
Test–restudy (all) 16 0.38 0.02 0.10 0.67 .01
Test–restudy (no recalled) 6 1.06 0.09 0.61 1.50 .00
Study trial 6 −0.12 0.06 −0.62 0.38 .64
Study–test trial 3 0.40 0.06 −0.23 1.03 .22
Equal vs. expanding 15.33 .00
Test–restudy (all) 8 −0.32 0.01 −0.54 −0.10 .00
(Continued)
295 Language Learning 72:1, March 2022, pp. 269–319
14679922, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/lang.12479 by Charité - Universitaetsmedizin, Wiley Online Library on [03/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Kim and Webb Spacing and Second Language Learning
Table 2 (Continued)
95% CI Q tests
a a
Variables k g Variance LL UL p Q p
Number of Sessions
We found a significantly large benefit of spacing on improving immediate L2
performance when it involved a single session, g = 1.04, 95% CI [0.49, 1.59].
However, better retention occurred when it involved multiple sessions, g =
1.12, 95% CI [0.55, 1.69], than when it involved a single session, g = 0.61,
95% CI [0.16, 1.05]. Longer spacing significantly promoted greater retention
than shorter spacing when it involved a single session, g = 0.76, 95% CI [0.42,
1.11]. However, when it involved multiple sessions, longer spacing was as ef-
fective as shorter spacing. Small effects of expanding spacing for retention
were found when it involved a single session, g = −0.04, 95% CI [−0.35,
0.28], and multiple sessions, g = −0.20, 95% CI [−0.42, 0.02], but the effects
were not statistically reliable.
Type of Practice
Spaced practice promoted better learning and retention when it involved a test–
restudy trial (g = 0.48−1.73, 95% CI [−0.06, 2.79], than when it involved
a study-only trial, g = 0.69−0.81, 95% CI [−0.36, 1.97]. However, because
the sample size for study-only trial was small (k = 2), the smaller effect with
a study-only trial should be interpreted with caution. Longer spacing signif-
icantly led to greater retention than shorter spacing when it involved a test–
restudy trial, g = 0.38−1.06, 95% CI [0.10, 1.50], but longer spacing was as
effective as shorter spacing when it involved a study trial or study–test trial.
Expanding spacing led to greater retention when it involved a test–restudy trial
than when it involved a study trial or test trial. Although the confidence inter-
vals for the test–restudy trial showed statistically reliable effects of expanding
spacing, the findings from the equal versus expanding comparison should be
interpreted with caution because of small samples (k = 2 for study trial, k = 2
for test trial).
Activity Type
Spacing promoted better learning on immediate posttests when it in-
volved comprehension activities, g = 0.97, 95% CI [0.04, 1.91], than
when it involved other activities, g = 0.07−0.68, 95% CI [−0.85, 1.78].
Provision of Feedback
Spaced practice relative to massed practice improved immediate L2 perfor-
mance more when feedback was provided, g = 0.69, 95% CI [0.15, 1.23], than
when feedback was not provided, g = 0.02, 95% CI [−0.99, 1.02]. However,
spacing enhanced retention regardless of whether feedback was provided or
not. The effect when there was an absence of feedback should be interpreted
with caution due to small samples (k = 2 at immediate posttests and k = 3 at
delayed posttests in the spaced vs. massed comparison). Longer spacing pro-
duced better retention at delayed posttests when feedback was provided, g =
0.55, 95% CI [0.27, 0.82], than when feedback was not provided, g = 0.24,
95% CI [−0.41, 0.89]. The confidence intervals (95% CI [0.27, 0.82]) for the
presence of feedback did not include zero, suggesting that larger spacing be-
tween feedback and the subsequent trial promotes better retention. Feedback
did not have an impact on the comparative effectiveness of equal and expand-
ing spacing.
Feedback Timing
Spacing led to greater retention when feedback was provided with a delay,
g = 2.35, 95% CI [1.36, 3.34], than when feedback was immediately provided,
g = 0.52, 95% CI [0.10, 0.94]. However, the extreme effect size should be
interpreted with caution due to the small samples (k = 2 for delayed feedback).
Longer spacing led to significantly better retention when delayed feedback
was provided, g = 0.87, 95% CI [0.41, 1.34], than when immediate feedback
was provided, g = 0.39, 95% CI [0.08, 0.71]. An extremely small to negligible
effect in favor of equal spacing was found when immediate feedback was
provided, g = 0.04, 95% CI [−0.44, 0.52], and a small effect was found in
favor of expanding spacing when delayed feedback was provided, g = −0.36,
95% CI [−0.94, 0.22]. However, for both these effects the confidence intervals
crossed zero indicating that these differential effects between equal and
expanding spacing regarding feedback timing are unlikely to be statistically
reliable.
Frequency of Practice
The random-effects meta-regression analyses showed a positive relationship
between frequency of practice and the immediate effects (i.e., the greater the
frequency of practice, the larger the spacing effects relative to massed practice
on immediate learning), but a negative relationship with the delayed effects
(i.e., the greater the frequency of practice, the smaller the spacing effects rel-
ative to massed practice in the long term). A negative relationship between
frequency of practice and effect sizes was found in the longer versus shorter
comparison (i.e., the greater the frequency of practice, the larger the effects for
shorter spacing). A negative relationship was also found in the equal versus
expanding comparison (i.e., the greater the frequency of practice, the larger
the expanding spacing effects). However, the effects of frequency of practice
in the three comparisons (spaced vs. massed, longer vs. shorter, and equal vs.
expanding) were small to negligible (not statistically significant).
Retention Interval
The random-effects meta-regression analyses showed a positive, albeit small
and negligible (not statistically significant), relationship between RI and ef-
fect sizes in the spaced versus massed comparison (i.e., the longer the RI, the
greater the spacing effects relative to massed practice). In the longer versus
shorter comparison, the analyses indicated that the longer the RI, the greater
the shorter spacing effects, however, the relationship was negligible (not sta-
tistically significant). In the equal versus expanding comparison, the results
showed a significant negative relationship, indicating that the longer the RI,
the larger the effects of expanding spacing schedules.
Discussion
The analyses of comparative effects indicated that spaced practice was signifi-
cantly more effective for L2 learning (g = 0.58) and retention (g = 0.80) than
massed practice. It is notable that spaced practice can lead to better imme-
diate gains than massed practice. The benefits of massed learning have been
demonstrated at extremely short RIs (2 or 4 seconds, e.g., Peterson, Saltzman,
Hillner, & Land, 1962). Our finding contrasts with results obtained by Peterson
et al. (1962) and suggests that spaced practice is a more effective strategy than
massed practice to enhance learners’ L2 performance immediately. Our find-
ing is consistent with previous meta-analyses (Cepeda et al., 2006; Donovan &
Radosevich, 1999). Donovan and Radosevich (1999) found a mean weighted
effect size of 0.45, 95% CI [0.41, 0.50], for immediate learning and 0.51, 95%
CI [0.39, 0.64], for retention, indicating that spaced practice was significantly
more beneficial than massed practice for both immediate learning and reten-
tion. Cepeda et al. (2006) found positive effects of spaced practice at short
RIs ranging from 1 second to less than 1 day (averaged percentage correct on
the final test: 38.5% for massed practice, 47.6% for spaced practice) and at
longer RIs ranging from 1 day to more than 31 days (28.5% for massed prac-
tice, 47.4% for spaced practice). It is also important to note that the effects of
spacing are considered smaller than those of certain types of L2 instruction
(e.g., form-focused or implicit instruction). Norris and Ortega (2000) meta-
analyzed the effectiveness of L2 instruction (i.e., focus on form explicit and
implicit treatments, and focus on forms explicit and implicit treatments) and
found a large effect of all instructional treatments, d = 0.96, 95% CI [0.78,
1.14]. Although the benefits of spaced practice on L2 learning found in the
current meta-analysis were smaller (g = 0.58 to 0.80) than the effects of other
types of L2 instruction (e.g., focus on form explicit, focus on forms explicit)
found by Norris and Ortega, spacing can still be considered to be useful for L2
learning.
The analyses indicated that both shorter and longer spacing have initial
benefits, whereas longer spacing has a greater effect on durable learning.
Cepeda et al. (2006) also found a pattern with the greatest increases in reten-
tion at longer spacing. Consistent with the desirable difficulty framework (e.g.,
Bjork, 1994), better retention occurred under difficult conditions, such as after
longer spacing as opposed to shorter spacing. The overall magnitude of the
longer spacing effect (g = 0.40) from our findings is small to medium, in spite
of a number of previous memory studies (e.g., Cepeda et al., 2005) that have
demonstrated the benefits of longer spacing in the long term. This might be
because some inconsistency was shown regarding the effects of shorter and
longer spacing on L2 learning, suggesting that other variables affecting the
benefits of one type of practice over another could be observable in instructed
L2 learning.
The analyses also revealed that there were no significant differences
between equal and expanding spacing in either the immediate or the delayed
posttests. It should be noted that only a small number of studies included
learning relative to shorter spacing (g = −0.64). However, given that our study
sample size was small (k = 4), there would be value in further exploring the
effects of spacing on L2 pronunciation learning.
Longer spacing promoted better retention for L2 grammar than shorter
spacing. One explanation is that learners’ comprehension can be impaired
by shorter spacing between presentations of different (but related) types of
grammatical rules, leading to undesirable difficulties (Metcalfe, 2011). How-
ever, learners may devote more attention or processing effort to longer spaced
conditions (Jacoby, 1978). Interleaving can benefit the retention of grammati-
cal features (e.g., Nakata & Suzuki, 2019b). Interleaved practice requires that
learners repeatedly switch between different kinds of intervening tasks for
the target features, which improves discriminability (Taylor & Rohrer, 2010).
However, given that the number of blocked and interleaved practice studies
on grammar learning was small (Nakata & Suzuki, 2019b; Pan et al., 2019;
Suzuki et al., 2020), researchers should be cautious in interpreting the effects
of blocking and interleaving for L2 grammar learning. Shintani et al. (2013)
found large effects of comprehension-based instruction (e.g., error identifica-
tion) on receptive knowledge of L2 grammar, d = 1.09, 95% CI [0.64, 1.55],
and small effects of production-based instruction (e.g., translation) on produc-
tive knowledge, d = −0.21, 95% CI [−0.39, −0.02]. Shintani’s (2015) meta-
analysis revealed very large effects of processing instruction (e.g., structured
input activities) on receptive knowledge, d = 2.60, 95% CI [2.19, 3.00], and
productive knowledge, d = 2.03, 95% CI [1.65, 2.41], of L2 grammar. We
found a small-to-medium effect of spaced practice for L2 grammar learning
(g = 0.56 for overall effect; g = 0.88 for receptive knowledge, g = 0.42 for
productive knowledge), which is smaller than that found by Shintani (2015)
for comprehension-based and processing instruction but larger than the effect
Shintani found for production-based instruction (for details, see Table S7.2,
Appendix S7, in the Supporting Information online).
Third, spacing manipulated within one session promoted better immedi-
ate L2 performance than spacing manipulated between sessions, but spacing
manipulated between sessions led to better retention than spacing manipu-
lated within one session. Because within-session spacing inevitably involves
shorter spacing than between-session spacing, spaced practice within a single
session may support higher levels of retrieval success at immediate posttests
than spaced practice between multiple sessions. The effects of between-session
spacing on long-term retention support the distributed practice effect (e.g.,
Bahrick, Bahrick, Bahrick, & Bahrick, 1993), suggesting that longer spacing
(time intervals between multiple sessions are relatively longer than intervals
principle (e.g., Jacoby, 1978), the gradual expansion of spacing between learn-
ing opportunities can lead to greater contextual variation and serve to decrease
the accessibility of a target item but increase reprocessing of the item in spaced
repetitions. Overall, our findings suggest that the timing of the final posttest
and gradual expansion of the spacing interval between learning opportunities
(rather than the timing of the initial retrieval attempt) may have a profound ef-
fect on spaced practice. However, as only two studies controlled for the initial
retrieval attempt, more research is warranted to test this interpretation.
It is pertinent to mention that some of the results of the moderator analyses
(age, learning target, activity type, feedback timing) as interpreted above were
not statistically significant due to small study sample sizes. However, tentative
explanations were offered because the findings could be noteworthy, and we
hope that these explanations will provide some direction for future research
initiatives.
We turn now to the pedagogical implications of our findings. There are
many such implications for both young and adult L2 learners. First, teachers
may need to revisit target words over spaced time intervals. However, the analy-
ses indicated that it might be useful to space the learning of pronunciation rules
with shorter rather than longer spacing, specifically when the rules are not eas-
ily distinguished from each other. This may allow students the time needed to
recognize the patterns and fully comprehend the rules. Second, teachers may
need to revisit target words across a single session. For better retention, teach-
ers could use longer spacing within a single session and/or, for likely even
larger benefits, (also) space items over multiple days. Third, teachers may need
to intersperse spaced retrieval (i.e., tests) with some kind of restudying prac-
tice. For example, teachers could revisit target words that had not been cor-
rectly recalled by students when tested or could provide feedback with a delay
(e.g., feedback given after testing all items). Furthermore, there could be some
value in spaced learning with comprehension activities (e.g., reading sentences
or listening to words, followed by comprehension questions), but teachers may
need to make sure that the activities are desirably challenging for students and
that there is sufficient study (or presentation) time to help students fully com-
prehend target items or features (e.g., Hausman & Kornell, 2014).
learning that (a) involves young learners, (b) targets L2 grammar and pronun-
ciation learning, (c) includes production activities, (d) includes delayed feed-
back, and (e) measures productive knowledge. Moreover, there is a need for
clearer reporting of participants’ L2 proficiency (as also observed in the syn-
thesis by Park, Solon, Dehghan-Chaleshtori, & Ghanbar, 2021), which could
help teachers to understand how learner differences may interact with the ef-
fects of spaced practice. Although learners may be learning through the same
activities across and within courses, their L2 proficiency (and aptitude) will
vary. Differential effects of spacing might be expected for learners of one pro-
ficiency level as compared to learners of a different proficiency level in the
same learning condition (see Serrano, 2011). Finally, we were not able to rule
out publication bias in the current meta-analysis. Therefore, the overall effects
of spaced practice on L2 learning from the current synthesis should be inter-
preted with caution.
Conclusion
This meta-analysis revealed that although the spacing effect was robust, the
size was in the range of small to medium (g = 0.58) for immediate effects (i.e.,
immediately after the last training session) and medium to large (g = 0.80)
for delayed effects (i.e., a delay of one day or greater following the treatment).
It also revealed that longer spacing was more effective than shorter spacing
for long-term retention (small-to-medium effect, g = 0.40), but that learning
gains were not significantly different between the equal and expanding spacing
conditions. Some of the differences between the effects of different spacing
conditions were explained by particular variables (e.g., learning target, number
of sessions).
Notes
1 An anonymous reviewer pointed out that there were some studies (k = 12) that
involved different types of posttests (e.g., receptive and productive) administered as
immediate posttests, and that in such cases each different type of posttest could be
considered as a separate learning session when coding the frequency of practice. To
examine whether this affected the results, we did further analyses. We coded
multiple types of posttests as one learning session and also, separately, we coded
multiple types of posttests as separate learning sessions. We did the analyses in both
ways, and the results showed no difference (see Appendix S9 in the Supporting
Information online for details).
This article has earned Open Data and Open Materials badges for making pub-
licly available the digitally-shareable data and the components of the research
methods needed to reproduce the reported procedure and results. All data and
materials that the authors have used and have the right to share are available
at http://www.iris-database.org. All proprietary materials have been precisely
identified in the manuscript.
References
Note. The full reference list of the studies included in the meta-analysis is available in
Appendix S10 in the Supporting Information online.
Avery, N., & Marsden, E. J. (2019). A meta-analysis of sensitivity to grammatical
information during self-paced reading: Towards a framework of reference for
reading time effect sizes. Studies in Second Language Acquisition, 41(5),
1055–1087. Retrieved from https://doi.org/10.1017/S0272263119000196
Baddeley, A. (1999). Human memory: Theory and practice (rev. ed.). East Sussex:
Psychology Press. Retrieved from https://doi.org/10.1016/s0145-2134(00)00166-6
Baddeley, A., Eysenck, M. W., & Anderson, M. C. (2015). Memory (2nd ed). New
York: Psychology Press. Retrieved from https://doi.org/10.4324/9781315749860
Baddeley, A. D., Thomson, N., & Buchanan, M. (1975). Word length and the structure
of short-term memory. Journal of Verbal Learning and Verbal Behavior, 14,
575–589. Retrieved from https://doi.org/10.1016/S0022-5371(75)80045-4
Bahrick, H. P. (1979). Maintenance of knowledge: Questions about memory we forgot
to ask. Journal of Experimental Psychology: General, 108(3), 296–308. Retrieved
from https://doi.org/10.1037/0096-3445.108.3.296
Bahrick, H. P., Bahrick, L. E., Bahrick, A. S., & Bahrick, P. E. (1993). Maintenance of
foreign language vocabulary and the spacing effect. Psychological Science, 4(5),
316–321. Retrieved from https://doi.org/10.1111/j.1467-9280.1993.tb00571.x
Barcroft, J. (2007). Effects of opportunities for word retrieval during second language
vocabulary learning. Language Learning, 57(1), 35–56. Retrieved from
https://doi.org/10.1111/j.1467-9922.2007.00398.x
Bird, S. (2010). Effects of distributed practice on the acquisition of second language
English syntax. Applied Psycholinguistics, 31, 635–650. Retrieved from
http://doi.org/10.1017/S0142716410000172
Bjork, R. A. (1975). Retrieval as a memory modifier. In R. Solso (Ed.), Information
processing and cognition: The Loyola Symposium (pp. 123–144). Mahwah, NJ:
Lawrence Erlbaum Associates. Retrieved from https://doi.org/10.2307/1421430
Bjork, R. A. (1994). Memory and metamemory considerations in the training of
human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing
about knowing (pp. 185–205). Cambridge, MA: MIT Press. Retrieved from
https://doi.org/10.7551/mitpress/4561.003.0011
Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of
stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From learning
processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp.
35–67). Hillsdale, NJ: Erlbaum.
Bloom, K. C., & Shuell, T. J. (1981). Effects of massed and distributed practice on the
learning and retention of second-language vocabulary. Journal of Educational
Research, 74(4), 245–248. Retrieved from
http://doi.org/10.1080/00220671.1981.10885317
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009).
Introduction to meta-analysis. Chichester: John Wiley and Sons, Ltd. Retrieved
from https://doi.org/10.1002/9780470743386
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2013).
Comprehensive Meta-Analysis Version 3 [Software]. Englewood, NJ: Biostat, Inc.
Borenstein, M., Higgins, J. P. T., Hedges, L. V., & Rothstein, H. R. (2017). Basics of
meta-analysis: I2 is not an absolute measure of heterogeneity. Research Synthesis
Methods, 8(5), 5–18. Retrieved from https://doi.org/10.1002/jrsm.1230
Brosvic, G. M., Epstein, M. L., Cook, M. J., & Dihoff, R. E. (2005). Efficacy of error
for the correction of initially incorrect assumptions and of feedback for the
affirmation of correct responding: Learning in the classroom. Psychological
Record, 55(3), 401–418. Retrieved from https://doi.org/10.1007/BF03395518
Butler, A. C., Karpicke, J. D., & Roediger, H. L. (2007). The effect of type and timing
of feedback on learning from multiple-choice tests. Journal of Experimental
Psychology: Applied, 13(4), 273–281. Retrieved from
https://doi.org/10.1037/1076-898X.13.4.273
Nakata, T., & Elgort, I. (2021). Effects of spacing on contextual vocabulary learning:
Spacing facilitates the acquisition of explicit, but not tacit, vocabulary knowledge.
Second Language Research, 37(2), 233–260. Retrieved from
http://doi.org/10.1177/0267658320927764
Nakata, T., & Suzuki, Y. (2019a). Effects of massing and spacing on the learning of
semantically related and unrelated words. Studies in Second Language Acquisition,
41(2), 287–311. Retrieved from http://doi.org/10.1017/S0272263118000219
Nakata, T., & Suzuki, Y. (2019b). Mixing grammar exercises facilitates long-term
retention: Effects of blocking, interleaving, and increasing practice. The Modern
Language Journal, 103(3), 629–647. Retrieved from
http://doi.org/10.1111/modl.12581
Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research
synthesis and quantitative meta-analysis. Language Learning, 50(3), 417–528.
Retrieved from https://doi.org/10.1111/0023-8333.00136
Pan, S. C., Tajran, J., Lovelett, J., Osuna, J., & Rickard, T. C. (2019). Does interleaved
practice enhance foreign language learning? The effects of training schedule on
Spanish verb conjugation skills. Journal of Educational Psychology, 111,
1172–1188. Retrieved from http://doi.org/10.1037/edu0000336
Park, H. I., Solon, M., Dehghan-Chaleshtori, M., & Ghanbar, H. (2021). Proficiency
reporting practices in research on second language acquisition: Have we made any
progress? Language Learning, 72. Retrieved from
https://doi.org/10.1111/lang.12475
Pashler, H., Cepeda, N. J., Wixted, J. T., & Rohrer, D. (2005). When does feedback
facilitate learning of words? Journal of Experimental Psychology: Learning,
Memory, and Cognition, 31(1), 3–8. Retrieved from
https://doi.org/10.1037/0278-7393.31.1.3
Pashler, H., Zarow, G., & Triplett, B. (2003). Is temporal spacing of tests helpful even
when it inflates error rates? Journal of Experimental Psychology: Learning,
Memory, and Cognition, 29(6), 1051–1057. Retrieved from
http://doi.org/10.1037/0278-7393.29.6.1051
Patall, E. A., Cooper, H., & Robinson, J. C. (2008). The effects of choice on intrinsic
motivation and related outcomes: A meta-analysis of research findings.
Psychological Bulletin, 134(2), 270–300. Retrieved from
https://doi.org/10.1037/0033-2909.134.2.270
Peterson, L. R., Saltzman, D., Hillner, K., & Land, V. (1962). Recency and frequency
in paired-associate learning. Journal of Experimental Psychology, 63, 396–403.
Retrieved from https://doi.org/10.1037/h0043571
Pinker, S. (1998). Words and rules. Lingua, 106(1–4), 219–242. Retrieved from
https://doi.org/10.1016/S0024-3841(98)00035-7
Plonsky, L., & Oswald, F. L. (2014). How big is big? Interpreting effect sizes in L2
research. Language Learning, 64(4), 878–912. Retrieved from
https://doi.org/10.1111/lang.12079
Serrano, R., & Huang, H-Y. (2018). Learning vocabulary through assisted repeated
reading: How much time should there be between repetitions of the same text?
TESOL Quarterly, 52(4), 971–994. Retrieved from http://doi.org/10.1002/tesq.445
Shintani, N. (2015). The effectiveness of processing instruction and production-based
instruction on L2 grammar acquisition: A meta-analysis. Applied Linguistics, 36(3),
306–325. Retrieved from https://doi.org/10.1093/applin/amu067
Shintani, N., Li, S., & Ellis, R. (2013). Comprehension-based versus productive-based
grammar instruction: A meta-analysis of comparative studies. Language Learning,
63(2), 296–329. Retrieved from https://doi.org/10.1111/lang.12001
Snoder, P. (2017). Improving English learners’ productive collocation knowledge: The
effects of involvement load, spacing, and intentionality. TESL Canada Journal,
34(3), 140–164. Retrieved from http://doi.org/10.18806/tesl.v34i3.1277
Suzuki, Y. (2017). The optimal distribution of practice for the acquisition of L2
morphology: A conceptual replication and extension. Language Learning, 67(3),
512–545. Retrieved from http://doi.org/10.1111/lang.12236
Suzuki, Y. (2018). The role of procedural learning ability in automatization of L2
morphology under different learning schedules. Studies in Second Language
Acquisition, 40(4), 923–937. Retrieved from
https://doi.org/10.1017/S0272263117000249
Suzuki, Y. (2019). Individualization of practice distribution in second language
grammar learning: A role of metalinguistic rule rehearsal ability and working
memory capacity. Journal of Second Language Studies, 2(2), 170–197. Retrieved
from http://doi.org/10.1075/bct.116.02suz
Suzuki, Y., & DeKeyser, R. (2017a). Effects of distributed practice on the
preceduralization of morphology. Language Teaching Research, 21(2), 166–188.
Retrieved from http://doi.org/10.1177/1362168815617334
Suzuki, Y., & DeKeyser, R. (2017b). Exploratory research on second language practice
distribution: An aptitude × treatment interaction. Applied Psycholinguistics, 38(1),
27–56. Retrieved from https://doi.org/10.1017/S0142716416000084
Suzuki, Y., Nakata, T., & DeKeyser, R. M. (2019). The desirable difficulty framework
as a theoretical foundation for optimizing and researching second language
practice. The Modern Language Journal, 103(3), 713–720. Retrieved from
https://doi.org/10.1111/modl.12585
Suzuki, Y., Yokosawa, S., & Aline, D. (2020). The role of working memory in blocked
and interleaved grammar practice: Proceduralization of L2 syntax. Language
Teaching Research, Retrieved from http://doi.org/10.1177/1362168820913985
Taylor, K., & Rohrer, D. (2010). The effects of interleaved practice. Applied Cognitive
Psychology, 24(6), 837–848. Retrieved from https://doi.org/10.1002/acp.1598
Toppino, T. C., & Bloom, L. C. (2002). The spacing effect, free recall, and two-process
theory: A closer look. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 28(3), 437–444. Retrieved from
https://doi.org/10.1037//0278-7393.28.3.437
Toppino, T. C., & DiGeorge, W. (1984). The spacing effect in free recall emerges with
development. Memory and Cognition, 12(2), 118–122. Retrieved from
https://doi.org/10.3758/bf03198425
Uchihara, T., Webb, S., & Yanagisawa, A. (2019). The effects of repetition on
incidental vocabulary learning: A meta-analysis of correlational studies. Language
Learning, 69(3), 559–599. Retrieved from https://doi.org/10.1111/lang.12343
Ullman, M. T. (2015). The declarative/procedural model: A neurobiologically
motivated theory of first and second language. In B. Van Patten & J. Williams
(Eds.), Theories in second language acquisition: An introduction (pp. 135–158).
New York: Routledge. Retrieved from https://doi.org/10.4324/9780429503986-7
van Aert, R. C. M., Wicherts, J. M., & van Assen, M. A. L. M. (2016). Conducting
meta-analyses based on p values: Reservations and recommendations for applying
p-uniform and p-curve. Perspectives on Psychological Science, 11(5), 713–729.
Retrieved from https://doi.org/10.1177/174569116659874
Verkoeijen, P. P. J. L., Rikers, R. M. J. P., & Özsoy, B. (2008). Distributed rereading
can hurt the spacing effect in text memory. Applied Cognitive Psychology, 22(5),
685–695. Retrieved from https://doi.org/10.1002/acp.1388
Wickelgren, W. A. (1972). Trace resistance and the decay of long-term memory.
Journal of Mathematical Psychology, 9(4), 418–455. Retrieved from
https://doi.org/10.1016/0022-2496(72)90015-6
Wilson, W. P. (1976). Developmental changes in the lag effect: An encoding
hypothesis for repeated word recall. Journal of Experimental Child Psychology,
22(1), 113–122. Retrieved from https://doi.org/10.1016/0022-0965(76)90094-1
Supporting Information
Additional Supporting Information may be found in the online version of this
article at the publisher’s website:
Things to Consider
This meta-analysis showed significant effects of spaced practice on L2 vocab-
ulary, grammar, and pronunciation learning. However, the majority of studies
examining spacing effects have investigated L2 vocabulary learning and there
is a need for more research on the effects of spaced practice on L2 grammar
and pronunciation.
r Spaced practice benefits L2 learning, but the effects seemed to depend on
what is being learned (e.g., learning target) and how the learning happens
(e.g., number of sessions, type of practice).
Materials, data, open access article: Coding sheet and data are publicly avail-
able at http://www.iris-database.org.
How to cite this summary: Kim, S. K., & Webb, S. (2022). Spaced practice
effects in L2 learning. OASIS Summary of Kim & Webb (2022) in Language
Learning. https://oasis-database.org