Manuscript - Nabus - Remos - Wood Proofread Done and Edited
Manuscript - Nabus - Remos - Wood Proofread Done and Edited
May 2024
Naga City, Philippines
Copyright 2024, Kenrick John Harvell B. Nabus, Mary Angelette M. Remos, and Matthew Ethan Wood
The Senior Project entitled
developed by
and submitted in partial fulfillment of the requirements of their respective Bachelor of Science degrees
has been rigorously examined and recommended for approval and acceptance.
developed by
and submitted in partial fulfillment of the requirements of their respective Bachelor of Science degrees
is hereby approved and accepted by the Department of Computer Science, College of Computer
Studies, Ateneo de Naga University.
ABSTRACT
Educational games have been widely used to enhance mathematics education in K-12 settings.
However, the effectiveness of adaptive difficulty algorithms in educational games has not been well
explored. This study aims to conduct a comprehensive comparative analysis of Item Response
Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) within the
context of educational games. The research focuses on math-based question-and-response gameplay,
which is an important aspect of educational game design. The findings of this study shed light
on the comparative effectiveness of IRT and NN-DA in terms of student accuracy, response time,
and engagement across different difficulty levels. The results suggest that IRT is more effective in
enhancing the learning experience for students, especially in the classroom setting. This research
provides valuable insights for educators and game developers who are interested in using learning
games as an effective tool for improving math education.
We dedicate this research work to all of humanity.
vi
ACKNOWLEDGEMENTS
The researchers thank everyone who helped them finish this thesis.
vii
TABLE OF CONTENTS
1 Introduction 1
1.1 Project Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose and Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Technical Framework 18
3.1 Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Adaptive Difficulty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
viii
3.1.2 Item Response Theory (IRT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.3 Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) . . . . . . 19
3.1.4 Educational Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Relevant Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Item Response Theory (IRT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) . . . . . . 22
3.2.3 Comparison of Item Response Theory (IRT) and Neural Network-Based Adap-
tive Difficulty Algorithms (NN-ADA) . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Software and Hardware Development Tools . . . . . . . . . . . . . . . . . . . . . . . 23
4 Methodology 24
4.1 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 Development Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 Experimental Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Application Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 Planning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.2 Analysis Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.3 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.4 Development Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.5 Testing and Maintenance Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.6 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.7 Deployment Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Participants Profile and Selection . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.2 Session Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.3 Actual Game Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.4 Key Data Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.2 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
ix
4.5 Research Planning and Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A Question Bank 68
B Survey Questions 73
B.1 In-Game Survey Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
B.2 Post-Game Survey Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
x
Chapter 1
Introduction
1
Nabus, Remos, Wood Chapter 1. Introduction 2
learning styles, ensuring a broader perspective on the 2 algorithms’ efficacy. This program will
dynamically adjust the game’s difficulty level based on the player’s performance, providing a con-
trolled environment for data collection and analysis. Metrics such as response time, accuracy, and
individualized learning progression will be systematically recorded and analyzed.
In an era where personalized and engaging learning experiences are increasingly valued, AI-
driven adaptive difficulty stands is a promising approach to enhance educational games. This thesis
project aspires to shed light on the multifaceted implications of this technology on education. By
amalgamating educational theory, technological innovation, and cognitive psychology principles, this
research aims to provide educators and game developers with valuable insights into designing and
deploying AI-driven adaptive educational games.
Ultimately, the anticipated outcome of this study is to contribute to the ongoing discourse sur-
rounding the integration of AI and education, thereby paving the way for more effective and enjoy-
able learning journeys. Through a rigorous comparative analysis of IRT and NN-ADA, this research
seeks to inform educational leaders and relevant groups about the potential benefits and challenges
associated with these adaptive difficulty algorithms, ultimately advancing the field of AI-enhanced
education.
• To what extent do Item Response Theory (IRT) and Neural Network-Based Adaptive Difficulty
Algorithms (NN-ADA) demonstrate variations in adaptability to performance levels within the
educational game?
• What significant differences exist in the results of the game sessions between item response
theory (IRT) and neural network-based adaptive difficulty algorithms (NN-ADA) and how
might these differences influence the choice between the two methods?
1.4 Objectives
The main objective of this research is to perform a comparative analysis of Item Response Theory
(IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) to determine which
of these two adaptive difficulty algorithms proves more effective in enhancing the learning experi-
ence within an educational game environment, particularly focusing on math-based question and
answer gameplay. In order to achieve the main objective, the following specific objectives must be
accomplished:
• To design and develop a math-based educational game that will serve as the controlled envi-
ronment for testing IRT and NN-ADA;
• To gather and analyze quantitative data on player interactions within the game, including
accuracy rates, response times, and player engagement;
• To measure and compare the impact of IRT and NN-ADA on learning outcomes within the
gameplay sessions; and
Nabus, Remos, Wood Chapter 1. Introduction 4
• To derive insights into the strengths and weaknesses of both IRT and NN-ADA in the context
of educational game design.
In this chapter, the study explores the essential body of knowledge that underpins the comparative
analysis of Item Response Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms
(NN-ADA) in the context of educational games. This chapter serves as a critical bridge between
existing theories, methodologies, and the current study, offering insights into the evolution and
application of adaptive difficulty algorithms in educational gaming. Through this comprehensive
review, the research seeks to identify key trends, challenges, and gaps in the literature, guiding the
investigation and interpretation of empirical findings in subsequent chapters.
5
Nabus, Remos, Wood Chapter 2. Review of Related Literature 6
enhances learning outcomes but also fosters increased student engagement. Adaptive learning sys-
tems create a more captivating and motivating educational environment by providing customized
learning experiences that align with individual needs and interests [8] [48]. An additional benefit lies
in the ability of adaptive learning to identify knowledge gaps and provide real-time feedback [8]. This
targeted focus on student understanding allows educators to address specific challenges promptly
and offer timely guidance. Moreover, adaptive learning transcends language barriers, making it
inclusive and suitable for diverse groups of learners [8]. This adaptability ensures that appropriate
learning content is provided to all students, irrespective of their background, improving accessibility
and inclusivity in education.
Digital games have been increasingly used to train executive function skills in learners of different
ages. One approach to enhancing the effectiveness of digital games is through adaptive difficulty
adjustment, which customizes gameplay based on the player’s performance. A study investigated
the effectiveness of adaptive difficulty adjustment in a digital game designed to develop executive
function skills in learners of different ages [43]. The results showed that both adaptive and non-
adaptive versions of the game resulted in increased shifting skills for all learners, with adolescents
scoring higher than pre-adolescents and early adolescents. However, a trend suggested that adaptive
treatment may be more effective for adolescents. This study highlights the potential of digital games
to train executive function skills and the importance of adaptive difficulty adjustment in customizing
gameplay for different learners [43].
Despite these merits, implementing adaptive difficulty adjustments still faces certain challenges.
Convincing educators of the advantages of adaptive learning, addressing technical and organizational
hurdles and ensuring adequate support for remote learners are among the obstacles [8] [33] [39].
Nevertheless, research indicates that adaptive learning remains an effective strategy for enhancing
student motivation and improving learning outcomes across various educational settings [47]. As
these challenges are addressed, the potential benefits of adaptive difficulty adjustments in education
continue to make a compelling case for their integration into contemporary learning environments.
adjustments in educational computer games can lead to higher motivation and learning levels than
incremental difficulty adjustments [47]. Furthermore, the strength and direction of difficulty adap-
tation can affect the situational interest in game-based learning [28]. This may affect students in a
way that aids them in learning and mastering concepts they may have a weak grasp on.
Cognitive-behavioral motivation, including adaptive cognition and behavior, has been found
to have a statistically significant positive correlation with student engagement [51]. This means
that students with higher levels of adaptive cognition and behavior tend to be more engaged in
their educational activities. Furthermore, the effectiveness of adaptive difficulty adjustment on
the development of executive function skills for learners of different ages has been studied, with
results showing increases in the change of skills for all learners between the pre-test and post-test
measures [43]. This suggests that when educational challenges are appropriately adjusted to match
the learner’s abilities, it facilitates more effective learning and skill development.
Adaptive learning systems, such as adaptive quizzes, have been shown to improve student learn-
ing outcomes and increase student motivation and engagement [46]. By providing personalized
feedback and adjusting difficulty levels in real-time, adaptive learning systems ensure that students
are neither bored with overly simple tasks nor overwhelmed by overly difficult ones. This tailored
approach keeps students within their optimal learning zone, promoting continuous engagement and
improvement. Moreover, adaptive difficulty influences neural plasticity and training transfer, with
adaptive difficulty mediating the behavioral and neural effects of cognitive training [18]. A research
suggests that by adjusting the difficulty of tasks to match the learner’s current capabilities, adaptive
learning systems can enhance the brain’s ability to reorganize and form new neural connections,
a phenomenon known as neural plasticity [18]. This process is crucial for the effective transfer
of training, where skills and knowledge gained in one context are applied to different, real-world
situations.
Competitive agents and adaptive difficulty within educational video games have been studied,
with findings suggesting that adaptive approaches could optimize learning outcomes by addressing
individual differences and optimizing learning challenges [36]. This personalization ensures that each
learner has challenges that are neither easy nor difficult, thereby maintaining an optimal learning
environment. The presence of competitive agents, which simulate real opponents, further enhances
engagement by adding a dynamic and interactive component to the learning experience. Further-
more, the frequency of difficulty adaptations in adaptive training systems has been examined, show-
Nabus, Remos, Wood Chapter 2. Review of Related Literature 8
ing that adapting difficulty based on performance can manage the intrinsic load of learners and affect
performance gains [31]. When difficulty is adapted frequently and appropriately, it prevents cogni-
tive overload, which can hinder learning, and instead maintains a balance that promotes sustained
engagement and continuous improvement.
In summary, adaptive difficulty performance in education is influenced by various factors, includ-
ing cognitive-behavioral motivation, situational interest, and the effectiveness of adaptive learning
systems. These influences are crucial in shaping students’ motivation, engagement, and learning
outcomes in educational settings.
and academic performance, proposing a framework for personalized student advising based on strat-
egy levels. The study contributes valuable insights for educators and researchers, emphasizing the
significance of tailored interventions for self-regulated learning to enhance academic outcomes. The
results highlighted the efficacy of IRT in elucidating nuanced aspects of student learning approaches.
Another study from Carnegie Mellon University integrated IRT with an educational game to
evaluate students’ skills and behaviors, showing that IRT can conduct an accurate evaluation by
considering the proportional difficulty of each question in the evaluated group [5] [24]. The study
introduces the Psychometric Profile Generator (PPG), a computational model utilizing Item Re-
sponse Theory (IRT) to create user profiles based on skill and behavior levels. Integrated with
an educational game using Brazilian Exam questions, the PPG was evaluated with 113 students
(average age: 11) in a Brazilian school. Results demonstrate the PPG’s accuracy, considering the
number of correct answers and question difficulty. The IRT model effectively analyzed data, proving
reliable in assessing students’ skills. The study concludes that IRT in educational game evaluation
provides an entertaining and dependable method for skill assessment, suggesting broader applica-
tions in computer engineering courses and organizational training with a focus on mobile learning.
Furthermore, IRT has been applied to create user profiles and evaluate students’ skills and behaviors
in educational games, demonstrating its potential for educational evaluation through games [5] [24].
These applications highlight the effectiveness of IRT in assessing and improving learning outcomes
in educational games.
In conclusion, Item Response Theory (IRT) proves to be a highly effective tool in the realm of
educational games, offering profound insights into students’ learning strategies and skill levels. These
findings underscore IRT’s potential to create personalized learning experiences, providing precise
assessments that inform tailored educational interventions. Despite implementation challenges, the
benefits of employing IRT in educational contexts are clear, offering significant enhancements in
student engagement and learning outcomes.
unique impact of success and failure on self-esteem, especially in multiplayer modes. Furthermore,
the study explores how individual personality traits, such as extraversion, emotional stability, and
openness to experience, influence players’ preferences for higher difficulty levels [11]. Moreover, the
study explores the unique influence of success and failure on self-esteem, particularly in multiplayer
modes, and examines how individual personality traits, such as extraversion, emotional stability,
and openness to experience, shape players’ preferences for higher difficulty levels. This underscores
the need for game designers to consider player psychology and tailor difficulty levels accordingly.
The material also provided valuable guidelines for game designers when setting difficulty lev-
els, emphasizing the importance of maintaining a consistent level of challenge, gradually increasing
difficulty, introducing new mechanics effectively, and preventing players from feeling stuck. It un-
derscores the need to tailor the game’s difficulty to its target audience and communicate it clearly in
advertising. Additionally, the text discusses various game genres, from challenging ”soulslike” games
to story-driven ”walking simulators” and casual mobile games, highlighting the different expectations
and needs of players in each category [11]. Furthermore, the text stresses the significance of precise
playtesting with representative players to fine-tune difficulty levels, whether the target audience
comprises casual gamers, hardcore fans of challenging games, or a broader player base with varying
preferences. Through well-designed playtests, developers can gain insights into player reactions and
preferences regarding difficulty, enhancing the overall gaming experience [11]. Furthermore, the text
underscores the importance of precise playtesting with representative players to fine-tune difficulty
levels. By gaining insights into player reactions and preferences through well-designed playtests,
developers can enhance the overall gaming experience and avoid negative feedback, whether cater-
ing to casual gamers, hardcore fans of challenging games, or a broader player base with varying
preferences.
In conclusion, the study recognizes that determining the optimal game difficulty level is a complex
endeavor that requires careful consideration of game genre, player psychology, and target audience.
Properly reflecting these factors in promotional activities and conducting meticulous playtests are
essential for achieving the desired player experience and avoiding negative feedback.
Nabus, Remos, Wood Chapter 2. Review of Related Literature 13
involved measuring the learning performance of 100 junior high school students, focusing on concept
attainment, motivation, and cognitive load after engaging with the pop-up question video. The
results indicated positive outcomes, with a 74% concept attainment rate, suggesting a good level
of understanding among students [57]. Additionally, students’ motivation levels were high, with
84% falling into the ”good” category, indicating that the pop-up questions contributed positively
to their engagement with the material [57]. However, the cognitive load percentage was lower at
38%, indicating that students experienced less mental strain when interacting with the video [57].
Overall, the findings suggest that integrating pop-up questions into educational videos can benefit
students’ motivation and cognitive load. By facilitating engagement and reducing mental strain,
pop-up questions enhance the learning experience, particularly in complex subjects like physics.
Furthermore, the study suggests that integrating pop-up questions into innovative learning models
could further optimize their impact on student learning outcomes.
A study ”Measuring Flow in Educational Games and Gamified Learning Environments” by Sh-
ernoff, Hamari, and Rowe delves into the intricate relationships between student engagement, flow,
and learning outcomes in educational gaming contexts. Student engagement is critical in educational
settings, directly influencing learning outcomes and academic performance. By exploring how factors
like interest, enjoyment, and concentration impact engagement levels during gameplay, the study
sheds light on the mechanisms through which students become deeply engaged in learning [49]. The
integration of Structural Equation Modeling (SEM) and psychometric surveys in this study provides
a robust framework for analyzing student engagement in educational games [49]. SEM allows for
developing models that depict the complex interconnections between engagement, flow, and learning
outcomes, offering a deeper understanding of the dynamics at play. Using the Experience Sampling
Method (ESM) and psychometric surveys, the researchers measured student engagement and flow,
capturing students’ interest levels, enjoyment, and concentration during gameplay [49]. The findings
of this study underscore the importance of fostering student engagement to enhance learning expe-
riences in educational games. By aligning the challenge levels presented in the game with students’
skill levels, educators and game designers can create environments that promote optimal engagement
and facilitate the flow state. This balance between challenge and skill enhances student motivation
and interest and contributes to deeper learning and knowledge acquisition. Overall, the research
conducted by Shernoff, Hamari, and Rowe highlights the significance of student engagement in edu-
cational gaming contexts and its impact on learning outcomes. By emphasizing the role of interest,
Nabus, Remos, Wood Chapter 2. Review of Related Literature 15
enjoyment, and concentration in fostering deep engagement, the study provides valuable insights
for educators and game designers seeking to create immersive and effective learning experiences for
students [49].
A study ”Adaptive quizzes to increase motivation, engagement, and learning outcomes in a first-
year accounting unit” explored using adaptive quizzes to enhance student motivation, engagement,
and learning outcomes in an online first-year accounting unit. Adaptive learning offers personalized
learning opportunities tailored to individual student needs, potentially improving learning outcomes
and increasing student motivation and engagement [46]. The research findings indicated that while
the adaptive quizzes did not directly lead to significant improvements in student scores, students
overwhelmingly enjoyed using the quizzes for their learning. Most surveyed students expressed a pos-
itive attitude towards adaptive quizzes, with a high percentage agreeing that the quizzes were useful
and provided motivation to keep trying. This positive feedback suggests that adaptive quizzes can
enhance student motivation and engagement in the learning process [46]. The study highlighted the
importance of considering student preferences and satisfaction when implementing adaptive learning
technologies to optimize learning outcomes. While challenges exist in developing adaptive quizzes
that effectively increase student motivation and engagement while improving learning outcomes, fur-
ther research is needed to align these factors using adaptive release testing technologies. The study
acknowledged limitations such as the small sample size and the need for additional investigation
into why some students did not use the quizzes and whether quiz usage correlated with improved
performance [46]. Overall, the research contributes valuable insights into the potential benefits of
adaptive quizzes in enhancing student motivation and engagement in a first-year accounting unit.
The positive student perceptions of adaptive quizzes underscore the importance of incorporating
adaptive learning tools in educational settings to support student learning experiences. Further
research in this area can help refine the design and implementation of adaptive quizzes to better
align with student needs and optimize learning outcomes in higher education contexts.
The researches highlight the significant role of student engagement in educational games and
learning activities. By fostering collaborative learning experiences, aligning tasks with students’ pref-
erences, and integrating interactive elements like pop-up questions, educators enhance engagement,
motivation, and learning outcomes. Adaptive learning technologies further personalize learning, in-
creasing student satisfaction and participation. Understanding the interplay between engagement,
flow, and learning outcomes in gaming contexts underscores the importance of creating dynamic
Nabus, Remos, Wood Chapter 2. Review of Related Literature 16
and immersive learning environments. These insights can help educators and developers optimize
engagement strategies for enhanced learning experiences and improved outcomes.
Technical Framework
This chapter lays the groundwork for the practical implementation and experimentation with Item
Response Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) in
the context of educational games. This chapter serves as the bridge between theoretical concepts
and practical application. Describes the relevant algorithms, tools, development techniques, and
methodologies essential for the study’s development, simulation, and data collection. By presenting
this technical framework, the research aims to provide transparency in integrating adaptive difficulty
algorithms into the educational game environment, ensuring the reliability and validity of subsequent
findings.
3.1 Terminologies
Researchers may use these terminologies in their communication, but the meanings of these words
may vary depending on the context. To avoid confusion, the default meanings of these words are
provided along with their definitions. Unless otherwise specified, the default meaning of a word
should be assumed during communication.
The term ”adaptive difficulty” within the context of this research refers to the dynamic adjustment
of the challenge level in an educational game. ”Adaptive,” as defined by, denotes the capacity to
18
Nabus, Remos, Wood Chapter 3. Technical Framework 19
change in response to changing conditions [14], and ”difficulty,” according to [32], refers to the
quality or state of being challenging, difficult to accomplish, deal with, or understand. In this study,
adaptive difficulty encompasses the application of both Item Response Theory and Neural Network-
Based Adaptive Difficulty Algorithms to tailor the educational game experience based on individual
learner performance.
Item Response Theory (IRT) is a statistical and mathematical framework used in educational and
psychological evaluation to model how individuals respond to test items. It focuses on estimating
individuals’ latent traits or abilities based on their responses to a set of test items, considering the
difficulty and discrimination of each item [23]. In the context of educational game development and
adaptive algorithms, IRT provides a foundational understanding of how individuals’ abilities can be
assessed and modeled based on their interactions with in-game items or questions. IRT offers a way
to gauge a player’s skill or knowledge level by analyzing their responses within the game, which can
be instrumental in designing and implementing adaptive difficulty algorithms [44].
Neural networks, often referred to as artificial neural networks (ANNs) or simulated neural networks
(SNNs), constitute a fundamental component of machine learning and play a central role in deep
learning algorithms. Inspired by the structure and functioning of the human brain, they are com-
posed of layers of nodes that function as artificial neurons interconnected with others, characterized
by specific weights and thresholds. Neural networks rely on training data to enhance their accuracy
progressively, ultimately becoming formidable tools in computer science and artificial intelligence.
They significantly accelerate speech and image recognition tasks, dramatically reducing processing
time compared to manual human identification [25].
Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) are AI-driven algorithms that
use neural networks to dynamically adjust the difficulty level of a task or activity, such as in educa-
tional games. These algorithms analyze user behavior and performance data to tailor the challenge
level to the individual’s current skill or knowledge level [45]. In the context of educational game
development, NN-ADA represents a sophisticated approach to adaptive algorithms that harness the
Nabus, Remos, Wood Chapter 3. Technical Framework 20
power of neural networks to make real-time, data-driven adjustments to the gameplay experience
[45].
An educational game is a type of video game or interactive software designed to facilitate learning
or teach specific skills, knowledge, or concepts. These games integrate educational content with
engaging gameplay elements to enhance the learning experience [2].
Item Response Theory (IRT) is a statistical framework with significant potential in educational
games. Unlike traditional test theory, which assumes fixed item difficulty, IRT considers the difficulty
of the items within the game and the players’ abilities [12]. This approach allows for more accurate
assessments of player abilities, making it a valuable tool for creating adaptive learning experiences
in educational games.
The researchers will employ item response theory (IRT) as a foundational framework to gauge
and adapt the difficulty of educational assessments. IRT is a statistical method widely used in the
field of educational measurement and assessment, allowing researchers to model how individuals
respond to test items based on their underlying abilities. This theory enables them to accurately
estimate a student’s proficiency level and the difficulty of each test item. As seen by the algorithm
below, the Two-Parameter Logistic (2PL) model is one of the fundamental IRT models they will
utilize.
In IRT, each item in an educational game is associated with a characteristic curve that describes
the probability that a player will correctly answer it according to their ability level. The curve
Nabus, Remos, Wood Chapter 3. Technical Framework 21
helps define the item’s difficulty and discrimination properties [12]. The 3PL model holds particular
significance in this research due to its comprehensive representation of item characteristics. It incor-
porates three essential parameters: the item’s discrimination parameter (a), its difficulty parameter
(b), and a guessing parameter (c) [58]. The discrimination parameter measures how effectively an
item distinguishes between individuals with differing abilities, the difficulty parameter indicates the
level of ability at which a respondent has a 50% chance of answering the item correctly, and the
guessing parameter accounts for random guessing by participants. By integrating the 3PL model
into their adaptive difficulty algorithm, the researchers aim to establish a sophisticated and accurate
system that customizes educational content to suit the unique abilities of each learner. This ap-
proach enhances the precision of assessments and fosters a more individualized and effective learning
experience, addressing a critical need in modern education.
The central formula that underlies Neural Network-Based Adaptive Difficulty Algorithms involves
using neural networks to predict each player’s optimal level of challenge. Neural networks excel at
identifying intricate patterns in player data, allowing for precise adjustments to the game’s difficulty.
The essence of these algorithms lies in their ability to continuously learn and evolve based on the
player’s interactions within the educational game [6].
Neural networks within NN-ADA rely on sophisticated data processing and feature extraction
techniques. This involves collecting and analyzing various player metrics, including response times,
the correctness of answers, player interactions, and even physiological data when available. This
adaptability is made possible through collecting and analyzing gameplay data, which is then fed
into a neural network for predictive modeling. As players engage in the educational game, the
neural network refines its understanding of the player’s skill level and makes real-time adjustments
to the game experience [6].
To assess the effectiveness of NN-ADA, the researchers will consider various evaluation metrics,
such as accuracy and precision, progress patterns, and adaptation speed. These metrics help quantify
the impact of the algorithm on learning outcomes and user experience.
Hence, the core concept of neural network-based adaptive difficulty algorithms in educational
games revolves around providing players with an optimal level of challenge. This means that when
players encounter too easy or too difficult questions, the algorithms step in to recalibrate the learning
experience [34]. For instance, if a player repeatedly struggles to grasp a specific concept or solve
a particular problem, the algorithm can detect this pattern and adjust the difficulty downward to
ensure a smoother learning progression. In contrast, when a player demonstrates mastery, and the
educational content becomes too predictable, the difficulty can be increased to maintain engagement
and excitement.
Algorithm Mechanism
Item Response Theory (IRT) operates as a statistical model, analyzing the characteristics of the item
against the responses of the players to gauge individual abilities [38]. Its strength lies in assessing
Nabus, Remos, Wood Chapter 3. Technical Framework 23
skill levels based on question responses, offering a structured method to estimate and adapt difficulty
[21]. This method enables the model to map the relationship between an individual’s ability and
the probability of giving correct responses to various items in a test or assessment.
On the other hand, Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) employ
neural networks to dynamically adjust difficulty levels [50]. NN-ADA’s agility stems from its real-
time learning ability, adapting swiftly to individual performance. These algorithms utilize neural
networks’ capacity to process vast amounts of data, enabling them to continuously analyze and
respond to a player’s actions or abilities during gameplay [1] [55].
Adaptability
The major comparison between the two for this research is the effectiveness of its ability to adapt
to the player; IRT uses a base formula to decide the likelihood of the player being correct , which
can then be used to determine the increase in their ability. At the same time, NN-ADA is purely
dependent on the developer; it can be made in any way, but it will be trained to learn on its own.
In this case, the NN will learn how much of an increase or decrease in ability of player should get
if they get a question wrong or right. This would be the main comparison the two will measure
regarding adaptability.
Methodology
This chapter will provide a detailed explanation of the development methodology that will be used
throughout this project. This will serve as a crucial foundation for understanding the systematic
approach that researchers will adhere to during the creation of the simulation. In this section, the
researchers outline the step-by-step procedures and the strong protocols established to effectively
tackle any potential challenges that might surface during the intricate development phase.
24
Nabus, Remos, Wood Chapter 4. Methodology 25
In this study, the selection of Agile methodology stems from its inherent adaptability and iterative
nature, aligning perfectly with developers’ dynamic research objectives. Agile’s iterative approach
allows for continual re-assessment and adjustment throughout the study, ensuring prompt responses
to evolving requirements [3]. Its flexibility allows researchers to accommodate unforeseen challenges
or insights during the development and testing phases. In addition, Agile’s emphasis on regular
feedback loops ensures active participation and collaboration among team members, fostering a more
responsive and efficient workflow [7][58]. The incremental delivery of features and functionalities of
this methodology enables researchers to quickly incorporate insights gained during each iteration,
ultimately leading to a more refined and targeted final product.
In this study, the researchers employed an experimental research approach to investigate the com-
parative effectiveness of two distinct algorithms, Item Response Theory (IRT) and Neural Network-
Nabus, Remos, Wood Chapter 4. Methodology 26
Based Adaptive Difficulty Algorithms (NN-ADA) within the context of an educational game. The
experimental approach allows researchers to exert control over a crucial component of the experi-
ment, the adaptive algorithm, and to modify it as necessary [54]. This manipulation enables the
researchers to observe and measure its effects on the primary outcome of interest. The experimental
approach is particularly well suited to the research objectives as it facilitates precise data collection
and analysis. Using the two algorithms as experimental tools, researchers can generate empirical
data essential to draw meaningful conclusions. This approach also allows the isolation of the impact
of each algorithm on the simulated player experiences, shedding light on their respective strengths
and weaknesses.
During the Planning Phase, project requirements and objectives will be identified, project scope will
be defined, limitations will be determined, feasibility studies will be conducted, and a project plan
will be created.
The analysis phase entails an examination of existing resources, including an in-depth analysis of
the prevalent, educational game structures within educational games, identifying key game features,
assessing available tools and technologies, and critically recognizing essential design and development
aspects.
The Design Phase involves the creation of a detailed design for the application, the development of
wireframes and mockups for the user interface, and the creation of technical specifications.
Mechanics
Parameters
Variables
Use-Case Diagram
Figure 4.2: Diagram showing relationship between users and features of the quiz game
Nabus, Remos, Wood Chapter 4. Methodology 29
Figure 4.3: The flow of the program and sequence of the processes.
Nabus, Remos, Wood Chapter 4. Methodology 30
This is a visual representation of the intended user interface. The developers designed the interface
very simply since the game serves the sole purpose of gathering data to test two adaptive algorithms.
In the representation, the circle represents the enemy, while the square represents the player. The
’New text’ area is designated for displaying the points after the quiz ends.
Key Features
• Questions & Multiple Choices: Presenting various questions with multiple answer options.
• Scoring Board: Displaying scores to track the player’s performance at the end of the game.
• Pop-up survey question: An in-app pop-up feature that measures students’ engagement during
gameplay.
• Adaptive Difficulty: Adjusting the difficulty level based on the algorithm and the player’s
performance or progression through the game.
Nabus, Remos, Wood Chapter 4. Methodology 31
• Simple UI: A clean and user-friendly interface that effectively presents questions and choices.
As part of the game mechanics, a series of math questions were integrated into the gameplay, designed
to span a range of mathematical topics and difficulty levels. These questions are carefully created by
a former District Elementary Math Coordinator and polished by a grade 6 Math teacher of Ateneo
de Naga University, ensuring the accuracy and quality of the questions, as well as the classification
of difficulty, time constraints for answering, and alignment with the grade 6 curriculum. The topics
range from basic arithmetic to more advanced concepts typically encountered throughout the school
year.
The difficulty of these questions is systematically classified, with easier levels focusing on funda-
mental concepts that require minimal computation, gradually progressing to more complex topics
that necessitate higher-order thinking skills and deeper understanding. For instance, the initial
levels may entail basic arithmetic operations such as addition and subtraction, while subsequent
levels explore topics like fractions, order of operations, and decimal operations, demanding greater
cognitive engagement and problem-solving abilities from the players.
The question bank comprises 10 levels of difficulty, each level containing 3 questions. All values
are deliberately kept low to ensure that students can potentially answer the questions mentally,
although the use of pen and paper is permitted if preferred. All questions are presented in a
multiple-choice format, offering 4 options for each.
Engagement Measurement
In measuring the player’s engagement, researchers considered factors such as interest, enjoyment,
and concentration, which have been used in numerous ESM and SEM studies. The researchers
administered in-game and post-game surveys, each comprising three questions targeting interest,
enjoyment, and concentration. Each question utilized a rating scale ranging from ’1=Not At All’
to ’5=Very Much’ to assess player responses accurately. The in-game survey appeared as a pop-up
feature during gameplay, allowing players to provide immediate feedback. In contrast, the post-game
survey distributed printed questionnaires after the game ended.
Nabus, Remos, Wood Chapter 4. Methodology 32
The application creation that utilizes both algorithms, starts with the Unity software. This is where
the two versions of the game were made and utilized most of the same coding for the UI as each
other.
User Interface
The game’s UI is simple and uninspired, meaning that it was made to focus less on aesthetics. The
start of the game shows the title and text fields that request the game user to input their first name
and last name, respectively, while also showing a start button to initiate the beginning of the game.
Nabus, Remos, Wood Chapter 4. Methodology 33
The game starts by displaying the first question; this question is based on the ability level that
was set by the developer. The answer options are displayed at the bottom of the screen. When
the user answers a question, the algorithm will provide the next question to be answered, and these
events will recur until all the questions are answered, in which the score will be displayed and the
answer buttons will no longer be displayed.
Nabus, Remos, Wood Chapter 4. Methodology 34
There is also the implementation of a mid-game survey question for player’s engagement mea-
surement during gameplay. This prompt will appear every five questions and players will rate the
displayed survey question on a scale from one to five, with one being the lowest and five being the
highest.
Backend
The algorithms used for these games are the major difference between the two games. The first being
the item response theory (IRT) and the second is the Neural Network-based adaptive algorithm. IRT
was implemented into the code with the formula that it uses, but also with some additions.
Along with the formula, to find the absolute difference, it is necessary to subtract 0.5 from the
probability of being correct to determine that the question is discriminating between players with
higher and lower ability levels optimally. Once the answer is given, depending on the question, the
learning rate will increase or decrease from the current ability level of the player to help determine
the best question that should be given next.
Nabus, Remos, Wood Chapter 4. Methodology 35
The neural network algorithm was not easy to adapt to code, as it needed to use a unity package
that uses machine learning, this being MLAgents. This package is built to help developers build a
model for machine learning that uses a neural network and the many layers that go into building
one. The model that the package creates is one that follows a convolutional neural network (CNN)
model due to its need to learn from different inputs and observations. The main part of the package
utilizes what is called an agent, which is given specific observations that it should be watching within
the environment and then will act based on those actions according to specific instructions it was
given. The observations that the agent is watching are the difficulty of the question, the player’s
ability, and whether or not the player is correct. The actions that the agent will be performing
is increasing or decreasing the player’s ability and will change it depending on the observations
mentioned previously. This algorithm uses a simple formula that utilizes the same variables as the
IRT formula but in a less specific manner.
This equation simply determines what the next difficulty should be by adding the product of
the discrimination and ability, as these two values interact with each other and will determine the
increase or decrease in the difficulty, while the guessing rate is subtracted from the difficulty as it
shows that the player could have guessed the answer and takes this into account. The ability variable
is the one that the agent will be trained to manipulate in order to provide a more accurate output
for what the next difficulty should be. The training for the agent was done by running the game
and having the agent answer questions using different instructions. The first set of instructions
is answering the question correctly and the instructions vary by how long it took to answer the
question, the agent is rewarded accordingly based on how fast it answers the questions. The next set
of instructions is having the agent answer questions incorrectly, also varying in the amount of time
it takes to answer the question, where the agent will be reprimanded because the answer was wrong.
The last instruction is simply to let the timer run entirely until the question is answered. The point
of training the agent this way is to make it undergo multiple scenarios of getting the questions right
and wrong so that it can adjust the ability level accordingly. This was also meant to help train the
agent to learn that giving a negative number to add to the ability is not bad but rather appropriate
because its job is to match a question to the player according to their current ability level. This
training was done five hundred thousand times in order for the accuracy to be polished.
Nabus, Remos, Wood Chapter 4. Methodology 36
Algorithm Implementation
Within the game, IRT is used to dynamically adjust the difficulty of questions based on the player’s
responses. As the player progresses through the game, the algorithm continuously evaluates the
player’s ability level and selects questions that are appropriately challenging. The algorithm cal-
culates the probability of a player answering a question correctly based on their ability and the
question’s difficulty. Then it adjusts the difficulty level for subsequent questions to maintain an
optimal level of challenge.
In the game environment, NN-ADA observes various parameters such as the player’s response
time, the correctness of answers, the difficulty of the question, and their ability level to predict the
approppirate increase in ability level. The algorithm then adjusts the player’s ability level, which
will then modify the next question difficulty to optimize the learning experience for the player.
The Testing Phase involves various testing types: unit testing, integration testing, system testing,
and acceptance testing. Concurrently, maintenance activities involve monitoring the application for
bugs and errors, continual updates, and ensuring compatibility with hardware and software tools.
Testing Approach
• Unit Testing: This will be conducted to scrutinize individual components of the simulation and
algorithms in isolation. This examination allows researchers to verify that each unit functions
as intended, free from dependencies on other components.
• Integration Testing: This will assess the seamless interaction between different modules within
the system. This involves testing how the algorithms integrate into the game environment and
ensuring they harmoniously interact with other game components.
• System Testing: This phase involves assessing the entire system as a whole. It verifies the col-
lective performance of the adaptive algorithms within the game environment. System testing
encompasses end-to-end evaluation, ensuring that the algorithms, alongside all game compo-
Nabus, Remos, Wood Chapter 4. Methodology 38
nents, operate cohesively, meeting the predetermined functional requirements and specifica-
tions.
• Usability Testing: This phase evaluates the game interface’s ease of use and user experience.
It involves participants completing specific tasks while researchers observe their interactions
and collect feedback. The data collected will inform iterative design improvements to enhance
user experience and address usability issues.
• Test Case 1: Ensure that the game starts without errors and presents the initial question.
• Test Case 2: Confirm that answers can be selected, and the game progresses accordingly.
• Test Case 3: Verify that the scoring mechanism accurately calculates the player’s score based
on their responses.
• Test Case 4: Verify that the document with the players’ results and data is in the correct
location.
• Test Case 5: Assess how well the Item Response Theory (IRT) algorithm adapts to the difficulty
of questions based on player responses.
• Test Case 6: Evaluate the Neural Network-Based Adaptive Difficulty Algorithm (NN-ADA)
in adjusting question difficulty to optimize the learning experience.
In the Deployment Phase, the finalized system, comprising the educational game integrated with the
Item Response Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA),
was introduced to public for its intended implementation and use.
Nabus, Remos, Wood Chapter 4. Methodology 39
The study involved Grade 6 participants from Ateneo de Naga University. Grade 6 students, typically
aged 11 to 12 in the Philippines, represent a critical stage in cognitive development, demonstrat-
ing enhanced problem-solving skills and critical thinking abilities. Their sensitivity to motivation
and engagement factors, particularly in the transition to adolescence, makes them ideal candidates
for evaluating the effectiveness of educational games and adaptive difficulty algorithms. Previous
successful studies with similar demographics further validate the suitability of Grade 6 students for
engaging in educational game settings. Additionally, the data disclosed in this study were restricted
to generalized forms, as per the confidentiality agreement with the grade school.
Participants Selection
Researchers targeted to engage 119 sixth-grade participants, representing the entire Grade 6 pop-
ulation. These students were divided into three categories based on their average grades from the
previous quarters in Mathematics subject:
Nineteen of these participants were selected from each category (Class A, Class B, Class C) for the
preliminary data gathering. The remaining 100 students participated in the actual data-gathering
phase. Within each category, students were further divided into two groups: one group participated
Nabus, Remos, Wood Chapter 4. Methodology 40
in gameplay utilizing Item Response Theory (IRT), while the other engaged with gameplay utilizing
Neural Network Adaptive Difficulty Algorithm (NN-ADA). Therefore, there was a target of 50
student participants from each category for each algorithm.
The actual data gathering phase of the study cannot begin until some missing data is provided
and that data cannot be obtained without having a preliminary gathering phase. There were 19
students who were chosen from Class A, Class B, and Class C to get the missing data such as the
question discrimination and guessing values. These data can only be obtained from testing the
students themselves and seeing these values. The testing specifically was for these selected students
to answer all 30 questions from the question bank so that the missing values could be determined.
Another important note to remember about this testing is that the students were writing the letter
“g” next to the questions that they were guessing; this allowed for determining the guessing value
for a question. The students during the testing were monitored in order to preserve the integrity of
the testing and prevent any form of cheating.
During the actual data gathering phase, the targeted 100 student participants engaged in separate
sessions tailored for evaluating the IRT and NN algorithms independently. Each group took turns
using the computer laboratory during their scheduled times to avoid disrupting their regular class
schedules.
Two separate groups of students (basically from Class A, Class B, and Class C, divided into two)
partook in these sessions: one group engaged with the IRT-based game, while the other interacted
with the NN-based counterpart. Despite the different groups, the sets of questions presented to each
group remained consistent. This approach ensures that despite involving distinct sets of students,
the comparison between the algorithms remains uniform and directly comparable.
Nabus, Remos, Wood Chapter 4. Methodology 41
Preparation
Before the game session began, the researchers prepared all computers and downloaded the game
application. Researchers supervised the gameplay sessions, while the attending teacher/s guided se-
lected student participants to the computer laboratory and assisted in maintaining order throughout
the session.
Gameplay Session
Clear instructions were provided to participants, guiding them through the gameplay experience.
Participants were not allowed to use a calculator; however, pen and paper are permitted for their
calculations. Each student answered a total of 15 questions, depending on the algorithm’s selection.
Three in-game question surveys appeared after the 4th, 8th, and 12th questions. Once students
finished, the program collected and recorded the necessary data.
Post-Game
After completing the game, students received a brief post-game survey questionnaire. Once they
finished the survey, they were free to leave the computer laboratory. The researchers then saved the
collected data on an external hard drive in each computer and returned them to their original state.
During the gameplay sessions, various types of data were collected from students that were used to
compare the two algorithms.
Player Responses:
• Accuracy Rate: Detailed records were collected regarding the accuracy of players’ responses
to the questions presented within the game. This included tracking the percentage of correct
answers given by each player.
• Response Time Across Varying Difficulty Levels: The time players take to respond to questions
of different difficulty levels was recorded. This data enabled the evaluation of how swiftly
Nabus, Remos, Wood Chapter 4. Methodology 42
Gameplay Metrics:
• Questions Response Time: Analysis of the time players take to respond to individual questions
was conducted, offering insights into their cognitive processing speed and efficiency.
• Player Engagement: To measure player engagement, a pop-up survey appeared during the
game. Additionally, a post-game survey was administered through a printed questionnaire.
These feedback mechanisms offered valuable insights into the overall levels of engagement
experienced by players during gameplay sessions.
Descriptive statistics was employed to summarize and describe the collected data. This includes cal-
culating measures such as mean, median, mode, standard deviation, and range for player responses,
accuracy rates, response times, and gameplay metrics across different difficulty levels.
Nabus, Remos, Wood Chapter 4. Methodology 43
Researchers analyzed the accuracy rates of players’ responses under both IRT and NN-ADA con-
ditions. By comparing the percentage of correct answers given by players across different difficulty
levels, researchers can assess the algorithms’ ability to dynamically adjust question difficulty while
maintaining an optimal level of challenge. Any significant disparities in accuracy rates between
the two algorithms may indicate variations in their effectiveness in calibrating difficulty levels to
individual player abilities.
• Measurement Criteria: Calculate the percentage of correct answers each player gave out of the
total questions attempted.
• Formula:
Number of Correct Answers
Accuracy Rate = × 100%
Total Questions Attempted
– Number of Correct Answers represents the total number of questions answered correctly
by players.
– Total Questions Attempted represents the total number of questions attempted by players.
• Interpretations:
– If the accuracy rate is consistently higher for one algorithm across all difficulty levels,
it suggests that the algorithm is better at adjusting question difficulty to match player
abilities.
The analysis of players’ response times to questions of varying difficulty levels is a proxy for assess-
ing cognitive processing speed and efficiency within the game. Researchers calculated the average
Nabus, Remos, Wood Chapter 4. Methodology 44
response times per difficulty level and compare them between IRT and NN-ADA conditions. Sig-
nificant variations in response times may indicate differences in players’ cognitive engagement and
problem-solving strategies under different adaptive algorithms.
• Measurement Criteria: Record the time taken by players to respond to questions of different
difficulty levels (e.g., level 1, level 2, . . . ). Calculate the average response time for each difficulty
level to evaluate how swiftly players react to varying levels of challenge.
• Formula:
Pn
i=1 ResponseT imei
Average Response Time per Difficulty Level =
n
– Response Timei represents the response time for each individual question i.
– n represents the total number of questions answered for a particular difficulty level.
• Interpretation:
– Longer response times for one algorithm may suggest that it is presenting overly chal-
lenging questions, leading to increased cognitive load.
– Consistently shorter response times under one algorithm may indicate that it effectively
matches question difficulty to player abilities, facilitating faster decision-making.
The ratings provided by players in response to in-game and post-game survey questions measuring
interest, enjoyment, and concentration will be analyzed. These ratings will be compared between
IRT and NN-ADA conditions to identify significant differences in player experiences under each
algorithm.
• Measurement Criteria: Calculate the average rating for each survey question (e.g., interest,
enjoyment, concentration) separately under IRT and NN-ADA conditions. Collect ratings
provided by players in the in-app survey and post-game survey, ranging from ‘1=Not At All’,
‘2=A Little’, ‘3=Somewhat’, ‘4=Pretty Much’, to ‘5=Very Much’.
Nabus, Remos, Wood Chapter 4. Methodology 45
• Formula:
Pn
i=1 Ratingi
Average Rating =
n
– Rating i represents the rating provided by each player for a specific survey question.
– n represents the total number of survey responses collected for that question
• Interpretation
– Higher average ratings for interest, enjoyment, and concentration under one algorithm
suggest that it may offer a more engaging and immersive gameplay experience.
– Significant variations in average ratings between algorithms indicate areas where one
algorithm may excel in enhancing player motivation and focus.
Correlation Analysis
Researchers will conduct correlation analysis to explore potential relationships between different
performance metrics, such as accuracy rates, response times, and engagement ratings. Understanding
these correlations can help identify factors contributing to effective adaptive learning experiences
and inform strategies for optimizing player engagement and learning outcomes.
The formula below calculates the covariance of the two variables divided by the product of their
standard deviations. A correlation coefficient close to 1 indicates a strong positive correlation, close
to -1 indicates a strong negative correlation and close to 0 indicates no correlation.
P
(Xi − X̄)(Yi − Ȳ )
r= P
p P
(Xi − X̄)2 (Yi − Ȳ )2
• r represents the correlation coefficient, which measures the strength and direction of the linear
relationship between two variables.
• Xi and Yi represent individual data points of the two variables being analyzed.
Interpretation:
Nabus, Remos, Wood Chapter 4. Methodology 46
• Positive correlations between accuracy rates and engagement ratings suggest that players who
perform well are more likely to be engaged with the game.
• Negative correlations between response times and accuracy rates may indicate that faster
responses are associated with lower accuracy, potentially due to guessing or lack of careful
consideration.
4.4.4 Visualization
To enhance the clarity and comprehensibility of the findings, the researchers will employ data visu-
alization techniques, such as graphs and charts, to illustrate the performance metrics and differences
between the algorithms.
This chapter provides a detailed examination of the outcomes stemming from applying adaptive
difficulty algorithms in educational games. Through a systematic analysis of empirical data, the
researchers reveal how these algorithms impact various aspects of learning. By synthesizing findings
from both quantitative analysis and qualitative insights, this chapter offers valuable insights into
the practical implications of integrating adaptive difficulty algorithms into educational contexts,
informing future research directions and instructional practices.
48
Nabus, Remos, Wood Chapter 5. Results and Discussion 49
Level 1 2 3 4 5 6 7 8 9 10
IRT 73.07 65.27 85.71 75.86 44.68 62.26 69.56 24.13 72.72 63.01
NN-ADA 86.95 83.01 91.52 72.41 43.95 79.68 59.72 15.58 50.94 40.42
Table 5.1: Comparison of Accuracy Rate for IRT & NN-ADA using Table
Nabus, Remos, Wood Chapter 5. Results and Discussion 50
Figure 5.1: Comparison of Accuracy Rate for IRT & NN-ADA using Line Graph
The figure above is a line graph that depicts the comparison between Item Response Theory
and Neural Network-Based Adaptive Difficulty Algorithms in terms of accuracy rate across different
difficulty levels. The x-axis represents each of the difficulty levels out of 10. The y-axis represents the
average accuracy percentage of players. While the table above shows the final calculations for each
level for both algorithms. Each point is computed by the accuracy rate formula stated in Chapter
4.
Based on the results, it is evident that both IRT and NN-ADA exhibit varying degrees of ef-
fectiveness in adjusting question difficulty to align with player abilities. With 43 players engaged
in the IRT group and 46 in the NN-ADA group, each participant answered a total of 15 questions.
Notably, NN-ADA better calibrates questions to match player proficiency in the lower levels 1 to
3. However, as the difficulty increases, the accuracy rates for both algorithms exhibit fluctuations,
with IRT occasionally surpassing NN in accuracy. Moreover, at the highest difficulty levels 7 to
10, NN-ADA shows markedly lower accuracy rates compared to IRT, indicating IRT gives better
questions to match player ability.
Nabus, Remos, Wood Chapter 5. Results and Discussion 51
NN-ADA excels in calibrating questions at lower difficulty levels, while IRT demonstrates superior
performance as difficulty increases. This disparity can be attributed to differences in algorithm de-
sign, question calibration methods, and cognitive load. NN-ADA’s agility in adjusting difficulty levels
may lead to rapid but occasionally inaccurate adjustments, affecting accuracy rates. In contrast,
IRT’s stability and consistency in adapting to question difficulty contribute to enhanced accuracy
across varying difficulty levels, ensuring an optimal balance between challenge and manageability.
The figure above is a bar graph showing the overall accuracy rate for IRT and NN-ADA. Across
the ten difficulty levels, IRT demonstrates higher accuracy rates in six out of ten levels compared to
NN-ADA. Nonetheless, as stated previously, NN-ADA would have the advantage of having higher
accuracy rates over IRT in the lower stages, while IRT would adjust better than NN-ADA as the
difficulty level increases.
Nabus, Remos, Wood Chapter 5. Results and Discussion 52
The figure above is a line graph that shows the relationship between the difficulty levels and
response times when students answer the questions in each level for both IRT and NN-ADA. The
x-axis represents the difficulty level. The y-axis represents the average response time of students
per of level difficulty for IRT and NN-ADA. This graph can also represent how each algorithm gives
Nabus, Remos, Wood Chapter 5. Results and Discussion 53
questions that match an estimated right amount of difficulty answerable by the students.
Results indicate that, overall, response times tend to increase as the difficulty level progresses
for both algorithms, aligning with the idea that more challenging questions require additional time
for processing and deliberation. Based on the results, students took longer to answer the questions
in the lower levels 1 to 3 for IRT. This suggests that NN-ADA may present questions perceived as
more manageable by players during the lower stages of gameplay. Interestingly, IRT demonstrates
relatively stable response times across most difficulty levels, with minor fluctuations observed. This
consistency indicates that IRT effectively adjusts question difficulty to maintain a consistent level
of cognitive demand on players. In contrast, NN-ADA exhibits slightly more variability in response
times, particularly evident in the higher difficulty levels. This variability may suggest occasional mis-
matches between question difficulty and player abilities, resulting in fluctuations in response times.
While NN-ADA appears to offer quicker decision-making in the lower stages, IRT demonstrates
stability and consistency in adapting to question difficulty across all levels.
The graph for Response Time not only indicates how long students take to answer questions at
different difficulty levels but also offers insights into how challenging they find each question. From
difficulty levels 1 - 4, it is observed that students from each group, although showing a clear difference
in response times, that both algorithms have a similar increasing rate of response times. From this,
it can be said that as the difficulty level increases, the response time also increases. A similar case
can be observed from points L5 - L9 with L5 - L6 decreasing while the rest steadily increase all the
same. Both groups of students may have found the questions from difficult level 6 easier than level
5. Moreover, an intersection between the 2 algorithms is seen on the graph, particularly on the point
of L4 which both have an average response time of 36 seconds. It can be surmised that students
on both algorithms found the difficulty of questions to be about the same. Lastly, particularly,
the response times of L10 for both algorithms show a clear difference as each algorithm’s direction
shifts away from each other. IRT’s average response time decreased to 31 seconds while NN-ADA
increased to 43 seconds. Based on this result, it may be deduced that students participating in IRT
may have found the questions a little easier compared to students from NN-ADA who may have
found the questions a little more difficult. Hence, IRT more effectively adjusted the difficulty level
based on student performance compared to NN-ADA, which struggled to match question difficulty to
students’ abilities. The result for Response Time reveals that as question difficulty increases, so does
the response time, with notable patterns and variations suggesting differences in how challenging
Nabus, Remos, Wood Chapter 5. Results and Discussion 54
students found each level, particularly at higher difficulties where IRT participants appeared to find
the questions easier than those using NN-ADA.
The observed increase in response times as difficulty levels progress underscores the relationship
between question complexity and processing time. NN-ADA’s ability to present questions perceived
as more manageable at lower stages may contribute to shorter response times than IRT. However,
IRT’s stability in adapting to question difficulty ensures relatively consistent response times, high-
lighting its effectiveness in maintaining cognitive demand on players.
The figure above is a bar graph that shows the comparison between IRT and NN-ADA in terms of
engagement. For each pair in the x-axis, shows the factors comprising engagement, which are interest,
enjoyment, and concentration while the y-axis represents the average ratings for each criterion for
both algorithms. This graph may show which algorithm the students were more engaged in.
Based on the results, it’s evident that students tend to exhibit slightly higher levels of interest,
enjoyment, and concentration when engaging with the IRT-based educational game. The marginal
differences between the two algorithms suggest that while both are effective in sustaining student en-
gagement, IRT may have a slight edge in eliciting more positive experiences across these dimensions.
This may also suggest that IRT was able to give questions that suited the challenge the students
had and their ability to answer the questions.
The graph for the engagement rating highlights notable differences between the two algorithms,
providing insights into the students’ mental states while answering quiz game questions across cat-
egories such as interest, enjoyment, and concentration. Firstly, in the category of interest, students
using IRT showed higher levels of interest in answering questions than those using NN-ADA. This
Nabus, Remos, Wood Chapter 5. Results and Discussion 56
heightened interest may stem from the adaptive nature of IRT, which tailors questions more closely
to the students’ abilities and keeps them engaged. Secondly, IRT students also scored higher in
enjoyment. This increased enjoyment likely results from IRT’s ability to deliver questions that
strike a balance between being challenging yet achievable, thus providing a sense of accomplishment
and satisfaction. Lastly, in the concentration category, IRT students again outperformed NN-ADA
students. This can be attributed to the tailored difficulty of IRT questions, which align with the stu-
dents’ existing knowledge and skills, allowing them to focus better and systematically work through
the questions. The ability to engage with appropriately challenging material likely helped maintain
their concentration and enhance their overall learning experience. Thus, IRT’s adaptive approach
fosters greater interest, enjoyment, and concentration among students, leading to a more engaging
and effective educational experience.
Figure 5.5: Overall Comparison of Engagement Rating between IRT & NN-ADA
The figure above is a bar graph that shows the comparison between IRT and NN-ADA in terms
of overall engagement rating. The 3 factors’ (interest, engagement, and concentration) ratings have
Nabus, Remos, Wood Chapter 5. Results and Discussion 57
been combined and calculated together to get the average score of engagement. With an overall
engagement rating of 3.75 for IRT and 3.46 for NN-ADA, it is evident that students generally
perceive the IRT-based game to be slightly more engaging than its NN-ADA counterpart. While
both algorithms demonstrate efficacy in sustaining student engagement, the higher average rating
for IRT suggests that it may offer a more immersive and captivating gameplay experience overall.
This may also suggest that IRT adapted questions for students more appropriately allowing them to
answer questions with the right level of challenge suited for their knowledge and ability. Furthermore,
this can be attributed to IRT’s tailored difficulty levels, which align closely with students’ abilities
and preferences, fostering a sense of accomplishment and satisfaction. In contrast, NN-ADA’s rapid
adjustments may occasionally lead to mismatches between question difficulty and student proficiency,
affecting overall engagement levels.
The correlation between accuracy rates and overall engagement scores for both the IRT and NN-ADA
groups is 0, indicating that there is no linear relationship between these two variables.
The correlation between Accuracy Rates and Response Times for both the IRT and NN-ADA groups
is 0, indicating no linear relationship between these two variables.
The correlation between response times and overall engagement scores for both the IRT and NN-
ADA groups is 0, indicating that there is no linear relationship between these two variables.
Strengths and weaknesses emerge for each algorithm; IRT demonstrates stability and consistency
but may have longer response times at lower levels, while NN-ADA offers faster decisions but exhibits
more variability in response times. These differences could be attributed to variances in algorithm
design, question calibration methods, and their impact on player engagement.
In summary, while NN-ADA may excel in providing suitable questions at lower difficulty lev-
els, IRT demonstrates superiority as difficulty increases, offering more appropriate questions with
shorter response times, and fostering higher levels of student engagement. These findings suggest the
potential benefits of employing IRT over NN-ADA in educational settings, particularly in scenarios
where the difficulty of the question varies.
or motivated during the lesson, there was insufficient practice or reinforcement of the concepts, the
instructional materials used were inadequate or not aligned with students’ learning styles, or there
may have been external factors such as stress or distractions affecting students’ ability to learn
and retain the information. The differences in accuracy rates between IRT and NN-ADA highlight
potential gaps in students’ knowledge or practice with high-difficulty questions, underscoring the
need for further investigation into factors such as instructional quality, engagement, and learning
environment to enhance understanding and retention in mathematics.
In this concluding chapter, the study comes full circle as key findings are synthesized and implications
are drawn. The chapter reflects on the effectiveness of adaptive difficulty algorithms in educational
games, highlighting their potential to enhance learning outcomes. Additionally, practical recom-
mendations are offered for educators, game developers, and researchers to leverage these algorithms
effectively. By identifying areas for further exploration and suggesting actionable steps, this chapter
aims to guide future endeavors in the realm of adaptive technologies in education, ultimately striving
for continuous improvement in instructional practices and learning experiences.
6.1 Conclusion
In this study, researchers compare the effectiveness of Item Response Theory (IRT) and Neural
Network-Based Adaptive Difficulty Algorithms (NN-ADA) within the realm of educational gaming,
specifically focusing on math-based question-and-answer gameplay. The aim was to discern which
of these adaptive difficulty methodologies better enhances students’ learning experience within an
educational game environment.
Through the development of a math-based question-and-answer game incorporating both IRT
and NN-ADA approaches, comprehensive gameplay sessions were conducted with Grade Six students
at Ateneo de Naga University. The data collected and analyzed provided valuable insights into the
performance and adaptability of these algorithms.
The findings suggest that both IRT and NN-ADA contribute to a tailored learning experience by
63
Nabus, Remos, Wood Chapter 6. Conclusion and Recommendations 64
dynamically adjusting question difficulty levels based on player performance. However, nuances exist
in their effectiveness and adaptability across different difficulty levels. While NN-ADA may excel in
calibrating questions for easier stages, IRT demonstrates superior adjustment to player abilities as
the difficulty increases. Moreover, analysis of response times indicated that IRT tended to provide
questions that matched students’ abilities more closely, resulting in shorter response times than NN-
ADA in higher difficulty levels. This suggests that IRT may offer a more finely tailored challenge
level, promoting engagement and efficiency in learning.
Furthermore, considerations of player engagement reveal differences between IRT and NN-ADA.
While both algorithms sustain engagement to varying degrees, IRT evokes slightly higher levels
of interest, enjoyment, and concentration among players. This aspect highlights the pivotal role
of player experience in educational game design and underscores the potential impact of adaptive
difficulty algorithms on user engagement.
The strengths and weaknesses of each algorithm offer valuable insights for educational game
designers and developers. IRT demonstrates stability and consistency in adapting to question dif-
ficulty, albeit with longer response times in lower levels. On the other hand, NN-ADA may offer
quicker decision-making in easier stages but exhibits more variability in response times, along with
lower accuracy rates at higher difficulty levels.
While this study contributes valuable insights into the comparative effectiveness of IRT and NN-
ADA, it is not without limitations. Challenges in data gathering, including constraints on accessing
student populations and scheduling issues, highlight the complexities inherent in empirical research
within educational settings. Despite these limitations, this study offers meaningful implications for
designing and implementing educational games. By leveraging adaptive difficulty algorithms such
as IRT, developers can tailor learning experiences to individual student needs, thereby optimizing
engagement and learning outcomes.
The study’s findings align with each algorithm’s strengths and characteristics. IRT (Item Re-
sponse Theory) is a traditional psychometric model that estimates the ability of a player based
on their responses to questions. It tends to perform well when dealing with higher difficulty ques-
tions because it’s designed to accurately measure proficiency, especially in situations with a clear
progression of difficulty levels. On the other hand, NN-ADA (Neural Network Adaptive Difficulty
Algorithm) likely utilizes machine learning techniques to adapt to the player’s performance in real-
time, which can be advantageous at lower difficulty levels where there might be more variability
Nabus, Remos, Wood Chapter 6. Conclusion and Recommendations 65
in player responses and where quick adaptation is crucial for engagement. So, the results align
with the strengths and characteristics of each algorithm, reflecting their performance under different
conditions within the game.
In conclusion, this research advances the understanding of adaptive difficulty algorithms within
educational gaming contexts and underscores the potential of IRT to enhance the learning experi-
ence for students. As technology continues to play an increasingly integral role in education, fur-
ther exploration of adaptive methodologies holds promise for advancing technology-assisted learning
methodologies.
6.2 Recommendations
To further enhance the validity and applicability of the findings from this study, several recommen-
dations are proposed, encompassing areas such as sampling size, game features, demographic testing,
algorithm training processes, and expanding the subject focus.
Firstly, expanding the sampling size of participants would significantly bolster the robustness of
research outcomes. Although this study involved grade six students from Ateneo de Naga University,
involving a larger and more diverse pool of participants from multiple educational institutions would
offer greater generalizability and enable a more comprehensive analysis of the effectiveness of adaptive
difficulty algorithms. Increasing the sample size can mitigate potential biases and better capture
the variability inherent in student learning experiences. Furthermore, future research may not be
limited to Grade Six students, but may also include participants from the entire range of elementary
grade levels, high schools, and colleges. This broader approach will allow for a more thorough
investigation of the algorithms’ efficacy across different educational stages and provide insights into
their adaptability and impact on a wider student population.
Secondly, enhancing the features of the quiz game itself represents a key avenue for refinement.
Incorporating additional gameplay elements, such as interactive tutorials, feedback mechanisms, and
personalized learning pathways, can enrich the educational gaming experience and further optimize
learning outcomes.
Furthermore, extending the test to different demographic groups and subject areas holds immense
potential to advance the understanding of adaptive difficulty algorithms in education. By conducting
similar studies across various grade levels and subjects, we can elucidate the transferability and
Nabus, Remos, Wood Chapter 6. Conclusion and Recommendations 66
efficacy of these methodologies in diverse learning contexts. Exploring the applicability of adaptive
difficulty algorithms in subjects beyond mathematics, such as science, history, and other subjects,
can provide valuable insights into their versatility and effectiveness in different knowledge domains.
Moreover, incorporating actual students into algorithm training processes holds immense promise.
By directly utilizing data from students’ interactions with educational content, researchers can fine-
tune algorithms to better meet the needs and preferences of learners, ultimately leading to more
personalized and effective educational experiences.
Furthermore, a longer training time for adaptive algorithms is recommended. Investing in longer
training times for adaptive algorithms, especially the Neural Network-based Adaptive Difficulty
Algorithm (NN-ADA), is essential. A better AI typically requires a more extended training process
to achieve optimal performance. This longer training period allows the algorithm to improve learning
accuracy and refine predictive capabilities.
Additionally, future researchers may significantly increase the number of questions in the ques-
tion bank. A more extensive question bank can provide a broader range of challenges and better
accommodate students’ varying skill levels, ensuring a more tailored and effective learning experi-
ence.
Lastly, implementing a feature in which the game automatically collects all relevant data from
players on different computers is important. This feature would ensure that all data from a player,
regardless of the device used, is automatically collected, sorted, and computed. It would track
metrics such as the level of questions answered, whether the answers were correct or incorrect, the
time taken to answer each question, the total score, and other pertinent data. Automated data
collection will simplify the analysis process and ensure comprehensive data availability to refine
algorithms. This study manually entered all game data into a spreadsheet for each student, which
was time-consuming and labor-intensive. Adding this automated feature would significantly ease
and expedite the data collection process. Unfortunately, this feature was not included due to time
constraints.
In conclusion, by implementing these recommendations, we can advance the field of educational
gaming and contribute to developing more effective and engaging learning technologies. By ex-
panding our sampling size, improving quiz game features, testing across different demographics and
subjects, incorporating actual students for algorithm training, longer training time for adaptive al-
gorithms, and increasing the number of questions in the question bank, we can enhance the validity,
Nabus, Remos, Wood Chapter 6. Conclusion and Recommendations 67
applicability, and impact of our research findings. As we continue to leverage adaptive difficulty al-
gorithms to tailor educational experiences to individual learner needs, our efforts hold the potential
to transform education and empower students to achieve their full potential.
Appendix A
Question Bank
• Level 1: Basics
∗ Choices: 0, 1, 2, undefined
∗ Correct Answer: 1
∗ Choices: 0, 1, 2, undefined
∗ Correct Answer: 0
– What is the result when you subtract any number from itself?
∗ Choices: 0, 1, 2, undefined
∗ Correct Answer: 0
∗ Choices: 0, 1, 2, undefined
∗ Correct Answer: 1
68
Nabus, Remos, Wood Appendix A. Question Bank 69
– Which is equivalent to 21 ?
2 6 4 6
∗ Choices: 6, 2, 8, 9
4
∗ Correct Answer: 8
∗ 9, 33, 63, 13
∗ Correct Answer: 13
∗ Choices: -4, 4, 9, -9
∗ Correct Answer: 9
∗ Correct Answer: 7
∗ 3, 9, 6, 1
∗ Correct Answer: 9
∗ Choices: 2, 4, 8, 12
∗ Correct Answer: 12
– Correct Answer: 7 cm
• If a rectangle has a length-to-width ratio of 5:2, and the width is 8 meters, what is the length
of the rectangle?
– Correct Answer: 20 m
• If the scale factor between two similar triangles is 1:3, and the smaller triangle has a side length
of 6 cm, what is the corresponding side length of the larger triangle?
• If there are 12 boys and 8 girls in a class, what is the ratio of boys to girls?
• A sequence starts with 2 and follows the rule: each term is triple the previous term. Determine
the 3rd term.
Nabus, Remos, Wood Appendix A. Question Bank 72
– 24, 18, 6, 12
– Correct Answer: 18
• Starting with 32, a sequence follows the rule: each term is half of the previous term. Determine
the 4th term.
– Choices: 2, 4, 8, 12
– Correct Answer: 4
• A sequence starts with 3 and follows the rule: each term is double the previous term. Find the
5th term.
Survey Questions
• How did the game provide content that focused your attention? (Concentration)
73
Appendix C
74
Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 75
89
Appendix E
90
Appendix F
91
Nabus, Remos, Wood Appendix F. Screenshots of Emails when Data Collection is Delayed 92
[1] M. Abbasi, A. Shahraki, and A. Taherkordi, Deep learning for network traffic monitoring
and analysis (ntma): A survey, Computer Communications, 170 (2021).
[4] AngloInfo, Schooling and education in the philippines, AngloInfo Philippines, (2023).
[5] M. H. Batista, J. Barbosa, J. Tavares, and J. Hackenhaar, Using the item response
theory irt for educational evaluation through games, International Journal of Information and
Communication Technology Education, 9 (2013), pp. 27–41.
[6] D. Ben Or, M. Kolomenkin, and G. Shabat, Dl-dda – deep learning based dynamic diffi-
culty adjustment with ux and gameplay constraints, (2021).
[7] T. Borer, Feedback loops in agile: A powerful way to deliver value, Agile Rant, (2023).
[8] P. Borkar, What is adaptive learning? benefits and challenges of adaptive learning, Master-
Soft, (2022).
[9] D. Cangemi, The reasons why you must use visual studio code — by denis cangemi
— dev genius. https://blog.devgenius.io/the-reasons-why-you-must-use-visual-studio-code-
b522f946a849, July 2020. (Accessed on 09/22/2023).
[10] L. Chen, Q. Wang, and S. Liu, Adaptive language learning application using neural network
algorithms, ACM Transactions on Computer-Human Interaction, 25 (2020), pp. 312–325.
[11] K. Cieślak and D. Zalewski, Game difficulty level – how to choose it properly?, 10 2021.
[12] Collimator, What is item response theory? https://www.collimator.ai/reference-
guides/what-is-item-response-theory: :text=ItemJuly 2023.
93
REFERENCES 94
[14] C. E. Dictionary, adaptive, No year. Website title: ADAPTIVE — definition in the Cam-
bridge English Dictionary, Date accessed: September 7, 2023.
[15] N. Dorfner and R. Zakerzadeh, Teaching tips academic games as a form of increasing
student engagement in remote teaching, (2021).
[16] F. J. Durán, R. Molina-Carmona, and F. Llorens, Measuring the difficulty of activities
for adaptive learning, Universal Access in the Information Society, 17 (2018), pp. 1–14.
[17] Fintelics, Ai in game difficulty adjustment: Adapting challenges to player skill levels — by
fintelics — medium. https://fintelics.medium.com/ai-in-game-difficulty-adjustment-adapting-
challenges-to-player-skill-levels-b7f7767c96b, May.
[18] K. Flegal, J. Ragland, and C. Ranganath, Adaptive task difficulty influences neural
plasticity and transfer of training, NeuroImage, 188 (2018).
[19] M. Gök and M. Inan, Sixth-grade students’ experiences of a digital game-based learning envi-
ronment: A didactic analysis, JRAMathEdu (Journal of Research and Advances in Mathematics
Education), 6 (2021), pp. 142–157.
[20] M. Haagsman, K. Scager, J. Boonstra, and M. Koster, Pop-up questions within edu-
cational videos: Effects on students’ learning, Journal of Science Education and Technology, 29
(2020).
[21] Hazel, Understanding item response theory, EdisonOS Blog, (2023).
[22] M. Hendrix, T. Bellamy-Wood, S. McKay, V. Bloom, and I. Dunwell, Implementing
adaptive game difficulty balancing in serious games, IEEE Transactions on Games, PP (2018),
pp. 1–1.
[23] K. Hori, H. Fukuhara, and T. Yamada, Item response theory and its applications in edu-
cational measurement part i: Irt and its implementation in r, Wiley Interdisciplinary Reviews
Computational Statistics, 14 (2020), p. e1531.
[26] R. Johns and B. Semah, 7 best programming languages for game development in 2023,
Hackr.io.
[27] A. Johnson and K. Brown, Meta-analysis of neural network-based adaptive difficulty algo-
rithms in educational settings, ACM Transactions on Learning Technologies, 12 (2019), pp. 421–
437.
[56] P. Sunarya, Machine learning and artificial intelligence as educational games, International
Transactions on Artificial Intelligence (ITALIC), 1 (2022), pp. 129–138.
[57] A. Syaiful Adam, Pop-up question on educational physics video: Effect on the learning per-
formance of students, Research and Development in Education, 2 (2022).
[58] N. Thompson, What is the three parameter irt model (3pl)? - assessment systems — online
testing & psychometrics. https://assess.com/three-parameter-irt-3pl-model/, November 2018.
(Accessed on 09/22/2023).
REFERENCES 97
[59] Y. Uesaka, M. Suzuki, and S. Ichikawa, Analyzing students’ learning strategies using item
response theory: Toward assessment and instruction for self-regulated learning, Frontiers in
Education, 7 (2022).
[60] B. Uyen, D. Tong, and N. Lien, The effectiveness of experiential learning in teaching
arithmetic and geometry in sixth grade, Frontiers in Education, 7 (2022), p. 858631.
[61] Wise, The Philippines Education Overview, Wise Blog, (2023).
Kenrick John Harvell B. Nabus is a BS Computer Science student of the Department of Computer
Science at the Ateneo de Naga University.
Matthew Ethan Wood is a BS Computer Science student of the Department of Computer Science
at the Ateneo de Naga University.
98