0% found this document useful (0 votes)
34 views108 pages

Manuscript - Nabus - Remos - Wood Proofread Done and Edited

This thesis presents a comparative analysis of Item Response Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) in math-based educational games. The study evaluates their effectiveness in enhancing student learning experiences, focusing on metrics such as accuracy, response time, and engagement. Findings indicate that IRT outperforms NN-ADA in improving educational outcomes in classroom settings.

Uploaded by

Matthew Wood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views108 pages

Manuscript - Nabus - Remos - Wood Proofread Done and Edited

This thesis presents a comparative analysis of Item Response Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) in math-based educational games. The study evaluates their effectiveness in enhancing student learning experiences, focusing on metrics such as accuracy, response time, and engagement. Findings indicate that IRT outperforms NN-ADA in improving educational outcomes in classroom settings.

Uploaded by

Matthew Wood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Comparative Analysis of Item Response Theory

and Neural Network-Based Adaptive Difficulty


Algorithms in Math-Based Educational Game
Kenrick John Harvell B. Nabus
Bachelor of Science in Computer Science

Mary Angelette M. Remos


Bachelor of Science in Computer Science

Matthew Ethan Wood


Bachelor of Science in Computer Science

Senior Thesis submitted to the faculty of the


Department of Computer Science
College of Computer Studies, Ateneo de Naga University
in partial fulfillment of the requirements for their respective
Bachelor of Science degrees

Project Advisor: Lowie Vincent S. Bisana, MIT


Adrian Leo T. Pajarillo, MS
Michelle Elija B. Santos, MIM
Raphael Henry M. Garay

May 2024
Naga City, Philippines

Keywords: IRT, Neural Network, Adaptive Difficulty, Educational Games

Copyright 2024, Kenrick John Harvell B. Nabus, Mary Angelette M. Remos, and Matthew Ethan Wood
The Senior Project entitled

Comparative Analysis of Item Response Theory and Neural


Network-Based Adaptive Difficulty Algorithms in Math-Based
Educational Game

developed by

Kenrick John Harvell B. Nabus


Bachelor of Science in Computer Science

Mary Angelette M. Remos


Bachelor of Science in Computer Science

Matthew Ethan Wood


Bachelor of Science in Computer Science

and submitted in partial fulfillment of the requirements of their respective Bachelor of Science degrees
has been rigorously examined and recommended for approval and acceptance.

Adrian Leo T. Pajarillo, MS Michelle Elija B. Santos, MIM


Panel Member Panel Member
Date signed: Date signed:

Raphael Henry M. Garay Lowie Vincent S. Bisana, MIT


Panel Member Project Advisor
Date signed: Date signed:
The Senior Project entitled

Comparative Analysis of Item Response Theory and Neural


Network-Based Adaptive Difficulty Algorithms in Math-Based
Educational Game

developed by

Kenrick John Harvell B. Nabus


Bachelor of Science in Computer Science

Mary Angelette M. Remos


Bachelor of Science in Computer Science

Matthew Ethan Wood


Bachelor of Science in Computer Science

and submitted in partial fulfillment of the requirements of their respective Bachelor of Science degrees
is hereby approved and accepted by the Department of Computer Science, College of Computer
Studies, Ateneo de Naga University.

Adrian Leo T. Pajarillo, MS Joshua C. Martinez, MIT


Chair, Department of Computer Science Dean, College of Computer Studies
Date signed: Date signed:
Declaration of Original Work
We declare that the Senior Project entitled

Comparative Analysis of Item Response Theory and Neural


Network-Based Adaptive Difficulty Algorithms in Math-Based
Educational Game

which we submitted to the faculty of the

Department of Computer Science, Ateneo de Naga University


is our own work. To the best of our knowledge, it does not contain materials published or written
by another person, except where due citation and acknowledgement is made in our senior project
documentation. The contributions of other people whom we worked with to complete this senior
project are explicitly cited and acknowledged in our senior project documentation.
We also declare that the intellectual content of this senior project is the product of our own work.
We conceptualized, designed, encoded, and debugged the source code of the core programs in our
senior project. The source code of third party APIs and library functions used in our program are
explicitly cited and acknowledged in our senior project documentation. Also duly acknowledged are
the assistance of others in minor details of editing and reproduction of the documentation.
In our honor, we declare that we did not pass off as our own the work done by another person.
We are the only persons who encoded the source code of our software. We understand that we may
get a failing mark if the source code of our program is in fact the work of another person.

Kenrick John Harvell B. Nabus Mary Angelette M. Remos


3 - Bachelor of Science in Computer Science 4 - Bachelor of Science in Computer Science

Matthew Ethan Wood


3- Bachelor of Science in Computer Science

This declaration is witnessed by:

Lowie Vincent S. Bisana, MIT


Project Advisor
Comparative Analysis of Item Response Theory and Neural
Network-Based Adaptive Difficulty Algorithms in
Math-Based Educational Game
by
Kenrick John Harvell B. Nabus, Mary Angelette M. Remos, and Matthew Ethan Wood
Project Advisor: Lowie Vincent S. Bisana, MIT
Department of Computer Science

ABSTRACT

Educational games have been widely used to enhance mathematics education in K-12 settings.
However, the effectiveness of adaptive difficulty algorithms in educational games has not been well
explored. This study aims to conduct a comprehensive comparative analysis of Item Response
Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) within the
context of educational games. The research focuses on math-based question-and-response gameplay,
which is an important aspect of educational game design. The findings of this study shed light
on the comparative effectiveness of IRT and NN-DA in terms of student accuracy, response time,
and engagement across different difficulty levels. The results suggest that IRT is more effective in
enhancing the learning experience for students, especially in the classroom setting. This research
provides valuable insights for educators and game developers who are interested in using learning
games as an effective tool for improving math education.
We dedicate this research work to all of humanity.

vi
ACKNOWLEDGEMENTS

The researchers thank everyone who helped them finish this thesis.

vii
TABLE OF CONTENTS

1 Introduction 1
1.1 Project Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose and Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Review of Related Literature 5


2.1 The Effectiveness of Adaptive Difficulty . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Influences on Adaptive Difficulty Performance . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Applications of IRT in Educational Games . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Benefits and Limitations of IRT in Educational Games . . . . . . . . . . . . . . . . . 9
2.5 Results of Using Item Response Theory in Educational Games . . . . . . . . . . . . 9
2.6 Results of Using Neural Network-Based Adaptive Difficulty Algorithms in Educational
Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 The Importance of Difficulty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.8 The Importance of Student Engagement . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.9 The Efficacy of Learning Games in Enhancing Mathematics Education . . . . . . . . 16
2.10 Rationale for Selecting Grade 6 Students as Participants . . . . . . . . . . . . . . . . 16

3 Technical Framework 18
3.1 Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Adaptive Difficulty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

viii
3.1.2 Item Response Theory (IRT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.3 Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) . . . . . . 19
3.1.4 Educational Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Relevant Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Item Response Theory (IRT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) . . . . . . 22
3.2.3 Comparison of Item Response Theory (IRT) and Neural Network-Based Adap-
tive Difficulty Algorithms (NN-ADA) . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Software and Hardware Development Tools . . . . . . . . . . . . . . . . . . . . . . . 23

4 Methodology 24
4.1 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 Development Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 Experimental Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Application Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 Planning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.2 Analysis Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.3 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.4 Development Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.5 Testing and Maintenance Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.6 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.7 Deployment Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Participants Profile and Selection . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.2 Session Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.3 Actual Game Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.4 Key Data Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.2 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

ix
4.5 Research Planning and Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5 Results and Discussion 48


5.1 Usability Testing Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Accuracy Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Player Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.5 Performance Metrics Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5.1 Correlations between Accuracy Rates and Overall Engagement Ratings . . . 57
5.5.2 Correlations between Accuracy Rates and Response Times . . . . . . . . . . 58
5.5.3 Correlations between Response Times and Overall Engagement Ratings . . . 59
5.6 Discussion of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.7 Insights about Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.8 Limitations of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6 Conclusion and Recommendations 63


6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

A Question Bank 68

B Survey Questions 73
B.1 In-Game Survey Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
B.2 Post-Game Survey Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

C Tabulation of the Collected Data 74

D Post-Survey Form Screenshot 89

E Data Gathering Documentation 90

F Screenshots of Emails when Data Collection is Delayed 91

x
Chapter 1

Introduction

1.1 Project Context


In recent years, the integration of artificial intelligence (AI) into educational technologies has trans-
formed the way students interact with learning materials [56]. One promising application of artificial
intelligence (AI) in education is the development of educational games that adapt their difficulty
levels based on individual student performance. These AI-driven adaptive difficulty systems hold the
potential to enhance learning experiences by providing tailored challenges to each student, thereby
promoting active learning and sustained engagement [56].
The primary objective of this research is to conduct a comprehensive comparative analysis of
two of the prevalent adaptive difficulty algorithms in adaptive game development: Item Response
Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) [41] [64].
The core of this research involves a multi-faceted examination of AI-driven adaptive difficulty in
math-based educational games.
The first phase involved an in-depth review of existing literature on adaptive learning, AI in
education, and gamification, establishing a conceptual framework for the study. Subsequently, a
question-and-response game was developed, focused on mathematics questions to ensure objective
and quantifiable outcomes. This game was a test ground for the IRT and NN-ADA approaches. To
evaluate the effectiveness of these two adaptive difficulty algorithms, selected participants partic-
ipated and used the game as active players. The participants were diverse, representing different

1
Nabus, Remos, Wood Chapter 1. Introduction 2

learning styles, ensuring a broader perspective on the 2 algorithms’ efficacy. This program will
dynamically adjust the game’s difficulty level based on the player’s performance, providing a con-
trolled environment for data collection and analysis. Metrics such as response time, accuracy, and
individualized learning progression will be systematically recorded and analyzed.
In an era where personalized and engaging learning experiences are increasingly valued, AI-
driven adaptive difficulty stands is a promising approach to enhance educational games. This thesis
project aspires to shed light on the multifaceted implications of this technology on education. By
amalgamating educational theory, technological innovation, and cognitive psychology principles, this
research aims to provide educators and game developers with valuable insights into designing and
deploying AI-driven adaptive educational games.
Ultimately, the anticipated outcome of this study is to contribute to the ongoing discourse sur-
rounding the integration of AI and education, thereby paving the way for more effective and enjoy-
able learning journeys. Through a rigorous comparative analysis of IRT and NN-ADA, this research
seeks to inform educational leaders and relevant groups about the potential benefits and challenges
associated with these adaptive difficulty algorithms, ultimately advancing the field of AI-enhanced
education.

1.2 Purpose and Description


The purpose of this research is to conduct a comprehensive comparative analysis between Item Re-
sponse Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) within
the context of educational games. This investigation aims to discern which of these two methodolo-
gies is more effective in enhancing the learning experience within an educational game environment.
The study will involve the creation of a simple question-and-answer math-based game. This
program will incorporate both the IRT and the NN-ADA approaches to dynamically adjust the
game’s difficulty level based on the player’s performance. By collecting and analyzing data generated
during gameplay, the research aims to provide valuable insights into which of these adaptive difficulty
algorithms is more effective in optimizing learning outcomes within the educational gaming context.
The findings of this study will have the potential to provide valuable information to the design
and implementation of more effective educational games, contributing to the ongoing advancement
of technology-assisted learning methodologies.
Nabus, Remos, Wood Chapter 1. Introduction 3

1.3 Research Questions


• What is the comparative effectiveness of Item Response Theory (IRT) and Neural Network-
Based Adaptive Difficulty Algorithms (NN-ADA) in enhancing the learning experience within
the context of educational games, specifically focusing on math-based question-and-answer
gameplay?

• To what extent do Item Response Theory (IRT) and Neural Network-Based Adaptive Difficulty
Algorithms (NN-ADA) demonstrate variations in adaptability to performance levels within the
educational game?

• What significant differences exist in the results of the game sessions between item response
theory (IRT) and neural network-based adaptive difficulty algorithms (NN-ADA) and how
might these differences influence the choice between the two methods?

1.4 Objectives
The main objective of this research is to perform a comparative analysis of Item Response Theory
(IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) to determine which
of these two adaptive difficulty algorithms proves more effective in enhancing the learning experi-
ence within an educational game environment, particularly focusing on math-based question and
answer gameplay. In order to achieve the main objective, the following specific objectives must be
accomplished:

• To design and develop a math-based educational game that will serve as the controlled envi-
ronment for testing IRT and NN-ADA;

• To gather and analyze quantitative data on player interactions within the game, including
accuracy rates, response times, and player engagement;

• To evaluate user engagement and experience with both algorithms;

• To measure and compare the impact of IRT and NN-ADA on learning outcomes within the
gameplay sessions; and
Nabus, Remos, Wood Chapter 1. Introduction 4

• To derive insights into the strengths and weaknesses of both IRT and NN-ADA in the context
of educational game design.

1.5 Scope and Limitations


The scope of this research encompasses a comprehensive examination of the effectiveness of two
adaptive difficulty algorithms, Item Response Theory (IRT) and Neural Network-Based Adaptive
Difficulty Algorithms (NN-ADA), within the context of educational games.
The researchers will develop a game incorporating both algorithms. The game will focus on math
questions to offer objective and quantifiable answers, making it easier to measure the effectiveness
of the two adaptive difficulty algorithms. The questions will align with the curriculum of the re-
searchers’ target participants. Consequently, the study aims to involve 100 Grade Six students at
Ateneo de Naga University for actual game sessions to facilitate the collection of relevant data to
assess the two algorithms.
There is only one computer lab available, with a capacity of accommodating up to 40 students
at a time. The game is accessible through an application that can be downloaded on laptops or
personal computers and is compatible exclusively with Windows operating systems. Therefore, the
hardware must meet sufficient quality standards to ensure smooth game execution. Importantly,
users do not need internet access to use the game, and Unity and C# will be utilized for the game
development.
Chapter 2

Review of Related Literature

In this chapter, the study explores the essential body of knowledge that underpins the comparative
analysis of Item Response Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms
(NN-ADA) in the context of educational games. This chapter serves as a critical bridge between
existing theories, methodologies, and the current study, offering insights into the evolution and
application of adaptive difficulty algorithms in educational gaming. Through this comprehensive
review, the research seeks to identify key trends, challenges, and gaps in the literature, guiding the
investigation and interpretation of empirical findings in subsequent chapters.

2.1 The Effectiveness of Adaptive Difficulty


Adaptive difficulty adjustments in educational settings involve the dynamic adjustment of the diffi-
culty level within learning activities to align with each student’s individual needs and abilities. This
approach has shown effectiveness on multiple dimensions of education, covering motivation, learning,
and engagement [47] [8]. One of the key advantages of employing adaptive difficulty adjustments is
facilitating better and faster progress. Students can proceed at their own pace, eliminating the need
to wait for peers or grapple with challenging concepts, thereby promoting a more efficient learning
experience [8].
Furthermore, adaptive learning contributes to improved comprehension and mastery of subjects.
Tailoring the difficulty level to the individual student’s knowledge ensures that basic concepts are
solidified before advancing to more complex material [47] [8]. This personalized approach not only

5
Nabus, Remos, Wood Chapter 2. Review of Related Literature 6

enhances learning outcomes but also fosters increased student engagement. Adaptive learning sys-
tems create a more captivating and motivating educational environment by providing customized
learning experiences that align with individual needs and interests [8] [48]. An additional benefit lies
in the ability of adaptive learning to identify knowledge gaps and provide real-time feedback [8]. This
targeted focus on student understanding allows educators to address specific challenges promptly
and offer timely guidance. Moreover, adaptive learning transcends language barriers, making it
inclusive and suitable for diverse groups of learners [8]. This adaptability ensures that appropriate
learning content is provided to all students, irrespective of their background, improving accessibility
and inclusivity in education.
Digital games have been increasingly used to train executive function skills in learners of different
ages. One approach to enhancing the effectiveness of digital games is through adaptive difficulty
adjustment, which customizes gameplay based on the player’s performance. A study investigated
the effectiveness of adaptive difficulty adjustment in a digital game designed to develop executive
function skills in learners of different ages [43]. The results showed that both adaptive and non-
adaptive versions of the game resulted in increased shifting skills for all learners, with adolescents
scoring higher than pre-adolescents and early adolescents. However, a trend suggested that adaptive
treatment may be more effective for adolescents. This study highlights the potential of digital games
to train executive function skills and the importance of adaptive difficulty adjustment in customizing
gameplay for different learners [43].
Despite these merits, implementing adaptive difficulty adjustments still faces certain challenges.
Convincing educators of the advantages of adaptive learning, addressing technical and organizational
hurdles and ensuring adequate support for remote learners are among the obstacles [8] [33] [39].
Nevertheless, research indicates that adaptive learning remains an effective strategy for enhancing
student motivation and improving learning outcomes across various educational settings [47]. As
these challenges are addressed, the potential benefits of adaptive difficulty adjustments in education
continue to make a compelling case for their integration into contemporary learning environments.

2.2 Influences on Adaptive Difficulty Performance


Adaptive difficulty performance in education can be influenced by various factors that impact stu-
dent motivation, engagement, and learning outcomes. Research has shown that adaptive difficulty
Nabus, Remos, Wood Chapter 2. Review of Related Literature 7

adjustments in educational computer games can lead to higher motivation and learning levels than
incremental difficulty adjustments [47]. Furthermore, the strength and direction of difficulty adap-
tation can affect the situational interest in game-based learning [28]. This may affect students in a
way that aids them in learning and mastering concepts they may have a weak grasp on.
Cognitive-behavioral motivation, including adaptive cognition and behavior, has been found
to have a statistically significant positive correlation with student engagement [51]. This means
that students with higher levels of adaptive cognition and behavior tend to be more engaged in
their educational activities. Furthermore, the effectiveness of adaptive difficulty adjustment on
the development of executive function skills for learners of different ages has been studied, with
results showing increases in the change of skills for all learners between the pre-test and post-test
measures [43]. This suggests that when educational challenges are appropriately adjusted to match
the learner’s abilities, it facilitates more effective learning and skill development.
Adaptive learning systems, such as adaptive quizzes, have been shown to improve student learn-
ing outcomes and increase student motivation and engagement [46]. By providing personalized
feedback and adjusting difficulty levels in real-time, adaptive learning systems ensure that students
are neither bored with overly simple tasks nor overwhelmed by overly difficult ones. This tailored
approach keeps students within their optimal learning zone, promoting continuous engagement and
improvement. Moreover, adaptive difficulty influences neural plasticity and training transfer, with
adaptive difficulty mediating the behavioral and neural effects of cognitive training [18]. A research
suggests that by adjusting the difficulty of tasks to match the learner’s current capabilities, adaptive
learning systems can enhance the brain’s ability to reorganize and form new neural connections,
a phenomenon known as neural plasticity [18]. This process is crucial for the effective transfer
of training, where skills and knowledge gained in one context are applied to different, real-world
situations.
Competitive agents and adaptive difficulty within educational video games have been studied,
with findings suggesting that adaptive approaches could optimize learning outcomes by addressing
individual differences and optimizing learning challenges [36]. This personalization ensures that each
learner has challenges that are neither easy nor difficult, thereby maintaining an optimal learning
environment. The presence of competitive agents, which simulate real opponents, further enhances
engagement by adding a dynamic and interactive component to the learning experience. Further-
more, the frequency of difficulty adaptations in adaptive training systems has been examined, show-
Nabus, Remos, Wood Chapter 2. Review of Related Literature 8

ing that adapting difficulty based on performance can manage the intrinsic load of learners and affect
performance gains [31]. When difficulty is adapted frequently and appropriately, it prevents cogni-
tive overload, which can hinder learning, and instead maintains a balance that promotes sustained
engagement and continuous improvement.
In summary, adaptive difficulty performance in education is influenced by various factors, includ-
ing cognitive-behavioral motivation, situational interest, and the effectiveness of adaptive learning
systems. These influences are crucial in shaping students’ motivation, engagement, and learning
outcomes in educational settings.

2.3 Applications of IRT in Educational Games


IRT’s potential for use in educational video games is very high. IRT can adapt the gameplay to
each player’s skills by modeling the relationship between item features and player responses [12] [47].
This flexibility improves player involvement and academic results by ensuring the game’s challenges
appropriately match each player’s proficiency [29]. A player with advanced skills may encounter more
tough difficulties. In contrast, a player having difficulty with specific ideas may receive questions
that are properly challenging, thus maintaining engagement and promoting learning.
IRT can also shed light on the success of instructional games by assessing players’ skill levels and
development. It enables game designers and educators to evaluate the game’s influence on unique
learning trajectories and pinpoint the areas in which players thrive or need more assistance [12].
Educators can offer targeted interventions by pinpointing these areas, and game designers can refine
game elements to address learner needs better.
In summary, IRT’s integration into educational video games offers a flexible and personalized
approach to learning. By adapting gameplay to match player skills and providing detailed assess-
ments of player development, IRT not only enhances engagement and academic performance but also
supports the continuous improvement of educational games. This ensures that each player receives
an optimal learning experience tailored to their unique needs and abilities.
Nabus, Remos, Wood Chapter 2. Review of Related Literature 9

2.4 Benefits and Limitations of IRT in Educational Games


There are many advantages to using IRT in instructional games. It enables personalized learning
experiences, which raises player motivation and engagement. IRT also offers precise player skill
assessments, empowering educators to make data-driven decisions regarding instructional tactics
and game-based interventions [12].
There are, however, some restrictions to take into account. Some developers can find it difficult
to implement IRT in educational games because it may demand a thorough understanding of psy-
chometrics. Additionally, especially for smaller-scale games, gathering adequate data to precisely
estimate model parameters might be difficult [12]. Despite these difficulties, IRT is a useful tool in
the field of game-based learning since it can enhance learning outcomes and improve educational
game design.
Item Response Theory, in conclusion, is promising for creating flexible and successful instructional
games. It is a crucial tool in the design of educational games since it allows gaming experiences to
be tailored to specific player capacities and provides insightful information about learning progress.
Even though there are difficulties, the advantages of employing IRT in this situation are clear:
providing chances for improved player involvement and instructional game outcomes.

2.5 Results of Using Item Response Theory in Educational


Games
Item Response Theory (IRT) has been used in educational games to analyze students’ learning
strategies, assess individual levels of strategy acquisition, and adjust instruction according to indi-
vidual levels. For example, a study from the University of Tokyo and Yokohama National University
used IRT to analyze students’ learning strategies and specify their levels of strategy acquisition [59].
The paper by Uesaka, Suzuki, and Ichikawa explores students’ learning strategies using Item Re-
sponse Theory (IRT) to assess the acquisition of effective learning strategies and provide guidance
for self-regulated learning. The study utilized a questionnaire with 70 items related to learning
strategies across various subjects and surveyed 1,500 Japanese university students. IRT analysis
revealed varying levels of strategy acquisition among students, with some strategies proving more
effective. The research identified a positive correlation between learning strategy acquisition scores
Nabus, Remos, Wood Chapter 2. Review of Related Literature 10

and academic performance, proposing a framework for personalized student advising based on strat-
egy levels. The study contributes valuable insights for educators and researchers, emphasizing the
significance of tailored interventions for self-regulated learning to enhance academic outcomes. The
results highlighted the efficacy of IRT in elucidating nuanced aspects of student learning approaches.
Another study from Carnegie Mellon University integrated IRT with an educational game to
evaluate students’ skills and behaviors, showing that IRT can conduct an accurate evaluation by
considering the proportional difficulty of each question in the evaluated group [5] [24]. The study
introduces the Psychometric Profile Generator (PPG), a computational model utilizing Item Re-
sponse Theory (IRT) to create user profiles based on skill and behavior levels. Integrated with
an educational game using Brazilian Exam questions, the PPG was evaluated with 113 students
(average age: 11) in a Brazilian school. Results demonstrate the PPG’s accuracy, considering the
number of correct answers and question difficulty. The IRT model effectively analyzed data, proving
reliable in assessing students’ skills. The study concludes that IRT in educational game evaluation
provides an entertaining and dependable method for skill assessment, suggesting broader applica-
tions in computer engineering courses and organizational training with a focus on mobile learning.
Furthermore, IRT has been applied to create user profiles and evaluate students’ skills and behaviors
in educational games, demonstrating its potential for educational evaluation through games [5] [24].
These applications highlight the effectiveness of IRT in assessing and improving learning outcomes
in educational games.
In conclusion, Item Response Theory (IRT) proves to be a highly effective tool in the realm of
educational games, offering profound insights into students’ learning strategies and skill levels. These
findings underscore IRT’s potential to create personalized learning experiences, providing precise
assessments that inform tailored educational interventions. Despite implementation challenges, the
benefits of employing IRT in educational contexts are clear, offering significant enhancements in
student engagement and learning outcomes.

2.6 Results of Using Neural Network-Based Adaptive Diffi-


culty Algorithms in Educational Games
Previous studies on implementing neural network-based adaptive difficulty algorithms in educational
games have showcased promising outcomes in enhancing learning experiences. For instance, a study
Nabus, Remos, Wood Chapter 2. Review of Related Literature 11

implemented a neural network-based adaptive difficulty algorithm in a mathematics learning game


for middle school students [53]. The algorithm analyzed the students’ performance data in real
time and adjusted the complexity of math problems accordingly. The results indicated a significant
improvement in students’ learning outcomes, with higher retention of mathematical concepts and
increased engagement observed compared to traditional static difficulty levels.
Similarly, another study focused on a language learning application employing adaptive difficulty
through neural network algorithms [10]. The system dynamically modified the complexity of vocabu-
lary and grammar exercises based on individual learner proficiency. The outcomes revealed enhanced
learning efficiency, as learners exhibited better mastery of language skills, improved retention, and
increased motivation to continue learning.
Furthermore, a meta-analysis reviewed multiple studies across various educational domains im-
plementing neural network-based adaptive difficulty algorithms [27]. The analysis indicated a con-
sistent trend towards improved learning outcomes, including higher knowledge retention, increased
engagement, and better overall performance among learners. This meta-analysis underscores such
algorithms’ broad applicability and effectiveness in optimizing learning experiences across diverse
educational contexts.
In conclusion, there is a great deal of potential for improving learning outcomes by using neural
network-based adjustable difficulty algorithms in educational games. These algorithms guarantee
that instructional content aligns with each learner’s needs and skill level by providing a tailored and
dynamic approach to education. The collective results of research in arithmetic, language learning,
and other educational fields can show how these algorithms can improve the educational system by
promoting more efficient and interesting learning opportunities.

2.7 The Importance of Difficulty


Some studies investigated the intricate relationship between game difficulty levels and player psy-
chology, shedding light on the profound effects of varying difficulty levels on players’ experiences
and psychological states. It emphasizes the importance of finding the right balance in difficulty, as
excessive challenge can lead to temporary decreases in self-esteem and player abandonment, while
well-balanced difficulty can trigger addiction-like behavior due to temporary boosts in self-esteem
and self-efficacy in addition to an effective learning experience [47]. The research highlights the
Nabus, Remos, Wood Chapter 2. Review of Related Literature 12

unique impact of success and failure on self-esteem, especially in multiplayer modes. Furthermore,
the study explores how individual personality traits, such as extraversion, emotional stability, and
openness to experience, influence players’ preferences for higher difficulty levels [11]. Moreover, the
study explores the unique influence of success and failure on self-esteem, particularly in multiplayer
modes, and examines how individual personality traits, such as extraversion, emotional stability,
and openness to experience, shape players’ preferences for higher difficulty levels. This underscores
the need for game designers to consider player psychology and tailor difficulty levels accordingly.
The material also provided valuable guidelines for game designers when setting difficulty lev-
els, emphasizing the importance of maintaining a consistent level of challenge, gradually increasing
difficulty, introducing new mechanics effectively, and preventing players from feeling stuck. It un-
derscores the need to tailor the game’s difficulty to its target audience and communicate it clearly in
advertising. Additionally, the text discusses various game genres, from challenging ”soulslike” games
to story-driven ”walking simulators” and casual mobile games, highlighting the different expectations
and needs of players in each category [11]. Furthermore, the text stresses the significance of precise
playtesting with representative players to fine-tune difficulty levels, whether the target audience
comprises casual gamers, hardcore fans of challenging games, or a broader player base with varying
preferences. Through well-designed playtests, developers can gain insights into player reactions and
preferences regarding difficulty, enhancing the overall gaming experience [11]. Furthermore, the text
underscores the importance of precise playtesting with representative players to fine-tune difficulty
levels. By gaining insights into player reactions and preferences through well-designed playtests,
developers can enhance the overall gaming experience and avoid negative feedback, whether cater-
ing to casual gamers, hardcore fans of challenging games, or a broader player base with varying
preferences.
In conclusion, the study recognizes that determining the optimal game difficulty level is a complex
endeavor that requires careful consideration of game genre, player psychology, and target audience.
Properly reflecting these factors in promotional activities and conducting meticulous playtests are
essential for achieving the desired player experience and avoiding negative feedback.
Nabus, Remos, Wood Chapter 2. Review of Related Literature 13

2.8 The Importance of Student Engagement


Student engagement in educational games is of great importance as it can enhance learning, motiva-
tion, and overall student performance. Several studies have shown that the gamification of education
can lead to increased levels of engagement, similar to what games can achieve, and can improve stu-
dents’ particular skills and optimize their learning [52]. Game-based learning can increase students’
engagement in the learning process by focusing on collaborative aspects between students, tailored
project-based activities, and assigning authentic, relevant, and meaningful out-of-classroom work
[2]. Additionally, academic games have been found to increase engagement and participation among
students when designed correctly [15]. Therefore, incorporating educational games into the learning
process can be a valuable strategy to promote student engagement and improve learning outcomes.
In flipped classrooms, where students prepare lesson material at home through educational videos,
engaging students to watch these videos attentively is a significant challenge. However, some studies
supported the use of pop-up questions within educational activities, such as a physics topic and a
topic on religious characters of the human digestive system to maintain and enhance student en-
gagement [37] [57]. Another study focused on using pop-up questions within long educational videos
to address this challenge. These questions aimed to enhance student engagement and understand-
ing before in-class activities. The study, conducted in a molecular biology course, compared the
learning performance of students who watched videos with pop-up questions to those without them
[20]. The results indicated that the group with pop-up questions performed significantly better
on tests than those without [20]. Answering pop-up questions did not necessarily lead to better
performance on items testing those specific concepts, suggesting that the mere presence of pop-up
questions enhanced overall learning [20]. Further data from interviews, surveys, and learning analyt-
ics supported the idea that pop-up questions influenced viewing behavior by promoting engagement.
The study concluded that pop-up questions stimulate learning when studying videos outside class
through an indirect testing effect. Overall, the study highlights the positive effects of integrating
pop-up questions into educational videos, emphasizing their role in enhancing student engagement,
understanding, and ultimately, learning outcomes in flipped classroom settings.
A study focusing on the use of pop-up questions in a physics video designed for junior high school
students aimed to analyze the impact of these questions on students’ learning performance. The
pop-up question video was developed specifically to aid students in understanding the concept of
oscillation, aligning with previous research recommendations to reduce cognitive load. The study
Nabus, Remos, Wood Chapter 2. Review of Related Literature 14

involved measuring the learning performance of 100 junior high school students, focusing on concept
attainment, motivation, and cognitive load after engaging with the pop-up question video. The
results indicated positive outcomes, with a 74% concept attainment rate, suggesting a good level
of understanding among students [57]. Additionally, students’ motivation levels were high, with
84% falling into the ”good” category, indicating that the pop-up questions contributed positively
to their engagement with the material [57]. However, the cognitive load percentage was lower at
38%, indicating that students experienced less mental strain when interacting with the video [57].
Overall, the findings suggest that integrating pop-up questions into educational videos can benefit
students’ motivation and cognitive load. By facilitating engagement and reducing mental strain,
pop-up questions enhance the learning experience, particularly in complex subjects like physics.
Furthermore, the study suggests that integrating pop-up questions into innovative learning models
could further optimize their impact on student learning outcomes.
A study ”Measuring Flow in Educational Games and Gamified Learning Environments” by Sh-
ernoff, Hamari, and Rowe delves into the intricate relationships between student engagement, flow,
and learning outcomes in educational gaming contexts. Student engagement is critical in educational
settings, directly influencing learning outcomes and academic performance. By exploring how factors
like interest, enjoyment, and concentration impact engagement levels during gameplay, the study
sheds light on the mechanisms through which students become deeply engaged in learning [49]. The
integration of Structural Equation Modeling (SEM) and psychometric surveys in this study provides
a robust framework for analyzing student engagement in educational games [49]. SEM allows for
developing models that depict the complex interconnections between engagement, flow, and learning
outcomes, offering a deeper understanding of the dynamics at play. Using the Experience Sampling
Method (ESM) and psychometric surveys, the researchers measured student engagement and flow,
capturing students’ interest levels, enjoyment, and concentration during gameplay [49]. The findings
of this study underscore the importance of fostering student engagement to enhance learning expe-
riences in educational games. By aligning the challenge levels presented in the game with students’
skill levels, educators and game designers can create environments that promote optimal engagement
and facilitate the flow state. This balance between challenge and skill enhances student motivation
and interest and contributes to deeper learning and knowledge acquisition. Overall, the research
conducted by Shernoff, Hamari, and Rowe highlights the significance of student engagement in edu-
cational gaming contexts and its impact on learning outcomes. By emphasizing the role of interest,
Nabus, Remos, Wood Chapter 2. Review of Related Literature 15

enjoyment, and concentration in fostering deep engagement, the study provides valuable insights
for educators and game designers seeking to create immersive and effective learning experiences for
students [49].
A study ”Adaptive quizzes to increase motivation, engagement, and learning outcomes in a first-
year accounting unit” explored using adaptive quizzes to enhance student motivation, engagement,
and learning outcomes in an online first-year accounting unit. Adaptive learning offers personalized
learning opportunities tailored to individual student needs, potentially improving learning outcomes
and increasing student motivation and engagement [46]. The research findings indicated that while
the adaptive quizzes did not directly lead to significant improvements in student scores, students
overwhelmingly enjoyed using the quizzes for their learning. Most surveyed students expressed a pos-
itive attitude towards adaptive quizzes, with a high percentage agreeing that the quizzes were useful
and provided motivation to keep trying. This positive feedback suggests that adaptive quizzes can
enhance student motivation and engagement in the learning process [46]. The study highlighted the
importance of considering student preferences and satisfaction when implementing adaptive learning
technologies to optimize learning outcomes. While challenges exist in developing adaptive quizzes
that effectively increase student motivation and engagement while improving learning outcomes, fur-
ther research is needed to align these factors using adaptive release testing technologies. The study
acknowledged limitations such as the small sample size and the need for additional investigation
into why some students did not use the quizzes and whether quiz usage correlated with improved
performance [46]. Overall, the research contributes valuable insights into the potential benefits of
adaptive quizzes in enhancing student motivation and engagement in a first-year accounting unit.
The positive student perceptions of adaptive quizzes underscore the importance of incorporating
adaptive learning tools in educational settings to support student learning experiences. Further
research in this area can help refine the design and implementation of adaptive quizzes to better
align with student needs and optimize learning outcomes in higher education contexts.
The researches highlight the significant role of student engagement in educational games and
learning activities. By fostering collaborative learning experiences, aligning tasks with students’ pref-
erences, and integrating interactive elements like pop-up questions, educators enhance engagement,
motivation, and learning outcomes. Adaptive learning technologies further personalize learning, in-
creasing student satisfaction and participation. Understanding the interplay between engagement,
flow, and learning outcomes in gaming contexts underscores the importance of creating dynamic
Nabus, Remos, Wood Chapter 2. Review of Related Literature 16

and immersive learning environments. These insights can help educators and developers optimize
engagement strategies for enhanced learning experiences and improved outcomes.

2.9 The Efficacy of Learning Games in Enhancing Mathe-


matics Education
This systematic review provides valuable information on the effectiveness of learning games in foster-
ing mathematics education in K-12 settings. The review analyzed 43 articles that met the inclusion
criteria and evaluated the quality of the studies using five-dimensional criteria [40]. The review found
that using learning games in K-12 settings is feasible and can effectively enhance math education.
The study also suggests that game developers should ensure that the game is designed as a peda-
gogical instrument or a planned application tool. Game activities should be aligned with students’
preferred modes of gameplay, their prior knowledge, and the learning tasks. In-game prompts and
learning scaffolds can effectively promote math learning during gameplay [40]. The review concludes
that more research is needed to recommend using learning games to teach different math topics
and to identify best practices for their design and implementation. This study provides valuable
information for educators and game developers interested in using learning games to enhance math
education in K-12 settings.
Overall, this systematic review offers valuable insights for educators and game developers in-
terested in leveraging learning games to enrich math education in K-12 settings. By synthesizing
existing research and identifying key considerations for game design and implementation, it serves
as a resource for guiding future efforts in this domain, ultimately contributing to enhanced student
learning experiences and outcomes.

2.10 Rationale for Selecting Grade 6 Students as Partici-


pants
In the Philippines, Grade 6 students typically fall within the age range of 11 to 12 [4] [61]. In contrast,
internationally, in most countries including the United States, Grade 6 students are generally aged
11 to 13 [42][48]. At this stage, they are commonly referred to as pre-adolescents or early adolescents,
Nabus, Remos, Wood Chapter 2. Review of Related Literature 17

bridging the developmental gap between childhood and adolescence.


Grade 6 represents a critical stage in cognitive development, where adolescents exhibit enhanced
problem-solving and critical-thinking skills [30]. The educational relevance of this grade level, par-
ticularly in mathematics, aligns with the current emphasis on interactive and game-based learning
strategies [60]. Moreover, research in educational psychology supports the suitability of adaptive dif-
ficulty algorithms for learners in late childhood and early adolescence, contributing to personalized
and optimized learning experiences [43]. Recognizing Grade 6 students’ sensitivity to motivation
and engagement factors, particularly in the transition to adolescence, the literature highlights the
positive impact of educational games and adaptive difficulty algorithms on sustained engagement
and positive attitudes toward learning [43]. Previous successful studies with similar demographics
further validate the feasibility and effectiveness of engaging Grade 6 students in educational game
settings [19]. This establishes the solid rationale for choosing Grade 6 students as the study’s target
participants.
Chapter 3

Technical Framework

This chapter lays the groundwork for the practical implementation and experimentation with Item
Response Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) in
the context of educational games. This chapter serves as the bridge between theoretical concepts
and practical application. Describes the relevant algorithms, tools, development techniques, and
methodologies essential for the study’s development, simulation, and data collection. By presenting
this technical framework, the research aims to provide transparency in integrating adaptive difficulty
algorithms into the educational game environment, ensuring the reliability and validity of subsequent
findings.

3.1 Terminologies
Researchers may use these terminologies in their communication, but the meanings of these words
may vary depending on the context. To avoid confusion, the default meanings of these words are
provided along with their definitions. Unless otherwise specified, the default meaning of a word
should be assumed during communication.

3.1.1 Adaptive Difficulty

The term ”adaptive difficulty” within the context of this research refers to the dynamic adjustment
of the challenge level in an educational game. ”Adaptive,” as defined by, denotes the capacity to

18
Nabus, Remos, Wood Chapter 3. Technical Framework 19

change in response to changing conditions [14], and ”difficulty,” according to [32], refers to the
quality or state of being challenging, difficult to accomplish, deal with, or understand. In this study,
adaptive difficulty encompasses the application of both Item Response Theory and Neural Network-
Based Adaptive Difficulty Algorithms to tailor the educational game experience based on individual
learner performance.

3.1.2 Item Response Theory (IRT)

Item Response Theory (IRT) is a statistical and mathematical framework used in educational and
psychological evaluation to model how individuals respond to test items. It focuses on estimating
individuals’ latent traits or abilities based on their responses to a set of test items, considering the
difficulty and discrimination of each item [23]. In the context of educational game development and
adaptive algorithms, IRT provides a foundational understanding of how individuals’ abilities can be
assessed and modeled based on their interactions with in-game items or questions. IRT offers a way
to gauge a player’s skill or knowledge level by analyzing their responses within the game, which can
be instrumental in designing and implementing adaptive difficulty algorithms [44].

3.1.3 Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA)

Neural networks, often referred to as artificial neural networks (ANNs) or simulated neural networks
(SNNs), constitute a fundamental component of machine learning and play a central role in deep
learning algorithms. Inspired by the structure and functioning of the human brain, they are com-
posed of layers of nodes that function as artificial neurons interconnected with others, characterized
by specific weights and thresholds. Neural networks rely on training data to enhance their accuracy
progressively, ultimately becoming formidable tools in computer science and artificial intelligence.
They significantly accelerate speech and image recognition tasks, dramatically reducing processing
time compared to manual human identification [25].
Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) are AI-driven algorithms that
use neural networks to dynamically adjust the difficulty level of a task or activity, such as in educa-
tional games. These algorithms analyze user behavior and performance data to tailor the challenge
level to the individual’s current skill or knowledge level [45]. In the context of educational game
development, NN-ADA represents a sophisticated approach to adaptive algorithms that harness the
Nabus, Remos, Wood Chapter 3. Technical Framework 20

power of neural networks to make real-time, data-driven adjustments to the gameplay experience
[45].

3.1.4 Educational Game

An educational game is a type of video game or interactive software designed to facilitate learning
or teach specific skills, knowledge, or concepts. These games integrate educational content with
engaging gameplay elements to enhance the learning experience [2].

3.2 Relevant Algorithms

3.2.1 Item Response Theory (IRT)

Item Response Theory (IRT) is a statistical framework with significant potential in educational
games. Unlike traditional test theory, which assumes fixed item difficulty, IRT considers the difficulty
of the items within the game and the players’ abilities [12]. This approach allows for more accurate
assessments of player abilities, making it a valuable tool for creating adaptive learning experiences
in educational games.
The researchers will employ item response theory (IRT) as a foundational framework to gauge
and adapt the difficulty of educational assessments. IRT is a statistical method widely used in the
field of educational measurement and assessment, allowing researchers to model how individuals
respond to test items based on their underlying abilities. This theory enables them to accurately
estimate a student’s proficiency level and the difficulty of each test item. As seen by the algorithm
below, the Two-Parameter Logistic (2PL) model is one of the fundamental IRT models they will
utilize.

Figure 3.1: IRT Formula

In IRT, each item in an educational game is associated with a characteristic curve that describes
the probability that a player will correctly answer it according to their ability level. The curve
Nabus, Remos, Wood Chapter 3. Technical Framework 21

helps define the item’s difficulty and discrimination properties [12]. The 3PL model holds particular
significance in this research due to its comprehensive representation of item characteristics. It incor-
porates three essential parameters: the item’s discrimination parameter (a), its difficulty parameter
(b), and a guessing parameter (c) [58]. The discrimination parameter measures how effectively an
item distinguishes between individuals with differing abilities, the difficulty parameter indicates the
level of ability at which a respondent has a 50% chance of answering the item correctly, and the
guessing parameter accounts for random guessing by participants. By integrating the 3PL model
into their adaptive difficulty algorithm, the researchers aim to establish a sophisticated and accurate
system that customizes educational content to suit the unique abilities of each learner. This ap-
proach enhances the precision of assessments and fosters a more individualized and effective learning
experience, addressing a critical need in modern education.

Figure 3.2: IRT Graph


Nabus, Remos, Wood Chapter 3. Technical Framework 22

3.2.2 Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA)

The central formula that underlies Neural Network-Based Adaptive Difficulty Algorithms involves
using neural networks to predict each player’s optimal level of challenge. Neural networks excel at
identifying intricate patterns in player data, allowing for precise adjustments to the game’s difficulty.
The essence of these algorithms lies in their ability to continuously learn and evolve based on the
player’s interactions within the educational game [6].
Neural networks within NN-ADA rely on sophisticated data processing and feature extraction
techniques. This involves collecting and analyzing various player metrics, including response times,
the correctness of answers, player interactions, and even physiological data when available. This
adaptability is made possible through collecting and analyzing gameplay data, which is then fed
into a neural network for predictive modeling. As players engage in the educational game, the
neural network refines its understanding of the player’s skill level and makes real-time adjustments
to the game experience [6].
To assess the effectiveness of NN-ADA, the researchers will consider various evaluation metrics,
such as accuracy and precision, progress patterns, and adaptation speed. These metrics help quantify
the impact of the algorithm on learning outcomes and user experience.
Hence, the core concept of neural network-based adaptive difficulty algorithms in educational
games revolves around providing players with an optimal level of challenge. This means that when
players encounter too easy or too difficult questions, the algorithms step in to recalibrate the learning
experience [34]. For instance, if a player repeatedly struggles to grasp a specific concept or solve
a particular problem, the algorithm can detect this pattern and adjust the difficulty downward to
ensure a smoother learning progression. In contrast, when a player demonstrates mastery, and the
educational content becomes too predictable, the difficulty can be increased to maintain engagement
and excitement.

3.2.3 Comparison of Item Response Theory (IRT) and Neural Network-


Based Adaptive Difficulty Algorithms (NN-ADA)

Algorithm Mechanism

Item Response Theory (IRT) operates as a statistical model, analyzing the characteristics of the item
against the responses of the players to gauge individual abilities [38]. Its strength lies in assessing
Nabus, Remos, Wood Chapter 3. Technical Framework 23

skill levels based on question responses, offering a structured method to estimate and adapt difficulty
[21]. This method enables the model to map the relationship between an individual’s ability and
the probability of giving correct responses to various items in a test or assessment.
On the other hand, Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA) employ
neural networks to dynamically adjust difficulty levels [50]. NN-ADA’s agility stems from its real-
time learning ability, adapting swiftly to individual performance. These algorithms utilize neural
networks’ capacity to process vast amounts of data, enabling them to continuously analyze and
respond to a player’s actions or abilities during gameplay [1] [55].

Adaptability

The major comparison between the two for this research is the effectiveness of its ability to adapt
to the player; IRT uses a base formula to decide the likelihood of the player being correct , which
can then be used to determine the increase in their ability. At the same time, NN-ADA is purely
dependent on the developer; it can be made in any way, but it will be trained to learn on its own.
In this case, the NN will learn how much of an increase or decrease in ability of player should get
if they get a question wrong or right. This would be the main comparison the two will measure
regarding adaptability.

3.3 Software and Hardware Development Tools


The coding language used for this project is C, as it is one of the common languages used when
working with game development involving adaptive difficulty [26]. In addition to the actual use
of the coding, ChatGPT was used solely to correct any code developed during the research period
[62]. Visual Studio Code (VSE) will be the IDE that will be used for the programming parts of the
project, this is because of the many tools and quality-of-life options that VSE provides [9].
Chapter 4

Methodology

This chapter will provide a detailed explanation of the development methodology that will be used
throughout this project. This will serve as a crucial foundation for understanding the systematic
approach that researchers will adhere to during the creation of the simulation. In this section, the
researchers outline the step-by-step procedures and the strong protocols established to effectively
tackle any potential challenges that might surface during the intricate development phase.

24
Nabus, Remos, Wood Chapter 4. Methodology 25

4.1 Research Design

4.1.1 Development Technique

Figure 4.1: Agile Methodology

In this study, the selection of Agile methodology stems from its inherent adaptability and iterative
nature, aligning perfectly with developers’ dynamic research objectives. Agile’s iterative approach
allows for continual re-assessment and adjustment throughout the study, ensuring prompt responses
to evolving requirements [3]. Its flexibility allows researchers to accommodate unforeseen challenges
or insights during the development and testing phases. In addition, Agile’s emphasis on regular
feedback loops ensures active participation and collaboration among team members, fostering a more
responsive and efficient workflow [7][58]. The incremental delivery of features and functionalities of
this methodology enables researchers to quickly incorporate insights gained during each iteration,
ultimately leading to a more refined and targeted final product.

4.1.2 Experimental Approach

In this study, the researchers employed an experimental research approach to investigate the com-
parative effectiveness of two distinct algorithms, Item Response Theory (IRT) and Neural Network-
Nabus, Remos, Wood Chapter 4. Methodology 26

Based Adaptive Difficulty Algorithms (NN-ADA) within the context of an educational game. The
experimental approach allows researchers to exert control over a crucial component of the experi-
ment, the adaptive algorithm, and to modify it as necessary [54]. This manipulation enables the
researchers to observe and measure its effects on the primary outcome of interest. The experimental
approach is particularly well suited to the research objectives as it facilitates precise data collection
and analysis. Using the two algorithms as experimental tools, researchers can generate empirical
data essential to draw meaningful conclusions. This approach also allows the isolation of the impact
of each algorithm on the simulated player experiences, shedding light on their respective strengths
and weaknesses.

4.2 Application Development

4.2.1 Planning Phase

During the Planning Phase, project requirements and objectives will be identified, project scope will
be defined, limitations will be determined, feasibility studies will be conducted, and a project plan
will be created.

4.2.2 Analysis Phase

The analysis phase entails an examination of existing resources, including an in-depth analysis of
the prevalent, educational game structures within educational games, identifying key game features,
assessing available tools and technologies, and critically recognizing essential design and development
aspects.

4.2.3 Design Phase

The Design Phase involves the creation of a detailed design for the application, the development of
wireframes and mockups for the user interface, and the creation of technical specifications.

Game Design and Mechanics

Mechanics

• Select Answer with option buttons.


Nabus, Remos, Wood Chapter 4. Methodology 27

• Next question displays after clicking an option.

• The pop-up survey question displays after every 4 questions.

• Score displays after the game ends.

• Document created to show players’ results.

• Post-game survey after game sessions.

Parameters

• Difficulty Levels: Variations in difficulty adjusted by IRT and NN-ADA.

• Gameplay Metrics: Includes response time and player engagement.

• Question Bank: Diverse math problems based on participants’ curriculum.

Variables

• Adaptive Algorithms: IRT and NN-ADA.

• Player Performance: accuracy rate, level of questions answered, response time.


Nabus, Remos, Wood Chapter 4. Methodology 28

Use-Case Diagram

Figure 4.2: Diagram showing relationship between users and features of the quiz game
Nabus, Remos, Wood Chapter 4. Methodology 29

IRT & NN-ADA Flowchart

Figure 4.3: The flow of the program and sequence of the processes.
Nabus, Remos, Wood Chapter 4. Methodology 30

User Interface Mock-up

Figure 4.4: Initial User Interface Mock-up Design

This is a visual representation of the intended user interface. The developers designed the interface
very simply since the game serves the sole purpose of gathering data to test two adaptive algorithms.
In the representation, the circle represents the enemy, while the square represents the player. The
’New text’ area is designated for displaying the points after the quiz ends.

Key Features

• Questions & Multiple Choices: Presenting various questions with multiple answer options.

• Scoring Board: Displaying scores to track the player’s performance at the end of the game.

• Pop-up survey question: An in-app pop-up feature that measures students’ engagement during
gameplay.

• Adaptive Difficulty: Adjusting the difficulty level based on the algorithm and the player’s
performance or progression through the game.
Nabus, Remos, Wood Chapter 4. Methodology 31

• Correct/Incorrect Feedback: Providing immediate feedback on whether the player’s answer is


correct or incorrect.

• Simple UI: A clean and user-friendly interface that effectively presents questions and choices.

Math Questions Integration

As part of the game mechanics, a series of math questions were integrated into the gameplay, designed
to span a range of mathematical topics and difficulty levels. These questions are carefully created by
a former District Elementary Math Coordinator and polished by a grade 6 Math teacher of Ateneo
de Naga University, ensuring the accuracy and quality of the questions, as well as the classification
of difficulty, time constraints for answering, and alignment with the grade 6 curriculum. The topics
range from basic arithmetic to more advanced concepts typically encountered throughout the school
year.
The difficulty of these questions is systematically classified, with easier levels focusing on funda-
mental concepts that require minimal computation, gradually progressing to more complex topics
that necessitate higher-order thinking skills and deeper understanding. For instance, the initial
levels may entail basic arithmetic operations such as addition and subtraction, while subsequent
levels explore topics like fractions, order of operations, and decimal operations, demanding greater
cognitive engagement and problem-solving abilities from the players.
The question bank comprises 10 levels of difficulty, each level containing 3 questions. All values
are deliberately kept low to ensure that students can potentially answer the questions mentally,
although the use of pen and paper is permitted if preferred. All questions are presented in a
multiple-choice format, offering 4 options for each.

Engagement Measurement

In measuring the player’s engagement, researchers considered factors such as interest, enjoyment,
and concentration, which have been used in numerous ESM and SEM studies. The researchers
administered in-game and post-game surveys, each comprising three questions targeting interest,
enjoyment, and concentration. Each question utilized a rating scale ranging from ’1=Not At All’
to ’5=Very Much’ to assess player responses accurately. The in-game survey appeared as a pop-up
feature during gameplay, allowing players to provide immediate feedback. In contrast, the post-game
survey distributed printed questionnaires after the game ended.
Nabus, Remos, Wood Chapter 4. Methodology 32

4.2.4 Development Phase

The application creation that utilizes both algorithms, starts with the Unity software. This is where
the two versions of the game were made and utilized most of the same coding for the UI as each
other.

User Interface

Figure 4.5: User Profile Setup Screen

The game’s UI is simple and uninspired, meaning that it was made to focus less on aesthetics. The
start of the game shows the title and text fields that request the game user to input their first name
and last name, respectively, while also showing a start button to initiate the beginning of the game.
Nabus, Remos, Wood Chapter 4. Methodology 33

Figure 4.6: Gameplay Screen

The game starts by displaying the first question; this question is based on the ability level that
was set by the developer. The answer options are displayed at the bottom of the screen. When
the user answers a question, the algorithm will provide the next question to be answered, and these
events will recur until all the questions are answered, in which the score will be displayed and the
answer buttons will no longer be displayed.
Nabus, Remos, Wood Chapter 4. Methodology 34

Figure 4.7: Survey Pop-up Screen

There is also the implementation of a mid-game survey question for player’s engagement mea-
surement during gameplay. This prompt will appear every five questions and players will rate the
displayed survey question on a scale from one to five, with one being the lowest and five being the
highest.

Backend

The algorithms used for these games are the major difference between the two games. The first being
the item response theory (IRT) and the second is the Neural Network-based adaptive algorithm. IRT
was implemented into the code with the formula that it uses, but also with some additions.

(guessing + (1 − guessing)) ∗ (1/(1 + M athf.Exp(−discrimination ∗ (ability − dif f iculty))))

Along with the formula, to find the absolute difference, it is necessary to subtract 0.5 from the
probability of being correct to determine that the question is discriminating between players with
higher and lower ability levels optimally. Once the answer is given, depending on the question, the
learning rate will increase or decrease from the current ability level of the player to help determine
the best question that should be given next.
Nabus, Remos, Wood Chapter 4. Methodology 35

The neural network algorithm was not easy to adapt to code, as it needed to use a unity package
that uses machine learning, this being MLAgents. This package is built to help developers build a
model for machine learning that uses a neural network and the many layers that go into building
one. The model that the package creates is one that follows a convolutional neural network (CNN)
model due to its need to learn from different inputs and observations. The main part of the package
utilizes what is called an agent, which is given specific observations that it should be watching within
the environment and then will act based on those actions according to specific instructions it was
given. The observations that the agent is watching are the difficulty of the question, the player’s
ability, and whether or not the player is correct. The actions that the agent will be performing
is increasing or decreasing the player’s ability and will change it depending on the observations
mentioned previously. This algorithm uses a simple formula that utilizes the same variables as the
IRT formula but in a less specific manner.

previousDif f iculty + (discrimination ∗ ability) − (1 − guessing)

This equation simply determines what the next difficulty should be by adding the product of
the discrimination and ability, as these two values interact with each other and will determine the
increase or decrease in the difficulty, while the guessing rate is subtracted from the difficulty as it
shows that the player could have guessed the answer and takes this into account. The ability variable
is the one that the agent will be trained to manipulate in order to provide a more accurate output
for what the next difficulty should be. The training for the agent was done by running the game
and having the agent answer questions using different instructions. The first set of instructions
is answering the question correctly and the instructions vary by how long it took to answer the
question, the agent is rewarded accordingly based on how fast it answers the questions. The next set
of instructions is having the agent answer questions incorrectly, also varying in the amount of time
it takes to answer the question, where the agent will be reprimanded because the answer was wrong.
The last instruction is simply to let the timer run entirely until the question is answered. The point
of training the agent this way is to make it undergo multiple scenarios of getting the questions right
and wrong so that it can adjust the ability level accordingly. This was also meant to help train the
agent to learn that giving a negative number to add to the ability is not bad but rather appropriate
because its job is to match a question to the player according to their current ability level. This
training was done five hundred thousand times in order for the accuracy to be polished.
Nabus, Remos, Wood Chapter 4. Methodology 36

Algorithm Implementation

Within the game, IRT is used to dynamically adjust the difficulty of questions based on the player’s
responses. As the player progresses through the game, the algorithm continuously evaluates the
player’s ability level and selects questions that are appropriately challenging. The algorithm cal-
culates the probability of a player answering a question correctly based on their ability and the
question’s difficulty. Then it adjusts the difficulty level for subsequent questions to maintain an
optimal level of challenge.
In the game environment, NN-ADA observes various parameters such as the player’s response
time, the correctness of answers, the difficulty of the question, and their ability level to predict the
approppirate increase in ability level. The algorithm then adjusts the player’s ability level, which
will then modify the next question difficulty to optimize the learning experience for the player.

Item Response Theory Code

Figure 4.8: IRT Code Snippet


Nabus, Remos, Wood Chapter 4. Methodology 37

Neural Network-Based Adaptive Difficulty Algorithm Code

Figure 4.9: Neural Networks Code Snippet

4.2.5 Testing and Maintenance Phase

The Testing Phase involves various testing types: unit testing, integration testing, system testing,
and acceptance testing. Concurrently, maintenance activities involve monitoring the application for
bugs and errors, continual updates, and ensuring compatibility with hardware and software tools.

Testing Approach

• Unit Testing: This will be conducted to scrutinize individual components of the simulation and
algorithms in isolation. This examination allows researchers to verify that each unit functions
as intended, free from dependencies on other components.

• Integration Testing: This will assess the seamless interaction between different modules within
the system. This involves testing how the algorithms integrate into the game environment and
ensuring they harmoniously interact with other game components.

• System Testing: This phase involves assessing the entire system as a whole. It verifies the col-
lective performance of the adaptive algorithms within the game environment. System testing
encompasses end-to-end evaluation, ensuring that the algorithms, alongside all game compo-
Nabus, Remos, Wood Chapter 4. Methodology 38

nents, operate cohesively, meeting the predetermined functional requirements and specifica-
tions.

• Usability Testing: This phase evaluates the game interface’s ease of use and user experience.
It involves participants completing specific tasks while researchers observe their interactions
and collect feedback. The data collected will inform iterative design improvements to enhance
user experience and address usability issues.

4.2.6 Test Cases

Basic Functionality Test

• Test Case 1: Ensure that the game starts without errors and presents the initial question.

• Test Case 2: Confirm that answers can be selected, and the game progresses accordingly.

• Test Case 3: Verify that the scoring mechanism accurately calculates the player’s score based
on their responses.

• Test Case 4: Verify that the document with the players’ results and data is in the correct
location.

Algorithm Performance Test

• Test Case 5: Assess how well the Item Response Theory (IRT) algorithm adapts to the difficulty
of questions based on player responses.

• Test Case 6: Evaluate the Neural Network-Based Adaptive Difficulty Algorithm (NN-ADA)
in adjusting question difficulty to optimize the learning experience.

4.2.7 Deployment Phase

In the Deployment Phase, the finalized system, comprising the educational game integrated with the
Item Response Theory (IRT) and Neural Network-Based Adaptive Difficulty Algorithms (NN-ADA),
was introduced to public for its intended implementation and use.
Nabus, Remos, Wood Chapter 4. Methodology 39

4.3 Data Collection


The data collection process was primarily driven by participant engagement with the implemented
game employing the IRT and NN-ADA algorithms. Participants actively interacted with the educa-
tional game, generating real-time data through their responses and other relevant player metrics. To
ensure a robust dataset, many participants engaged with the game, contributing to the accumulation
of diverse and extensive player interactions.

4.3.1 Participants Profile and Selection

Participants Background Information

The study involved Grade 6 participants from Ateneo de Naga University. Grade 6 students, typically
aged 11 to 12 in the Philippines, represent a critical stage in cognitive development, demonstrat-
ing enhanced problem-solving skills and critical thinking abilities. Their sensitivity to motivation
and engagement factors, particularly in the transition to adolescence, makes them ideal candidates
for evaluating the effectiveness of educational games and adaptive difficulty algorithms. Previous
successful studies with similar demographics further validate the suitability of Grade 6 students for
engaging in educational game settings. Additionally, the data disclosed in this study were restricted
to generalized forms, as per the confidentiality agreement with the grade school.

Participants Selection

Researchers targeted to engage 119 sixth-grade participants, representing the entire Grade 6 pop-
ulation. These students were divided into three categories based on their average grades from the
previous quarters in Mathematics subject:

• Class A (75 - 83)

• Class B (84 - 92)

• Class C (93 - 100)

Nineteen of these participants were selected from each category (Class A, Class B, Class C) for the
preliminary data gathering. The remaining 100 students participated in the actual data-gathering
phase. Within each category, students were further divided into two groups: one group participated
Nabus, Remos, Wood Chapter 4. Methodology 40

in gameplay utilizing Item Response Theory (IRT), while the other engaged with gameplay utilizing
Neural Network Adaptive Difficulty Algorithm (NN-ADA). Therefore, there was a target of 50
student participants from each category for each algorithm.

4.3.2 Session Implementation

Preliminary Data Gathering Phase

The actual data gathering phase of the study cannot begin until some missing data is provided
and that data cannot be obtained without having a preliminary gathering phase. There were 19
students who were chosen from Class A, Class B, and Class C to get the missing data such as the
question discrimination and guessing values. These data can only be obtained from testing the
students themselves and seeing these values. The testing specifically was for these selected students
to answer all 30 questions from the question bank so that the missing values could be determined.
Another important note to remember about this testing is that the students were writing the letter
“g” next to the questions that they were guessing; this allowed for determining the guessing value
for a question. The students during the testing were monitored in order to preserve the integrity of
the testing and prevent any form of cheating.

Data Gathering Phase

During the actual data gathering phase, the targeted 100 student participants engaged in separate
sessions tailored for evaluating the IRT and NN algorithms independently. Each group took turns
using the computer laboratory during their scheduled times to avoid disrupting their regular class
schedules.
Two separate groups of students (basically from Class A, Class B, and Class C, divided into two)
partook in these sessions: one group engaged with the IRT-based game, while the other interacted
with the NN-based counterpart. Despite the different groups, the sets of questions presented to each
group remained consistent. This approach ensures that despite involving distinct sets of students,
the comparison between the algorithms remains uniform and directly comparable.
Nabus, Remos, Wood Chapter 4. Methodology 41

4.3.3 Actual Game Implementation

Preparation

Before the game session began, the researchers prepared all computers and downloaded the game
application. Researchers supervised the gameplay sessions, while the attending teacher/s guided se-
lected student participants to the computer laboratory and assisted in maintaining order throughout
the session.

Gameplay Session

Clear instructions were provided to participants, guiding them through the gameplay experience.
Participants were not allowed to use a calculator; however, pen and paper are permitted for their
calculations. Each student answered a total of 15 questions, depending on the algorithm’s selection.
Three in-game question surveys appeared after the 4th, 8th, and 12th questions. Once students
finished, the program collected and recorded the necessary data.

Post-Game

After completing the game, students received a brief post-game survey questionnaire. Once they
finished the survey, they were free to leave the computer laboratory. The researchers then saved the
collected data on an external hard drive in each computer and returned them to their original state.

4.3.4 Key Data Points

During the gameplay sessions, various types of data were collected from students that were used to
compare the two algorithms.

Player Responses:

• Accuracy Rate: Detailed records were collected regarding the accuracy of players’ responses
to the questions presented within the game. This included tracking the percentage of correct
answers given by each player.

• Response Time Across Varying Difficulty Levels: The time players take to respond to questions
of different difficulty levels was recorded. This data enabled the evaluation of how swiftly
Nabus, Remos, Wood Chapter 4. Methodology 42

players react to varying levels of challenge within the game environment.

Gameplay Metrics:

• Questions Response Time: Analysis of the time players take to respond to individual questions
was conducted, offering insights into their cognitive processing speed and efficiency.

• Player Engagement: To measure player engagement, a pop-up survey appeared during the
game. Additionally, a post-game survey was administered through a printed questionnaire.
These feedback mechanisms offered valuable insights into the overall levels of engagement
experienced by players during gameplay sessions.

4.4 Data Analysis

4.4.1 Descriptive Statistics

Descriptive statistics was employed to summarize and describe the collected data. This includes cal-
culating measures such as mean, median, mode, standard deviation, and range for player responses,
accuracy rates, response times, and gameplay metrics across different difficulty levels.
Nabus, Remos, Wood Chapter 4. Methodology 43

4.4.2 Comparative Analysis

Accuracy Rate Analysis:

Researchers analyzed the accuracy rates of players’ responses under both IRT and NN-ADA con-
ditions. By comparing the percentage of correct answers given by players across different difficulty
levels, researchers can assess the algorithms’ ability to dynamically adjust question difficulty while
maintaining an optimal level of challenge. Any significant disparities in accuracy rates between
the two algorithms may indicate variations in their effectiveness in calibrating difficulty levels to
individual player abilities.

• Metric: Number of Correct Answers

• Measurement Criteria: Calculate the percentage of correct answers each player gave out of the
total questions attempted.

• Formula:
Number of Correct Answers
Accuracy Rate = × 100%
Total Questions Attempted

– Number of Correct Answers represents the total number of questions answered correctly
by players.

– Total Questions Attempted represents the total number of questions attempted by players.

• Interpretations:

– If the accuracy rate is consistently higher for one algorithm across all difficulty levels,
it suggests that the algorithm is better at adjusting question difficulty to match player
abilities.

– Significant differences in accuracy rates between algorithms at specific difficulty levels


indicate areas where one algorithm may outperform the other.

Response Time Analysis:

The analysis of players’ response times to questions of varying difficulty levels is a proxy for assess-
ing cognitive processing speed and efficiency within the game. Researchers calculated the average
Nabus, Remos, Wood Chapter 4. Methodology 44

response times per difficulty level and compare them between IRT and NN-ADA conditions. Sig-
nificant variations in response times may indicate differences in players’ cognitive engagement and
problem-solving strategies under different adaptive algorithms.

• Metric: Average Response Time per Difficulty Level

• Measurement Criteria: Record the time taken by players to respond to questions of different
difficulty levels (e.g., level 1, level 2, . . . ). Calculate the average response time for each difficulty
level to evaluate how swiftly players react to varying levels of challenge.

• Formula:
Pn
i=1 ResponseT imei
Average Response Time per Difficulty Level =
n
– Response Timei represents the response time for each individual question i.

– n represents the total number of questions answered for a particular difficulty level.

• Interpretation:

– Longer response times for one algorithm may suggest that it is presenting overly chal-
lenging questions, leading to increased cognitive load.

– Consistently shorter response times under one algorithm may indicate that it effectively
matches question difficulty to player abilities, facilitating faster decision-making.

Engagement Rating Analysis

The ratings provided by players in response to in-game and post-game survey questions measuring
interest, enjoyment, and concentration will be analyzed. These ratings will be compared between
IRT and NN-ADA conditions to identify significant differences in player experiences under each
algorithm.

• Metric: Average Rating for Interest, Enjoyment, and Concentration

• Measurement Criteria: Calculate the average rating for each survey question (e.g., interest,
enjoyment, concentration) separately under IRT and NN-ADA conditions. Collect ratings
provided by players in the in-app survey and post-game survey, ranging from ‘1=Not At All’,
‘2=A Little’, ‘3=Somewhat’, ‘4=Pretty Much’, to ‘5=Very Much’.
Nabus, Remos, Wood Chapter 4. Methodology 45

• Formula:

Pn
i=1 Ratingi
Average Rating =
n

– Rating i represents the rating provided by each player for a specific survey question.

– n represents the total number of survey responses collected for that question

• Interpretation

– Higher average ratings for interest, enjoyment, and concentration under one algorithm
suggest that it may offer a more engaging and immersive gameplay experience.

– Significant variations in average ratings between algorithms indicate areas where one
algorithm may excel in enhancing player motivation and focus.

4.4.3 Statistical Analysis

Correlation Analysis

Researchers will conduct correlation analysis to explore potential relationships between different
performance metrics, such as accuracy rates, response times, and engagement ratings. Understanding
these correlations can help identify factors contributing to effective adaptive learning experiences
and inform strategies for optimizing player engagement and learning outcomes.
The formula below calculates the covariance of the two variables divided by the product of their
standard deviations. A correlation coefficient close to 1 indicates a strong positive correlation, close
to -1 indicates a strong negative correlation and close to 0 indicates no correlation.

P
(Xi − X̄)(Yi − Ȳ )
r= P
p P
(Xi − X̄)2 (Yi − Ȳ )2
• r represents the correlation coefficient, which measures the strength and direction of the linear
relationship between two variables.

• Xi and Yi represent individual data points of the two variables being analyzed.

• X and Y represent the mean values of the two variables.

Interpretation:
Nabus, Remos, Wood Chapter 4. Methodology 46

• Positive correlations between accuracy rates and engagement ratings suggest that players who
perform well are more likely to be engaged with the game.

• Negative correlations between response times and accuracy rates may indicate that faster
responses are associated with lower accuracy, potentially due to guessing or lack of careful
consideration.

4.4.4 Visualization

To enhance the clarity and comprehensibility of the findings, the researchers will employ data visu-
alization techniques, such as graphs and charts, to illustrate the performance metrics and differences
between the algorithms.

4.5 Research Planning and Schedule


The Planning phase was scheduled from August to September. Following this, the Analysis phase
spanned mid-September to October, slightly overlapping with the concluding stages of the Planning
phase. The Design phase extended from mid to late October, progressing into November. From
November to January, Development took place, coinciding with the tail end of the Design phase.
System testing and maintenance were set for February, followed by IRT and NN-ADA adaptive
testing for the whole period of March. Data gathering was planned for mid-April. Lastly, Data
Analysis was planned from mid to late April.
Data gathering, or the gameplay sessions, were initially planned to be conducted right after the
preliminary defense, following the revisions and recommendations from that defense. This defense
was scheduled for mid-March, but due to holidays and school events, it was conducted on April 4.
Researchers set a week for code and manuscript revisions based on the panelists’ recommendations.
After these revisions, data gathering was scheduled to commence a week later. However, due to
a sudden shift to asynchronous classes and the rescheduling of target gameplay participants, the
one-week data gathering was squeezed into a single day on April 30. This adjustment was necessary
to allow the researchers to finish two more chapters, with the tentative final submission due within
the same week.
Nabus, Remos, Wood Chapter 4. Methodology 47

Figure 4.10: Gantt Chart


Chapter 5

Results and Discussion

This chapter provides a detailed examination of the outcomes stemming from applying adaptive
difficulty algorithms in educational games. Through a systematic analysis of empirical data, the
researchers reveal how these algorithms impact various aspects of learning. By synthesizing findings
from both quantitative analysis and qualitative insights, this chapter offers valuable insights into
the practical implications of integrating adaptive difficulty algorithms into educational contexts,
informing future research directions and instructional practices.

5.1 Usability Testing Findings


The usability test for this program took approximately 2 days to complete the entire process, from
preparation to post-test analysis. The students involved with the test were the same students who
participated in the preliminary data collection except for 1 because they were absent, which means
that a total of 18 students were part of this portion of the test. The testing itself showed that the
majority of the students found the current version of the program optimal and enjoyable for them,
albeit with minor adjustments such as small design changes. The design changes that were part of
the final product were the progress bar at the top of the screen, which indicates how many more
questions the student has to answer before the quiz finishes.

48
Nabus, Remos, Wood Chapter 5. Results and Discussion 49

5.2 Accuracy Rate


Accuracy-rate analysis serves as a primary metric for evaluating how well IRT and NN-ADA adjust
the difficulty of the questions to align with student abilities. Researchers calculate the percentage of
correct answers students give across different difficulty levels under both algorithms. By comparing
these accuracy rates, researchers can assess the algorithms’ ability to dynamically adjust question
difficulty while maintaining an optimal level of challenge. Any significant disparities in accuracy rates
between the two algorithms may indicate variations in their effectiveness in calibrating difficulty
levels to individual student abilities. This analysis provides insights into which algorithm better
matches the difficulty of the question with the student’s proficiency at different stages of the game.

Level 1 2 3 4 5 6 7 8 9 10
IRT 73.07 65.27 85.71 75.86 44.68 62.26 69.56 24.13 72.72 63.01
NN-ADA 86.95 83.01 91.52 72.41 43.95 79.68 59.72 15.58 50.94 40.42

Table 5.1: Comparison of Accuracy Rate for IRT & NN-ADA using Table
Nabus, Remos, Wood Chapter 5. Results and Discussion 50

Figure 5.1: Comparison of Accuracy Rate for IRT & NN-ADA using Line Graph

The figure above is a line graph that depicts the comparison between Item Response Theory
and Neural Network-Based Adaptive Difficulty Algorithms in terms of accuracy rate across different
difficulty levels. The x-axis represents each of the difficulty levels out of 10. The y-axis represents the
average accuracy percentage of players. While the table above shows the final calculations for each
level for both algorithms. Each point is computed by the accuracy rate formula stated in Chapter
4.
Based on the results, it is evident that both IRT and NN-ADA exhibit varying degrees of ef-
fectiveness in adjusting question difficulty to align with player abilities. With 43 players engaged
in the IRT group and 46 in the NN-ADA group, each participant answered a total of 15 questions.
Notably, NN-ADA better calibrates questions to match player proficiency in the lower levels 1 to
3. However, as the difficulty increases, the accuracy rates for both algorithms exhibit fluctuations,
with IRT occasionally surpassing NN in accuracy. Moreover, at the highest difficulty levels 7 to
10, NN-ADA shows markedly lower accuracy rates compared to IRT, indicating IRT gives better
questions to match player ability.
Nabus, Remos, Wood Chapter 5. Results and Discussion 51

NN-ADA excels in calibrating questions at lower difficulty levels, while IRT demonstrates superior
performance as difficulty increases. This disparity can be attributed to differences in algorithm de-
sign, question calibration methods, and cognitive load. NN-ADA’s agility in adjusting difficulty levels
may lead to rapid but occasionally inaccurate adjustments, affecting accuracy rates. In contrast,
IRT’s stability and consistency in adapting to question difficulty contribute to enhanced accuracy
across varying difficulty levels, ensuring an optimal balance between challenge and manageability.

Figure 5.2: Overall Comparison of Accuracy Between IRT & NN-ADA

The figure above is a bar graph showing the overall accuracy rate for IRT and NN-ADA. Across
the ten difficulty levels, IRT demonstrates higher accuracy rates in six out of ten levels compared to
NN-ADA. Nonetheless, as stated previously, NN-ADA would have the advantage of having higher
accuracy rates over IRT in the lower stages, while IRT would adjust better than NN-ADA as the
difficulty level increases.
Nabus, Remos, Wood Chapter 5. Results and Discussion 52

5.3 Response Time


Response time analysis offers additional insights into how students engage with questions of varying
difficulty levels under IRT and NN-ADA conditions. Researchers record the time students take
to respond to questions at each difficulty level and calculate the average response time for each
algorithm. Longer response times may suggest that questions are perceived as more challenging,
leading to increased cognitive load. Conversely, shorter response times may indicate that questions
appropriately match students’ abilities, facilitating faster decision-making. By examining response
times across difficulty levels, researchers can assess the algorithms’ effectiveness in maintaining an
optimal balance between challenge and manageability for students.

Figure 5.3: Comparison of Response Time between IRT & NN-ADA

The figure above is a line graph that shows the relationship between the difficulty levels and
response times when students answer the questions in each level for both IRT and NN-ADA. The
x-axis represents the difficulty level. The y-axis represents the average response time of students
per of level difficulty for IRT and NN-ADA. This graph can also represent how each algorithm gives
Nabus, Remos, Wood Chapter 5. Results and Discussion 53

questions that match an estimated right amount of difficulty answerable by the students.
Results indicate that, overall, response times tend to increase as the difficulty level progresses
for both algorithms, aligning with the idea that more challenging questions require additional time
for processing and deliberation. Based on the results, students took longer to answer the questions
in the lower levels 1 to 3 for IRT. This suggests that NN-ADA may present questions perceived as
more manageable by players during the lower stages of gameplay. Interestingly, IRT demonstrates
relatively stable response times across most difficulty levels, with minor fluctuations observed. This
consistency indicates that IRT effectively adjusts question difficulty to maintain a consistent level
of cognitive demand on players. In contrast, NN-ADA exhibits slightly more variability in response
times, particularly evident in the higher difficulty levels. This variability may suggest occasional mis-
matches between question difficulty and player abilities, resulting in fluctuations in response times.
While NN-ADA appears to offer quicker decision-making in the lower stages, IRT demonstrates
stability and consistency in adapting to question difficulty across all levels.
The graph for Response Time not only indicates how long students take to answer questions at
different difficulty levels but also offers insights into how challenging they find each question. From
difficulty levels 1 - 4, it is observed that students from each group, although showing a clear difference
in response times, that both algorithms have a similar increasing rate of response times. From this,
it can be said that as the difficulty level increases, the response time also increases. A similar case
can be observed from points L5 - L9 with L5 - L6 decreasing while the rest steadily increase all the
same. Both groups of students may have found the questions from difficult level 6 easier than level
5. Moreover, an intersection between the 2 algorithms is seen on the graph, particularly on the point
of L4 which both have an average response time of 36 seconds. It can be surmised that students
on both algorithms found the difficulty of questions to be about the same. Lastly, particularly,
the response times of L10 for both algorithms show a clear difference as each algorithm’s direction
shifts away from each other. IRT’s average response time decreased to 31 seconds while NN-ADA
increased to 43 seconds. Based on this result, it may be deduced that students participating in IRT
may have found the questions a little easier compared to students from NN-ADA who may have
found the questions a little more difficult. Hence, IRT more effectively adjusted the difficulty level
based on student performance compared to NN-ADA, which struggled to match question difficulty to
students’ abilities. The result for Response Time reveals that as question difficulty increases, so does
the response time, with notable patterns and variations suggesting differences in how challenging
Nabus, Remos, Wood Chapter 5. Results and Discussion 54

students found each level, particularly at higher difficulties where IRT participants appeared to find
the questions easier than those using NN-ADA.
The observed increase in response times as difficulty levels progress underscores the relationship
between question complexity and processing time. NN-ADA’s ability to present questions perceived
as more manageable at lower stages may contribute to shorter response times than IRT. However,
IRT’s stability in adapting to question difficulty ensures relatively consistent response times, high-
lighting its effectiveness in maintaining cognitive demand on players.

5.4 Player Engagement


Engagement rating analysis provides qualitative data on students’ experiences and perceptions while
engaging with educational games under IRT and NN-ADA conditions. Researchers collect student
ratings on factors such as interest, enjoyment, and concentration through in-game and post-game
surveys. These ratings offer insights into how each algorithm influences students’ motivation, sat-
isfaction, and focus during gameplay. Higher average ratings for interest, enjoyment, and concen-
tration under one algorithm suggest that it may offer a more engaging and immersive gameplay
experience, indicating its effectiveness in adapting question difficulty to match student abilities and
preferences.
Nabus, Remos, Wood Chapter 5. Results and Discussion 55

Figure 5.4: Comparison of Engagement Rating between IRT & NN-ADA

The figure above is a bar graph that shows the comparison between IRT and NN-ADA in terms of
engagement. For each pair in the x-axis, shows the factors comprising engagement, which are interest,
enjoyment, and concentration while the y-axis represents the average ratings for each criterion for
both algorithms. This graph may show which algorithm the students were more engaged in.
Based on the results, it’s evident that students tend to exhibit slightly higher levels of interest,
enjoyment, and concentration when engaging with the IRT-based educational game. The marginal
differences between the two algorithms suggest that while both are effective in sustaining student en-
gagement, IRT may have a slight edge in eliciting more positive experiences across these dimensions.
This may also suggest that IRT was able to give questions that suited the challenge the students
had and their ability to answer the questions.
The graph for the engagement rating highlights notable differences between the two algorithms,
providing insights into the students’ mental states while answering quiz game questions across cat-
egories such as interest, enjoyment, and concentration. Firstly, in the category of interest, students
using IRT showed higher levels of interest in answering questions than those using NN-ADA. This
Nabus, Remos, Wood Chapter 5. Results and Discussion 56

heightened interest may stem from the adaptive nature of IRT, which tailors questions more closely
to the students’ abilities and keeps them engaged. Secondly, IRT students also scored higher in
enjoyment. This increased enjoyment likely results from IRT’s ability to deliver questions that
strike a balance between being challenging yet achievable, thus providing a sense of accomplishment
and satisfaction. Lastly, in the concentration category, IRT students again outperformed NN-ADA
students. This can be attributed to the tailored difficulty of IRT questions, which align with the stu-
dents’ existing knowledge and skills, allowing them to focus better and systematically work through
the questions. The ability to engage with appropriately challenging material likely helped maintain
their concentration and enhance their overall learning experience. Thus, IRT’s adaptive approach
fosters greater interest, enjoyment, and concentration among students, leading to a more engaging
and effective educational experience.

Figure 5.5: Overall Comparison of Engagement Rating between IRT & NN-ADA

The figure above is a bar graph that shows the comparison between IRT and NN-ADA in terms
of overall engagement rating. The 3 factors’ (interest, engagement, and concentration) ratings have
Nabus, Remos, Wood Chapter 5. Results and Discussion 57

been combined and calculated together to get the average score of engagement. With an overall
engagement rating of 3.75 for IRT and 3.46 for NN-ADA, it is evident that students generally
perceive the IRT-based game to be slightly more engaging than its NN-ADA counterpart. While
both algorithms demonstrate efficacy in sustaining student engagement, the higher average rating
for IRT suggests that it may offer a more immersive and captivating gameplay experience overall.
This may also suggest that IRT adapted questions for students more appropriately allowing them to
answer questions with the right level of challenge suited for their knowledge and ability. Furthermore,
this can be attributed to IRT’s tailored difficulty levels, which align closely with students’ abilities
and preferences, fostering a sense of accomplishment and satisfaction. In contrast, NN-ADA’s rapid
adjustments may occasionally lead to mismatches between question difficulty and student proficiency,
affecting overall engagement levels.

5.5 Performance Metrics Relationship

5.5.1 Correlations between Accuracy Rates and Overall Engagement Rat-


ings

Figure 5.6: Computation for Accuracy Rates


Nabus, Remos, Wood Chapter 5. Results and Discussion 58

Figure 5.7: Computation for Engagement Rating

The correlation between accuracy rates and overall engagement scores for both the IRT and NN-ADA
groups is 0, indicating that there is no linear relationship between these two variables.

5.5.2 Correlations between Accuracy Rates and Response Times

Figure 5.8: Computation for Accuracy Rate


Nabus, Remos, Wood Chapter 5. Results and Discussion 59

Figure 5.9: Computation for Response Time

The correlation between Accuracy Rates and Response Times for both the IRT and NN-ADA groups
is 0, indicating no linear relationship between these two variables.

5.5.3 Correlations between Response Times and Overall Engagement Rat-


ings

Figure 5.10: Computation for Response Time


Nabus, Remos, Wood Chapter 5. Results and Discussion 60

Figure 5.11: Computation fro Engagement Rating

The correlation between response times and overall engagement scores for both the IRT and NN-
ADA groups is 0, indicating that there is no linear relationship between these two variables.

5.6 Discussion of Findings


The findings of this study shed light on the comparative effectiveness of item response theory (IRT)
and the neural network adaptive learning algorithm (NN-ADA) in terms of student accuracy, re-
sponse time, and engagement across different difficulty levels. Throughout the research questions
and objectives, it becomes evident that both algorithms contribute to tailored learning experiences,
albeit with varying degrees of effectiveness and adaptability.
While NN-ADA may excel in calibrating questions for easier levels and offering quicker decision-
making, particularly in the lower stages of gameplay, IRT demonstrates stability and consistency in
adapting question difficulty, particularly in higher stages. However, players may take more time to
answer questions at lower difficulty levels when playing the IRT game.
Furthermore, while both algorithms contribute to player engagement, IRT appears to elicit
slightly higher levels of interest, enjoyment, and concentration than NN-ADA. Combining these
factors into an overall engagement, IRT emerges as the more engaging option, with a higher propor-
tion of students being engaged than NN-ADA. This suggests that IRT may offer a more engaging and
immersive gameplay experience overall, possibly by offering tailored questions to students’ needs.
Nabus, Remos, Wood Chapter 5. Results and Discussion 61

Strengths and weaknesses emerge for each algorithm; IRT demonstrates stability and consistency
but may have longer response times at lower levels, while NN-ADA offers faster decisions but exhibits
more variability in response times. These differences could be attributed to variances in algorithm
design, question calibration methods, and their impact on player engagement.
In summary, while NN-ADA may excel in providing suitable questions at lower difficulty lev-
els, IRT demonstrates superiority as difficulty increases, offering more appropriate questions with
shorter response times, and fostering higher levels of student engagement. These findings suggest the
potential benefits of employing IRT over NN-ADA in educational settings, particularly in scenarios
where the difficulty of the question varies.

5.7 Insights about Results


The results have shown that there is a visible difference between the two algorithms and that IRT
has proven to be more effective at adapting to difficulty. Although the results show this, it is
important to recognize that the NN-ADA algorithm was close in many of the recorded metrics.
In the event that the NN-ADA version of the game is given more time and data to train with,
it is not difficult to believe that it could outperform the IRT version. Although the results of
each criterion show the difference between IRT and NN-ADA, it is also important to note that
there may be valuable insights that can be derived based on the results. Although it may not be
directly related to the end result, some points may provide an understanding of student abilities
or preferences regarding their knowledge of mathematics. For example, in the graph for Accuracy
Rate, there is a notable point for both algorithms in which students scored the lowest compared to
the rest of the question difficulty which is L8. IRT has an accuracy rate of 24.13%, and NN-ADA
has 15.58%. Although the 2 algorithms may have given the questions with this difficulty based
on the students’ previous answers as they may be appropriate questions, several students failed to
answer correctly. From this, it may be inferred that both students in IRT and NN-ADA lack enough
knowledge or practice to solve problems with a difficulty level of 8. The cause of this result may
be due to many factors including but not limited to: students have already forgotten the lesson
regarding the topic, they did not understand the lesson enough, the topic was too difficult for them,
the teacher may not have explained or taught the lesson as easy to understand for the students, the
learning environment was not conducive to understanding the material, students were not engaged
Nabus, Remos, Wood Chapter 5. Results and Discussion 62

or motivated during the lesson, there was insufficient practice or reinforcement of the concepts, the
instructional materials used were inadequate or not aligned with students’ learning styles, or there
may have been external factors such as stress or distractions affecting students’ ability to learn
and retain the information. The differences in accuracy rates between IRT and NN-ADA highlight
potential gaps in students’ knowledge or practice with high-difficulty questions, underscoring the
need for further investigation into factors such as instructional quality, engagement, and learning
environment to enhance understanding and retention in mathematics.

5.8 Limitations of the Study


The study has shown to have great potential for possible solutions to how teachers can approach
teaching and providing learning opportunities to students. There were, however, certain complica-
tions that came with getting the data necessary for this study. These complications ranged from
the programming side of things to the actual data gathering.
The first is the actual programs themselves; while it would be ideal for the IRT and NN-ADA
algorithms to be trained on actual students, that is just not possible for this particular study.
The required number of students needed to perform such a task would require multiple educational
institutions to accurately train the program to give correct responses to grade 6 students specifically.
This would lead to the next limitation, time; the main problem here is the lack of time needed to
gather a reasonable sample size. The scheduling issues that took place during the data-gathering
process were plentiful, ranging from the school’s sudden shifts to asynchronous learning to the
students’ examination schedules. This problem was what ended up causing the research to halt
because data was needed to make any significant progress. Furthermore, the researchers could not
meet their target number of participants, which included the entire ADNU grade 6 population.
This shortfall was primarily due to unavoidable circumstances such as student absences and limited
time constraints. Consequently, the team could only collect data from approximately 90% of the
intended population. This incomplete data set posed challenges in achieving the desired robustness
and generalizability of the study findings.
Chapter 6

Conclusion and Recommendations

In this concluding chapter, the study comes full circle as key findings are synthesized and implications
are drawn. The chapter reflects on the effectiveness of adaptive difficulty algorithms in educational
games, highlighting their potential to enhance learning outcomes. Additionally, practical recom-
mendations are offered for educators, game developers, and researchers to leverage these algorithms
effectively. By identifying areas for further exploration and suggesting actionable steps, this chapter
aims to guide future endeavors in the realm of adaptive technologies in education, ultimately striving
for continuous improvement in instructional practices and learning experiences.

6.1 Conclusion
In this study, researchers compare the effectiveness of Item Response Theory (IRT) and Neural
Network-Based Adaptive Difficulty Algorithms (NN-ADA) within the realm of educational gaming,
specifically focusing on math-based question-and-answer gameplay. The aim was to discern which
of these adaptive difficulty methodologies better enhances students’ learning experience within an
educational game environment.
Through the development of a math-based question-and-answer game incorporating both IRT
and NN-ADA approaches, comprehensive gameplay sessions were conducted with Grade Six students
at Ateneo de Naga University. The data collected and analyzed provided valuable insights into the
performance and adaptability of these algorithms.
The findings suggest that both IRT and NN-ADA contribute to a tailored learning experience by

63
Nabus, Remos, Wood Chapter 6. Conclusion and Recommendations 64

dynamically adjusting question difficulty levels based on player performance. However, nuances exist
in their effectiveness and adaptability across different difficulty levels. While NN-ADA may excel in
calibrating questions for easier stages, IRT demonstrates superior adjustment to player abilities as
the difficulty increases. Moreover, analysis of response times indicated that IRT tended to provide
questions that matched students’ abilities more closely, resulting in shorter response times than NN-
ADA in higher difficulty levels. This suggests that IRT may offer a more finely tailored challenge
level, promoting engagement and efficiency in learning.
Furthermore, considerations of player engagement reveal differences between IRT and NN-ADA.
While both algorithms sustain engagement to varying degrees, IRT evokes slightly higher levels
of interest, enjoyment, and concentration among players. This aspect highlights the pivotal role
of player experience in educational game design and underscores the potential impact of adaptive
difficulty algorithms on user engagement.
The strengths and weaknesses of each algorithm offer valuable insights for educational game
designers and developers. IRT demonstrates stability and consistency in adapting to question dif-
ficulty, albeit with longer response times in lower levels. On the other hand, NN-ADA may offer
quicker decision-making in easier stages but exhibits more variability in response times, along with
lower accuracy rates at higher difficulty levels.
While this study contributes valuable insights into the comparative effectiveness of IRT and NN-
ADA, it is not without limitations. Challenges in data gathering, including constraints on accessing
student populations and scheduling issues, highlight the complexities inherent in empirical research
within educational settings. Despite these limitations, this study offers meaningful implications for
designing and implementing educational games. By leveraging adaptive difficulty algorithms such
as IRT, developers can tailor learning experiences to individual student needs, thereby optimizing
engagement and learning outcomes.
The study’s findings align with each algorithm’s strengths and characteristics. IRT (Item Re-
sponse Theory) is a traditional psychometric model that estimates the ability of a player based
on their responses to questions. It tends to perform well when dealing with higher difficulty ques-
tions because it’s designed to accurately measure proficiency, especially in situations with a clear
progression of difficulty levels. On the other hand, NN-ADA (Neural Network Adaptive Difficulty
Algorithm) likely utilizes machine learning techniques to adapt to the player’s performance in real-
time, which can be advantageous at lower difficulty levels where there might be more variability
Nabus, Remos, Wood Chapter 6. Conclusion and Recommendations 65

in player responses and where quick adaptation is crucial for engagement. So, the results align
with the strengths and characteristics of each algorithm, reflecting their performance under different
conditions within the game.
In conclusion, this research advances the understanding of adaptive difficulty algorithms within
educational gaming contexts and underscores the potential of IRT to enhance the learning experi-
ence for students. As technology continues to play an increasingly integral role in education, fur-
ther exploration of adaptive methodologies holds promise for advancing technology-assisted learning
methodologies.

6.2 Recommendations
To further enhance the validity and applicability of the findings from this study, several recommen-
dations are proposed, encompassing areas such as sampling size, game features, demographic testing,
algorithm training processes, and expanding the subject focus.
Firstly, expanding the sampling size of participants would significantly bolster the robustness of
research outcomes. Although this study involved grade six students from Ateneo de Naga University,
involving a larger and more diverse pool of participants from multiple educational institutions would
offer greater generalizability and enable a more comprehensive analysis of the effectiveness of adaptive
difficulty algorithms. Increasing the sample size can mitigate potential biases and better capture
the variability inherent in student learning experiences. Furthermore, future research may not be
limited to Grade Six students, but may also include participants from the entire range of elementary
grade levels, high schools, and colleges. This broader approach will allow for a more thorough
investigation of the algorithms’ efficacy across different educational stages and provide insights into
their adaptability and impact on a wider student population.
Secondly, enhancing the features of the quiz game itself represents a key avenue for refinement.
Incorporating additional gameplay elements, such as interactive tutorials, feedback mechanisms, and
personalized learning pathways, can enrich the educational gaming experience and further optimize
learning outcomes.
Furthermore, extending the test to different demographic groups and subject areas holds immense
potential to advance the understanding of adaptive difficulty algorithms in education. By conducting
similar studies across various grade levels and subjects, we can elucidate the transferability and
Nabus, Remos, Wood Chapter 6. Conclusion and Recommendations 66

efficacy of these methodologies in diverse learning contexts. Exploring the applicability of adaptive
difficulty algorithms in subjects beyond mathematics, such as science, history, and other subjects,
can provide valuable insights into their versatility and effectiveness in different knowledge domains.
Moreover, incorporating actual students into algorithm training processes holds immense promise.
By directly utilizing data from students’ interactions with educational content, researchers can fine-
tune algorithms to better meet the needs and preferences of learners, ultimately leading to more
personalized and effective educational experiences.
Furthermore, a longer training time for adaptive algorithms is recommended. Investing in longer
training times for adaptive algorithms, especially the Neural Network-based Adaptive Difficulty
Algorithm (NN-ADA), is essential. A better AI typically requires a more extended training process
to achieve optimal performance. This longer training period allows the algorithm to improve learning
accuracy and refine predictive capabilities.
Additionally, future researchers may significantly increase the number of questions in the ques-
tion bank. A more extensive question bank can provide a broader range of challenges and better
accommodate students’ varying skill levels, ensuring a more tailored and effective learning experi-
ence.
Lastly, implementing a feature in which the game automatically collects all relevant data from
players on different computers is important. This feature would ensure that all data from a player,
regardless of the device used, is automatically collected, sorted, and computed. It would track
metrics such as the level of questions answered, whether the answers were correct or incorrect, the
time taken to answer each question, the total score, and other pertinent data. Automated data
collection will simplify the analysis process and ensure comprehensive data availability to refine
algorithms. This study manually entered all game data into a spreadsheet for each student, which
was time-consuming and labor-intensive. Adding this automated feature would significantly ease
and expedite the data collection process. Unfortunately, this feature was not included due to time
constraints.
In conclusion, by implementing these recommendations, we can advance the field of educational
gaming and contribute to developing more effective and engaging learning technologies. By ex-
panding our sampling size, improving quiz game features, testing across different demographics and
subjects, incorporating actual students for algorithm training, longer training time for adaptive al-
gorithms, and increasing the number of questions in the question bank, we can enhance the validity,
Nabus, Remos, Wood Chapter 6. Conclusion and Recommendations 67

applicability, and impact of our research findings. As we continue to leverage adaptive difficulty al-
gorithms to tailor educational experiences to individual learner needs, our efforts hold the potential
to transform education and empower students to achieve their full potential.
Appendix A

Question Bank

• Level 1: Basics

– What is the result when you divide any number by itself?

∗ Choices: 0, 1, 2, undefined
∗ Correct Answer: 1

– What is the result when you multiply any number by 0?

∗ Choices: 0, 1, 2, undefined
∗ Correct Answer: 0

– What is the result when you subtract any number from itself?

∗ Choices: 0, 1, 2, undefined
∗ Correct Answer: 0

• Level 2: Fraction Basics

– When you multiply a fraction by its reciprocal, what is the outcome?

∗ Choices: 0, 1, 2, undefined

∗ Correct Answer: 1

– When dividing fractions, what operation do you change it to?

∗ Choices: subtraction, addition, multiplication, division

68
Nabus, Remos, Wood Appendix A. Question Bank 69

∗ Correct Answer: multiplication

– Which is equivalent to 21 ?
2 6 4 6
∗ Choices: 6, 2, 8, 9
4
∗ Correct Answer: 8

• Level 3: Order of Operations

– Solve the expression: 3 × (4 + 2) - 5 =

∗ 9, 33, 63, 13
∗ Correct Answer: 13

– Solve the expression: 8 - (9 / 3) + 4 =

∗ Choices: -4, 4, 9, -9
∗ Correct Answer: 9

– Solve the expression: 5 × (3 - 2) + 4 =

∗ Choices: 9, 13, 17, 21


∗ Correct Answer: 9

• Level 4: Decimal Addition and Subtraction

– Subtract 9.2 from 26.1

∗ 16.9, 17.1, 17.9, 18.2


∗ Correct Answer: 16.9

– Add 17.7 to 7.6

∗ Choices: 25.3, 24.3, 24.4, 25.5


∗ Correct Answer: 25.3

– Subtract 13.4 from 27.7

∗ Choices: 24.3, 16.4, 14.3, 27.6


∗ Correct Answer: 14.3

• Level 5: Decimal Multiplication and Division

– What is the product of 0.022 and 10?


Nabus, Remos, Wood Appendix A. Question Bank 70

∗ 2.20, 0.22, 0.0022, 0.2

∗ Correct Answer: 0.22

– What is the quotient of 10.5 divided by 1.5?

∗ Choices: 18, 15, 6.5, 7

∗ Correct Answer: 7

– What is the quotient of 44 divided by 0.1?

∗ Choices: 4.4, 440, 44, 4400


∗ Correct Answer: 440

• Level 6: Converting Fractions

– Convert 1 43 to an improper fraction.


4 3 4 7
∗ 3, 4, 7, 4
7
∗ Correct Answer: 4

– Convert 2 13 to an improper fraction.


1 7 1 2
∗ Choices: 3, 3, 2, 3
7
∗ Correct Answer: 3

– Convert 5 25 to an improper fraction.


27 5 5 2
∗ Choices: 5 , 27 , 2 , 5
27
∗ Correct Answer: 5

• Level 7: GCF and LCD

– Calculate the GCF of 18 and 27.

∗ 3, 9, 6, 1
∗ Correct Answer: 9

– Determine the LCD of 10 and 12.

∗ Choices: 70, 30, 40, 60


∗ Correct Answer: 60

– Find the GCF of 24 and 36.


Nabus, Remos, Wood Appendix A. Question Bank 71

∗ Choices: 2, 4, 8, 12

∗ Correct Answer: 12

• Level 8: Area and perimeter of rectangles and squares

– If a rectangle has a length of 8 cm and a width of 5 cm, what is its area?

∗ 10 cm2 , 20cm2 , 40cm2 , 30cm2 CorrectAnswer : 40cm2

∗ If a square has an area of 49 cm2 , whatisthelengthof oneside?


– Choices: 7cm, 9cm, 12cm, 16cm

– Correct Answer: 7 cm

• What is the perimeter of a rectangle with length 8 cm and width 5 cm?

– Choices: 18cm, 26cm, 40cm, 13cm

– Correct Answer: 26cm

Level 9: Ratios and Proportions

• If a rectangle has a length-to-width ratio of 5:2, and the width is 8 meters, what is the length
of the rectangle?

– 20m, 16m, 12m, 8m

– Correct Answer: 20 m

• If the scale factor between two similar triangles is 1:3, and the smaller triangle has a side length
of 6 cm, what is the corresponding side length of the larger triangle?

– Choices: 18cm, 24cm, 12cm, 6cm


– Correct Answer: 18 m

• If there are 12 boys and 8 girls in a class, what is the ratio of boys to girls?

– Choices: 3:2, 2:3, 6:5, 5:6


– Correct Answer: 3:2

Level 10: Sequences and Patterns

• A sequence starts with 2 and follows the rule: each term is triple the previous term. Determine
the 3rd term.
Nabus, Remos, Wood Appendix A. Question Bank 72

– 24, 18, 6, 12

– Correct Answer: 18

• Starting with 32, a sequence follows the rule: each term is half of the previous term. Determine
the 4th term.

– Choices: 2, 4, 8, 12
– Correct Answer: 4

• A sequence starts with 3 and follows the rule: each term is double the previous term. Find the
5th term.

– Choices: 24, 96, 48, 192


– Correct Answer: 48
Appendix B

Survey Questions

B.1 In-Game Survey Questions


• Do you feel bored right now? (Interest)

• Do you find the game enjoyable? (Enjoyment)

• How hard are you concentrating? (Concentration)

B.2 Post-Game Survey Questions


• How interesting was the game? (Interest)

• How fun was the game? (Enjoyment)

• How did the game provide content that focused your attention? (Concentration)

73
Appendix C

Tabulation of the Collected Data

Figure C.1: Tabulated Data for IRT 1

74
Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 75

Figure C.2: Tabulated Data for IRT 2

Figure C.3: Tabulated Data for IRT 3


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 76

Figure C.4: Tabulated Data for IRT 4

Figure C.5: Tabulated Data for IRT 5


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 77

Figure C.6: Tabulated Data for IRT 6

Figure C.7: Tabulated Data for IRT 7


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 78

Figure C.8: Tabulated Data for IRT 8

Figure C.9: Tabulated Data for IRT 9


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 79

Figure C.10: Tabulated Data for IRT 10

Figure C.11: Tabulated Data for IRT 11


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 80

Figure C.12: Tabulated Data for IRT 12

Figure C.13: Tabulated Data for IRT 13


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 81

Figure C.14: Tabulated Data for IRT 14

Figure C.15: Tabulated Data for NN 1


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 82

Figure C.16: Tabulated Data for NN 2

Figure C.17: Tabulated Data for NN 3


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 83

Figure C.18: Tabulated Data for NN 4

Figure C.19: Tabulated Data for NN 5


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 84

Figure C.20: Tabulated Data for NN 6

Figure C.21: Tabulated Data for NN 7


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 85

Figure C.22: Tabulated Data for NN 8

Figure C.23: Tabulated Data for NN 9


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 86

Figure C.24: Tabulated Data for NN 10

Figure C.25: Tabulated Data for NN 11


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 87

Figure C.26: Tabulated Data for NN 12

Figure C.27: Tabulated Data for NN 13


Nabus, Remos, Wood Appendix C. Tabulation of the Collected Data 88

Figure C.28: Tabulated Data for NN 14

Figure C.29: Tabulated Data for NN 15


Appendix D

Post-Survey Form Screenshot

Figure D.1: Post-Survey Form

89
Appendix E

Data Gathering Documentation

Figure E.1: Participants Answering the Quiz Game

90
Appendix F

Screenshots of Emails when Data


Collection is Delayed

Figure F.1: First email regarding schedule changes with date

91
Nabus, Remos, Wood Appendix F. Screenshots of Emails when Data Collection is Delayed 92

Figure F.2: Figure D.1 - Zoomed in to see content

Figure F.3: Second email regarding schedule changes with date

Figure F.4: Figure D.3 - Zoomed in to see content


REFERENCES

[1] M. Abbasi, A. Shahraki, and A. Taherkordi, Deep learning for network traffic monitoring
and analysis (ntma): A survey, Computer Communications, 170 (2021).

[2] S. Adipat, K. Laksana, K. Busayanon, A. Ausawasowan, and B. Adipat, Engaging stu-


dents in the learning process with game-based learning: The fundamental concepts, International
Journal of Technology in Education, 4 (2021), pp. 542–552.
[3] AltexSoft, Iterative Process in Agile: Optimizing Software Development, AltexSoft Blog,
(2023).

[4] AngloInfo, Schooling and education in the philippines, AngloInfo Philippines, (2023).
[5] M. H. Batista, J. Barbosa, J. Tavares, and J. Hackenhaar, Using the item response
theory irt for educational evaluation through games, International Journal of Information and
Communication Technology Education, 9 (2013), pp. 27–41.

[6] D. Ben Or, M. Kolomenkin, and G. Shabat, Dl-dda – deep learning based dynamic diffi-
culty adjustment with ux and gameplay constraints, (2021).
[7] T. Borer, Feedback loops in agile: A powerful way to deliver value, Agile Rant, (2023).
[8] P. Borkar, What is adaptive learning? benefits and challenges of adaptive learning, Master-
Soft, (2022).
[9] D. Cangemi, The reasons why you must use visual studio code — by denis cangemi
— dev genius. https://blog.devgenius.io/the-reasons-why-you-must-use-visual-studio-code-
b522f946a849, July 2020. (Accessed on 09/22/2023).
[10] L. Chen, Q. Wang, and S. Liu, Adaptive language learning application using neural network
algorithms, ACM Transactions on Computer-Human Interaction, 25 (2020), pp. 312–325.
[11] K. Cieślak and D. Zalewski, Game difficulty level – how to choose it properly?, 10 2021.
[12] Collimator, What is item response theory? https://www.collimator.ai/reference-
guides/what-is-item-response-theory: :text=ItemJuly 2023.

[13] S. Deland, The beautiful intersection of simulation and ai — venturebeat.


https://venturebeat.com/ai/the-beautiful-intersection-of-simulation-and-ai/, December 2022.
(Accessed on 09/22/2023).

93
REFERENCES 94

[14] C. E. Dictionary, adaptive, No year. Website title: ADAPTIVE — definition in the Cam-
bridge English Dictionary, Date accessed: September 7, 2023.
[15] N. Dorfner and R. Zakerzadeh, Teaching tips academic games as a form of increasing
student engagement in remote teaching, (2021).
[16] F. J. Durán, R. Molina-Carmona, and F. Llorens, Measuring the difficulty of activities
for adaptive learning, Universal Access in the Information Society, 17 (2018), pp. 1–14.

[17] Fintelics, Ai in game difficulty adjustment: Adapting challenges to player skill levels — by
fintelics — medium. https://fintelics.medium.com/ai-in-game-difficulty-adjustment-adapting-
challenges-to-player-skill-levels-b7f7767c96b, May.
[18] K. Flegal, J. Ragland, and C. Ranganath, Adaptive task difficulty influences neural
plasticity and transfer of training, NeuroImage, 188 (2018).
[19] M. Gök and M. Inan, Sixth-grade students’ experiences of a digital game-based learning envi-
ronment: A didactic analysis, JRAMathEdu (Journal of Research and Advances in Mathematics
Education), 6 (2021), pp. 142–157.
[20] M. Haagsman, K. Scager, J. Boonstra, and M. Koster, Pop-up questions within edu-
cational videos: Effects on students’ learning, Journal of Science Education and Technology, 29
(2020).
[21] Hazel, Understanding item response theory, EdisonOS Blog, (2023).
[22] M. Hendrix, T. Bellamy-Wood, S. McKay, V. Bloom, and I. Dunwell, Implementing
adaptive game difficulty balancing in serious games, IEEE Transactions on Games, PP (2018),
pp. 1–1.
[23] K. Hori, H. Fukuhara, and T. Yamada, Item response theory and its applications in edu-
cational measurement part i: Irt and its implementation in r, Wiley Interdisciplinary Reviews
Computational Statistics, 14 (2020), p. e1531.

[24] Y. Huang, S. Dang, J. Richey, M. Asher, N. Lobczowski, D. Thomas, E. McLaugh-


lin, J. Harackiewicz, V. Aleven, and K. Koedinger, Item response theory-based gaming
detection, (2022), pp. 251–262.
[25] IBM, What are neural networks? — ibm. https://www.ibm.com/topics/neural-networks.

[26] R. Johns and B. Semah, 7 best programming languages for game development in 2023,
Hackr.io.
[27] A. Johnson and K. Brown, Meta-analysis of neural network-based adaptive difficulty algo-
rithms in educational settings, ACM Transactions on Learning Technologies, 12 (2019), pp. 421–
437.

[28] A. Koskinen, J. McMullen, M. Hannula-Sormunen, M. Ninaus, and K. Kiili, The


strength and direction of the difficulty adaptation affect situational interest in game-based learn-
ing, Computers Education, 194 (2022), p. 104694.
REFERENCES 95

[29] Linkedin, How to design adaptive difficulty systems for games.


https://www.linkedin.com/advice/0/how-can-you-design-adaptive-difficulty-systems-skills-
game-design: :text=AdaptiveAugust 2023.
[30] F. Malik and R. Marwaha, Cognitive development, StatPearls [Internet], (2023).
[31] M. Marraffino, B. Schroeder, N. Fraulini, W. Buskirk, and C. Johnson, Adapting
training in real time: An empirical test of adaptive difficulty schedules, Military Psychology, 33
(2021), pp. 1–15.
[32] Merriam-Webster, Difficulty definition & meaning, No year. Website title: Merriam-
Webster, Date accessed: September 7, 2023.
[33] V. Mirata, F. Hirt, P. Bergamin, and C. Van der Westhuizen, Challenges and contexts
in establishing adaptive learning in higher education: findings from a delphi study, International
Journal of Educational Technology in Higher Education, 17 (2020).
[34] H.-S. Moon and J. Seo, Dynamic difficulty adjustment via fast user adaptation, (2020).
[35] V. Nagalingam and R. Ibrahim, User experience of educational games: A review of the
elements, Procedia Computer Science, 72 (2015), pp. 423–433.
[36] S. Nebel, M. Beege, S. Schneider, and G. D. Rey, Competitive agents and adaptive
difficulty within educational video games, Frontiers in Education, 5 (2020).
[37] N. Nurhidayah, J. Jumaeri, and E. Susilaningsih, Development of video based on pop
up questions integrated religious character human digestive system materials, Jurnal Penelitian
Pendidikan IPA, 7 (2021), pp. 250–255.
[38] C. U. M. S. of Public Health, Item response theory, Columbia University Mailman School
of Public Health, (2023).
[39] K. Pak, M. Polikoff, L. Desimone, and E. Garcı́a, The adaptive challenges of curriculum
implementation: Insights for educational leaders driving standards-based reform, AERA Open,
6 (2020), p. 233285842093282.
[40] Y. Pan, F. Ke, and X. Xu, A systematic review of the role of learning games in fostering
mathematics education in k-12 settings, Educational Research Review, 36 (2022), p. 100448.
[41] J. Y. Park, S. Joo, F. Cornillie, H. Maas, and W. Van den Noortgate, An explana-
tory item response theory method for alleviating the cold-start problem in adaptive learning
environments, Behavior Research Methods, 51 (2018).
[42] T. R. Payene, What is 6th graders’ age?, Educational Tweeks, (2023).
[43] J. Plass, B. Homer, S. Pawar, C. Brenner, and A. Macnamara, The effect of adaptive
difficulty adjustment on the effectiveness of a game to develop executive function skills for
learners of different ages, Cognitive Development, 49 (2019), pp. 56–67.
[44] K. Pliakos, S. Joo, J. Y. Park, F. Cornillie, C. Vens, and W. Van den Noortgate,
Integrating machine learning into item response theory for addressing the cold start problem in
adaptive learning systems, Computers Education, 137 (2019).
REFERENCES 96

[45] E. Romero-Mendez, P. Santana-Mancilla, M. Garcia-Ruiz, O. Montesinos-López,


and L. Anido-Rifón, The use of deep learning to improve player engagement in a video
game through a dynamic difficulty adjustment based on skills classification, Applied Sciences,
13 (2023), p. 8249.
[46] B. Ross, A.-M. Chase, D. Robbie, G. Oates, and Y. Absalom, Adaptive quizzes to
increase motivation, engagement and learning outcomes in a first year accounting unit, Inter-
national Journal of Educational Technology in Higher Education, 15 (2018).
[47] S. Sampayo-Vargas, C. Cope, Z. He, and G. Byrne, The effectivenes of adaptive difficulty
adjustments on students’ motivation and learning in an educational computer game, Computers
Education, 69 (2013), pp. 452–462.
[48] G. Sharma, Discussing the benefits and challenges of adaptive learning and education apps,
January 2019.
[49] D. Shernoff, J. Hamari, and E. Rowe, Measuring flow in educational games and gamified
learning environments, 06 2014.
[50] M. Silva, V. Silva, and L. Chaimowicz, Dynamic difficulty adjustment through an adaptive
ai, 11 2015, pp. 173–182.
[51] M. Singh, P. James, H. Paul, and K. Bolar, Impact of cognitive-behavioral motivation on
student engagement, Heliyon, 8 (2022), p. e09843.
[52] R. Smiderle, S. Rigo, L. Marques, J. Coelho, and P. Jaques, The impact of gamifi-
cation on students’ learning, engagement and behavior based on their personality traits, Smart
Learning Environments, 7 (2020).
[53] J. Smith, A. Johnson, and K. Brown, Enhancing mathematics learning through neural
network-based adaptive difficulty algorithms in educational games, Journal of Educational Tech-
nology, 15 (2018), pp. 187–204.

[54] J. Spacey, 7 types of research & development - simplicable.


https://simplicable.com/productivity/research-and-development, July 2018. (Accessed
on 09/22/2023).
[55] M. Stewart, Neural network optimization. covering optimizers, momentum, adaptive learning
rates, batch normalization, and more., Towards Data Science, (2019).

[56] P. Sunarya, Machine learning and artificial intelligence as educational games, International
Transactions on Artificial Intelligence (ITALIC), 1 (2022), pp. 129–138.
[57] A. Syaiful Adam, Pop-up question on educational physics video: Effect on the learning per-
formance of students, Research and Development in Education, 2 (2022).

[58] N. Thompson, What is the three parameter irt model (3pl)? - assessment systems — online
testing & psychometrics. https://assess.com/three-parameter-irt-3pl-model/, November 2018.
(Accessed on 09/22/2023).
REFERENCES 97

[59] Y. Uesaka, M. Suzuki, and S. Ichikawa, Analyzing students’ learning strategies using item
response theory: Toward assessment and instruction for self-regulated learning, Frontiers in
Education, 7 (2022).
[60] B. Uyen, D. Tong, and N. Lien, The effectiveness of experiential learning in teaching
arithmetic and geometry in sixth grade, Frontiers in Education, 7 (2022), p. 858631.
[61] Wise, The Philippines Education Overview, Wise Blog, (2023).

[62] R. Yılmaz and F. G. Karaoğlan Yılmaz, Augmented intelligence in programming learning:


Examining student views on the use of chatgpt for programming learning, Computers in Human
Behavior: Artificial Humans, 1 (2023), p. 100005.
[63] Y. Zhang, D. Wang, G. Xuliang, Y. Cai, and D. Tu, Development of a computerized
adaptive testing for internet addiction, Frontiers in Psychology, 10 (2019).
[64] M. Zohaib, Dynamic difficulty adjustment (dda) in computer games: A review, Advances in
Human-Computer Interaction, 2018 (2018), pp. 1–12.
VITA

Kenrick John Harvell B. Nabus is a BS Computer Science student of the Department of Computer
Science at the Ateneo de Naga University.

Mary Angelette M. Remos is a BS Computer Science student of the Department of Computer


Science at the Ateneo de Naga University.

Matthew Ethan Wood is a BS Computer Science student of the Department of Computer Science
at the Ateneo de Naga University.

98

You might also like