0% found this document useful (0 votes)
99 views6 pages

Predicting The Assessment Course Performance of Criminology Students Using Data Mining

This study aims to determine the performance of criminology students in the assessment course. The research covers the analysis of the performance of Criminology students of Legacy College of Compostela in the Assessment Course with six (6) subject areas. The data were taken from the College of Criminal Justice Education (CCJE) students’ evaluation, Multiple Linear Regression was employed to predict the students’ performance in the assessment course.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views6 pages

Predicting The Assessment Course Performance of Criminology Students Using Data Mining

This study aims to determine the performance of criminology students in the assessment course. The research covers the analysis of the performance of Criminology students of Legacy College of Compostela in the Assessment Course with six (6) subject areas. The data were taken from the College of Criminal Justice Education (CCJE) students’ evaluation, Multiple Linear Regression was employed to predict the students’ performance in the assessment course.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ISSN 2278-3091

Volume 12, No.1, January - February 2023


Eugene P. Iglesias et al., International Journal of Advanced Trends in Computer Science and Engineering, 12(1), January – February
International
2023, 1 - 6 Journal of Advanced Trends in Computer Science and Engineering
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse011212023.pdf
https://doi.org/10.30534/ijatcse/2023/011212023

Predicting the Assessment Course Performance of


Criminology Students Using Data Mining
Eugene P. Iglesias1, Rogelio Badiang2
1
Graduate Student, University of the Immaculate Conception, Philippines,
[email protected]
2
Graduate School Professor, Philippines, University of the Immaculate Conception, [email protected]

Received Date December 15, 2022 Accepted Date: January 18, 2023 Published Date: February 06, 2023

possibly results in achieving the basis of having quality


ABSTRACT education and capabilities of an institution to produce
professionals. Before a criminology student will graduate, an
Living in this changing era where things constantly change assessment course or other call it pre-review course needs to
overtime, transitioning means not a big thing. Obviously, be passed by the students. This course includes six subject
nowadays even learning often takes place outside of areas with different courses under it.
traditional educational settings. This study aims to determine
the performance of criminology students in the assessment The application of data mining methods in the field of
course. The research covers the analysis of the performance education has attracted great attention in recent years. Data
of Criminology students of Legacy College of Compostela in Mining (DM) is the discovery of data. It is the field of
the Assessment Course with six (6) subject areas. The data discovering new and potentially useful information or
were taken from the College of Criminal Justice Education meaningful results from big data [12]. It also aims to obtain
(CCJE) students’ evaluation, Multiple Linear Regression was new trends and new patterns from large datasets by using
employed to predict the students’ performance in the different classification algorithms [2].
assessment course. The data mining study results were
acquired using IBM SPSS as the Modeler to transform the 1.1 Purpose of the Study
data and extract relevant information, which was then used
for the conclusion. Based on the results, it can be concluded This section describes the purpose of conducting the study.
that the subjects of Crime Detection and Investigation and
Law Enforcement Administration significantly influence the 1. To determine the performance of criminology
outcome of the students' assessments. Therefore, to students in the assessment course.
concentrate on reviewing other areas with students, such as
correlational administration, criminalistics, criminal law and 2. To assist educators in determining what subject/s
procedure, and criminal sociology, might be useful in could affect the performance of the students in
improving students' performance. the assessment.

Key words: Data Mining, assessment of course performance, 1.2 Scope and Delimitation of the Study
criminology students, Multiple Linear Regression
This research covers the analysis of the performance of
1. INTRODUCTION Criminology students of Legacy College of Compostela in the
Assessment Course with six (6) subjects such as Crime
The world is constantly changing, which means that Detection and Investigation, Correctional Administration,
nowadays, learning occurs outside of formal school settings. Criminalistics, Law Enforcement and Administration,
Students must choose what and when to study, whether they Criminal Jurisprudence and Procedure, and Criminal
have mastered the content sufficiently to quit practicing it, Sociology for the academic year 2021-2022.
and how to divide their time between other subject [10]. This

1
Eugene P. Iglesias et al., International Journal of Advanced Trends in Computer Science and Engineering, 12(1), January – February 2023, 1 - 6

1.3 Related Literature and Studies (MLP) with back propagation type supervised-learning
algorithm to produce both classification and regression type
A few studies have been made in education data mining for prediction models and decision tree for achieving the highest
discovering different patterns to improve the students’ possible prediction accuracy.
performance. [4] studied the use of data mining techniques
using the Apriori algorithm on a set of students of Istanbul Lastly, in the study of Shahiria et.al. [11] they studied on
Eyup I.M.K.B.Vocational Commerce High School. They predicting students’ performance is mostly useful to help
have taken the dataset of 28 students for 74 courses for educators and learners improve their learning and teaching
minimum support rate 9 and as minimum confidence rate process. It reviewed previous studies on predicting students’
85%. In their study they have revealed that if a student failed performance with various data mining analytical methods.
a particular subject in class 9th then he/she will fail next year Most of the researchers have used cumulative grade point
as well. It discovered the rate of successful students by finding average (CGPA) and internal assessment as data sets. While
the rate of unsuccessful students which will help the student prediction techniques are commonly used in the educational
in choosing the right subject. data mining area. Neural Network and Decision Tree are the
two methods that were used by the researchers for predicting
There are many approaches to prediction student students’ performance.
performance, data mining techniques are one of the most
well-recognized and the most well-used techniques in data 2. METHODOLOGY
mining are classification and regression. The researchers
applied them to forecast student performance, Strecht et.al. This section describes the details of the dataset, pre-
[6] conducted to predict students’ grades in their work and processing techniques, and machine learning algorithms
their results (pass/fail). To predict the student’s results, a employed in this study.
classification model was applied, while a regression model
was used to forecast the grades. For classification, decision 2.1 Dataset
trees and SVM were used, and for regression analysis, SVM
Educational institutions regularly store all data that are
Random Forest and AdaBoost.R2 were used. Based on their
available about students in the electronic medium using the
study, the classification model was shown to be capable of
Enrollment System. Data is stored in databases for
obtaining useful patterns, while regression methods were
unsuccessful to prevail over a simple baseline. processing. These data can be of many types and volumes,
from students’ demographics to their academic performance.
In addition to Jayakumar [8] in educational data of college In this study, the data were taken from the College of Criminal
students, the result classifies the student. Based on previous Justice Education (CCJE) students’ evaluation, where all
student results, they predict the future student result using J48, student records are stored. In these records, the final grades of
NB, MLP, and random forest; however, classification 330 students who have taken all the professional courses were
accuracy is not very high. However, Naïve Bayes has a correct selected as the dataset.
classification accuracy compared to rest three. The algorithm
was trained on the same data of the 2007 batch of 51 students 2.2 Data Identification and Collection
with the use of the software tool Weka used. The algorithms
At this phase, it is determined from which source the data will
were tested on 2 A given table is showing the comparison of
be stored, which features of the data will be used, and whether
classification accuracy of these algorithms on the training as
the collected data is suitable for the purpose. Feature selection
well as test data. The classification accuracy of these
involves decreasing the number of variables used to predict a
algorithms is very high for training, however, it gets reduced
particular outcome. The goals are to facilitate the
for cross-validation. Among these algorithms, Naïve Bayes is
interpretability of the model, reduce complexity, increase the
giving good classification accuracy.
computational efficiency of algorithms, and avoid overfitting.
Furthermore, Sen and Ucar [9] compared the achievements of
Computer Engineering Department scholars in Karabük The data is composed of criminology students who are able to
University according to some factors similar as age, gender, have a pre-review exam. It has been collected and processed
type of high academy scale and the scholars studying in based on the results of their pre-review subject areas. It is
distance education or regular education through data mining composed of 6 subjects. The data is stored in excel sheets,
ways. They've taken the dataset of 3047 records. In their study composed of 33 students. As for this reason, the data of the
they have used NN architecture called multilayer perceptron students has been multiplied by 10 in order to come up with

2
Eugene P. Iglesias et al., International Journal of Advanced Trends in Computer Science and Engineering, 12(1), January – February 2023, 1 - 6

the targeted number of data. No backlog student data is taken each variable are illustrated in Figure 1. Moreover, there are
as the researcher wants to concentrate on students who had 330 total number of students to be observed in the study.
taken the pre-rev exam. In this study the researcher mainly
concentrated on the different subjects and exams conducted
by the college.

2.3 Establishing DM model and implementation of the


algorithm

Multiple Linear Regression was employed to predict the


relationship of the students’ performance and the assessment
course. Regression is a supervised machine learning
technique that uses a training dataset to predict outcomes,
which is similar to how classification works [5]. The output Figure 1. Descriptive Statistics
variable in classification is categorical, whereas the output
variable in regression is numerical. 3.2 Model Summary

The information about the model's properties is provided in


Multiple linear regression computes the t statistic of the the Model Summary. In Figure 2, R-value was .712 which
overall model, the associated p value (how likely it is that the clearly shows that there is a correlation between the
t statistic would have occurred by chance if the null dependent and independent variables.
hypothesis of no relationship between the independent and
dependent variables was true), and the regression coefficients The total variation for the dependent variable that could be
that result in the smallest overall model error in order to explained by the independent variables is displayed using the
determine the best-fit line for each independent variable [3]. R-square value [7]. In this case, it shows .507 which means
that the model is effective enough to determine the
The DM process serves two main purposes. The first purpose relationship.
is to make predictions by analyzing the data in the database
(predictive model). The second one is to describe behaviors Adjusted R-square resulted in .498 which is considered good
(descriptive model). In predictive models, a model is created and shows the generalization of the result.
by using data with known results. Then, using this model, the
result values are predicted for datasets whose results are
unknown. In descriptive models, the patterns in the existing
data are defined to make decisions.
Figure 2. Model Summaryb
3. EXPERIMENTS AND RESULTS
3.3 ANOVA
After determining the important variables and gathering the
needed data, the researcher ran the experiment and the data Jain & Chetty [7] also mentioned that ANOVA table is used
mining study results were obtained using IBM SPSS as the to assess whether the model is significant enough to predict
Modeler to change the data and extract pertinent information the outcome. As stipulated in the Figure 3, P-value or Sig
that was then used for the conclusion. value resulted in 0.000, which is less than the normal
probability of 0.05, therefore resulting in the result being
3.1 Descriptive Statistics statistically significant.

The researcher chose the Assessment Result as the dependent F-ratio, on the other hand, resulted in 55.357 which is
variable and six (6) subjects such as Crime Detection and considered a representation of the improvement in the
Investigation, Correctional Administration, Criminalistics, prediction of the variable by fitting the model after
Law Enforcement and Administration, Criminal considering the inaccuracy present in the model.
Jurisprudence and Procedure, and Criminal Sociology are the
independent variables. The mean and standard deviation of

3
Eugene P. Iglesias et al., International Journal of Advanced Trends in Computer Science and Engineering, 12(1), January – February 2023, 1 - 6

Figure 3. ANOVAa

3. 4 Coefficient Table Figure 5. Coefficientsa (b)

The Coefficient Table shows the strength of the relationship 3.5 Normal P-Plot of Regression Standardized Residual
i.e the significance of the variable in the model and the
magnitude with which it impacts the dependent variable [1]. The cumulative distribution function (CDF) of the
In this experiment, the significant value should be below the standardized residual is observed and compared to the
tolerable level of significance for the study below 0.05 for a expected CDF of the normal distribution using a probability
95% confidence interval in the study. plot. Additionally, the researcher is testing the normality of
the residuals and not the predictors.
Figures 4 and 5 illustrate the coefficients of the dependent
variable (Assessment Result) and independent variables Figure 6 shows the Normal Probability Plot of the Regression
(Crime Detection and Investigation, Correctional Standardized Residual. As can be seen, there are some
Administration, Criminalistics, Law Enforcement and deviations that happened, meaning there are some points that
Administration, Criminal Jurisprudence and Procedure, and fall less to the trendline, but generally the points seem to
Criminal Sociology). follow the line and with that, it could assume that there is a
normal distribution and that the observed standardized
As stipulated in Figure 5, it shows the significant relationship residuals are normally distributed.
of each independent variable to the dependent variable.
Correctional Administration, Criminalistics, Criminal
Jurisprudence and Procedure, and Criminal Sociology has a
significant value greater than 0.05 which simply means that it
has no significant relationship to the dependent variable. Only
the two (2) independent variables namely; Crime Detection
and Investigation and Law Enforcement Administration got
the <0.05 level of significance with the dependent variable.
With this result, it can be interpreted that whenever there is an
increase in these subject areas, there is also an increase to the
Assessment Result. This proves that only the said variables
have a significant relationship with the Assessment Result.

Figure 6. Normal P-Plot of Regression Standardized


Residual

3.6 Scatter Plot

When using multiple linear regression, the researcher


Figure 4. Coefficientsa (a) presumes that the correlation between the predictors and the
response variable is linear. If this presumption is broken, the

4
Eugene P. Iglesias et al., International Journal of Advanced Trends in Computer Science and Engineering, 12(1), January – February 2023, 1 - 6

linear regression will attempt to fit non-linear data with a REFERENCES


straight line. This can be determined whether the relationships
[1] Abu Shanab, E., & Hammouri, Q. (2017). Exploring the
between the predictors and the outcome are linear using the factors influencing employees’ satisfaction toward e-tax
bivariate plot of the predicted value against residuals (Tharu, systems. International Journal of Public Sector Performance
2019). Figure 6 displays the plot of the standardized residuals Management, 3(2), 169.
against the standardized projected value of assessment result https://doi.org/10.1504/ijpspm.2017.10005371
and six subjects in a scatter plot.
[2] Baker, R. S., & Yacef, K. (2009). The state of educational
In addition, as observed in the graph some of them are really data mining in 2009: A review and future visions. Journal of
bonds and others are far which we called outliers. A scatter Educational Data Mining, 1(1), 3–17.
plot with dots going from lower left to upper right indicates a
positive correlation (as variable x goes up; variable y also [3] Bevans, R. (2022, November 15). Multiple Linear
Regression | A Quick Guide (Examples). Scribbr.
goes up). A scatterplot of z scores also reveals the strength of
https://www.scribbr.com/statistics/multiple-linear-
the relationship between variables. If the dots in the regression/
scatterplot form a narrow band so that when a straight line is
drawn through the band the dots will be near the line, there is [4] Buldua, A. Üçgün,. K, (2010). Data mining application
a strong linear relationship between the variables. on students’ data. Procedia Social and Behavioral Sciences 2
5251–5259. Retrieved from:
https://doi.org/10.1016/j.sbspro.2010.03.855

[5] El Aissaoui, O., El Alami El Madani, Y., Oughdir, L.,


Dakkak, A., & El Allioui, Y. (2020). A Multiple Linear
Regression-Based Approach to Predict Student Performance.
Advances in Intelligent Systems and Computing, 9–23.
https://doi.org/10.1007/978-3-030-36653-7_2

[6] Strecht, Pedro, et al. (2015). ”A Comparative Study of


Classification and Regression Algorithms for Modelling
Students’ Academic Performance.” , International
Figure 7. Scatter Plot
Educational Data Mining Society. Retrieved from:
2015EDM_FalakmasirYRK_final
4. DISCUSSION AND CONCLUSION
[7] Jain, R., & Chetty, P. (2019). How to interpret the results
The researcher’s goal is to predict criminology students'
of the linear regression test in SPSS?. From Project Guru:
assessment performance based on the average scores of their
https://www.projectguru.in/interpret-results-linear-
reviews in various subject areas. This is also to assist every
regression-test-
educator in determining what subject/s could affect the
spss/?fbclid=IwAR3SOGHCnCHwqiD1QHyCvdSmXglk_z
performance of the students; by doing so, it will improve their
6d_eNeW-lICitGb992Fx8aoh8ATvA
learning and teaching technique, and it will give them an idea
of what subject they need to focus on.
[8] Jayakumar, N. & Namdeo, N. (2014). Predicting Students'
Performance Using Data Mining Technique with Rough Set
Based on the findings, the subjects Crime Detection and
Theory Concepts. Retrieved from:
Investigation and Law Enforcement Administration have a
http://dx.doi.org/10.19044/esj.2021.v17n7p
significant relationship with the dependent variable, which is
the students' assessment result. Therefore, other subject areas
[9] Sen, B. & Ucar, E. (2012). Evaluating the achievements
such as Correctional Administration, Criminalistics, Criminal
of computer engineering department of distance education
Jurisprudence and Procedure, and Criminal Sociology are the
students with data mining methods. Procedia Technology 1
subjects that educators should focus on reviewing with their
262 – 267. Retrieved from:
students in order to improve their students' performance in
https://doi.org/10.1016/j.protcy.2012.02.053
assessments.

5
Eugene P. Iglesias et al., International Journal of Advanced Trends in Computer Science and Engineering, 12(1), January – February 2023, 1 - 6

[10] Sense, F., Velde, M. V. D., & Rijn, H. V. (2021).


Predicting University Students’ Exam Performance Using a
Model-Based Adaptive Fact-Learning System (pp. 155-169).
Journal of Learning Analytics.
https://doi.org/10.18608/jla.2021.6590

[11] Shahiria, A.M, Husaina, W. & Rashida, N.A.,, (2015). A


Review on Predicting Student’s Performance using Data
Mining Techniques. Retrieved from:
https://doi.org/10.1016/j.procs.2015.12.157

Tharu, R. P. (2019). Multiple regression model fitted for job


satisfaction of employees working in saving and cooperative
organization. International Journal of Statistics and Applied
Mathematics, 4(4), 43–49.
https://www.mathsjournal.com/pdf/2019/vol4issue4/PartA/4
-2-16-993.pdf

[12] Witten, I. H., Frank, E., & Hall, M. A. (2011). Data


mining practical machine learning tools and techniques (3rd
ed.). Morgan Kaufmann. Retrieved from:
https://doi.org/10.1016/C2009-0-19715-5

You might also like