0% found this document useful (0 votes)
17 views20 pages

Chapter 01

thesis of sentiment
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views20 pages

Chapter 01

thesis of sentiment
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

INTRODUCTION

Chapter 1
Introduction
1.1 Overview
Data mining (DM) is the process where data is analysed and summarized into useful information.
In short, data mining is the process of finding correlations or patterns in large databases [1]. DM
consists of analysing large quantities of data to extract previously unknown or hidden interesting
patterns such as similar groups of data records i.e. cluster analysis, detecting anomaly and
dependencies i.e. association rule mining etc. This involves use of database techniques like
spatial indices. These patterns can then be seen as a kind of summary of input data, and may be
used in further analysis through machine learning and predictive analysis. As an example, data
mining step might identify multiple groups in the data, which can then be used to obtain accurate
prediction results by a decision support system.

DM works to analyze data stored in data warehouses or particular data that may come from all
parts of business and from the production to the management [2]. Weiss and Indurkhya have
[3].
According to Technology Forecast [4], and Frawley et al. [5], it is the process of nontrivial
extraction of implicit, previously unknown and potentially useful information such as knowledge
rules, constraints, and regularities from data stored in repositories using pattern recognition
technologies as well as statisti Weiss and Kulikowski have
, is learning a function that maps (classifies) a data item into one of
several predefine [6]. Apte and Hong suggested that classification methods of data
mining are used as part of knowledge discovery applications; which may includes classifying
trends in financial markets, education and identifying objects of interest from large dataset of
images [7]. Regression is a predictive technique that maps data item to a prediction variable.
Clustering is a descriptive task where a finite set of categories or clusters are identified to
describe the data e.g. to identify those students who are short of attendance and have shown poor
performance in sessionals [8], [9], [10]. Cheeseman and Stutz in [11] proposed that examples of
clustering applications in a knowledge discovery context include discovering similar groups and
for DM to be effective, the systems should allow the users to discover information as well as

1|Page
INTRODUCTION

knowledge from their own perspectives. Query languages or graphical user interfaces are
required to express the DM requests and the discovered information or knowledge so that results
from the DM Engine can be understandable and usable for end users.

1.2 Educational System Design


With the rapid development of technologies [12], flexible and efficient learning methods for
learners are being developed. The students usually acquire basic knowledge and core skills in the
classroom. Learning goals and processes always are the same for each student in traditional
classroom. But students [13] with different backgrounds have different needs. The interactions
in the classrooms should therefore, be differentiable and responsive enough to accommodate the
variations according to the readiness levels, interests and learning profiles [14]. In a
traditional classroom, the teacher is the main source of information and students are required to
stay in the same place and participate simultaneously in the same set of activities, whereas in a
situation of ubiquitous learning, activities can be conducted in a different space and time for each
student. In addition, integrated teaching aids are also available to them all the time and are
accessible from any device [12]. The paradigm shift in educational system from traditional
classroom teaching to smart learning environment is shown in Figure 1.1 (a) and 1.1 (b).

1.2.1 Smart Learning Environments

Smart learning is a learning system that provides learners to learn in the real world environment.
In smart learning environment, the use of intelligent technologies such as cloud computing,
learning analytics or big data focuses on how learning data can be captured, analyzed and
directed towards improved learning, teaching and supporting the development of personalized
and adaptive learning [15],[16].

1.2.1.1 Research Framework of Smart Education

The essence of smart education is to create intelligent environments by using smart technologies,
so that smart pedagogies can be facilitated as to provide personalized learning services and
empower learners [17]. Zhu et al. [18] have proposed research framework of smart education as
shown in Figure 1.2. The framework describes three essential elements in smart education: (a)
Smart environments (b) Smart pedagogy (c) Smart learner.

2|Page
INTRODUCTION

Delivering lecture using


Teaching aids like Projectors
Teacher Student
Asking queries

Traditional Approach
(a)
Cloud Computing, Mobile
Computing, Intelligent Learning
Agents

Tutor Feedback/personalized guidance, Learner


Assignments, Tutorials, Multimedia

Web enabled Learning (MOOCs),


YouTube Tutorials, Mobile Apps,
Visualization Tools
Modern Approach
(b)
Figure 1.1 (a), (b) Educational System Design

Smarter
Education
(Ideology)

Smart Learning Smart Pedagogies


Environments

Smart Learners

Figure 1.2 Research Framework of Smart Education [18]

The smart pedagogies consists of (a) Mass-based generation learning, (b) Individual based
personalized learning, (c) Group-based collaborative learning, (d) Class-based differentiated
instruction. A 4-tier architecture of smart pedagogies [18] has been shown in Figure 1.3.

3|Page
INTRODUCTION

Mass-Based Generative
Collective
Learning
Intelligence

Individual Based
Personalized
Expertise Personalized Learning

Comprehensive Group Based


Ability
Collaborative Learning

Basic
Knowledge Class-Based
& Core Skills Differentiated Instruction

Figure 1.3 4-Tier Architecture of Smart Pedagogies [18]

Another smart learning environment framework [19] has been illustrated in Figure 1.4.

Learning Status Learning Portfolios


Detecting Module
U
S Learning Performance Test Bank
E
Evaluation Module
R

Student I Adaptive Learning Task Learner Sheets and


N Module Materials
T
E Adaptive Learning Learning Tools
R Content Module
F
A Personal Learning Learner Profiles
C Support Module
E
Inference Engine Knowledge Base

Figure 1.4 Smart Learning Frameworks [19]


1.2.2 Policies and Practices in Education

Quality teaching involves the use of pedagogical techniques to produce learning outcomes for
students [20]. It involves dimensions like (a) effective design of curriculum (b) course content

4|Page
INTRODUCTION

(c) variety of learning contexts i.e. project-based learning, collaborative learning,


experimentation (d) feedback (e) effective assessment of learning outcomes (f) student support
services (g) student query response systems (h) intelligent tutoring system (i) integration of
degree/diploma programmes with employment/placements (j) adoption of skill oriented courses.

According to Sahlberg, Finland National Education Policies are intended to raise student
achievements built upon ideas of (a) sustainable leadership that place strong emphasis on
teaching and learning (b) intelligent accountability (c) encouraging schools to craft optimal
learning environments (d) implement educational content that best helps their students reach the
general goals of schooling [21].

A study conducted by Slater et al. has found a relationship between observable teacher
characteristics and student performance [22]. The authors investigated whether observable
characteristics of teachers correlate with measures of teacher effectiveness.

The National Council of Teachers of Mathematics in the US described


their students as being the ability to influence their performance [23]. In a South African context,
Fleisch has discussed the relationship between higher levels of teacher resources and student
performance [24].

The relationship between characteristics of a teacher (both qualification and demographic


characteristics) and performance of a student is important for education policy. It is the
responsibility of policymakers to employ best suited and most able teachers to enhance student
performance [25].

1.2.2.1 Pointers to Policies and Practices

Alignment of teaching and learning process as well as student assessment to the teaching
and learning framework.

Viewpoint of s should be included in the development of educational framework.

Engagement of academia, researchers, technocrats, policymakers to share their ideas for


best practices in quality teaching.

Conducting of workshops, seminars, conferences and symposiums for students to give


them exposure and to become the most significant part of learning process.

5|Page
INTRODUCTION

Providing support to faculty members involved in fostering quality education.

Promoting research inspired teaching.

Regular improvement of existing and development of latest support systems using new
tools of ICT and e-learning.

Inclusion of smart learning environment frameworks for imparting quality education.

Recognition and reward for innovative and effective teaching.

Awards for teaching excellence and for producing students with good academic record.

Need to develop pedagogical competencies with the objective of improving quality


education.

Techniques for effective integration of with


smart learning frameworks.

Funding and financial support for carrying out research.

Use of multimedia techniques and access to library, updated books, journals, research
papers, electronic, digital documents to students.

1.3 Role of Data Mining in Education

Educational data mining [26] is the field that uses data mining techniques in educational
environments to strengthen the learning systems. It is playing an important role in educational
systems where education is primary factor for society [27]. Educational data mining is receiving
great attention due to the many reasons such as (a) to increase the quality of education, (b) to
find solution to problems arising from complex educational dataset, (c) competitive environment
among the academic institutions. The main challenge of institutions is to deeply analyze their
performance in terms of student performance, teaching skills and academic activities.

There are some important factors related to students like performance in sessionals, attendance,
lab work etc for analyzing and predicting student class result. Some of the widely used data
mining techniques i.e. decision trees, neural networks, nearest neighbor, naive bayes etc are
being used in educational data mining. Using these techniques, vital useful knowledge can be
discovered through classification, clustering and association rules which becomes helpful in
increasing quality of education [28].

6|Page
INTRODUCTION

Data mining techniques applied on educational data are significant to educational organizations
as well as for students for effective decision support system. These techniques help us in
enhancing our understanding of learning by finding educational trends to improve student
performance, course selection, in-house trainings and faculty development. There are some
factors correlated to academic performance of using linear regression analysis
[29]. According to Liebowitz, adaptive/personalized learning, educational data mining, data
visualization, visual analytics, knowledge management and blended/e-learning play growing
roles to better inform higher education officials and teachers [30].

Educational data mining helps in facilitating utilization of resources related to performance of


students, predicting placement results and finding new educational trends. There are tools like
WEKA, RapidMiner, SPSS, Matlab, Orange, KEEL, Python etc for performing data mining in
the area of education. Slater et al. have reviewed the tools frequently used for data
mining/analytics in the area of education [31]. The field seeks to develop and improve methods
for exploring the educational data in order to discover new insights related to students learning
activities in educational system and helps in improving retention rates of , increases
educational improvement ratio and enhances

Data mining with the support from machine learning, statistical and visualization techniques can
help in finding and extracting knowledge. In order to collect data, questionnaires and feedback
forms are got filled up from students. These forms
approach towards educational patterns or trends, interest towards technologies, teaching
methodologies to be adopted. The data collected is to be analyzed using techniques like decision
tree, neural networks, naive bayes, support vector machines, k-means etc in order to help in
of students, interest in course, prediction of
student retention, prediction of course suitability, and personalized intervention strategy [32].

1.3.1 Necessity of Educational Data Mining

With competitive environment prevailing among the educational institutions, the main objective
of higher education institutes is to disseminate quality education to its students and to improve
the quality of managerial decisions. Quality of education can be improved by gaining knowledge
from educational data which facilitate academic planners in higher education institutes to

7|Page
INTRODUCTION

ssist instructors, to improve teaching and many other benefits


and for achieving this, data mining plays an important role. Data mining is necessary in
organizations to enhance competitive advantage and decision making. Through data mining, we
can share, develop and apply this knowledge for organizational growth. With the usage of data
mining, educational data can be analyzed and it helps in developing model for improving
erformance
and helpful in predicting educational institutions performance taking into consideration
parameters like

Teaching skills
Soft skills
Course content
Infrastructure requirement
Faculty development programmes
Students preference for industrial trainings
Academic trends
Social and emotional learning

1.4 Scope of Data Mining

Data mining technology can generate new business opportunities by providing capabilities like
automated prediction of trends and behaviors. It automates the process of finding predictive
information in large databases. The technologies like Big Data and IOT are emerging fields of
Data Mining. In Big Data, it consists of data which can be structured, semi-structured or
unstructured and the optimized techniques like real time queries are used to respond to queries in
less than second using databases. The data is collected, stored, organized, analyzed to discover
new insights and improve business decisions. The data mining analytics using Big Data involves
different phases for handling the data which involves data collection, storage, data organization,
data analysis, visualization, and action or result utilization. Big data opportunities are capable of
taking up big data jobs include specialized programmers, statistical modelers, advanced
mathematics etc.

8|Page
INTRODUCTION

The data mining future scope involves a career prospect which involves marketing, analysis,
statistics, applied mathematics, data visualization and movement, cloud computing, relational
databases, product placement and management etc.

Further, the scope of data mining is in various fields like:

Public sector undertakings

Academics

Big data organizations

Commercial organizations

Corporate Information Technology companies

Marketing departments and business

Risk management organizations

Education sector

1.5 Genesis of the Problem

Data Mining Techniques are used to extract meaningful information from large volume of data
but the educational environment is least explored as far as these techniques are concerned. The
different classification and clustering techniques are being implemented on other datasets
belonging to other fields for enhancing various parameters of quality but implementation of these
techniques on educational environment for improving institutional effectiveness and enhancing
student/teacher learning process have been used the least so far. Also not much effort is visible in
literature either to gather meaningfully the huge amounts of data being produced by different
categories of educational systems or to use the already gathered data for meaningful mining on a
large scale so as to improve the teaching/learning processes.

Therefore, authors choose to work on Educational data mining which is the emerging field of
Data mining. The techniques like a) Statistics, b) Decision trees, c) K-Means, d) Naïve Bayes, e)
Support vector machines, f) K-Nearest Neighbors, g) Neural Networks etc have not yet been
implemented on the educational datasets. Moreover, no suitable datasets are available in the
public domain for such experimentations. These techniques when implemented on educational

9|Page
INTRODUCTION

datasets for exploring and analyzing the unique types of data that come from educational domain
can be purposefully implored to improve the learning analytics.

These learning analytics in turn can help to learn the students learning dynamics, and may prove
to be helpful for educators and policy makers. Also for carrying out such kinds of explorations,
not much support is available for discovering data patterns through customized or standard
generic tools. Some of the potential application areas where these Data mining techniques have
not applied so far are:

a) Predicting students' performance


b) Guiding students in right course selection
c) Pedagogical support
d) Placement predictions for students
e) Development of computational models for feedback analysis and course preferences
f) Strengthening decision making process for the learners, educators and policy makers.

1.6 Identification of Gap

1. Some of the big challenges our educational system is facing are related to quality and skill
driven education, better placements of students, lack of support in adopting new educational
patterns as per market requirement.

2. Decision-making process becomes more complex with the increase in horizontal and vertical
educational entities. An educational institution requires more efficient approaches to manage
and support decision-making procedures.

3. There are data mining techniques to extract meaningful knowledge from large datasets but
their applications in the area of education sector have largely remained unexplored till now.
These unexplored areas are:

To improve the quality education and

To gather useful information about requirements in current educational system and


based upon its effective analysis, improvement in students learning styles

10 | P a g e
INTRODUCTION

There is a need to introduce latest knowledge related to the educational trends into
the systems.

This knowledge in turn, needs to be extracted from historical and operational


educational datasets using data mining techniques so that these trends can further be
used to fill up the existing gaps in:
a) Students learning style and instructor teaching style
b) Assessment of students
c) Evaluation of teachers/instructors etc
d) Predicting behaviors of students
e) Predicting student placements based on certain parameters like attendance in
class, GPA, technical skills, aptitude, communication skills, lab assignments etc.
f) Learning abilities, knowledge and interests
g) Introduction of skill driven courses into syllabi
h) Adoption of learning analytics
i) Adoption of those patterns into syllabi which becomes helpful to crack the
competitive exams

1.7 Challenges in the Educational Data Classification

In educational sector, there is enormous growth in Big data and the data of educational field is
different from other fields in following terms (a) there are increasing learning resources, (b)
dataset vary in formal/non-formal sector, (c) updation in the curricula, (d) students behavioral
attributes vary, (e) criteria for assessing students vary from institution to institution, (f)
demographic factors, (g) data fetched from web enabled sources like MOOCs, Moodle, e-
learning are different from formal education sector, (h) dataset varies in regular and distance
learning modes, (i) academic and non-academic skills of students, (j) data varies in multiple
streams of engineering like computers, IT, Electronics & communication, electrical, mechanical,
textile, civil etc. The major challenges that exist in the educational data classification are:

Non-availability of relevant datasets

No set of agreed common attributes in datasets

11 | P a g e
INTRODUCTION

Plausibility of attribute values

Checking data completeness

Data redundancy

Collection of new datasets, pre-processing, cleaning and mining

Inability to measure the quality of insights

Regular updates in the educational frameworks

Educational patterns vary from schools to university

Measurement of the quality of output obtained from algorithms

Time span, privacy and security issues

Outlier detection

Input data formats vary in algorithmic tools i.e. data may be accepted in numeric or may
be in string format.

1.8 Problem Definition

With the growth in educational technology, there is exponential increase in digital traces in
education sector along matching with computing power but due to the absence of knowledge
discovery process approaches on educational dataset, it has become really difficult to analyze
and consolidate the data. Therefore, there is an urgent need of analyzing and evaluating this huge
amount of unstructured data using educational data mining techniques for effective decision
making and predicting academic trends.

In present scenario, with the emergence of new technologies, inclusion of various modes of e-
learning and other online educational resources into the teacher-taught paradigm, in the formal as
well as informal education sectors has resulted into a collection of huge volumes of data. For this
structured, semi-structured or unstructured data to make reasonable sense to the stakeholders of
the systems, the emerging trends of data mining need to be explored for processing this data on
distributed systems with parallel computations. With more stress on skilled manpower, quality
education is most important and for that overall performance of students is of great concern to
the field of higher education.

12 | P a g e
INTRODUCTION

Educational data mining techniques are conducive for inspecting educational data. With the
mandatory adoption of mining techniques, education sector is the beneficiary due to faster
decision-making with analyses of data fetched from students.

The data from students and other stakeholders may include a) preferences for the courses, course
outcomes, trainings especially vocational trainings, industry oriented courses as optional
subjects, job profiles, etc. b) choices of the appropriate existing subjects, c) available options at
the national and international levels, d) in-house training needs for the employees and
management and so on. Subsequently, these techniques help in acquiring useful information
pertaining to students such as a) prediction of skilled students, b) finding new educational trends
which are as per industry standards, c) fulfillment of the demand of skilled manpower, d)
targeting learners showing unsatisfactory performance in class, d) updations in syllabi, e) guiding
learners to choose right course to undergo trainings.

1.9 Objectives

1. To propose the prediction of student(s) preferences for selection of courses using existing
significant clustering techniques.

2. To predict the learning behaviors of students and the probability of their placements in
educational system currently being followed.

3. To analyze datasets, classification and pattern recognition for finding academic trends to
support academic and job oriented decisions making by the students.

1.10 Scope of work

Educational Data Mining is an upcoming field related to several well-established areas of


research including e-learning, web mining, text mining etc but still a less explored area. Data
mining techniques are required to be used to analyze educational data and extract useful
information from large amount of data.

Using classification and clustering techniques of data mining, we intended to predict the class
result of students based on the attributes taken. These techniques can help in identifying those
students who show poor performance in sessionals in an educational institution. The main

13 | P a g e
INTRODUCTION

purpose of using these techniques would be to produce actionable outcomes from academic
performances of the students.

The clustering of the students based on some attributes like their class performance, sessionals
and attendance in class etc. are essential for this purpose. It is intended to enhance the decision
making approach to monitor and enhance the performance of students. We have shown that by
increasing the value of clusters, the accuracy becomes better and we can find the better grouping
of the data. It would also help us to cluster those students who need special attention.

Using data mining classification techniques, we can predict the learning behaviors of students
and the probability of their placements in educational system currently being followed.

The classifiers have been implemented on the educational datasets and comparative analysis of
classifiers accuracy has been performed using Python Machine Learning Techniques. This
research has lead to better selection of classifiers for data analysis in future. Apart from it,
emphasis has been laid on the implementation of Big Data Analytics for getting meaningful
information from unstructured data so as to help students in selecting the choices for their
industrial trainings.

In the scope of this work, a tool has been developed using Dot Net Technologies for Association
Rule Mining and for feedback analysis; another tool in PHP to help in in-house training
programmes.

Involvement of classification and pattern recognition for finding academic trends have been
shown to support academic and job oriented decisions making by the students.

1.11 Contributions

The dataset is generated from the feedback obtained from students of NIELIT (National Institute
of Electronics & IT) Shimla centre (erstwhile DOEACC Society), an Autonomous Scientific
Society under the administrative control of Ministry of Electronics & Information Technology
(MoE&IT), Government of India. It has been set up to carry out Human Resource Development
and related activities in the area of Information, Electronics & Communications Technology
(IECT). NIELIT is engaged both in formal & non-formal sectors of education in the area of
IECT besides development of industry oriented quality education and training programmes in the

14 | P a g e
INTRODUCTION

state-of-the-art areas. The Feedback Performa used has been attached as Annexure A-1. The
parameters of feedback had been categorized as, a) teaching skills, b) course content, c)
infrastructure facilities. The statistical techniques such as regression analysis of data mining have
been implemented on this dataset. The results obtained from SPSS package in the form of mean
and standard deviation have contributed to the thesis by improving a) institutional effectiveness,
b) student learning, c) teaching skills of instructors, d) infrastructure quality. The contribution
related to results of these statistical techniques of data mining has been published in the
following paper below:

P. Guleria, M. Arora and M. Sood, "Increasing Quality of Education using Educational


Data Mining", 2nd International Conference on Information Management in the Knowledge
Economy, 2013, pp. 118-122, IEEE.

Subsequently, a survey of relevant literature has been conducted on knowledge discovery


perspective and role of data mining in educational environment. This literature survey related to
educational data mining and techniques has been published as the following research paper:

International Journal of Data Mining & Knowledge Management


Process (IJDKP), Vol.4, No.5, 2014. doi: 10.5121/ijdkp.2014.4504.

After this literature survey, data mining techniques have been applied to the educational dataset
synthesized for the purpose of evaluating the entropy of attributes of data. Therefore, a decision
tree classifier has been implemented on this dataset in order to obtain the following results: (a)
prediction of performance of students in a particular class, (b) identification of students whose
attendance is short and who have performed poorly in sessionals.

It has been shown that the results of WEKA tool and decision tree algorithm produces exactly
same information gain for the root attribute. The following paper based upon this idea has been
published:

P. Guleria, N. Thakur and M. Sood, "Predicting Student Performance using Decision Tree
Classifiers and Information Gain," International Conference on Parallel, Distributed and
Grid Computing, Solan, 2014, pp. 126-129, IEEE. doi: 10.1109/PDGC.2014.7030728.

15 | P a g e
INTRODUCTION

Further in this direction, authors have performed K-means, K-Nearest Neighbor classification
and clustering technique on another dataset synthesized for this purpose and related to
educational environment. The synthesized dataset is based on performance of students in a)
internal exams, b) attendance, c) lab work, d) assignments, and e) overall performance. Using K-
mean clustering technique on this dataset, the following two clusters have been generated (a)
students who are short of attendance and (b) students who have performed poorly in sessionals.
This work also concludes that, on increasing the value of K-clusters, the accuracy becomes better
and it can find the better clustering of the data in the dataset. In K-Nearest Neighbor technique,
using K values, the nearest class for upcoming group of fresh students has been determined
which can help in: a) identifying group of those students who are having good practical as well
as good overall performance in the class, b) strengthens the decision making approach of
instructors to monitor the capabilities of the group. This work on K-means and K nearest
neighbors has been published in the research papers given below:

-
International Journal of Innovations & Advancement in Computer Science, 2347 8616, Vol.
3, Issue 8, 2014.

-Nearest Neighbor: A
International Journal of Computer Science and
Information Security (IJCSIS), Vol. 12, 2016.

-Means
and K-
ISSN: 0974-5572, Vol. 10, No.40, 2017.

The academic trends related to preferable choices for students to undergo industrial trainings
have been predicted using Association rule mining and its importance in academic counseling
has been highlighted. The knowledge has been extracted from a semi-synthesized dataset
especially created for this purpose for the students of engineering background. Using association
rule mining, preferred courses have been extracted from the dataset for students to undergo
industrial trainings. In this work, rules have been discovered using Apriori algorithm which help
instructors a) to find interest of students towards specialized/industry oriented courses and (b) to

16 | P a g e
INTRODUCTION

enhance the effectiveness of academic planning/decision-making. The results related to this work
have been published in research paper given below:

P.
International Journal of Advance Research in Science and
Engineering (IJARSE), ISSN-2319-8354(E), Vol. No.4, special issue (01), 2015.

Further contribution of this thesis includes neural network based clustering and classification
approaches. In this work, experimentation has been performed on the same dataset which is
synthesized for K-means technique. The first approach proposed is based on Self-Organizing
Map (SOM) which is a type of ANN (Artificial Neural Network). In this work, students are
clustered based on certain attributes into natural classes so that similar classes are grouped
together. In second approach, pattern recognition has been carried out through two-layer feed
forward network to classify inputs into a set of target categories. The SOM map learns itself by
recognizing major features in the input data to which they are introduced. The SOM neural
network based clustering and pattern recognition performs data mining by training the network
to identify the classified and misclassified data. The findings have shown that a) the network is
trained properly and (b) for the input data of students, desired target classes are classified
accurately. The confusion matrix results represent the percentage of accurately classified classes
of students which help in a) clustering of students and b) strengthens decision making to
supervise the appraisal of students. This work has been published as per the following details.

Grenze International Journal of Computer


Theory and Engineering, Grenze ID: 01.GIJCTE.1.1.543, 2015.

P. Guleria
International Journal of Control Theory and Applications, ISSN: 0974-5572, Vol. 10, No.40,
2017.

The next contribution is on Bayesian theorem. Using Naïve Bayes theorem, the probability of
placements of students has been predicted from the dataset synthesized for experimentation
purpose. The results shown a) help instructors/educators to update the syllabi/curricula as per
industry needs and b) guide students to focus on skills like quantitative aptitude, reasoning,

17 | P a g e
INTRODUCTION

communication, technical etc. apart from regular studies to improve chances of placement. The
results have been published as given below:

P. Guleria and M. Sood, "Predicting Student Placements using Bayesian


classification", Third International Conference on Image Information Processing (ICIIP),
2015, pp. 109-112.IEEE. doi: 10.1109/ICIIP.2015.7414749.

A Support Vector Machine (SVM) is a supervised data mining technique which is effective in
accuracy of results. The SVM technique has been used on the similar dataset being used in Bayes
theorem to predict the placement results of students. Here, dataset is classified into labels with
cross validation technique applied on it to find the best possible values. The placement results
obtained using SVM technique classify attributes and provide a better insight to students to
update themselves with current academic scenario to get placed. This work has been published as
per the details given below:

Indian Journal of Science and Technology, Vol. 9(34),


2016. doi: 10.17485/ijst/2016/v9i34/100206 (SCOPUS INDEXED).

After this, the authors have developed a web based educational classification tools in Asp.Net
and Php. The predictive rules in the form of preferred courses for students to undergo industrial
trainings have been applied using Apriori algorithm of Association rule mining technique in
Asp.Net as front end and Sql server 2008 as back-end. Another tool has been developed in PHP
as front end and MySQL as backend for feedback analytics categorizing following: (a) teaching
skills, (b) course content, (c) infrastructure of institute, and (d) other deliverables.

-Based Data Mining Tools: Performing Feedback Analysis


International Journal of Data Mining & Knowledge
Management Process (IJDKP), Vol.5, no.6, 2015.

Another contribution is in the emerging field of big data. For this, authors have firstly
synthesized an experimental dataset that consists of courses related to the field of ICT and their
attributes. The dataset is processed through proposed methodology of MapReduce algorithm.
The framework using mapper and reducer functions runs the job in parallel on a single node

18 | P a g e
INTRODUCTION

cluster using Hadoop distributed file system. The file system converts the data into individual
tuples and meaningful data obtained from reducer function. The results and their analysis show
that MapReduce can provide students with the career counseling support which strengthens their
decision-making to opt for the right course(s) for training activities as per industry requirements.
The proposed methodology is going to be the pivotal point in designing and implementing such
support system that will facilitate intelligent decision-making by parents, teachers and mentors
related to the careers of their children/ wards/ students and strengthening of in-house training
programmes.

P. Guleria and M. Sood, "Big Data Analytics: Predicting Academic Course Preference
using Hadoop Inspired MapReduce", Fourth International Conference on Image
Information Processing (ICIIP), 2017, pp. 1-4, IEEE.doi: 10.1109/ICIIP.2017.8313734.

Finally, the authors have performed educational data classification and comparative analysis of
classifiers through python programming on dataset synthesized for three different subjects with
similar attributes. In this work, the following has been done: a) classification of educational
dataset has been performed using different classifiers, b) comparison of precision values of
classifier models, c) validation dataset has been created, and d maximum interest
among the three different courses have been predicted. The work related to this has been
published in research paper given below:

In Proceedings of Futuristic Trends in


Network and Communication Technologies (FTNCT-2018), Springer CCIS Series, ISSN
No.:1865-0929, 2018.

1.12 Organization of the Report

Chapter 2 is all about the literature survey of KDD, Data Mining step of KDD Process, methods
and techniques.

Chapter 3 covers different mining techniques implemented for classification and clustering of
data. The working principles and algorithms of mining techniques have been discussed in this
chapter. The mining techniques discussed in this chapter are as follows a) Statistical techniques,

19 | P a g e
INTRODUCTION

b) Decision trees, c) K-Means clustering, d) Association rule mining, e) Neural networks, f)


Support Vector Machines, g) Bayes classification, h) K-Nearest Neighbors.

In Chapter 4, Research methodology used in this work has been presented. The research
methodology followed has been divided into 5 phases which are as follows: a) Study of related
literature, b) study of functional requirements of education and training followed in an academic
institute, c) datasets are synthesized and predictive analytics of educational trends, decision
making using data mining techniques have been performed, d) classification and clustering
techniques have been implemented over the dataset and the results evaluated, and e) tools
developed for using APRIORI algorithm for Association rule mining in Asp.Net and feedback
analytics in php language.

In Chapter 5, the tools used for finding results using data mining techniques have been discussed.
The tools discussed are: a) WEKA, b) MATLAB, c) SPSS, d) RapidMiner, e) Hadoop
MapReduce framework, f) Python.

In Chapter 6, experiments, results and discussions have been discussed in detail. In this chapter,
analysis and classification of educational data is done for effective decision making. The results
have been obtained using various data mining techniques discussed in Chapter 3.The techniques
applied on the educational dataset are as follows: a) Statistical techniques, b) classification, c)
clustering, d) pattern recognition, e) supervised learning, and f) probabilistic approach.

In Chapter 7, Web based data mining tools have been proposed for Association Rule Mining and
for performing feedback analysis. The tools have been developed in Asp.Net and Php language.
In Chapter 8, Educational data classification using Hadoop Inspired MapReduce framework has
been presented. Here, the data is distributed using Map and Reduce phases for parallel
computation of data. Using MapReduce framework, preferred courses have been predicted for
students to undergo industrial trainings.

In Chapter 9, educational data classification has been performed using Python language. Using
python, multiple classifiers have been implemented over the dataset and are compared in terms
of their precision values. The classifiers implemented over the dataset are as follows a) KNN, b)
SVM, c) CART, d) Linear regression, e) Naïve Bayes, and f) Linear discriminant analysis.

The conclusion and future scope are highlighted in the Chapter 10 followed by references.

20 | P a g e

You might also like