Approaches to Evaluation
A test is a method of measuring a person's ability, knowledge, or performance in a given domain
(Brown 2003; Bachman 1995). Language teaching and testing are intertwined as it helps students
create positive attitude, competitive temperament and mastery of language; it helps teachers in
raising morale, getting reflections, diagnosing error, identifying thrust areas; enhancing
effectiveness, and knowing future course of action (Kubiszyn & Borich 2007, Madsen 1983,
Salvia & Ysseldyke 2007). Language testing approaches are axioms or correlative assumptions
which provide method (a set of testing style), design (norm and domain) and procedure
(technique and administration) to follow (Richards & Rodgers 2001). There are six major
dichotomies in the literatures of language testing and they are – formative versus summative,
direct versus indirect, discrete versus integrative, objective versus subjective, traditional versus
alternative and norm referenced versus criterion-referenced.
Norm-Referenced Test (NRT)
In testing, the scores or a performance of a particular group (the ―norm‖ group) as measured in
some way is known as ‗norm‘. Norms may be used to compare the performance of an individual
or group with the norm group. Norms may be expressed by reference to such factors as age,
grade, region and special need group on a test (Brown 1976; Noll, Scannell & Craig, 1979).
NRT is a test that measures how the performance of a particular test taker or group of test takers
compares with the performance of another test taker or group of test takers whose scores are
given as the norm. Norm-referenced standardized tests can use local, state, or national norms as a
base. A test taker‘s score is, therefore, interpreted with reference to the scores of other test takers
or groups of test takers, rather than to an agreed criterion (Richard & Schmidt 2002).
Hence, NRT is an approach of evaluation through which a learner‘s individual relative rank is
compared to other students in the classroom (Brown 1976; Mrunalini 2013; Salvia & Ysseldik
2007). For example, if a student receives a percentile rank score of 34, this means that he or she
performed better than 34% of the students in the norm group (Bond 1996). Hence, if we
conclude the test performance that a particular student achieved in the classroom as ‗better than
34 percent of the other students‘, it is an approach of evaluating through NRT. NRT tells us
where a student stands compared to other students in her performance and it is only useful to
take certain types of decisions (Bachman 1995; Kubiszyn & Borich 2007).
The examples of NRTs include IQ tests, developmental-screening tests (used to identify learning
disabilities in young children or determine eligibility for special educational services), cognitive
ability tests, readiness tests etc. SAT (Stanford Achievement Test), CAT (California
Achievement Test), MAT (Metropolitan Achievement Test), TOEFL, IELTS etc. are best
practices of NRT. To be simpler, theater auditions, course placement, program eligibility, or
school admissions and job interviews are NRTs because their goal is to identify the best
candidate compared to the other candidates, not to determine how many of the candidates meet a
Tafano Ouke(PhD) Page 1
fixed list of standards. Educators use NRTs to evaluate the effectiveness of teaching programs, to
help determine students' preparedness for programs, and to determine diagnosis of disabilities for
eligibility.
Criterion-Referenced Test (CRT)
The word "criterion" in CRT has been referred in two ways in the literature. One, criterion refers
to the material being taught in the course. Here, CRT would assess the particular learning points
of a particular course or program. Second, criterion is the standard of performance or cut-point
for decision making that is expected for passing the test/course. Here, CRT would be used to
assess whether students pass or fail at a certain criterion level or cut-point (Bond 1996). CRT is a
test that measures a test taker‘s performance according to a particular standard or criterion that
has been agreed upon. The test taker must reach this level of performance to pass the test, and a
test taker‘s score is interpreted with reference to the criterion score, rather than to the scores of
other test takers (Richard & Schmidt 2002). Hence, CRT is an approach of evaluation through
which a learner‘s performance is measured with respect to the same criterion in the classroom
(Brown 1976; Mrunalini 2013; Salvia & Ysseldik 2007). For instance, if we conclude the test
performance that a particular student achieved in the classroom as ‗90 percent‘, it is an approach
of evaluating through CRT. The popular way to show CRT is percentage (Mrunalini 2013;
Salvia & Ysseldik 2007). CRT tells us about a student‘s level of proficiency in or mastery over a
set of skills and help us decide whether a student needs more or less work over a set of skills
saying nothing about the student‘s place compared to other students (Bachman 1995; Kubiszyn
& Borich 2007). For instance, if a test is designed to evaluate how well students demonstrate
mastery of the specified content (e.g. types of tense) it is CRT. Most everyday tests, quizzes and
final exams conducted in the classroom teaching can be taken as CRT. A ‗Basic Writing‘ CRT
would include questions based on what was supposed to be taught in writing classes. It would
not include ‗speaking‘ or ‗advanced writing‘ questions. Students who took ‗Basic Writing‘
course could pass this test if they were taught well and they studied enough and the test was
well-prepared.
COMPARING NRT AND CRT
In NRT, difficulty of items vary from those that no one answers correctly to those that everyone
answers correctly whereas in CRT, the difficulty of items are equivalent to each other (Bond
1996; Kubiszyn & Borich 2007; Linn 2000; Sanders & Horn 1995). NRT covers many
objectives at a time while CRT covers a few objectives to be achieved that is instructed (Bond
1996; Kubiszyn & Borich 2007; Linn 2000; Montgomery & Connolly 1987; Sanders & Horn
1995). In a NRT ‗distracters‘ are constructed whereas in CRT item responses are ‗relevant‘
among each other (Bond 1996; Kubiszyn & Borich 2007; Linn 2000; Sanders & Horn 1995).
The purposes of NRT are screening, diagnosis, classification and placement whereas the
purposes of CRT are skill or objective achievement (Bond 1996; Linn 2000; Montgomery &
Connolly 1987; Sanders & Horn 1995). In NRT, item construction usually does not develop from
Tafano Ouke(PhD) Page 2
task analysis; test items may or may not be related to the objectives of instruction (intervention).
In CRT, items developed from task analysis; test items are related to the objectives of instruction
(Bond 1996; Linn 2000; Montgomery & Connolly 1987; Sanders & Horn 1995). Scoring of NRT
is based on standards relative to a group; variability of scores (ie, means and standard deviations)
which is desired with normal distribution. Scoring of CRT is based on absolute standards;
variability of scores is not obtained because perfect or near-perfect scores are desired (Bond
1996; Linn 2000; Montgomery & Connolly 1987; Sanders & Horn 1995). In a NRT percentile
rank is used for relative ranking whereas in a CRT percent is used for performance (Bond 1996;
Kubiszyn & Borich 2007; Linn 2000; Montgomery & Connolly 1987; Sanders & Horn 1995).
NRTs are breadth but not depth in content specification whereas CRTs are depth but not breadth
in content specification (Bond 1996; Linn 2000; Sanders & Horn 1995).
FORMATIVE EVALUATION
Formative evaluation is used to monitor the learning progress of students during the period of
instruction. Its main objective is to provide continuous feedback to both teacher and student
concerning learning successes and failures while instruction is in process. Feedback to students
provides reinforcement of successful learning and identifies the specific learning errors that need
correction. Feedback to teacher provides information for modifying instruction and for
prescribing group and individual remedial work. Formative evaluation depends on tests, quizzes,
homework, classwork, oral questions prepared for each segment of instruction. These are usually
mastery tests that provide direct measures of all the intended learning " outcomes of the segment.
The tests used for formative evaluation are mostly teacher-made. Observational techniques are
also useful in monitoring student progress and indentifying learning errors. Since formative
evaluation is used for assessing student learning progress during instruction, the results are not
used for assigning course grades.
SUMMATIVE EVALUATION
Summative evaluation is designed to find out the extent to which the instructional objectives
have been achieved usually at the end of a terminal period. It is used primarily for assigning
course grades or for certifying student mastery of the intended learning outcomes at the end of a
particular course programme. The techniques used for summative evaluation are determined by
the instructional objectives. For this evaluation, there are external examinations as well as
teacher-made tests, ratings etc. Although the main purpose of summative evaluation is assigning
grades, it also provides information for judging the appropriateness of the course objectives and
the effectiveness of instruction.
DISTINCTION BETWEEN SUMMATIVE AND FORMATIVE EVALUATION
The terms summative and formative evaluation were for the first time conceptualised by Michael
$criven in his classic (1967) essay on the methodology of evaluation. According to him,
Summative evaluation refers to the assessment of worthwhileness of the instructional programme
Tafano Ouke(PhD) Page 3
which has already been completed, while formative evaluation refers to the assessment or worth
of the instructional programme which is still going on and can still be modified.
A summative evaluator gathers information and judges the merit of overall instructional
sequence to retain or adapt that sequence. The audience of summative evaluation is the consumer
of the instructional programme in contrast to the formative evaluator whose audience is the
designer and the developer of the programme. A formative evaluator is a partisan of the
instructional sequence and does everything to make teaching-learning better. A summative
evaluator is an uncommitted non-partisan person who is to pass judgement on an instructional
endeavour.
A very clear distinction is made between these concepts by Bloom, Hastings and Madaus,
Summative evaluation, according to them is, judgemental in nature. Its purpose is to appraise the
teaching-learning process and to distinguish it from formative evaluation. It is &I end of the
course activity concerned with assessment of the larger instructional objectives of a course or a
substantial chunk of the course. Our public examinations, annual and term tests are all
summative tests used for making summative evaluation. It is a measurement of pupils'
achievement and not of their day-to-day improvement. Thus it is a status evaluation of students.
The major function is that of grading, promoting or certification of achievement. It may take
place at the end of a unit, term or a course of studies. Its emphasis is generally on measurement
of congnitive behaviours, sometimes on psychometor and occasionally on afFective behaviours.
Instrumentation is limited to final or summative examinations, through a weighted sample of
come objectives. The average difliculty level of questions ranges from 35% to 70%. Scoring,
though normally norm-referenced, can also be criterion-referenced. Reporting of scores is by
objectives. Summative evaluation is thus a judgemental activity focused on certification of
students' achievement.
Formative evaluation is developmental, not judgemental in nature. Its purpose is to improve
students learning and instruction. Therefore, its major function is feedback to the teacher and
students to locate strengths and weaknesses in the teaching-learning process in order to improve
it. It operates during instruction and ideally should not be limited to assessment of cognitive
behaviours. All classroom assessments which are not used for grading purpose, whether these are
unit tests, informal tests, questioning during teaching, home assignments, teacher classroom
observations of pupil's responses are examples of formative evaluation. For formative testing,
specially designed instruments are devised. As for judgements or scoring, it is criterion-
referenced, not norm-referenced as in summative evaluation. Decisions are made to relate to
steps to be taken to improve the instructional programme vis-a-vis pupils' learning. Reporting of
pupils’ progress is done in terms of an individual pattern of pass-fail scores on different tasks in
the hierarchy of learning outcomes. Formative evaluation is, therefore, a means of determining
what the pupils have mastered and what is still to be mastered, thereby indicating the basis for
improvement of students learning.
Tafano Ouke(PhD) Page 4
Concept of Educational Assessment
An assessment is a process by which information is obtained relative to some known objectives
or goals, It is a broad term that includes testing. It is also the process of gathering information to
monitor progress and make educational decisions if necessary. Assessment may include a test,
observations, interviews, behavior monitoring etc.(Overton, 2012).To many students and
instructors as well assessment may mean simply giving tests and assigning them grades. So, this
concept not only limits assessment but also limiting different teaching-learning issues, because it
fails to take into account both the utility of assessment and its importance in the teachinglearning
process (Overton, 2012). In the most general sense, assessment is the process of making
judgment or measurement of worth of an entity—person, process, or program. Educational
assessment involves gathering and evaluating data evolved from planned learning activities or
programs. This form of assessment often referred to as evaluation. It also refers to the process of
collecting, interpreting, and synthesizing information to aid in decision making. For many
people, classroom assessment means using paper and pencil tests to grade pupils. However, it is
more than testing. It includes information gathering on pupils, instruction, and classroom climate
by instructors. It includes interpreting and synthesizing those information to help instructors
understand their pupils, plan and monitor instruction, and establish a conducive classroom
atmosphere (Airasian, 1991). Madaus and Kellaghan (1993) also add to say that assessment in
the classroom is highly based on instructors’ observation of students as they go about their
normal learning activities. It implies more than quantifying test results of pupils like
measurement. It is a systematic development of tests and/or examination recording and
interpreting. It involves observational techniques other than testing to collect information on
overall pupils’ performance. Thus, it is broader and inclusive term. According to Farrant(1980),
assessment is the process by which the quality of individuals’ work or performance is judged. In
educational institutions, assessment of learning is usually carried out by teachers or instructors
on the basis of impressions gained as they observe their pupils at work or by various kinds of
tests given periodically.
Purpose of Educational Assessment
Educational assessment is normally conducted by instructors and teachers and designed to serve
several related purposes include: 1)motivating and directing learning, 2)providing feedback to
students on their performance, 3)providing feedback on instruction and /or curriculum, and
ensuring standards of progression. It also helps to: improve learning and instruction; identify
learning difficulties that give learners opportunity to show progresses toward objectives; help
instructors determine the effectiveness, their teaching aids, methods, techniques and learning
materials, provides educational administrators with adequate information about instructors’
effectiveness and the institutions programs as a whole, acquaint parents with their children’s
performance, guiding teaching, estimating correctly the teachers’ effectiveness in teaching;
provides feedback information to administrators to aid counseling and decision making;
suggesting areas of improvement ;indicates continuation of pupils performance.
Tafano Ouke(PhD) Page 5
Concept of Continuous Assessment
When assessment is practiced as an ongoing process, or on a day-to-day basis, it is called a
continuous assessment. It is an assessment system aimed at deriving a student’s final
examination marks and based upon a number of previous assessments on selected syllabus
objectives.It is being used increasingly as an alternative to terminal examinations.Because, it
provides more reliable information than examinations. It builds up a picture of a pupil’s
performance over a prolonged and representative period. Thus, currently, schools and
universities are turning to continuous assessments whereby records of the students’ performance
in nearly everything he/she does during his/her course are kept. These records build up into a
much more complete and reliable assessment of the student than is possible by a single
examination (Farrant, 1980). Furthermore, it is not adequate to use one-shot assessment for
having a clear picture of a pupil’s academic achievement. According to Yoloye(1984)
continuous assessment is a method of evaluating the progress and achievement of students in
educational institutions. It aims to get the truest possible picture of each student’s ability by
helping each student to develop his/her abilities. The process offers a very valuable learning tool.
Through continuous assessment pupils are encouraged to carry out investigations and projects
where they will be involved in their own learning on a very practical basis. Generally, it is both
learning and an assessment system that brings teachers and students together in a cooperative
endeavor to accomplish those critical instructional objectives that need detailed study and
practice(Botswana, 1994). Continuous assessment is a student evaluation system that operates at
the classroom level and is integrated with the instructional process. It includes a variety of
measures (i.e., daily assessing students using observation, oral questions, tests or quizzes, etc)
and procedures that a teacher can use whether his/or her instruction has been effective to target
those students who have and haven’t mastered particular skills. It serves as a foundation for
improved instruction in the classroom. According to Wasanga (1987)continuous assessment
should be objective, systematic, comprehensive, cumulative and guidance-oriented in order to
update judgments about performance of pupils. It is best carried out by the class teacher using
instruments such as exercises, terminal tests, home-take assignments, project works, field works.
Purpose of continuous Assessment
The purpose of continuous assessment focuses on learners’ overall performance and the
teaching-learning process (ICDR, 1994). Concerning learners the specific roles of assessment are
to: investigate the participation of pupils in the learning conditions; analyze the level of
knowledge, skill, and ability of pupils in the different subjects; examine the improvement of
pupils in their classroom performance over a period of time ; accumulate records of progress for
the pupils; determine pupil’s strengths and weaknesses; gather information about a wide range
the pupils’ characteristics or habits or attitudes as a feedback for making decisions ; and
identifying the extent to which pupils have overcome any social, psychological or learning
difficulties. As to the instruction process continuous assessment helps to : determine the extent to
which the educational institutions are working towards achieving their set objectives; provide
Tafano Ouke(PhD) Page 6
information from which instructors can find insights into their own effectiveness in teaching;
find out the degree which the methods of instruction and materials employed are up-to-date and
effective; and give incentives in the learning process.
Areas of Continuous Assessment
Assessing achievement in various subjects using oral examinations and different written tests;
assessing closely university or school related behavioral aspects like participating in the
instruction process, extracurricular activities, and fulfillment of assignments, disciplines,
punctuality and absenteeism by using anecdotal records, rating scales, checklists, interviews etc.
Moreover, assessing general behavioral aspects—characteristics, interests, beliefs, feelings,
attitudes, etc. by using observational techniques and interviews.
Principles and Procedures of Continuous Assessment to be followed by Instructors
According to Yoloye(1984)there are five procedures to be followed by instructors in continuous
assessment as follows:1) they should combine all the scores attained by each student in class
assignment, group works, projects, home works, tests, examination and any other sources used
during instruction to obtain an overall score for a given period. Furthermore, the sources of the
scores to be added together should be carefully planned in advance. In other words, teachers
should plan at the beginning of the year, the number of assignments, tests, and their timing. They
should also decide how many marks are to be assigned to each. A term plan might be as follows:
total marks available =100 marks—2 class tests each 20 marks=40 marks;2 assignments each 10
marks-20 marks; and 1 end of term examination=40 marks. And also the score from each test or
assignment etc. should be used in two ways: to identify each student’s difficulties and help
him/her to learn the things not mastered in the exam before the next lessons are due. A student
can be helped by giving him /her an appropriate reading assignment or by asking another student
who has mastered the topic to help after each school hours. If many students have the same
difficulty, the teacher should spend some time re-teaching the topic to the entire class. To help
teachers assess their own performances and the effectiveness of their teaching so as to find
improved ways of teaching. Fourthly, teachers should keep a close watch on the personality
development of each student. Personality includes: character, temperament, interest, attitude, and
adjustment. A variety of measuring instruments, especially, observational techniques may be
used to measure the personality characteristics of students. Their performance on measures of
personality should contribute to their final assessment. Fifthly, information concerning the
students’ learning and personality characteristics should be used to understand them better and
help them through guidance and counseling to overcome their difficulties and improve their
performance.
Challenges of Continuous Assessment
According to ( Madaus&Kellighan, 1981 ), different universities particularly young universities
have a number of challenges regarding effectiveness of continuous assessment. In some cases,
Tafano Ouke(PhD) Page 7
there are no functional guidelines. Other instructors claim that the weighting system is not clear
—how much of the total 100% should be continuously assessed. Furthermore, leaders,
instructors, and students as well have negative attitude or little attitude towards the importance
and implementation of continuous assessment. Moreover, facilities-related problems and large
class size are criticized to be major problem to employ continuous assessment. And also because
of the vast nature of some contents of courses instructors give more focus for covering the
portion than giving attention for continuous assessment. In addition, lack of experience and
specialization of instructors in areas of non-teaching in some universities become problems for
ineffectiveness of continuous assessment. The negative attitude towards continuous assessment
on the part of instructors, parents, and students is also another serious problem to properly
implement continuous assessment. Lastly, the modular approach and block mode of delivery is
also another problem according to Overton (2012). The effect of increasing class size in tertiary
education is not well understood (Bandiera, 2010). The effects of class size on students’exam
performance by comparing the same students’ performance to his/her own performance in
courses with small and large class sizes. Another challenge for the effective implementation of
continuous assessment is the failure to understand the purpose of assessment—the cognitive,
psychomotor, and affective domains, i.e why to assess on the part of the instructors, and why to
be assessed on the part of students (Bandiera, 2010).
Tafano Ouke(PhD) Page 8