TDP 301 Eductional Measurements and Evaluation Notes Sept Dec 2023-1
TDP 301 Eductional Measurements and Evaluation Notes Sept Dec 2023-1
SCHOOL OF EDUCATION
LESSON ONE
INTRODUCTION TO EDUCATIONAL MEASUREMENT & EVALUATION
Introduction
Educational measurement and evaluation is the study of methods, approaches and strategies used to
measure, assess and evaluate in the educational setting. Evaluation has been conceived either as the
assessment of the merit and worth of educational programmes (Guba and Lincoln, 1981; Glatthorn, 1987;
and Scriven, 1991), or as the acquisition and analysis of information on a given educational programme for
the purpose of decision making (Nevo, 1986; Shiundu & Omulando, 1992). The course involves the study
of tests, test construction and item construction as well as statistical procedures used to analyze tests and
test results.
Test: A method to determine a student's ability to complete certain tasks or demonstrate mastery of a skill
or knowledge of content. Some types would be multiple choice tests, or a weekly spelling test. While it is
commonly used interchangeably with assessment, or even evaluation, it can be distinguished by the fact that
a test is one form of an assessment. A test or assessment yields information relative to an objective or goal.
In that sense, we test or assess to determine whether or not an objective or goal has been obtained.
Assessment: Assessment is a process by which information is obtained relative to some known objective or
goal. Assessment is a broad term that includes testing. A test is a special form of assessment. Tests are
assessments made under contrived circumstances especially so that they may be administered. In other
Page 1 of 129
words, all tests are assessments, but not all assessments are tests. An assessment may also include methods
such as observations, interviews, behavior monitoring, etc. We test at the end of a lesson or unit. We assess
progress at the end of a school year through testing, and we assess verbal and quantitative skills through
such instruments as the SAT and GRE. Whether implicit or explicit, assessment is most usefully connected
to some goal or objective for which the assessment is designed.
The process of gathering information to monitor progress and make educational decisions
if necessary. Assessment is therefore quite different from measurement, and has uses that suggest very
different purposes.
Assessment of skill attainment is rather straightforward. Either the skill exists at some acceptable level or it
doesn’t. Skills are readily demonstrable. Assessment of understanding is much more difficult and complex.
Skills can be practiced; understandings cannot. We can assess a person’s
knowledge in a variety of ways, but there is always a leap, an inference that we make about what a person
does in relation to what it signifies about what he knows.
Evaluation: Procedures used to determine whether the subject (i.e. student) meets a preset criteria, such as
qualifying for special education services. This uses assessment (remember that
Page 2 of 129
an assessment may be a test) to make a determination of qualification in accordance with a predetermined
criteria. Evaluation is perhaps the most complex and least understood of the terms. Inherent in the idea of
evaluation is "value." When we evaluate, what we are doing is engaging in some process that is designed to
provide information that will help us make a judgment about a given situation. Generally, any evaluation
process requires information about the situation in question. When we evaluate, we are saying that the
process will yield information regarding the worthiness, appropriateness, goodness, validity, legality, etc.,
of something for which a reliable measurement or assessment has been made.
We evaluate every day. Teachers, in particular, are constantly evaluating students, and such evaluations are
usually done in the context of comparisons between what was intended (learning, progress, behavior) and
what was obtained. When used in a learning objective, the definition of evaluate is: To classify objects,
situations, people, conditions, etc., according to defined criteria of quality. Indication of quality must be
given in the defined criteria of each class category.
Measurement refers to the process by which the attributes or dimensions of some physical object are
determined. One exception seems to be in the use of the word measure in determining the IQ of a person,
attitudes or preferences.
However, when we measure, we generally use some standard instrument to determine how big, tall, heavy,
voluminous, hot, cold, fast, or straight something actually is. Standard instruments refer to instruments such
as rulers, scales, thermometers, pressure gauges, etc. We measure to obtain information about what is. Such
information may or may not be useful, depending on the accuracy of the instruments we use, and our skill at
using them.
To sum up, we measure distance, we assess learning, and we evaluate results in terms of some set of
criteria. These three terms are certainly connected, but it is useful to think of them as separate but connected
ideas and processes
Assessment:
This is the use of both formal and informal data gathering procedures to establish the extent to which
learners have gained the required knowledge, skills, values and attitudes following instruction. The results
of an assessment are used in decision making.
Assessment is a process by which information is obtained relative to some known objective or goal.
Assessment is a broad term that includes testing. A test is a special form of assessment. Tests are
assessments made under contrived circumstances especially so that they may be administered. In other
words, all tests are assessments, but not all assessments are tests. An assessment may also include methods
Page 3 of 129
such as observations, interviews, behavior monitoring, etc. We test at the end of a lesson or unit. We assess
progress at the end of a school year through testing, and we assess verbal and quantitative skills through
such instruments as the CAT. Whether implicit or explicit, assessment is most usefully connected to some
goal or objective for which the assessment is designed.
Types of Assessment
Diagnostic Assessment
This is carried out before instruction to determine whether or not students possess certain entry behavior
and during instruction to help the teacher determine the difficulties students are experiencing.
Formative Assessment
It takes place during instruction to provide feedback to teachers and students on students’ progress towards
attainment of desired objectives and to identify areas that need further attention.
Types of formative assessment
a) Oral questions
b) Written Assignments (Take home assignments)
c) Classwork
d) Question/Answer session
e) In- Class activities
f) Student feedback
Summative Assessment
It is carried out at the end of a unit, chapter, term or year to measure student progress during a given time
span. Summative assessment is used mainly for:
(a) Grading learners
(b) Certifying learners
(c) Judging the effectiveness of the teacher
(d) Comparing the performance of students, schools and districts.
Types of Summative assessment
i) Projects
ii) Term papers
iii) End of course examinations
iv) Portfolios
v) Student evaluation of the course
vi) Instructors self evaluation
Page 4 of 129
Importance of assessments to a teacher
More specifically, assessment is a method the teacher uses to make decisions on learners’ progress. It is an
essential process in teaching and learning as it enables the teacher to evaluate the level and extent of
learners’ achievement of the set objectives.
Assessment enables the teacher to;
i. Determine the level of achievement of set objectives
ii. Determines how much knowledge the learners have grasped
iii. Establishes how the learners have mastered skills taught and acquired attitudes
iv. Detects the difficulties and challenges learners are encountering, which forms the basis for
remedial teaching
v. Check on the effectiveness on the use of resources and methods of instruction
vi. Provides basis for learner promotion and reward.
vii. Provide information to school administration, parents and stakeholders for necessary action.
viii. motivating and directing learning
ix. providing feedback to student on their performance
x. providing feedback on instruction and/or the curriculum
None of these people is greater than the other the number is used for identification.
Ordinal Scale
This is the second level of measurement. In ordinal measurement, numbers denote the rank order of the
objects or the individuals. Ordinal measures reflect which person or object is larger or smaller, heavier or
lighter, harder or softer etc.
Socio-economic status is a good example of ordinal measurement because every member of the upper class
is higher in social prestige than every member of the middle and lower class.
The draw back of ordinal measurement is that ordinal measures are not absolute quantities nor do they
convey that the distance between the different rank value is equal. Ordinal measurements do not show the
distances between the values and have no absolute zero point.
Interval Scale
This is the third level of measurement and includes all the characteristics of the nominal and ordinal scale of
measurement. Interval scales provide information about the distance between the units and the ordering of
the magnitude of the measure, but which lack an absolute zero point. The zero is arbitrary for measuring
attitude, aptitude, and temperature.
Ratio Scale
This is the highest level of measurement and has all the properties of nominal, ordinal and interval scales
plus an absolute or true zero point. Common examples of ratio scale are the measures of weight, width,
length, loudness etc.
The purpose of measurement and evaluation in education
Placement of the students
Selecting students for courses
Certification
Stimulating learning
Improving teaching
For research purpose
For guidance and counselling
For modification of curriculum purposes
Page 7 of 129
For purpose of selecting students for employment
For modification of teaching methods
For the purpose of promotions to the student
For reporting students’ progress to their parents
For the award of scholarship and merit awards
For the admission of students into educational institutions
For the maintenance of the students
Methods of Assessment
Students are assessed using different methods. The most commonly used methods are:
Page 8 of 129
Written exercises
These are questions set and administered by teachers to students to determine the extent to which students
have acquired specified knowledge and skills.
Homework assignments
Teachers assign students work to do after classes.
Check-lists
Checklists are inventories of learning tasks that have been completed and level of competence that has been
achieved.
Attitude Scales
They consist of statements with which the pupil may express agreement or disagreement, one such a scale is
the Likert scale. This scale is divided into categories: Strongly agree, Agree, Undecided, Disagree and
strongly Disagree.
Direct Observation
Another method of assessment is direct observation of students’ activities.
Oral Exam
Here the student is interviewed and expected to give an oral response for example languages e.g. French.
Page 9 of 129
Summative assessment is used primarily to make decisions for grading or determine readiness for
progression. Typically summative assessment occurs at the end of an educational activity and is designed to
judge the learner’s overall performance. In addition to providing the basis for
grade assignment, summative assessment is used to communicate students’ abilities to external
stakeholders, e.g., administrators and employers.
Formal assessment occurs when students are aware that the task that they are doing is for assessment
purposes, e.g., a written examination. Most formal assessments also are summative in nature and thus tend
to have greater motivation impact and are associated with increased stress. Given their role in decision-
making, formal assessments should be held to higher standards of reliability and validity than informal
assessments.
Final (or terminal) assessment is that which takes place only at the end of a learning activity. It is most
appropriate when learning can only be assessed as a complete whole rather than as constituent parts.
Typically, final assessment is used for summative decision-making. Obviously, due to its timing, final
assessment cannot be used for formative purposes.
Page 10 of 129
Process assessment focuses on the steps or procedures underlying a particular ability or task, i.e., the
cognitive steps in performing a mathematical operation or the procedure involved in analyzing a blood
sample. Because it provides more detailed information, process assessment is most useful when a student is
learning a new skill and for providing formative feedback to assist in improving performance.
Product assessment focuses on evaluating the result or outcome of a process. Using the above examples, we
would focus on the answer to the math computation or the accuracy of the blood test results. Product
assessment is most appropriate for documenting proficiency or competency in a given skill, i.e., for
summative purposes. In general, product assessments are easier to create than process assessments,
requiring only a specification of the attributes of the final product.
Divergent vs. Convergent Assessment
Divergent assessments are those for which a range of answers or solutions might be considered correct.
Examples include essay tests, and solutions to the typical types of indeterminate problems posed in PBL.
Divergent assessments tend to be more authentic and most appropriate in evaluating higher cognitive skills.
However, these types of assessment are often time consuming to evaluate and the resulting judgments often
exhibit poor reliability.
A convergent assessment has only one correct response (per item). Objective test items are the best example
and demonstrate the value of this approach in assessing knowledge. Obviously, convergent assessments are
easier to evaluate or score than divergent assessments.
Unfortunately, this “ease of use” often leads to their widespread application of this approach even when
contrary to good assessment practices.
Page 11 of 129
Comparison between Assessment and Evaluation
From: Apple, D.K. & Krumsieg. K. (1998). Process education teaching institute handbook.
Pacific Crest
Roles of Assessment and Evaluation
1. Identify problem areas in student achievement so as to strengthen the weak points in the
learning process.
2. Reward good performance
3. Evaluate student’s progress at school and recommend ways to improve student learning
4. Monitor the knowledge, skills and attitudes acquired by students.
5. Evaluate the progress of schools in achieving curriculum objectives
6. Compare levels of achievement among several schools or education districts.
7. Identify curriculum areas that may need study or revision.
8. Provide information to teachers about the effectiveness of the teaching method.
Review Questions
1. What is a test?
2. Explain what is meant by measurement in education
3. Distinguish between measurement, assessment and evaluation
4. Differentiate between assessment and evaluation
5. Why is assessment and evaluation important?
Page 12 of 129
6. Describe the importance of measurement and evaluation in education
7. Analyze the 5 different approaches used in educational assessment and evaluation
LESSON TWO
An objective
Page 13 of 129
audience, Behaviour, Consequence and Degree. (The ABCD of objectives) In addition they
must:
a)Describe what the learner will be doing when demonstrating that he/she has reached the
objective; i.e., What should the learner be able to do? (Performance)
b)Describe the important conditions under which the learner will demonstrate his/her
competence; i.e., Under what conditions do you want the learner to be able to do it?
(Conditions)
c)Indicate how the learner will be evaluated, or what constitutes acceptable performance; i.e.,
How well must it be done? (Criterion)
Objectives are statements of intent specifying behavior expected after learners being subjected to
some learning experience. Educational objectives play an important role in educational
assessment. Well written objectives:
clearly state the behavior that pupils should demonstrate at the end of a period of instruction
The measurement of what education has achieved may be useful in determining what
education should achieve.
They provide guidance and direction to teaching and testing.
Objectives specify in precise terms the student behavior to be measured hence they help
teachers to identify what pupils should learn and indicate to the teacher what questions to
ask.
They also help the teacher in assessing relevant material and act as a standard for measuring
learning outcomes.
They indicate samples of learning outcomes that the test developer is willing to accept as
evidence that the stated instructional objectives have been achieved.
They give direction to education
Page 14 of 129
INSTRUCTIONAL OBJECTIVES
Instructional objectives are statements of what is to be achieved at the end of the instructional
process. They are, therefore, the subject of assessment and evaluation. This chapter discusses the
importance of instructional objectives, and their formulation.
Importance of instructional/learning objectives
Instructional/learning objectives communicative to the learners, instructors and other
interested people what the learners should be able to do at the end of the lesson.
Instructional/learning objectives help learners organize their study and avoid getting lost
(if learners are informed).
Instructional/learning objectives help the teacher plan learning activities with focus
Instructional/learning objectives enable the teacher select the most appropriate teaching
approaches.
Well written Instructional/learning objectives help to take time in developing the lesson.
Instructional/learning objectives form the basis for the development of the instruction by
limiting the content.
Instructional/learning enable the teacher identify appropriate teaching & learning
resources
Instructional/learning form the basis for lesson evaluation.
Page 15 of 129
Specify application criteria -identify any desired levels of speed, accuracy, quality, quantity e.t.c
. For example: Given a calculator, calculate averages from a list of numbers correctly, all the
time. OR
Given a spreadsheet package, compute variances from a list of numbers rounded to the second
decimal point.
Review each learning outcome to be sure it is complete, clear and concise
Categories of Objectives
Bloom et al (ed.) (1964), Krathwol, Masia (1964) and Horrow (1972) categorize components of
learning into what has now become known as the taxonomy of educational objectives.
The categories are:
(a) The cognitive domain
(b) The effective domain
(c) The psychomotor domain
Their main contribution was to emphasize that all the three domains of learning exist, and that
most importantly, the respective domains of learning should be addressed consciously by
Educationists in general and by the classroom teachers in particular. Further than identifying the
three domains of the taxonomy, they sub-categorized the three domains.
The Cognitive Domain
The cognitive domain of educational objectives deals principally with the development of the
learner’s mental abilities or achievements, which include intellectual aptitudes. The taxonomy of
educational objectives identifies six categories of cognitive objectives: knowledge,
comprehension, application, analysis, synthesis and evaluation. Each category is assumed to
include behaviors at the lower levels. Each of these categories is discussed below.
1. Knowledge:
It is defined as “remembering of previously learned material.” At this level students are
expected simply to recall the information previously presented to them. The following
are statements of objectives at the knowledge level.
At the end of the lesson, the student should be able to:
a Define the term “photosynthesis”
b List four conditions necessary for photosynthesis.
Page 16 of 129
From the two examples it is evident that knowledge objectives deal with behavior that is
mere rote learning.
2. Comprehension
It is defined as “the ability to grasp the meaning of material.” The following are
examples of objectives at the comprehension level.
At the end of the lesson the student should be able to:
a Describe the cause of natural changes in landforms.
b Write a two-paragraph summary of the themes of Imbuga’s “Betrayal in the City”.
c Explain what fresh air is.
3. Application
It is defined as “the ability to use learned material in new and concrete situations.”
Examples of objectives at this level are:
At the end of the lesson the student should be able to:
a Explain how we can control road accidents.
b Explain how the chief’s baraza operates.
c Compute the area of a rectangle measuring 60cm by 45 cm, correctly.
4. Analysis
It is defined as the “ability to break down material into its component parts so that its
organizational structure may be understood.” The following are examples of objectives
at the analysis level.
At the end of the lesson, the student should be able to:
a State the causes of the Mau Mau in Kenya
b Explain the advantages and disadvantages of liberalization of the economies of
Eastern African countries.
c Classify people who need help.
d Draw a time-line showing the development of science and technology in the 20 th
century.
5. Synthesis
It is defined as the “ability to put parts together to form a new whole.” The following are
some objectives at the synthesis level.
Page 17 of 129
At the end of the lesson, the student should be able to:
a Conduct a survey of costs of different types of transport.
b Compose a song for use during the national literacy day.
c Briefly describe the factors she would consider in searching for a new water supply
for Nairobi.
6. Evaluation
It is defined as the “ability to judge the value of material for a given purpose.” Evaluation
is the highest level of the cognitive domain. The following are examples of objectives at
the evaluation level.
At the end of the lesson the student should be able to:
a Compare and contrast the activities of non-governmental organizations and local
government authorities in the provision of non-formal education in Kisumu district.
b Justify the effects of traffic police on the reduction of road accidents.
c With specific examples, justify the teaching of mathematics.
d Give reasons for and against the importation of sugar.
The following are action verbs for different levels of the cognitive domain.
1. Knowledge 2. Comprehension 3. Application
define translate use
repeat discuss illustrate
name describe interpret
memorize identify dramatize
arrange locate employ
recall review practice
order explain operate
recognize classify solve
relate recognize
Page 18 of 129
Differentiate Organize Measure
Test Prepare Prove
Relate Create Revise
Analyze Summarize Evaluate
Appraise Design Appraise
Calculate Arrange Value
Criticize Manage Revise
Distinguish Collect Score
Inspect Formulate Select
Experiment Set up Argue
Examine Write Assess
Choose
The Affective Domain
The affective domain is concerned with the development of, or change of values, attitudes,
interests and appreciations. Most educators feel that it is the responsibility of the school to
develop positive values and attitudes. Hence the need for teachers to state affective objectives.
According to the taxonomy, the affective domain consists of five categories according to the
degree of internalization i.e. receiving, responding, valuing, organization and characterization.
Each of these categories is briefly described below:
Receiving. This is the lowest level of the affective domain. It is defined as sensitivity to the
existence of certain phenomena, that is, willingness to receive or attend to them. The following
are examples of objectives at receiving level.
1. The student develops a tolerance for African music.
2. The student patiently listens to a lecture on the dangers of drug abuse.
Responding. This refers to active attending to the phenomena. At this level, learners act out
behaviors that are consistent with people who hold a particular value. (Lorber and Pierce, 1983).
Student responses at this level indicate more than passive listening/attending; they require active
participation. More complete responding would be indicated by a student’s willingness to
engage in an activity, even when allowed a choice. Examples of objectives at the responding
level are:
Page 19 of 129
1. The student indicates interest in protecting the environment by voluntarily reading
magazines designed for people involved in environmental protection.
2. The student demonstrates a commitment to honesty by not cheating.
Valuing. This refers to the worth an individual attaches to an object, phenomenon, or behavior.
It implies something as having worth and consequently revealing consistency in behavior related
to the object or phenomena. Below are some examples objectives stated at this level.
1. The student indicates her commitment to political reforms by writing letters to the press
on the need for political reforms.
2. The student shows commitment to Christianity by becoming a member of one of the
Catholic Organizations in the school.
Organization. Organization is defined as the conceptualization of values and the employment
of these concepts for determining the inter-relationship among values. As ideas are internalized
they became increasingly interrelated and prioritized i.e. they become organized into a value
system. This requires that the student conceptualizes a value by analyzing interrelationships
drawing generalizations that reflect the valued idea. In school, students should be offered
learning opportunities that help them organize their values. Some examples of these learning
opportunities are simulation games, project work and case studies. The following are examples
of objectives at the organization level.
1. The student should form judgements as to whether population and family life education
should be taught.
2. The student should balance his/her argument for and against polygamy.
Characterization. This is the fifth level of the affective domain. At this level, the individual
develops a consistent value system, which becomes part and parcel of his or her life style. Such
individuals would never say “do as I say not as I do” Here are two examples of objectives
stated at this level.
1. The learner should develop a consistent philosophy of life.
2. The student should demonstrate the value of honesty by consistently acting honestly in her
dealings with fellow students.
Behavioral Terms for Objectives in the Major Categories of the Affective Domain
Table 3.1 shows the behavioral terms for objectives in the major categories of the Affective
Domain.
Page 20 of 129
Level of Examples of Behavioral Terms Examples of Questionnaire Items
Internalization
Receiving Take note of value concepts, ask, Would you be presently interested
choose, give, identify, select, in joining a club that discusses the
use, and point to. Bible? Would you like to know
more about Jomo Kenyatta.
Responding Reads willingly, follows Is it usually possible for you to go
instructions, volunteers to help, to Church?
applauds performances, answers,
assists, complies, gives, practices
Valuing Helps protect, campaigns Do you support elderly people?
actively, supports community
organizations, describe,
differentiate, follow, invite,
justify, initiate, select.
Organization Compares codes of conduct, Have any of the books you read
defines limits of behavior, alter, markedly influenced your views
arrange, combine, defend, and about one party system of
prepare. government?
Characterizatio Changes behavior in light of
n value re-organization,
consistently demonstrates
humanitarianism as rated by
peer, act, display, perform,
revise, unify.
Source: Lorber and Pierce, 1983.
Page 21 of 129
physical education, home science, typing and technical subjects. The psychomotor domain is
divided into six categories in ascending order of complexity and sophistication. These are:
1. Reflex movements
2. Basic Fundamental movements
3. Perceptual abilities
4. Physical abilities
5. Skilled movements
6. Non discursive communication.
Reflex Movements. These are involuntary actions elicited as a response to some stimulus and
are not a concern of educators. These movements are either evident at birth or develop with
maturation.
Basic Fundamental Movements. These skills are developed during the first year of life.
Examples of these skills are crawling, standing up and manipulating objects.
Perceptual Abilities. This is the third level of the psychomotor domain. Perceptual abilities are
not observed but depend on cognitive tasks that students are required to perform. Examples of
objectives at the perceptual ability level are:
1. The student should walk the full distance and back across a balance beam without falling.
2. The student should list the names of four musicians playing jazz from an audio recording
of music.
Physical Abilities. The physical abilities include endurance, strength, flexibility and agility.
Below are examples of objectives stated in this level. The student should be able to:
1. Run a hundred meters distance in less than ten seconds (endurance)
2. Execute correctly fifty-four push- ups continuously (strength).
Skilled Movements. They are a result of learning often-complex learning. They result in
efficiency in carrying out a complex movement task. These include simple adaptive skills such
as dancing and typing; compound adaptive skills such as playing tennis, hockey and golf; and
complex adaptive skills such as aerial gymnastic stunts. The following are examples of
objectives in this category:
1. The student should type at a rate of fifty words per minute without making six errors.
2. The student should execute two twisting dives with 100 per cent accuracy.
Page 22 of 129
Non-discursive Communication. This category involves non-verbal communications that are
used to convey a message to an observer. Examples of such movements are postures and facial
expressions. A typical objective in this category is: the student will exhibit appropriate gestures
and facial expressions.
Behavioral Terms for Objectives in the Major Categories of the Psychomotor Domain
Table 3.2 shows the behavioral terms for objectives in the major categories of the Psychomotor.
Table 3.2: Behavioral Terms for Objectives in the Major Categories of the Psychomotor
Domain.
Category Examples of Behavioral Terms
Perceptual abilities Maintains balance, bounces ball, differentiates objects, selects
by size, draws geometric symbols, writes the alphabet, catches
thrown ball, identifies shapes consistently, repeats poem, plays
piano from memory.
Physical abilities Runs 3 kilometers, executes push-ups, extends to toes
Skilled movement Plays the guitar, jumps hurdles, plays tennis, dances to music
Non-discursive Exhibits appropriate gestures and facial expressions, performs
communication original dance.
Source: Adapted from Lorber and Pierce, 1983.
Note: It is important to note that setting objectives is very crucial in assessment of learning
outcomes.
Activities
1. For the same content area, construct two objectives each at the knowledge, comprehension,
application, analysis, synthesis and evaluation levels of the taxonomy of cognitive objectives.
2. Identify ways we measure aspects of achievement, intelligence and classroom conduct.
3. Make up examples in your subject area that illustrate mismatches between what is being
tested and how it is being tested.
Written Exercises
1. Distinguish between cognitive, affective and psychomotor domains.
2. Identify and explain
(a) Five levels of cognition
(b) Five categories of the affective domain
Page 23 of 129
(c) Six categories of the psychomotor domain
LESSON THREE
TESTS IN EDUCATIONAL ASSESSMENT
Introduction
This unit discusses the importance of tests in educational assessment. It is followed by a
description of the steps in test construction. It further describes types of tests, their strengths and
their weaknesses.
Specific Objectives
After studying this unit, you should be able to:
1. Give reasons for testing in the classroom.
2. Describe the steps is test development
3. Explain the various types of tests.
4. Construct various test items
TEST CONSTRUCTION
A test is a collection of items developed to measure human educational or psychological
attributes. It can also be used to make predictions.
Bean (1953) defines a test as an organized succession of stimulate designed to measure
quantitatively or to evaluate qualitatively some mental process, trait or characteristic. For
example the reading ability of a child may be measured with the help of a test specially designed
for the purpose. His/her reading ability score may be evaluated with respect to the average
performance of the reading ability of other children of his/her age or class.
Why Teacher Made Tests
It is important for teachers to know how to construct their own tests because of the following:
The teacher Made Tests can be closely related to a teacher’s particular objectives and pupils
since he/she knows the needs, strengths and weaknesses of his/her students.
The teacher can tailor the test to fit her/his particular objectives fit a class or even fit the
individual pupils.
Page 24 of 129
The classroom tests may be used by the teacher to help him/her develop more efficient
teaching strategies e.g. a teacher develop his/her own tests, administer them to the students as
pre-tests and
(a)Re-teach some of the information assumed known by the students.
(b)Omit some of the material planned to be taught because the students already know it.
(c)Provide some of the students with remedial instruction while giving other students some
enriching experience.
They are used for diagnosis where the teacher diagnoses the pupil’s strengths and
weaknesses.
Types of Test
Achievement Tests
Achievement refers to what a person has acquired or achieved after the specific training or
instruction has been imparted. Hence achievement tests are designed to measure the effects of a
specific program of instruction or training i.e. the extent to which students have learned the
intended curriculum. Examples of achievement tests are Kenya Certificate of Secondary
Education (KCSE) examination and Kenya certificate of primary education (KCPE).
Aptitude Test
Tuckman (1975) defines aptitude as “ a combination of abilities and other characteristics whether
native or acquired, known or believed to be indicative of an individual’s ability to acquire skill or
knowledge in a particular area. On the basis of such abilities, future performance of a child can
be predicted. Aptitude tests are tests or examinations that measure the degree to which students
have the capacity to acquire knowledge, skills and attitudes. The primary purpose of an aptitude
test is to predict what a person can learn and are future oriented.
Criterion-referenced Test
Criterion-referenced tests are tests that measure the extent to which prescribed standards have
been met. They examine students’ mastery of educational objectives and are used to determine
whether a student has learned specific knowledge or skills. Criterion-referenced assessment asks
the question: can student X do Z?
Norm Referenced Tests
Page 25 of 129
These are tests which compare a student’s performance on the test with that of other students in
his/her cohort. For example, scores on a test given to Form II students can be compared to the
scores of other students in Form II. Unlike criterion-referenced tests, norm referenced tests are
not concerned with determining how proficient a student is in a particular subject or skill. The
main problem with making normative comparisons is that they do not indicate what students
know or do not know.
Page 26 of 129
teaching. Students’ performance on the exam will pinpoint
areas where you should spend more time or change your
current approach.
To provide statistics for the course or institution. Institutions often
want information on how students are doing. How many are
passing and failing, and what is the average achievement in class?
Exams can provide this information.
To accredit qualified students. Certain professions demand that
students demonstrate the acquisition of certain skills or knowledge.
An exam can provide such proof – for example, ensuring standards of
progression are met
To find out students’ progress
To diagnose students difficulties
To report students’ progress to the stake holders
To motivate students
To compare performance between class
To select students with a certain aptitude
To predict performance
Measuring growth over time
Ranking pupils in terms of their achievement of particular instructional
objectives
Diagnosing pupils difficulties
Evaluating the teacher’s instructional method
Ascertaining the effectiveness of the curriculum
Encouraging good study habits
Motivating students
The teacher should not hope that because a test can serve many masters it will automatically
serve his/her intended purpose. The teacher must plan for this in advance.
b) What is to be tested?
The next major question the teacher needs to ask himself or herself is what knowledge,
skills and attitudes do I want to measure? Should I test for factual knowledge or should I
test the extent to which my students are able to apply their factual knowledge?
c) Decide on the nature of the content or items to be included
Page 27 of 129
Validity. Make sure your questions address what you want to evaluate.
Realistic expectations. Your exam should contain questions that
match the average student’s ability level. It should also be possible
to respond to all questions in the time allowed. To check the exam,
ask a teaching assistant to take the test – if they can’t completed it in
well under the time permitted then the exam needs to be revised.
Uses multiple question types. Different students are better at different
types of questions. In order to allow all students to demonstrate their
abilities, exams should include a variety of types of questions.
Offer multiple ways to obtain full marks. Exams can be highly
stressful and artificial ways to demonstrate knowledge. In recognition of
this, you may want to provide questions that allow multiple ways to
obtain full marks. For example, ask students to list five of the seven
benefits of multiple-choice questions.
Free of bias. Your students will differ in many ways including
language proficiency, socio-economic background, physical
disabilities, etc. When constructing an exam, you should keep student
differences in mind to watch for ways that the exams could create
obstacles for some students. For example, the use of colloquial
language could create difficulties for students for whom English is a
first language, and examples easily understood by North American
students may be inaccessible to international students.
Redeemable. An exam should not be the sole opportunity to obtain
marks. There should be other opportunities as well. Assignments and
midterms allow students to practice answering your types of questions
and adapt to your expectations.
Demanding. An exam that is too easy does not test students’
understanding of the material.
Transparent marking criteria. Students should know what is expected
of them. They should be able to identify the characteristics of a
satisfactory answer and understand the relative importance of those
characteristics. This can be achieved in many ways; you can provide
feedback on assignments, describe your expectations in class, or post
model solutions on a course website.
Timely. Spread exams out over the semester. Giving two exams one
week apart doesn’t give students adequate time to receive and respond to
the feedback provided by the first exam. When possible, plan the exams
to fit logically within the flow of the course material. It might be helpful
to place tests at the end of important learning units rather than simply
give a midterm halfway through the semester.
Page 28 of 129
Knowledge-Verbs to use in the statement of instructional objectives at this level
include: define, describe, enumerate, identify label, list, match, name, read, select,
reproduce and state.
Comprehension- Learners should be able to classify, cite, convert, describe,
discuss, explain, give examples, paraphrase, restate in own words, summarize,
understand, distinguish and rewrite.
Application- apply, change, compute, modify, predict, prepare, relate, solve,
show, use and produce.
Analysis- at this level, learners should be able to break down, correlate,
discriminate, differentiate, distinguish, focus, illustrate, infer, limit, outline point
out, prioritize, recognize, separate, subdivide, select and compare
Synthesis- Learners at this level can put parts together to form a whole that has a
new meaning or structure. Key words in use for this level of learning include
categories
Evaluation- Learning and the level includes appraising, comparing and
contrasting, defending, judging, interpreting, justifying, discriminating and
evaluating
Page 29 of 129
The purpose is to coordinate the assessment questions with the time spent on any
particular content area, the objectives of the unit being taught, and the level of critical
thinking required by the objectives or state standards.
The teacher should know in advance specifically what is being assessed as well as the level of
critical thinking required of the students. Tables of Specifications are created as part of the
preparation for the unit, not as an afterthought the night before the test. Knowing what is
contained in the assessment and that the content matches the standards and benchmarks in level
of critical thinking will guide learning experiences presented to students. Students appreciate
knowing what is being assessed and what level mastery is required.
Any question on an assessment should require students to do three things: first, access
information on the topic of the question. Second, use that knowledge to complete critical
thinking about the information. Third, determine the best answer to the question asked on the
assessment. A Table of Specifications is a two-way chart which describes the topics to be
covered in a test and the number of items or points which will be associated with each topic.
Sometimes the types of items are described as well. The purpose of a Table of Specifications is
to identify the achievement domains being measured and to ensure that a fair and representative
sample of questions appear on the test.
As it is impossible, in a test, to assess every topic from every aspect, a Table of Specifications
allows us to ensure that our test focuses on the most important areas and weights different areas
based on their importance / time spent teaching. A Table of Specifications also gives us the proof
we need to make sure our test has content validity. Tables of Specifications are designed based
on:
course objectives
topics covered in class
amount of time spent on those topics
textbook chapter topics emphasis and
space provided in the text
Page 30 of 129
2. Break the domain into levels (e.g. knowledge, comprehension, application …)
3. Construct the table
The more detailed a table of specifications is, the easier it is to construct the test.
A table of specification has two dimensions. The first dimension represents the different abilities
that the teacher wants the pupil to display and the second represents the specific content and
skills to be measured.
A Table of Specification for a 15 Item C.R.E Test for Form 3
Moderation of the test: This means examining the items set by a group of teachers in the same
subject area. Questions that are ambiguous are identified and feedback is given about the test
prepared. When moderating, attention should be paid to the following:
Ensure all the questions are within the syllabus
Ensure that the test has face validity i.e. by looking at the questions tell
whether the items are clear and not ambiguous.
Ensure items are free of bias
Check that appropriate action verbs are used.
Check whether the questions can be answered in the time given
Check whether the items sample the syllabus (content validity)
Check whether the marks in the question paper and marking scheme tally.
Page 31 of 129
3. Administering of the Test
During administration you need to:
During administration you need to inform the students in advance about the
test and the time that the test will take place.
Prepare a register of attendance.
Write the start and end time of the test.
Ensure there is enough room and there is enough space in the sitting
arrangement.
Collecting question papers in time from custodian to be able to start the test at
the appropriate time stipulated
Ensure compliance with the stipulated sitting arrangements in the test to
prevent collision between and among the testees
Ensure orderly and proper distribution of question papers to the testees
Make any corrections before the test starts
Do not talk unnecessarily before the test. Testees time should not be wasted at
the beginning of the test with unnecessary remarks, instructions or threat that
may develop test anxiety
It’s necessary to remind the testees of the need to avoid malpractices before
they start and make it clear that cheating will be penalized
Stick to the instructions regarding the conduct of the test and avoid giving hits
to testees who ask about particular items. But make correction or clarification
the testees wherever necessary.
Keep interruptions during the test to a minimum
Page 32 of 129
6. Interpret the scores and you can use statistical tools e.g. the mean, mode and standard
deviation.
Page 33 of 129
Summarize …………………………
Prepare a plan …………………….
Organize …………………………
Describe reasons for selection of …………………………
Argue for and against …………………………………….
Evaluation
Make an ethical judgement ………………………………
Justify ………………………………..
Critically assess this statement …………………………..
N.B.: The table of specification can aid immensely in the preparation of test items, in the
production of a valid and well balanced test.
In the classification of objectives to both teacher and students.
The table is only a guide, it is not designed to be adhered to strictly.
Relating the test Items to the instructional objective. It is important to obtain a match between a
test’s items and the test’s instructional objectives, which is not guaranteed by the table of
specification. The table of specification only indicates the number or proportion of test items to
be allocated to each of the instructional objectives specified.
Example 1
Objective:
The student will be able to differentiate between assessment and measurement.
Test Item
Distinguish between assessment and measurement.
Example 2
Objective
Students can perform mouth-to-mouth resuscitation on a drowning victim.
Test Item:
Describe the correct procedure for administering mouth-to-mouth resuscitation on a drowning
victim.
Page 34 of 129
First example there is a match between one learning outcome and the test item. In the second
there is no match, because describing something is not valid evidence that the person can do
something.
Writing down the Items
After preparing a table of specification, you should construct test items. There are two types of
test items free response type and choice type.
Free Response (Essay Questions)
They require students to provide answers. They are of two types
a) Restricted type
b) Extended type
The restricted type is one that requires the examine to supply the answer in one or two lines and
is concerned with one central concept (Marshall & Hales, 1972)
An extended essay is one where the examinee’s answer comprises several sentences and is
usually concerned with more than one central concept.
Examples
Restricted Type
Describe the meaning of reliability of an educational test.
Extended Type
Describe five characteristics of a good test
Advantages of Restricted Type of Question
They cover a wide content
They are easy to construct
They are easy to mark
Disadvantage
They do not test pupils’ ability to organize or criticize.
Advantages of Extended Type
Easy to set
Improves students writing skills
Help develop student ability to organize and express his/her ideas in a logical and
coherent manner.
Page 35 of 129
Allows for creativity since examinee is required to give a coherent and organized
answer rather than recognize the answer.
Leaves no room for guess work
Tests student’s ability to supply rather than select the correct answer
Disadvantages
Inadequate sampling of the syllabus
It is disadvantageous to pupils with difficulties in expressing themselves
Marking is highly unreliable and varies form scorer to scorer and sometimes within
the same score when asked to evaluate the answer at different time intervals.
Scoring takes a longer time because of the length of the answer.
Open to subjectivity i.e. halo effect thus what influences the marks may not be what was being
tested i.e. handwriting or language.
Page 36 of 129
independence in Kenya. Better Item: Describe five methods used in the struggle for
independence in Kenya.
Suggestions for Grading Essay Items
1. Check your marking scheme against actual responses. Before actually beginning to mark
the exam papers, it is recommended that a few papers are selected at random to a certain
the appropriateness of the marking scheme. It also helps in the up dating of the marking
scheme.
2. Be consistent in your grading. Graders are human and may be influenced by the first few
papers they read and thereby grade them either too leniently or too harshly depending on
their initial mind set (Hales &Tokar, 1975). For this reason it is important that once
grading has started teachers should occasionally refer to the first few papers graded to
satisfy them that standards are being applied consistently. This may be especially true for
those papers read near the end of the day when the reader might be physically and
mentally tired.
3. Randomly shuffle the papers before grading them. Research shows that a student’s essay
grade will be influenced by the position of his/her paper especially if the preceding
answers were either very good or very poor.
It is hence recommended that the examiner shuffles the papers prior to grading to the
bias introduced. It is significant especially if the teacher is working with high and low
level classes and read the best papers first or last.
4. Grade only one question at a time for all papers. To reduce the “halo” effect it is
recommended that teachers grade one question at a time rather than one paper (containing
several responses) at a time. This also makes it possible for the examiner to concentrate
and become thoroughly familiar with one set of scoring criteria and not be distracted by
moving from one question to another.
5. Try to grade all responses to a particular question without interruption. One source of
unreliability is that the grader’s standards may vary markedly from one day to the next
and even from morning to afternoon of the same day. If a lengthy break is taken the
reader should re-read some of the first few papers to re-familiarize him/herself with
his/her grading standards so that she/he will not change them mid-stream.
Page 37 of 129
6. The mechanics of expression should be judged separately form the content. For those
teachers who feel that the mechanics of expression are very important, it is recommended
that they assign a proportion of the question’s value to such factors as legibility, spelling,
handwriting punctuation, and grammar.
The proportion assigned to these factors should be spelt out in the grading criteria and the
students should be informed in advance.
7. Provide comments and correct errors. Although it is time consuming to write comments
and correct errors, it should be done if we are to help the student to become a better
student.
This also helps the teacher in explaining his/her method of assigning a particular grade.
Oral Questions
The oral question is variation of essay. It is well suited for testing students who are unable to
write because of their physical handicaps or in languages.
Advantages
1. They permit the examiner to determine how well the student can synthesize and organize
his ideas and express himself.
2. They require the pupil to know and be able to supply the answer.
3. They allow students to demonstrate their oral competence in mastery of a language.
4. They permit free response by the student.
5. Permits detailed probing by the examiner
6. Students may ask for clarification.
Limitation
1. They provide for a limited sampling of content
2. They have low rater reliability.
3. They are time consuming (only one student can be tested at a time).
4. Do not permit or provide for any record of the examinee’s response to be use for future
action by the teacher and pupil unless the examination process is recorded.
Choice Items
They require controlled response from the candidate and are at times referred to as objective
type. Examples include
Multiple choice
Page 38 of 129
True/false
Matching
Completion
Multiple Choice Item
It consists to two parts (1) the stem, which contains the problem
(2) A list of suggested answers (responses or options).
The incorrect responses are often called distracters.
The correct response is called the Key.
The stem may be stated as a direct question or an incomplete statement.
There are five variations of the multiple-choice item:
Correct answer type
Best answer
Incomplete statement
Multiple response
Negative variety
a) Correct Answer Type:
This is the simplest type of multiple choice item. The student is told to select the one
correct answer listed among several plausible but incorrect options.
Example
When a test item and the objective it is intended to measure match the item
a. Is called an objective item
b. Has content validity
c. Is too easy
d. Should be discarded
b) Best Answer Type:
The directions are similar to those of the single correct answer except the student is told
to select the best answer.
Example
Which of the following is the most important agent of curriculum implementation
a. Teacher
Page 39 of 129
b. Inspectorate
c. Curriculum development center
d. Parents Teachers Associations
c) Incomplete Statement
The stem is an incomplete statement rather than a question. It is best for lower levels.
For example the first president of Uganda was ______________________
a. Kabaka Mutesa I
b. Obote
c. Museveni
d. Idd Amin
d) Multiple Response Type
The candidate is required to endorse more than one response.
Which of the following reasons explain why a teacher needs to prepare a lesson plan in
advance.
i. To enable him/her collect the necessary material in good time.
ii. To enable him/her focus on the questions that pupils are likely to ask.
iii. To keep a meaningful record of what has been taught to a given class.
iv. To visualize and organize a complete learning situation in advance.
a) i &ii
b) iii & iv
c) i, iii & iv
d) i, ii & iii
Negative Variety Item
Here all the responses are correct except one example:
Which of the following is not a good reason for organizing an educational visit?
a. Correlate several school subjects
b. Broaden students experiences beyond the classroom
c. Break the monotony of the class.
d. Arouse the students’ curiosity and develop an inquiring mind.
How to Construct Effective Multiple Choice Items
Page 40 of 129
1. The stem should contain the central problem so that the student will have some idea as to
what is expected to him/her and some tentative answer in mind before he begins to read the
options.
Poor Stem: A criterion reference test. This example is poor because it does not ask a question
or set a task. It is essential that the intent of the item be stated clearly in the stem.
2. Avoid repetition of words in the options. The stem should be written so that the key words
are incorporate in the stem and will not have to be repeated in each option.
Poor: According to Engle’s law
a. Family expenditures for food increase in accordance with the size of the
family.
b. Family expenditures for food decrease as income increases.
c. Family expenditures for food require a smaller percentage of an increasing
income.
d. Family expenditures for food rise in proportion to income.
Better: According to Engle’s Law, Family expenditures for food.
a. Increase in accordance with the size of the family
b. Decrease as income increases
c. Require a smaller percentage of an increasing income
d. Rise in proportion to income
3. Avoid making the key consistently longer or shorter than the distracters.
4. Avoid giving irrelevant clues to the correct answer. The length of the answer may be a clue
others may be of a grammatical nature such as the use of “a” or “an” at the end of a statement
and using a singular or plural subject and/ or verb in the stem with just one or two singular
or plural options.
Page 41 of 129
5. There should be only one key.
6. An item in the test should not reveal the answer to another stem
Example:
Item 1 : The “halo” effect is pronounced in essay tests. The best way to minimize its effects
is to:
a Provide optional questions
b “Aim” the student to the desired response
c Read all responses to one question before reading the responses to the other questions.
d Permit students to write essays at home.
If item 10 – In what type of test is the “halo” effect more operative.
a Essay
b Matching
c True-false
d Short-Answer
The student can obtain the correct answer to item 10 from item 1.
7. The distracters should be plausible and homogeneous. The student should be forced to read
and consider all options. No distracter should be automatically eliminated by the student
because it is irrelevant or a stupid answer.
Advantages of Multiple Choice Tests
Easy to mark/score
Objective marking
Covers a wide content area
Can lend itself to machine scoring
Provides level playing ground for students good in language and those poor in the same
Limitations of Multiple Choice Tests
Susceptible to guess-work
Not fit for measuring arguments, opinions and creativity
Encourages note learning
Difficult to set and takes a long time to prepare
Do not test ability to communicate or organize ideas
Page 42 of 129
True – False Items
These are when items are expressed in the form a declarative statement which is either entirely
true or entirely false.
Advantages
Easy to set and mark
More items can be answered
Samples a wide content area
Favors both the linguistically gifted and poor candidates.
A sure way of detecting misconceptions in learners.
Disadvantages
Susceptible to guess work
Tests trivial facts
Does not test higher cognitive abilities i.e. analysis, evaluation etc.
True or false items are relative
Susceptible to ambiguity and misinterpretation
Suggestions for Writing True-False Items
1 Construct statements that are definitely true or definitely false.
2 Keep true and false statements at approximately the same length and be sure that there
are approximately equal numbers of true and false items.
3 Keys should not fall in a pattern
4 The statement should include only a single issue.
Matching Items
The candidates are asked to match up items in two columns.
The matching exercise is well suited to those situations where one is interested in testing the
knowledge of terms, definitions, dates, events and other matters involving simple relationships.
Example:
For each definition below, select the most appropriate term from the set of terms at the right.
Mark your answer in the blank before each definition.
Definitions Terms
1. A professional judgement of the 1. Behavioral objective
Page 43 of 129
Adequacy of test scores
2. Determination of the amount of some skill 2. Criterion referenced test
Test or trait
3. Specification of what a child must do to in 3. Evaluation
Indicate mastery of a skill
4. A series of tasks or problems 4. Measurement
5. Tests used to compare individuals 5. Norm referenced
6. Test
Advantages of Matching Exercises
Easy to mark and score
Easy to set
Provide level ground for those with good or poor language
Disadvantages
Limited to measuring factual information
Limited to only parts of the content
Susceptible to guess work
Completion Items
Also referred to as supply items
The examinee is expected to complete or fill in the spaces. So that the sentence can be a
complete one.
Example
The longest river in Africa _____________________
Guidelines for writing completion items:
The item should be clear and unambiguous word each item in specific terms with clear
meaning so that the intended answer is the only one possible.
Note: It is important that you prepare a table of specification when constructing your tests. It
ensures balance and comprehensiveness in a test.
Page 44 of 129
Guidelines When Making Marking Schemes
Look at what others have done. Chances are that you are not the only person who teaches
this course. Look at how others choose to assign grades.
Make a marking scheme usable by non-experts. Write a model answer and use this as the
basis for a marking scheme usable by non-experts. This ensures that your TAs and your students
can easily understand your marking scheme. It also allows you to have an external examiner
mark the response, if need be.
Give consequential marks. Generally, marking schemes should not penalize the same error
repeatedly. If an error is made early but carried through the answer, you should only penalize it
once if the rest of the response is sound.
Review the marking scheme after the exam. Once the exam has been written, read a few
answers and review your key. You may sometimes find that students have interpreted
your question in a way that is different from what you had intended. Students may come up with
excellent answers that may be slightly outside of what was asked. Consider giving these students
partial marks.
When marking, make notes on exams. These notes should make it clear why you gave a
particular mark. If exams are returned to the students, your notes will help them understand their
mistakes and correct them. They will also help you should students want to review their exam
long after it has been given, or if they appeal their grade.
1. Marking guidelines are developed in the context of relevant syllabus outcomes and
content.
2. Marks are awarded for demonstrating achievement of aspects of the syllabus outcomes
addressed by the question.
3. Marking guidelines reflects the nature and intention of the question and will be expressed
in terms of the knowledge and skills demanded by the task.
4. Marking guidelines indicate the initial criteria that will be used to award marks.
5. Marking guidelines allow for less predictable and less defined responses, for example,
characteristics such as flair, originality and creativity, or the provision of alternative
solutions where appropriate.
6. Marking guidelines for extended responses uses language that is consistent with the
outcomes and the band descriptions for the subject.
7. Marking guidelines are to incorporate the generic rubric provided in the examination
paper as well as aspects specifically related to the question.
Page 45 of 129
9. Where a question is designed to test higher-order outcomes, the marking guidelines will
allow for differentiation between responses, with more marks being awarded for the
demonstration of higher-order outcomes.
10. Marking guidelines will indicate the quality of response required to gain a mark or a sub-
range of marks.
11. High achievement will not be defined solely in terms of the quantity of information
provided.
12. Optional questions within a paper will be marked using comparable marking criteria.
13. Marking guidelines for questions that can be answered using a range of contexts and/or
content will have a common marking guideline exemplified using appropriate contexts
and/or content.
Item analysis can be a powerful technique available to instructors for the guidance and
improvement of instruction. For this to be so, the items to be analyzed must be valid measures of
instructional objectives. Further, the items must be diagnostic, that is, knowledge of which
incorrect options students select must be a clue to the nature of the misunderstanding, and thus
prescriptive of appropriate remediation.
In addition, instructors who construct their own examinations may greatly improve the
effectiveness of test items and the validity of test scores if they select and rewrite their items on
the basis of item performance data.
Page 46 of 129
Item Analysis Guidelines
Item analysis is a completely futile process unless the results help instructors improve their
classroom practices and item writers improve their tests. Let us suggest a number of points of
departure in the application of item analysis data.
1. Item analysis gives necessary but not sufficient information concerning the
appropriateness of an item as a measure of intended outcomes of instruction. An item
may perform beautifully with respect to item analysis statistics and yet be quite irrelevant
to the instruction whose results it was intended to measure. A most common error is to
teach for behavioral objectives such as analysis of data or situations, ability to discover
trends, ability to infer meaning, etc., and then to construct an objective test measuring
mainly recognition of facts. Clearly, the objectives of instruction must be kept in mind
when selecting test items.
2. An item must be of appropriate difficulty for the students to whom it is administered. If
possible, items should have indices of difficulty no less than 20 and no greater than 80. lt
is desirable to have most items in the 30 to 50 range of difficulty. Very hard or very easy
items contribute little to the discriminating power of a test.
3. An item should discriminate between upper and lower groups. These groups are usually
based on total test score but they could be based on some other criterion such as grade-
point average, scores on other tests, etc. Sometimes an item will discriminate negatively,
that is, a larger proportion of the lower group than of the upper group selected the correct
option. This often means that the students in the upper group were misled by an
ambiguity that the students in the lower group, and the item writer, failed to discover.
Such an item should be revised or discarded.
4. All of the incorrect options, or distracters, should actually be distracting. Preferably, each
distracter should be selected by a greater proportion of the lower group than of the upper
group. If, in a five-option multiple-choice item, only one distracter is effective, the item
is, for all practical purposes, a two-option item. Existence of five options does not
automatically guarantee that the item will operate as a five-choice item.
How well did my test distinguish among students according to the how well they met my
learning goals?
Recall that each item on your test is intended to sample performance on a particular learning
outcome. The test as a whole is meant to estimate performance across the full domain of learning
outcomes targeted.
One way to assess how well your test is functioning for this purpose is to look at how well the
individual items do so. The basic idea is that a good item is one that good students get correct
more often than do poor students. An item analysis gets at the question of whether your test is
working by asking the same question of all individual items—how well does it discriminate? In
short, item analysis gives the teacher a way to exercise additional quality control over their tests.
Page 47 of 129
Well-specified learning objectives and well-constructed items gives teachers a head-start in that
process, but item analyses can give you feedback on how successful you actually were. Item
analyses can also help you diagnose why some items did not work especially well, and thus
suggest ways to improve them (for example, if you find distracters that attracted no one, try
developing better ones). The important test for an item’s discriminability is to compare it to the
maximum possible. How well did each item discriminate relative to the maximum possible for
an item of its particular difficulty level? Here is a rough rule of thumb.
n addition to these and other qualitative procedures, a thorough item analysis also includes a
number of quantitative procedures. Specifically, three numerical indicators are often derived
during an item analysis: item difficulty, item discrimination, and distractor power statistics.
The item difficulty statistic is an appropriate choice for achievement or aptitude tests when the
items are scored dichotomously (i.e., correct vs. incorrect). Thus, it can be derived for true-false,
multiple-choice, and matching items, and even for essay items, where the instructor can convert
the range of possible point values into the categories “passing” and “failing.”
The item difficulty index, symbolized p, can be computed simply by dividing the number of test
takers who answered the item correctly by the total number of students who answered the item.
As a proportion, p can range between 0.00, obtained when no examinees answered the item
correctly, and 1.00, obtained when all examinees answered the item correctly. Notice that no test
item need have only one p value. Not only may the p value vary with each class group that takes
the test, an instructor may gain insight by computing the item difficulty level for a number of
different subgroups within a class, such as those who did well on the exam overall and those who
performed more poorly.
Although the computation of the item difficulty index p is quite straightforward, the
interpretation of this statistic is not. To illustrate, consider an item with a difficulty level of 0.20.
We do know that 20% of the examinees answered the item correctly, but we cannot be certain
why they did so. Does this item difficulty level mean that the item was challenging for all but the
best prepared of the examinees? Does it mean that the instructor failed in his or her attempt to
teach the concept assessed by the item? Does it mean that the students failed to learn the
Page 48 of 129
material? Does it mean that the item was poorly written? To answer these questions, we must
rely on other item analysis procedures, both qualitative and quantitative ones.
Item discrimination analysis deals with the fact that often different test takers will answer a test
item in different ways. As such, it addresses questions of considerable interest to most faculty,
such as, “does the test item differentiate those who did well on the exam overall from those who
did not?” or “does the test item differentiate those who know the material from those who do
not?” In a more technical sense then, item discrimination analysis addresses the validity of the
items on a test, that is, the extent to which the items tap the attributes they were intended to
assess. As with item difficulty, item discrimination analysis involves a family of techniques.
Which one to use depends on the type of testing situation and the nature of the items. I’m going
to look at only one of those, the item discrimination index, symbolized D. The index parallels the
difficulty index in that it can be used whenever items can be scored dichotomously, as correct or
incorrect, and hence it is most appropriate for true-false, multiple-choice, and matching items,
and for those essay items which the instructor can score as “pass” or “fail.”
We test because we want to find out if students know the material, but all we learn for certain is
how they did on the exam we gave them. The item discrimination index tests the test in the hope
of keeping the correlation between knowledge and exam performance as close as it can be in an
admittedly imperfect system.
1. Divide the group of test takers into two groups, high scoring and low scoring. Ordinarily,
this is done by dividing the examinees into those scoring above and those scoring below
the median. (Alternatively, one could create groups made up of the top and bottom
quintiles or quartiles or even deciles.)
2. Compute the item difficulty levels separately for the upper (p upper) and lower (p lower)
scoring groups.
3. Subtract the two difficulty levels such that D = p upper- plower.
How is the item discrimination index interpreted? Unlike the item difficulty level p , the item
discrimination index can take on negative values and can range between -1.00 and 1.00.
Consider the following situation: suppose that overall, half of the examinees answered a
particular item correctly, and that all of the examinees who scored above the median on the exam
answered the item correctly and all of the examinees who scored below the median answered
incorrectly. In such a situation pupper = 1.00 and p lower = 0.00. As such, the value of the item
discrimination index D is 1.00 and the item is said to be a perfect positive discriminator. Many
would regard this outcome as ideal. It suggests that those who knew the material and were well-
prepared passed the item while all others failed it.
Page 49 of 129
Though it’s not as unlikely as winning a million-dollar lottery, finding a perfect positive
discriminator on an exam is relatively rare. Most psychometricians would say that items yielding
positive discrimination index values of 0.30 and above are quite good discriminators and worthy
of retention for future exams.
Finally, notice that the difficulty and discrimination are not independent. If all the students in
both the upper and lower levels either pass or fail an item, there’s nothing in the data to indicate
whether the item itself was good or not. Indeed, the value of the item discrimination index will
be maximized when only half of the test takers overall answer an item correctly; that is, when p
= 0.50. Once again, the ideal situation is one in which the half who passed the item were students
who all did well on the exam overall.
Does this mean that it is never appropriate to retain items on an exam that are passed by all
examinees, or by none of the examinees? Not at all. There are many reasons to include at least
some such items. Very easy items can reflect the fact that some relatively straightforward
concepts were taught well and mastered by all students. Similarly, an instructor may choose to
include some very difficult items on an exam to challenge even the best-prepared students.
Page 50 of 129
The mean is also the number that divides the scores into two equal groups and mode is the score
that occurs most frequently.
The shape of a distribution of your test scores can provide useful clues about your test and your
students’ performance. When representing students’ scores on a graph, the scores often will be
positively or negatively skewed. When the distribution is positively skewed, that implies that the
most frequent scores (the mode) and the median are below the mean. If your test is very difficult,
there may be many low scores and few high ones. The distribution of scores would have a shape
similar to the one depicted below that is positively skewed.
When the tail points to the left, the distribution is negatively skewed. In this distribution there are
high scores and relatively few low scores. Notice that the mean is influenced by the skewing.
Page 51 of 129
The mean can be distorted if there are some scores that are extremely different (outliers) from the
mean of the majority of scores for the group. Consequently, the median is the most descriptive
measure of central tendency.
Indicators of Variability
Variability is the dispersion of the scores within a distribution. Given a test, a group of students
with a similar level of performance on a specific skill tend to have scores close to the mean.
Another group with varying levels of performance will have scores widely spread and further
from the mean. In other words, how varied are the scores? Two common measures of variability
are the range and standard deviation.
Range
The range, R, is the difference between the lowest and the highest scores in a distribution. The
range is easy to compute and interpret, but it only indicates the difference between the two
extreme scores in a set.
If we use the scores from Mr. Walker’s class (above), we would calculate the range as: Range
(R) = the highest score – the lowest score in the distribution.
95 91 100 96 92 91 87 84 70 65 96 65 56 86 43 65 22 40 93
Standard Deviation
A more useful statistic than simply knowing the range of scores would be to see how widely
dispersed different scores are from the mean. The most common measure of variability is the
standard deviation (SD). The standard deviation is defined as the numeric index that describes
Page 52 of 129
how far away from the mean the scores in the distribution are located. The formula for the
standard deviation is:
The higher the standard deviation, the wider the distribution of the scores is around the mean.
This indicates a more heterogeneous or dissimilar spread of raw scores on a scale. A lower value
of the standard deviation indicates a narrower distribution (more similar or homogeneous) of the
raw scores around the mean.
Page 53 of 129
Sensitive to extreme scores
Properties of the Standard deviation
The standard deviation is only used to measure spread or dispersion around the mean of a
data set.
Standard deviation is never negative.
Standard deviation is sensitive to outliers. A single outlier can raise the standard
deviation and in turn, distort the picture of spread.
For data with approximately the same mean, the greater the spread, the greater the
standard deviation.
If all values of a data set are the same, the standard deviation is zero (because each value
is equal to the mean).
Activities
1. Construct five extended essay questions and five restricted essay questions in your
subject area
2. Prepare a table of specification for a 20 item multiple choice test.
Written Exercises
1. Distinguish between testing and assessment
2. Describe the main steps in the development of tests.
3. Outline the main considerations you would bear in mind in the construction of tests.
LESSON FOUR
Page 54 of 129
This unit describes the procedures involved in the administration of tests and examinations. It is
followed by a discussion of the criteria for the awarding of grades for achievement tests.
Specific Objectives
After studying this unit, you should be able to:
1. Identify the main steps in the administration of tests.
2. Explain how emergency cases should be treated in an exam room
3. Distinguish grading using the normal curve and a standard scale.
Practical Procedures involved in the Administration of Tests and Examinations
1. The teacher should inform students several days to the test about the purpose of the test,
areas to be covered by the test and the type of questions contained in the test.
2. He/she should prepare a register of attendance for each test and make sure that students sign
it upon receiving and handing in their answer scripts.
3. The teachers should write on the chalkboard the starting time and the finishing time of the
test.
4. You should ensure that there is enough room between seats to avoid the chance of students
copying from one another.
5. The examination room should be well lighted and ventilated. It should be quite free from
disruptions.
6. During administration you need to inform the students in advance about the test and the time
that the test will take place.
7. Prepare a register of attendance.
8. Write the start and end time of the test.
9. Ensure there is enough room and there is enough space in the sitting arrangement.
10. Collecting question papers in time from custodian to be able to start the test at the appropriate
time stipulated
11. Ensure compliance with the stipulated sitting arrangements in the test to prevent collision
between and among the testees
12. Ensure orderly and proper distribution of question papers to the testees
13. Make any corrections before the test starts
14. Do not talk unnecessarily before the test. Testees time should not be wasted at the beginning
of the test with unnecessary remarks, instructions or threat that may develop test anxiety
15. It’s necessary to remind the testees of the need to avoid malpractices before they start and
make it clear that cheating will be penalized
16. Stick to the instructions regarding the conduct of the test and avoid giving hits to testees who
ask about particular items. But make correction or clarification the testees wherever
necessary.
17. Keep interruptions during the test to a minimum
Page 55 of 129
Good Conditions of Examination Room
Uniform conditions refer to the need for all candidates to be accorded similar situations in the
examination room. All candidates must be treated equally with supplies, time, comforts and
instructions.
Lighting is one of the most important needs in an examination room. It should be adequate to all
in the room. The supervising teacher must satisfy her/himself to this before the examination
starts.
The teacher must also be satisfied that the seating arrangement is proper. Candidates should
preferably not sit at their own desks for others may have coded facts somewhere. The desks
should be completely empty. Candidates should not be able to read the scripts of their fellow
candidates. This is minimized by ensuring that a neighboring candidate sits directly in front,
indirect to the left and directly to the right. The entire desk arrangement should be in the form of
a square matrix.
Equally important is the issue of positioning examination candidates away from external noise,
and distracting human traffic.
Before candidates arrive, the supervising teacher should supply writing paper by placing one on
each desk. This saves time, particularly some, where the examination is a laboratory practical
requiring in addition, the supply of several pieces of equipment.
Page 56 of 129
Procedure at the start of an examination
Examination candidates should be allowed to enter and be seated at their desks at least fifteen
minutes before the starting signal is due. This will enable them to have time to sort out their
basic writing item, and to write their names and numbers on the answer sheets, which should
already be on the candidates’ desks. It also helps in the candidates’ psychological adjustment to
an examination situation.
Before the starting signal is given, the supervising teacher should announce any necessary
instructions. These include, length of the examination, what a candidate should do if he/she
requires additional writing paper, whether time remaining shall be announced, whether a
candidate can leave before the time ends, whether and where there will be a stapler to bind the
answer sheets and whether any papers shall be taken out of the examination room.
A most important announcement concerns any correction, which should be made in the question
papers themselves. It is extremely disorientating to have such corrections announced during the
course of the examination. In any case, some candidates already may have attempted the
particular item. It would disturb them to have to repeat the item in the light of the correction.
When satisfied that all is well, the starting signal should be given it is immediately after this that
the supervising teacher makes not of absent candidates, if any. This exercise helps in resolving
disputes that may arise later over lost scripts, for instance.
Page 57 of 129
given, candidates should stop writing. They should be given a few minutes to arrange their
papers in order, before the scripts are handed in.
Collecting examination scripts can present difficulties. A weak student can make sure he/she
marked present, but fail to hand in his script, thereby putting the teacher on the defensive and
causing embarrassment. A fool-proof method of avoiding this problem calls for the supervising
teacher to position him/herself at the exit, and collect scripts as candidates leave the examination
room. Thereafter, the bundle of scripts should be well secured and stored, pending marking.
A second example of emergency results from sickness or other emergency, which interrupts an
on-going test. If interruptions like this arise, the teacher should consider the likelihood of
consultation among the candidates during the period of interruptions.
Absence are not normal emergencies as described in the preceding paragraphs. It is the school
policy that should be the guiding factor. For, should another paper be prepared for the candidate
who turns up a day later? In the absence of a large item bank, is it feasible to set a test of the
same depth as the test already sat? Equally important should the teacher’s consideration over his
students be limitless? And shouldn’t candidates be expected to be proper managers of their
affairs as part of normal training? It boils down to letting the candidate face the consequences of
his absence.
Securing Test Scripts
Page 58 of 129
The embarrassment of the consequences of lost scripts presents a teacher with no mean
experience. The students face the danger of being awarded bogus marks, to say the least. If it is
an external annual examination, the candidate may have to waste a full year of his life. The fact
that test scripts should be locked safely away immediately. They should be closely bundled in
envelopes during any movement by the teacher or the marker. After marking, the same
precaution should prevail.
Marking Test Scripts
In zonal, regional and national examinations, it is necessary to use teams of markers. In order to
ensure consistency of marking, the markers. In order to ensure consistency of marking, the
markers are trained on a common set of scripts and then mark scripts as a group so that
unexpected responses can be discussed and added to the marking scheme. Some scripts are re-
marked by the team leader in order to make a check on consistency.
A good test can be spoiled by poor marking. Indeed, marking counts towards test objectivity,
and therefore everything must be done to ensure adherence to the points explained below. To
start with, a marking scheme should be made. In fact, the marking scheme should be made as
the test items are being constructed. Most important, marking schemes are necessary even if
there is only the subject teacher to do the marking.
Test papers should be marked as soon as possible after the test has been administered. This
ensures that issues are still fresh in the teacher’s mind and once marking has started, the teacher
should proceed uninterrupted to the end of a whole set of test items. This suggestion is
particularly required of essay items where a teacher needs to maintain consistency in interpreting
scorable points.
Before the actual marking starts, a sufficient conducive marking atmosphere needs to be
established. The marking atmosphere should be free of physical distractions such as noise and
human traffic. Equally, the teacher should be emotionally ready.
Page 59 of 129
It is good practice to first read through a sample of scripts. This gives the teacher a general feel
of the task ahead. It has sometimes been found necessary to alter a marking scheme as a result of
such sampling.
Notes should be made of all common mistakes observed in test scripts. But on actual scripts, the
errors should not be corrected. Instead, specific but consistent symbols should be used to
identify points of errors. In fact such symbols are used only where a mark has been earned or
failed to be earned. In essays, a candidate’s words do not have to match words of the teacher’s
marking scheme. Rather it is parity of the ideas a teacher should seek to see if they match his
own.
When marking a teacher’s personal image of the candidate should not be allowed to crowd the
teacher’s marking of the student’s scripts, nor should handwriting and flattery language
influence.
Recording of scores into a respective ledger should come last in the exercise. If possible, this
ledger should be stored separately from the scripts. This is for reasons of possible loss of one lot
or the other.
Awarding Grades
The awarding of letter grades A, B, C, D, E and F for achievement is a social-cultural practice of
a long time now, so are the implications of the respective grades, for example, A for success and
E (or F) for failure in a given endeavor.
In the past, intuition, experience and tradition have been the basis of awarding the particular
letter grade. Later, it became necessary to assign marks to assist in guiding towards the grade
awards. Attaining a certain number of marks would earn a particular grade, and so on. Later
still, closer scrutiny became necessary so as to apply more scientific analyses at least on
educationally sound basis to judge which candidate gets which grade.
Page 60 of 129
2. Grading on fixed pass marks (a criterion- referenced evaluation basis).
A third practice is called grading on a standard. It is preferred for certain courses only.
Grading on the curve involves dividing test scores into five groups of different sizes in such a
way that the corresponding letter grades A, B, C, D and E (or F) lead to a frequency distribution
which approximates the shape of the normal curve. In such a distribution, two smallest groups,
A and E (or F), occur at the extreme ends. The next two small groups B and D occur on either
side of the largest group C. But what will be the percentage of each group?
It will be recalled that stanines 1 to 9 represent, respectively, the percentages 4,7, 12, 17, 17, 12,
7 and 4 of the total distribution, thus.
Stanine 1 2 3 4 5 6 7 8 9
Equivalent % 4 7 12 17 20 17 12 7 4
By a method of interpolation, the percentage 6, 12 and 17 can be broken up, to lead to the
suggested percentage weightings for the letter grades, as follows:
Grade E D C B A
Equivalent % 0 – 10 24 40 24 6
Strict adherence to the curve has its demerits, however for even if a student performs quite well,
he/she may still fail the test if his/her peers do better. Secondly, the grade E (or F) suggests that
the student acquired no basic skills. Thirdly, not many class groups are normally distributed on
aptitude, or in large enough numbers to justify statistical data that can lead to a near normal
distribution.
Page 61 of 129
Putting these cautionary points before us, we are led by reason, intuition and experience to adopt
a modified system of distribution the letter grades. It is more positive in attitude, and it provides
for flexibility, depending on subject classification and also on the decisions of a school or region.
It may be expressed as shown in the following table.
Grade E D C B A
Equivalent % 0 - 10 10 –20 30 - 50 20 - 50 10 - 20
range
As a practical exercise, let us adopt the percentages 5, 15, 45, 25 and 10 to be the respective
weights of the grades E (or F) and A. Let us also consider the following table of marks of 35
students from a biology test. The task is to identify which marks score the grade A, B, C, D and
E (or F) as the case may be.
The first step is to calculate how many of the 35 scores fit into each letter grade. After
manipulations at percentage approximations, we have the table.
Grade A B C D E Total
Equivalent % 10 25 45 15 5 100
No. of Scorers 3 9 16 5 2 35
The following problems are associated with the grading system based on the normal curve.
1. It assumes that we are dealing with large enough data to lead to a normal distribution.
Page 62 of 129
2. It assumes that the students in the test group are normally distributed on the
characteristic of aptitude.
3. It condemns a percentage of candidates to sure failure. Conversely, it implies that the
passing or failing of candidates does not depend on the strength of the percentage of the
teacher'’ performance but is the reflection of their relative position as against their peers.
4. Finally, grading on the curve assumes competence o the machinery of item construction,
prior analysis of items in the item bank, syllabus coverage, and the objectivity of the
marking itself.
Grading on Standard Scale
Grading on this criterion is otherwise called the system of grading on fixed pass marks.
Essentially, students are judged on the basis of mastery of the specific course contents. On this
basis, the student’s peers are not an issue. Indeed, a large number of students can legitimately
score high grades and vice versa.
The following table shows an example of the fixed mark level utilized in many institutions and
regions.
The grades A,B,C,D, and E are the scales of the standard which has been designed, hence the
description “standard scale” grading.
As should be expected, this system calls for strict adherence to the statement of instructional
objectives in behavioral terms. In turn, standards of achievement should be specific.
Page 63 of 129
Theoretically, no adjustments should be made to the grade distributions if the criteria of
instructional objectives, syllabus coverage, item difficulty and discrimination, test administration
and marking procedures were strictly followed.
Another form of criterion – referenced grading is the one calling for a candidate to pass or,
otherwise fail the test. The candidate who passes is judged, ready and able to proceed to the next
level of learning. Many courses demand this mode of grading, such as practical on-the-road
driving test, surgical technique of removing the appendix, meeting Olympic qualifying height in
the high-jump event, the technique of defusing a time-bomb devise, and so on. The issue of
“average” performance, for instance, does not arise in these critical issues. The standard of
performance must be mastered.
The teacher is put in a demanding situation when he/she has to grade on a standard. He/she is
expected to thoroughly know his subject matter from the point of view of the facts, the principles
as well as thinner-relationships between the course at hand and he course that follows it.
Secondly, the teacher should have a record of past performances of his/her previous student
groups at that level and the performance of these student groups at the next level of study or
occupation. This calls for follow-up inquiry to see whether the standard he/she himself set was
indeed “up to standard.” Equally important, the teacher should consult other professionals to
assist him/her in confirming the standard he/she is setting. An outside opinion is always healthy.
But the outside opinion must best be assisted by what records the teacher has at hand as his/her
guide.
Page 64 of 129
Letter Grade Numerical Equivalent
A 4
B 3
C 2
D 1
F 0
Grade point refers to the student’s score on a course multiplied by the number of credit hours
assigned to that particular course. For instance, statistics 1 is allocated 3 hours per week per
semester, and a student gets “D”, his/her grade point for the course will be 2 multiplied by 3
equals 6.
Grade point average (GPA) refers to the sum of the student’s grade points on various courses
units divided by the total number of the student’s semester – hours.
Example: A third year student obtained grades on four courses / units as follows:
Page 65 of 129
Teaching Practice A 4 6 24
Educational C 2 3 6
Assessment and
Education
Sociology of Education B 3 3 9
Comparative Education C 2 3 6
Total 10 15 45
Computation
GPA = ∑GP
∑SH
GPA = 45 =3
15
This is the student’s overall grade point at the end of his / program. This is the grade that
determines a student’s classification as either First Class, Second Class Upper, Second Class
Lower, or Pass.
The formula for computing the cumulative grade point average GPA is: -
GPA = ΣGP
ΣSH
Page 66 of 129
Where GP = Grade Points
SH = Semester Hours
Example
Below is a record of a student’s performance in a two year program.
GP 52 48 40 60
SH 18 22 18 20
GPA = 52+48+40+60
18+22+18+20
= 2.5
Note: For assessment to be meaningful, it must be reliable. For this reason, teachers
should ensure that tests are properly invigilated and marked.
Activities
1. Compute the grade point average of a university student with the following
scores. The semester credit hour is 3 hours for all the units.
Unit letter Grade Numeric eq.
Ed. assessment and evaluation A 4
History of education B 3
Philosophy of education A 4
Research methods A 4
Written Exercises
1. Identify the main steps in the administration of tests
2. Explain how emergency cases should be treated in an examination
Describe the procedure of marking scripts.
Page 67 of 129
LESSON FIVE
For example when assessing students’ attitudes towards mathematics, you should find out
whether they like or dislike mathematics. This is the affective component. You should also find
out their knowledge (belief regarding the nature of mathematics) and assumptions and
presumptions about mathematics and its relationship to other school subjects.
Finally, you should evaluate students’ preparedness to be involved in the processes of inquiry
and involvement in investigations. However, it should be noted that the components can only be
separated from each other theoretically.
Page 68 of 129
vegetation. But at the end of the lesson, it is very difficult to find out whether students’
attitudes have really changed.
Another problem is that of finding a sufficiently reliable criterion. Teachers find it difficult
to identify the things they should accept as evidence of the acquisition of desirable attitudes
and values.
The other problem is the teachers’ inability to construct valid and reliable assessment
instruments.
Methods of Assessing Attitudes and Values
Research has shown that it is not correct to infer a person’s attitude towards something from his
knowledge of the object. For example, a student can give correct answers to all questions on
conservation of natural resources, but that does not mean that the student has a positive attitude
towards natural resources.
Before you start teaching attitudes, values and designing assessment techniques, you should
identify the attitudes and values you want your students to learn, e.g. co-operation, honesty and
responsibility. After you have identified the attitudes and values, you should carefully analyze
the concept and clearly specify indicators of the presence of the attitude or value. This will make
it possible for you to measure it. For example, if you want to find out whether a student is
developing the attitude of cooperation, the following are some of the indicators you should look
for:
Willingness to share materials
Liking for group work and
Willingness to help other students solve problems.
Page 69 of 129
5) oral interviews
6) observation
7) group discussion
Attitude and Value Scales
An attitude and value scale consists of a set of statements or questions such as “I think every
student should learn AIDS education”. The student, parent, teacher is asked to respond to such a
question or statement in terms of personal preferences or beliefs. Unlike tests, attitudes don not
have correct or wrong answers. Attitude scales assume that subjective attitudes of people can be
measured in quantitative techniques. The responses of individuals are assigned numerical scores.
Two most commonly used types of attitude scales are described below: Likert Scale and
Semantic Differential
Likert Scale:
This type of scale is normally used to assess students’ attitudes. Under this method, statements
which reflect both positive and negative attitudes towards an object are stated. Students are then
asked to indicated their level of agreement with each statement by marking one of the following
categories.
Strongly agree : 5
Agree : 4
Undecided : 3
Disagree : 2
Strongly disagree: 1
For example, a Likert Scale to assess students’ attitudes towards people with HIV would like
this:
Directions: Indicate whether you strongly agree (SA), agree (A), are undecided (U), Disagree
(D), or Strongly Disagree (SD).
S A U D SD
A
1. People with AIDS have themselves to blame.
2. Children with the HIV virus should not be allowed in
schools
3. I would not allow my children to play with HIV positive
Page 70 of 129
children
4. A spouse has a right to leave the infected partner
Weight of 1,2,3,4, and 5 are assigned. The direction of weighting is determined by the
favorableness or un-favourableness of the item. The attitude scale is scored by assigning weights
for response alternatives as follows: strongly disagree 1, agree 2, undecided 3, disagree 4,
strongly disagree 5. For positive items, the order is reversed.
Semantic Differential
The semantic differential is another method of assessing attitudes toward a target object. The
semantic differential scale measures how an individual judges a particular concept on a set of
semantic scales. The approach uses a six point scale anchored by adjective opposites. For
example, after you have taught about polygamy and wish to determine the attitude students have
developed towards polygamy, you may want to use the following scale;
Polygamy is:
Desirable 6 5 4 3 2 1 Undesirable
Fashionabl 6 5 4 3 2 1 Outdated
e
Acceptable 6 5 4 3 2 1 Unacceptabl
e
Page 71 of 129
Rating Scales
Rating scales are used to provide the means of obtaining quantified databases on observed
behavior or characteristics of individuals. They consist of a set of descriptive words or
statements. They are useful for behaviors that cannot be quantified by counting procedures. For
example mastery of the content being taught. The evaluator observes the lesson and later
quantifies these attributes using for example a scale of 1-5 where 5 is the highest and vice versa.
Questionnaires
Questionnaires are also used for determining interest and attitudes of people. Direct questions
are asked and the responses are normally presented in a yes – no format.
Note: Attitudes are very important and teachers should make attempts to develop the right
attitudes in their students.
Activities
1. Construct an attitude scale to assess changes in a selected value or attitude.
2. Construct an instrument showing value towards abortion
Written Exercises
1. What are the main problems of assessing attitudes?
2. Distinguish between a Likert scale and a Semantic Differential
3. Explain the three components of an attitude.
LESSON SIX
Page 72 of 129
(1) Describe the main features of a good assessment of practical skills.
(2) Explain the methods of assessing practical skills.
Importance of Assessing Skills
The development of useful skills is one of the main functions of both formal and informal
education. Many education systems place a lot of stress on the development of practical skills.
For example, in Kenya, practical skills have been implemented as follows:
(1) At the Primary School Level
Agriculture, Home Science, Art Education, Craft Education, Music, Business Education and
Science.
(2) At the Secondary School Level
Industrial Education, Agriculture, Home Science, Music, Art and Design, Business Education
and Science.
(3) At the Post School Level
Artisan Courses, Craft Courses, Diploma Courses and Higher Diploma Courses
The following are examples of skills that are developed at the primary school level: making
instruments, playing musical instruments, asking questions, observing, recording, collecting
information, collecting objects, drawing, communicating, writing, reading, reporting, discussing,
classifying.
Evaluation of the primary school curriculum in Kenya has shown that most teachers us paper and
pencil tests to assess practical skills. This is because of three main reasons.
1. Learning outcomes in the psychomotor domain are difficult to assess.
2. Since the Kenya National Examinations Council uses multiple choice items to assess
practical subjects, some teachers feel that this is the method of assessing practical skills.
3. Lack of equipment and materials for teaching practical skills subjects has left teachers with
no alternative but teach these subjects theoretically. Consequently, primary school teachers
spend very little time developing practical skills.
What to Consider when assessing skills
In assessing skills, the teacher must consider the following:
1. The objectives of the course.
2. Skill assessment should be continuous throughout the teaching-learning process.
Page 73 of 129
3. It should be economical in materials and time.
4. It should test important skills (be valid).
5. The marks gained by each student should be reliable.
6. Both the process and product should be assessed.
Unfortunately, may teachers assess practical skills only through paper and pencil tests. Although
such tests may provide useful information about the student’s knowledge concerning the skills
being tested, they do not clearly tell us whether the student can exhibit the skills being tested. It
is better to assess skills by assessing the performance of the students. Some ideas are given in
this section.
Methods of Assessment
There are several techniques that can be used to assess skills. Some useful assessment
techniques are:
1. Observation of students.
2. Observation and judging the quality of the product resulting from students’ projects.
3. Performance tests.
Observation of students
One way of obtaining information about the progress students are making in the acquisition of
skills is to observe students’ behavior as it occurs. The main advantage of this method is its
directness. The teacher does not have to ask students questions about the skills they have
acquired. What he/she does is watch them do and say things. This method is very useful at the
primary school level because most children have no vocabulary for describing skills and may not
be willing to express themselves verbally.
When using observation methods, three major considerations should be dealt with. The first and
most significant consideration concerns a decision with respect to what should be observed. For
example, an English language teacher may decide to determine students’ mastery of the English
language. Some of the indicators of this are:
i) Correct use of tenses,
ii) Correct pronunciation of words, and
iii) Appropriate use of vocabulary.
Page 74 of 129
The second major consideration concerns the timing and recording of observations. Since it is
not possible to make continuous observations, the teacher must make a decision about when to
observe students. One acceptable approach to this problem is to keep a brief anecdotal record
based on the skills you want children to learn. When a student displays the behavior, you should
make a tally in the appropriate row. However, only one tally should be made in the same row
during one lesson. Below is an example of a record card for teacher’s observation of standard
seven pupils.
School ……………………………………………… Name of Teacher
………………………
Name of Pupil ………………………………………
Age …………………………………………………
Class ………………………………………………
Number of lessons …………………………………
Page 75 of 129
…. …
Recording Instructions:
1. Make a tally in the appropriate row each time you observe a new display of the behavior.
Only one tally may be made in the same row during one lesson.
2. After 4 tallies in the same row put the fifth tally across thus, //// //// ////.
Checklists
Checklists are a way of improving other forms of assessment, especially observations. They are
not a method of assessment as such. They are aimed at providing evidence of the presence of
desired amounts of an attitude of determining the extent to which students are using certain skills
to solve problems. You should use one record form for each child. At the end of the course, you
will be able to give a fair and quick summary of the skills possessed by the student. Checklists
can be used to give advice to the pupil.
To make the method reliable, the following suggestions are made:
i) The teacher should break the trait or skill to be assessed into smaller more clearly defined
units, such as; appears to like group work, cooperates with other students, obey rules and
laws, etc.
ii) Students should be observed on several different occasions.
Example 2: A Checklist for the Topic Clothing & Textiles
SCHOOL …………………………………………..
AGE ………………………………………………..
NAME OF CHILD …………………………………
NAME OF TEACHER …………………………….
CLASS ……………………………………………..
TOPIC ……………………………………………...
SUBJECT …………………………………………..
Page 76 of 129
NOT DONE DONE DONE
INCORRECTLY CORRECTLY
You can watch the student preparing the dress and put ticks in the right column for each part
done correctly. At the end of the test, you should add the number of ticks in the “done correctly”
column and give the score for the student out of 10. The pass standard will depend on the
difficulty of the test.
Example 3: A Checklist for Assessing Practical Skills in Science
SCHOOL __________________________________
AGE ______________________________________
NAME OF CHILD ___________________________
NAME OF TEACHER ________________________
CLASS _____________________________________
TOPIC _____________________________________
SUBJECT ___________________________________
SPECIFIC YES NO
BEHAVIOR
Page 77 of 129
Skillful with hands …………………………… ……………………………
Likes practical work … …
Handles tools and …………………………… ……………………………
instruments competently … …
Collects materials
Conducts experiments
…………………………… ……………………………
…. ….
…………………………… ……………………………
…. ….
In a number of courses, such as home science, agriculture, art and craft, and industrial education
students are asked to work on a project. This may involve the production of a dress, musical
instrument or table. Naturally, the students will be more motivated in their work if the projects
are assessed. However, assessment of projects is a difficult exercise because there are usually no
clear standards to follow. Some guidelines may help you:
Page 78 of 129
Demonstrate to the students each step of the project.
c) Your role should be that of tutor, continuous assessor and facilitator.
d) Care must be taken to ensure that work has actually been done by the students since it is
easy for a student to get help for a student to get help from various sources. For this
reason, only work actually under the supervision of the teacher should be accepted as a
measure of achievement.
e) Both the process and products should be assessed. In assessing pupils’ finished products,
teacher should consider the following:
(i) Craftsmanship: This includes the accuracy with which a student performs various
tasks. You should use physical measures and your own experience to judge the
quality of product. Questions that you should ask yourself in craftsmanship
include the following is the project neat? Is the project accurate?
(ii) Time: Students are normally given a specified time limit within which to produce
a finished product. It is therefore necessary for the teacher to find out the time it
took the student to complete the task.
One technique that has proved useful in assessing pupils’ final products is the use of a rating
scale.
Performance Tests
Performance tests are one of the most widely used forms of assessment of practical skills. They
are used primarily in practical skill subjects, such as art education, craft education, music, home
science and industrial education. Performance tests are administered to students to determine
extent to which they have mastered various elements of the skill being tested. This method of
assessment requires students to perform skills they have learned under conditions of the trade
concerned before one or more experts. In assessing practical skills using performance tests, the
following aspects are considered.
The quality of work: This is measured in terms of perfect-ness of the product, work finish or
appearance, accuracy and precision with which the student works.
Page 79 of 129
The degree of proficiency: This measures correct use of tools and equipment, time taken to
complete the work, ease and efficiency in handling tools and equipment and regard for safety
practices.
Procedure: This is the extent to which the student follows the detailed steps.
Assessment of Practical Skills in Science
Science subjects i.e. biology, physics and chemistry include objectives whose purpose is to
develop practical skills. Teachers are expected to periodically assess pupils’ practical work in
science. Assessment should be aimed at supporting the learning and teaching process. Feedback
from assessments should enable teachers to improve their teaching techniques and plan remedial
action for individual pupils.
The following practical skills or abilities are usually assessed in science subjects.
1. Manipulative skills – These include the ability to:
Assemble apparatus.
Handle chemicals and instruments.
Use apparatus.
2. Following instructions during practical work.
Their understanding of procedure.
Their ability to complete an investigation in accordance with laid down procedure.
3. Observation, identification, recording and interpretation.
Their ability to recognize, identify and interpret scientific material.
Their ability to make accurate recordings of data.
4. Analysis and interpretation of data: Pupils’ ability to analyze data using qualitative and
quantitative methods and their ability to interpret the data are assessed.
5. Presentation of the results: pupils are assessed in terms of their ability to write a report on
the basis of the data collected.
6. The design of the investigation: The pupils should show the ability to identify a problem,
formulate hypotheses, work out a design, test hypotheses and make generalizations.
You should prepare a checklist on the a basis of the six categories listed above and use it to
assess the work of each pupil, or if pupils are working in groups, each group.
Note: Although written tests are widely used to assess skills, paper and pencil test are not valid
for the measurement of skills.
Page 80 of 129
Activities
1. Prepare a checklist for assess five skills in Science.
2. Construct an observation schedule to assess five skills in science.
Written Exercises
1. Identify four things you should consider in assessing skills.
2. Briefly describe two methods of assessing practical skills.
Explain six categories of abilities that are assessed in science subjects.
LESSON SEVEN
PREPARATION OF MARKING SCHEME
Introduction
A marking scheme is a set of criteria used in assessing student learning.
Why prepare a marking scheme?
Preparing a marking scheme ahead of time will allow you to review your questions, to verify that
they are really testing the material you want to test, and to think about possible alternative
answers that might come up. This section discusses guidelines for preparing marking schemes as
well as moderation of tests
Objectives
At the end of this topic, you should be able to:
• Prepare marking schemes for different kinds of tests
• Explain the meaning and purpose of moderation
• Discuss how to moderate examinations/tests
Learning Activities
Learning Activity 1.1 Reading
Read the provided topic notes on marking preparation for different kinds of tests.
Learning Activity 1.2 Discussion
Take part in the group discussion on the preparation of marking schemes for tests and the
purpose of test moderation, the importance of preparation of a marking scheme on time.
Learning Activity 1.3 Review
Page 81 of 129
Read and comment on two of the posts in the discussion forum
Assessment
The activity in 1.2 and participation in the discussion in activity 1.3 will be graded.
Topic Resources
Lecturer notes
Topic Seven Notes
Guidelines When Making Marking Schemes
• Look at what others have done. Chances are that you are not the only person who
teaches this course. Look at how others choose to assign grades.
• Make a marking scheme usable by non-experts. Write a model answer and use this as
the basis for a marking scheme usable by non-experts. This ensures that your TAs and
your students can easily understand your marking scheme. It also allows you to have
an external examiner mark the response, if need be.
• Give consequential marks. Generally, marking schemes should not penalize the same
error repeatedly. If an error is made early but carried through the answer, you should
only penalize it once if the rest of the response is sound.
• Review the marking scheme after the exam. Once the exam has been written, read a
few answers and review your key. You may sometimes find that students have
interpreted your question in a way that is different from what you had intended.
Students may come up with excellent answers that may be slightly outside of what
was asked. Consider giving these students partial marks.
• When marking, make notes on exams. These notes should make it clear why you gave
a particular mark. If exams are returned to the students, your notes will help them
understand their mistakes and correct them. They will also help you should students
want to review their exam long after it has been given, or if they appeal their grade.
Sample Marking Scheme for Presentations
This is an example of a marking scheme for a presentation assignment
• Presentation (40%)
1. verbal & non-verbal communication (10%) - avoid note reading, eye contact, enthusiasm,
body language, volume of voice, clarity of language
2. Visual aids (10%) - minimal text, appealing layout, distraction-free, large font size
3. Structure (10%) - clarity of goals, organization, logical progression, good flow
4. Discussion questions (10%) - ability to guide discussion: ability to ask and answer
questions, maintain order
Page 82 of 129
• Content (60%)
1. Introduction (10%) - introduce background material
2. Thesis statement (5%) - clear, focused
3. Main body (35%) - depth, synthesis of references, accuracy, summary, figures or tables
4. Discussion questions (10%) - thought-provoking, sufficient quantity
Page 83 of 129
• Scan through the responses and look for major discrepancies in the answers -- this
might indicate that the question was not clear.
• If there are multiple questions, score Question #1 for all students, then Question #2,
etc. Use a scoring rubric that provides specific areas of feedback for the students
What is Moderation?
Moderation is a set of processes designed and implemented by the assessors/evaluators to:
• Provide system-wide comparability of grades and scores derived from internal-based
assessment
• Form the basis for valid and reliable assessment in schools
• Maintain the quality of assessment and the credibility, validity and acceptability of
certificates.
• Qualitative moderation - Unit Grades from student assessment are moderated by peer
review against system criteria;
• Statistical moderation - Scores from student assessment within courses are placed on
the same scale
• Moderation is necessary for producing valid, credible and publicly acceptable
certificates in an assessment system.
• Moderation provides for comparability of standards across the classes and schools.
How to Moderate…
Let’s say our exam has a mean of 70 and a standard deviation of 10. The students have done
fairly well here. If I want to compare the scores in this exam with another exam with a mean of
50 and standard deviation of 20, it’s possible to scale that in a very simple way. We reduce the
mean from the marks. We divide by the standard deviation. Then multiply by the new standard
deviation. And add back the new mean.
If the first column has the marks in a school internal exam, and the second in a public exam, we
can scale the internal scores to be in line with the public exam scores for them to be comparable.
The internal exam has a higher average, which means that it was easier, and a lower spread,
which means that most of the students answered similarly. When scaling it to the public exam,
students who performed well in the internal exam would continue to perform well after scaling.
But students with an average performance would have their scores pulled down.
This is because the internal exam is an easy one, and in order to make it comparable, we’re
stretching their marks to the same range. As a result, the good performers would continue getting
a top score. But poor performers who’ve gotten a better score than they would have in a public
exam lose out.
Page 84 of 129
TEST ITEM ANALYSIS
Introduction
After you create your objective assessment items and give your test, how can you be sure that the
items are appropriate -- not too difficult and not too easy? How will you know if the test
effectively differentiates between students who do well on the overall test and those who do not?
An item analysis is a valuable, yet relatively easy, procedure that teachers can use to answer both
of these questions.
Objectives
At the end of this topic, you should be able to:
1. Discuss how to determine the Difficulty index of a test
2. Discuss how to determine the Discrimination index of a test
Learning Activities
Learning Activity 1.1 Reading
Read the provided topic notes on how to determine Difficulty Index and Discrimination Index.
Learning Activity 1.2 Discussion
Participate in the discussion on the determination of the difficulty and discrimination index.
Learning Activity 1.3 Review
Read and comment on two of the posts in the discussion forum
Assessment
The activity 1.2 and participation in the discussion in activity 1.3 will be graded
Topic Resources
• Lecture notes
• Internet
• Exercises
Page 85 of 129
1. Difficulty Index
To determine the difficulty level of test items, a measure called the Difficulty Index is used. This
measure asks teachers to calculate the proportion of students who answered the test item
accurately. By looking at each alternative (for multiple choice), we can also find out if there are
answer choices that should be replaced. For example, let's say you gave a multiple-choice quiz
and there were four answer choices (A, B, C, and D). The following table illustrates how many
students selected each answer choice for Question #1 and #2.
Question A B C D
#1 0 3 24* 3
#2 12* 13 3 2
Page 86 of 129
Student Total Questions
Score (%) 1 2 3
Asif 90 1 0 1
Sam 90 1 0 1
Jill 80 0 0 1
Charlie 80 1 0 1
Sonya 70 1 0 1
Ruben 60 1 0 0
Clay 60 1 0 1
Kelley 50 1 1 0
Justin 50 1 1 0
Tonya 40 0 1 0
"1" indicates the answer was correct; "0" indicates it was incorrect.
Follow these steps to determine the Difficulty Index and the Discrimination Index.
After the students are arranged with the highest overall scores at the top, count the
number of students in the upper and lower group who got each item correct. For Question #1,
there were 4 students in the top half who got it correct, and 4 students in the bottom half.
Determine the Difficulty Index by dividing the number who got it correct by the total
number of students. For Question #1, this would be 8/10 or p=.80.
Determine the Discrimination Index by subtracting the number of students in the lower
group who got the item correct from the number of students in the upper group who got the item
correct. Then, divide by the number of students in each group (in this case, there are five in each
group). For Question #1, that means you would subtract 4 from 4, and divide by 5, which results
in a Discrimination Index of 0.
The answers for Questions 1-3 are provided in Table 2.
Page 87 of 129
Item # Correct (Upper # Correct (Lower Difficulty
Discrimination group) group) (p) (D)
Question 4 4 .80 0
1
Question 0 3 .30 -0.6
2
Question 5 1 .60 0.8
3
Now that we have the table filled in, what does it mean? We can see that Question #2 had a
Difficulty index of .30 (meaning it was quite difficult), and it also had a negative discrimination
Index of -0.6 (meaning that the low-performing students were more likely to get this item
correct). This question should be carefully analyzed, and probably deleted or changed. Our
"best" overall question is Question 3, which had a moderate difficulty level (.60), and
discriminated extremely well (0.8).
Another consideration for an item analysis is the cognitive level that is being assessed. For
example, you might categorize the questions based on Bloom's taxonomy (perhaps grouping
questions that address Level I and those that address Level II). In this manner, you would be able
to determine if the difficulty index and discrimination index of those groups of questions are
appropriate. For example, you might note that the majority of the questions that demand higher
levels of thinking skills are too difficult or do not discriminate well. You could then concentrate
on improving those questions and focus your instructional strategies on higher-level skills.
Page 88 of 129
LESSON EIGHT
RESULTS ANALYSIS AND PRESENTATION
Introduction to Statistics
Introduction
A Statistic is a mathematical value that summarizes a characteristic of a sample. A statistic is a
summarizing measure that is calculated from a sample data.
Statistics is the mathematical science concerned with gathering, processing, and analyzing
numerical data in any field. Statistics can also be defined as the study of methods of handling
quantitative information including techniques for organizing and summarizing as well as for
making generalizations and inferences from data.
In general, statistical methods can be grouped into two broad classes
Descriptive statistics and
Inferential statistics
Objectives
By the end of the topic, you should be able to:
1) Definition of terms
2) Differentiate between descriptive and inferential statistics
3) Discuss Frequency distribution
4) Explain the presentation of raw scores and drawing of distribution curves
Learning Activities
Learning Activity1.1 Reading
Read the provided topic notes on statistics, types of data, you have also been given access to
lecturer notes.
Learning Activity 1.2: Discussion
Statistics is divided into two broad categories. Participate in discussion of the different two types.
Learning Activity 1.3 Review
Read and comment on two of the posts in the discussion forum
Assessment
The journal in activity 1.2 and participation in the discussion in activity 1.3 will be graded.
Topic Resources
Lecturer notes.
Page 89 of 129
Topic Nine Notes
Types of statistics
1. Descriptive Statistics
Descriptive statistics refers to procedures for organizing, summarizing and describing
quantitative information or data.
The aim of descriptive statistics is the reduction of the quantitative data to a form that can be
readily comprehended.
Examples – Descriptive Statistics
Mean
Standard deviation, and
Correlation coefficient
In descriptive statistics, no attempt is made to generalize beyond the data at hand. Descriptive
statistics include measures of central tendency, dispersion, and correlation. Descriptive statistics
deal with describing and examining associations of variables within given data sets.
2 Inferential Statistics
Inferential Statistics is concerned with the methods by which inferences are made to a
larger group on the basis of observation made on a smaller subgroup. In other words, it is
concerned with the process of generalization from the part to the whole (i.e., from the
sample to the population).
In inferential statistics, conditional inferences are made about populations from statistics
of samples by means of logic and probability theory. E.g., Is there a statistically
significant difference between form four boys and girls in mathematics achievement?
Suppose after administering a sound mathematics test for some selected form four boys
and girls, the mean for girls was 52% and the mean for boys was 60%. Is this difference a
result of chance differences or is it of statistical significance? Such a question is answered
by inferential statistics.
Generally, the aim of inferential statistics is that of generalizing beyond the data at hand.
Inferential statistics help us to draw conclusions beyond our immediate samples and data.
For example, inferential statistics could be used to infer, from a relatively small sample of
employees, what the job satisfaction is likely to be for a company’s entire work force.
In other words, inferential statistics help us to draw general conclusions about the
population on the basis of the findings identified in a sample.
The most widely used inferential statistical procedures; include the t-test, analysis of
variance (ANOVA), chi-square, and regression.
Scales of Measurement
Page 90 of 129
A measurement scale can have three attributes
Magnitude
Equal Intervals
Absolute zero point
When a scale has magnitude, one instance of the attribute can be judged greater than, less than,
or equal to another instance of the attribute. E.g., If person A is 160cm tall and person B is
162cm tall, this scale of measurement reflects the difference in height between person A and
person B. Person B is taller than person A.
Equal interval denotes that the magnitude of the attribute represented by a unit of measurement
on the scale is equal regardless of where on the scale the unit falls. E.g., The difference in height
between someone measuring 61 inches versus someone measuring 60 inches is the same
magnitude as the difference in height that exists between someone measuring 75 inches versus
someone measuring 74 inches. In short, an inch reflects a certain amount of height regardless of
where that inch falls on the scale.
An absolute zero point is a value that indicates that nothing at all of the attributes being
measured exists. E.g., “0 inches” of height is a scale value that implies no height whatsoever –
absolute zero.
Levels/Scales of Measurement
Nominal Scale
Ordinal Scale
Interval Scale
Ratio Scale
Nominal Scale
A nominal scale refers to the classification of items into discrete groups which do not bear any
magnitude relationships to one another.
A nominal scale is the most limited (or least powerful) types of measurement scale. It
indicates no order or distance relationship.
A nominal scale simply describes differences between things by assigning them to
categories.
Nominal scale is simply a system of assigning number symbols to events in order to label
them. E.g. The assignment of numbers to basketball players is almost purely arbitrary
since one number would do as well as another.
The numbers have no quantitative value. One cannot do much with the numbers
involved. For example, one cannot usefully average the numbers on the back of a group
of football players and come up with a meaningful value.
Nominal data are thus, counted data.
Distinguishing Characteristics of Nominal Measurement Scales and Data
• Used only to qualitatively classify or categorize not to quantify.
Page 91 of 129
• No absolute zero point.
• Cannot be ordered in a quantitative sequence.
• Impossible to use to conduct standard mathematical operations.
• Examples include gender, religious and political affiliation, and marital status.
. Purely descriptive and cannot be manipulated mathematically.
Ordinal Scale
An ordinal scale reflects only magnitude and does not possess the attributes of equal
intervals or an absolute zero point.
It is the lowest level of the ordered scale.
The ordinal scale places events in order on some continuum; It can be said that one class
is higher than another.
There is no attempt to make the intervals of the scale equal in terms of some rule.
Rank orders represent ordinal scales.
Ordinal measures have no absolute values, and the real differences between adjacent
ranks may not be equal.
All that can be said is that one person is higher or lower on the scale than another but
more precise comparisons cannot be made. One has to be very careful in making
statements about scores based on ordinal scales. E.g., If student’s A’s position in her class
is 5 and students B’s position is 20, it cannot be said that A’s position is four times as
good as that of B. The statement would make no sense at all.
Thus, the use an ordinal scale implies a statement of “greater than” or “less than” (an
equality statement is also acceptable) without our being able to state how much greater or
less.
The real difference between ranks 1 and 2 may be more than or less than the difference
between ranks 5 and 6.
Since the numbers of this scale have a rank meaning, the appropriate measure of central
tendency is the “median.”
Page 92 of 129
Interval Scale
An interval scale possesses the attributes of magnitude and equal intervals but not an
absolute zero point.
Interval scales can have an arbitrary zero, but it is not possible to determine for them
what may be called an absolute zero or the unique origin.
The primary limitation of the interval scale is the lack of a true zero; it does not have the
capacity to measure the complete absence of a trait or characteristic.
The Fahrenheit scale is the most common example of an interval scale.
One can say that an increase in temperature from 300 to 400 involves the same increase
in temperature as an increase from 600 to 700.
But one cannot say that the temperature of 600 is twice as warm as the temperature of
300 because both numbers are dependent on the fact that the zero on the scale is set
arbitrarily at the temperature of the freezing point of water.
The ratio of the two temperatures, 300 and 600 means nothing because zero is an
arbitrary point.
Ratio statements cannot be made without an absolute zero point.
Interval scales provide more powerful measurement than ordinal scales for interval scale
also incorporates the concept of equality of intervals.
Mean is the appropriate measure of central tendency while standard deviation is the most
widely used measure of dispersion.
Ratio Scale
Any scale of measurement possessing magnitude, equal intervals, and an absolute zero
point is called a ratio scale.
Page 93 of 129
This scale is termed “ratio” because the collection of properties that it possesses allows
ratio statements to be made about the attribute being measured. E.g., If a man is 28 years
old and an adolescent is 14 years old, it is correct to infer that the man is twice as old as
the adolescent.
Such ratio statements may be made only if the scale possesses all three characteristics.
Ratio scales have meaningful, absolute zero points; zero actually means exactly nothing
of the quantity being measured. E.g., The zero point on a centimeter scale indicates the
complete absence of length or height. But an absolute zero of temperature is theoretically
an obtainable and it remains a concept existing only in the scientist’s mind.
The number of incorrect letters in a page of type script represents a score on a ratio scale.
The scale has an absolute zero.
The ratio involved does have significance and facilitates a kind of comparison which is
not possible in case of an interval scale.
Measures of physical dimensions such as weight, height, distance, etc. are examples of
ratio scale.
Distinguishing Characteristics of Ratio Measurement Scales and Data
• Identical to the interval scale, except that they have an absolute zero point.
• Unlike with interval scale data, all mathematical operations are
• Examples include height, weight, and time.
• Highest level of measurement.
• Allow for the use of sophisticated statistical techniques.
Proceeding from the nominal scale (the least precise type of scale) to ratio scale (the most
precise), relevant information is obtained increasingly. Generally speaking, the levels are
hierarchical.
LESSON NINE
MEASURES OF CENTRAL TENDENCY
Mean
The mean is the most widely used and preferred measure of central tendency. The mean is
defined as the sum of all data scores divided by the number of scores in the distribution. Mean is
arithmetic average of scores.
Page 94 of 129
N
Example:
Scores = 5, 2, 4, 3, 8, 6.
Mean = (5+2+4+3+8+6)/6 = 4.67
For grouped data the variables are the midpoints of the class intervals. Given variables x, and
frequency f, therefore;
Mean, x = fx
f
The mean is the best known and the most reliable measure of central tendency. Hence it is
often preferred to both the median and the mode. Every score in the distribution is considered in
its computation. This is not done when calculating the mode and the median.
Thus mean is simply found by adding all the scores in a distribution and dividing by the total
number of scores (N). It is denoted by X̄ pronounced X bar.
The formula is:
n
∑ Xi
i=1
X̄ = N
Where X̄ = mean
X i is the raw score for each individual i.e. the ith person’s score
N is the number of scores.
n
∑
i=1
is the summation sign indicating we are summing from the first score to
th
the n score i.e. all the X-scores in the distribution are added.
Example
Page 95 of 129
Therefore, the mean is 54/9 = 6.
Class interval
Frequency (
f i) Class-mark (
xi f i xi
)
65-69 3 67 201
60-64 4 62 248
55-59 8 57 456
50-54 10 52 520
45-49 9 47 423
40-44 3 42 126
35-39 4 37 148
30-34 1 32 32
n n
∑ fi ∑ f i xi
i=1 =42 i=1 = 2154
m
n
∑ f i xi
i=1
n
2154
X̄ =
∑ fi
i=1 =42 = 51.3
1. One important property of the mean is that, it is the point in a distribution of scores such
that the summed deviations of scores from it (the mean) are equal to zero. What do
we mean by deviation? Deviation is the difference between a score and the mean,
X i − X̄ , and it can be either positive or negative. In any distribution the sum of
deviations about the mean is always equal to zero.
Page 96 of 129
N
∑ ( X i −μ)
i.e. i=1 = 0 where μ is the population mean for X-scores and population
has
N the subjects
n
∑ ( X i − X̄ )
i=1 = 0 where X̄ is the sample mean and sample is of size n.
For illustration, let us consider the following scores. Suppose our scores are 3, 3, 4, 5, 6, 6,
8, 9 and 10 (note this can be considered as a population or a sample without any change of
the results). The mean will be 6 and the deviation scores will be 3-6, 3-6, 4-6, 5-6, 6-6, 6-6,
8-6, 9-6 and 10-6, in general i
X − X̄
. These deviations will be respectively -3, -3, -2, -1, 0,
0, 2, 3 and 4 (note their sum is zero). Thus, the mean may be considered as the exact
balance point in a distribution.
2. If we add a constant, say C to every score in the distribution, the resulting scores
X̄ X̄
will have a mean X +C equal to the original mean X plus the constant C. If
we subtract a constant instead, the resulting scores will be a mean equal to
original mean minus the constant. Note that subtracting a constant C is the same
as adding –C. Hence the first formula is adequate or it includes even the second
formula.
i.e.
X̄ X +C = X̄ X + C
X̄ X −C = X̄ X - C
Let us illustrate this with data below, of which the mean is 5, but let us add 3 to every score
and calculate the new mean:
Xi Xi+ C
3 3+3 = 6
4 4+3 = 7
5 5+3 = 8
8 8+3 = 11
n n
∑ Xi ∑ X i +C
i=1 = 20 i=1 = 32
X̄ X =20/4 = 5 n
∑ ( X i +3)
X̄ i=1
Thus X = 5 X̄ X +C = 4 = 32/4 = 8
X̄ X +C = X̄ X + 3 or X̄ X + C
Page 97 of 129
3. If each score in a distribution of scores is multiplied by a constant C, then the mean of
X̄
these scores will be original mean multiplied by the constant i.e. C X . Let us illustrate
this property with the data that was used above. Let all the scores be multiplied by 2 (see
the table below).
Xi CX i
3 3¿ 2 = 6
4 4¿ 2 = 8
5 5¿ 2 = 10
8 8¿ 2 = 16
n n
∑ Xi ∑ CX i
i=1 = 20 i=1 = 40
X̄ X =20/4 = 5 n
∑ CXi
X̄ i=1
Thus X = 5 X̄ CX = 4 = 40/4 = 10
X̄ CX = 2 X̄ X or C X̄ X .
Note that the division is reciprocal of multiplication. Hence if we divide each score by
a constant C, the mean of the resulting scores is original mean
X̄ X divide by constant, C
1
× X̄ X
(or C ).
Advantages of mean
Page 98 of 129
It is least affected by fluctuations of sampling.
Disadvantages of mean
Records taken on a highway speeding motorists were as follows in kilometers per hour;
96, 96, 97, 99, 100, 101, 102, 104, 155.
Mode = 96km/hr.
Median = 100km/hr.
Mean = 105km/hr.
Median is the best representation of these score. Mode is low and mean is higher than all scores
except one (155). The mean pulled up towards 155 and median ignored it. Mean can be used by
management to settle disputes of salaries and mode can be used by the unionists because
majority of labour force earns lower salaries. There are other examples which can be sighted on
how the mean, median and mode can be used in favour of one another
Exercise
Frequency 6 8 7 6 8
Page 99 of 129
It is shown that some of the people were adult guides. Select a statistical method to describe
measurement of central tendency or average age. Explain why?
3. The monthly wages and salaries of a firm are grouped in the following.
Wages and salary
MEASURES OF VARIABILITY
The measures or indexes considered here are range, quartile deviation, mean deviation, variance
and standard deviation. Range is the simplest measure of variability (or dispersion). Standard
deviation is the most reliable measure of variability and standard deviation is the square root
of variance; that is, variance is standard deviation squared.
1. Compute range, quartile deviation, mean deviation, variance and standard deviation for
grouped and ungrouped data using computational and definitional formulae.
2. Give the properties of variance and standard deviation (s.d.) (e.g. when a constant is
added to all the scores of the distribution)
3. Compute variance and s.d. using assumed mean method.
4. Interpret computed s.d.
Measures of variability
A variable can be defined as a trait, which can take on a range of values. When we talk about
variation, we refer to the arrangement or spread of values that the variable takes in the
distribution. Measures of variation give us some information about the difference between
scores. While measures of central tendency give information about typical score in a
distribution, measures of variability provide information about the differences in spread
between scores in the distribution.
For example:
The spread of the variable makes a different on how you can compare one set to another. Two
sets can have equal mean and median yet there very different.
For example;
SET A: 59 59 59 60 61 61 61
SET B: 30 40 50 60 70 80 90
The mean = median = 60 yet Set A is different from Set B. Set A is close together yet, Set B is
more spread out. The dispersion of the two sets is different hence difference in variability. The
measures of variability or spread of variables give part of solution on how the set of data is
different.
What we are saying is that to describe a distribution, a measure of central tendency is not enough
(sufficient), even if you give the most reliable of them. May be the mean is necessary but not
sufficient. To describe any distribution of variables (e.g. scores) three elements are
important considerations:
(i) Measure of central tendency
(ii) Measure of variability, and
(iii) Shape of the distribution.
These three elements are necessary and sufficient to describe any distribution of variables of a
sample or population. Thus, in order to adequately describe a distribution of scores (variables),
we need, in addition to a measure of central tendency, a measure of variability of variables
besides the shape of the distribution. Information concerning variability is as important, if not
more important than information concerning central tendency.
Range
The range is defined as the difference between the highest score and the lowest score.
For example for the data or scores 3, 3, 4, 5, 6, 6, 8, 9, 10 the range is simply 10-3, which is 7.
Range is the difference between the lowest score and highest score.
Set A: 59, 59, 59, 60, 61, 61, 61; Range is 61-59=2
Set B: 30, 40, 50, 60, 70, 80, 90; Range is 90-30=60
Like the mode, range is not a very stable measure of variability and its main advantage is that it
gives a quick, rough estimate of variability
When dealing with grouped data, an estimate of the range can be computed by subtracting the
midpoint (class-mark) of the lowest interval from the class-mark of the highest interval.
The range is the simplest measure of variability. Hence it is not a stable measure of variability.
Thus, the range is only used as a quick reference to the spread of scores in a distribution.
Like the mode, range is not a very stable measure of variability and its main advantage is that it
gives a quick, rough estimate of variability.
Quartile Deviation
The quartile deviation as called “semi-interquartile range” is defined as half of the difference
between the 75th percentile (Q3) and the 25th percentile (Q1). Hence it is one half the scale
distances between the 75th and 25th percentiles in a frequency distribution.
To find the quartile deviation (Q), which includes the middle 50th percentile or Q2 (or median),
we fist locate the 75th percentile and the 25th percentile.
75th percentile is a point in a distribution such that a quarter of distribution is above it and
other three quarters below it. Similarly,
25th percentile has a quarter below it while the other three quarters above it,
while 50th percentile or median has half of the distribution above and below it.
Thus the formula for calculation of quartile deviation, Q, is:
Q 3 −Q 1 P 75−P25
Quartile deviation (Q) = 2 or 2
Example
Class interval
Frequency (
f i) Cumulative
frequency
65-69 3 42
60-64 4 39
55-59 8 35
50-54 10 27
45-49 9 17
40-44 3 8
35-39 4 5
30-34 1 1
n
∑ fi
i=1 =42
3×42
(−27)5
4
=54.5 + 8
=54.5 + 2.8
=57.3
N
(−Cf b )i
4
Q1 = P25 = L + fw
42
(
−8 )5
4
= 44.5 + 9
= 44.5+1.4
= 45.9
Q 3 −Q 1 P 75−P25
Hence Quartile deviation (Q) = 2 or 2
57 . 3−45. 9
= 2
11. 4
= 2
= 5.7
Note, the same procedure as used for median earlier has been used here to find the 75 th and 25th
percentiles or Q1 and Q3.
You will find that the two measures of dispersion (variability) discussed above i.e. range and
quartile deviation do not take into consideration explicitly the values of each and every one of
the raw scores in the distribution. To arrive at a more reliable indicator of the variability (or
spread or dispersion) in a distribution, one should consider the value of each individual score and
determine the amount by which each varies from the most expected value (mean) of the
distribution. Recall that the mean was identified as the most stable measure of central tendency.
Thus deriving an index based on how each score deviates from the mean score, one will have
considerable stable or very stable, index of variability in a distribution.
The deviation scores provide a good basis for measuring the spread of scores in a distribution.
However, we cannot use sum of these deviations in order to get an index of spread because this
sum in any distribution will be zero.
N n
∑ ( X i −μ) ∑ ( X i − X̄ )
i=1 = 0 or i=1 =0
We know why things are getting messed up to obtain zero in every distribution for this sum. The
reason is, because we are summing up negative and positive values, which happen to be exactly
the same. Consequently, there are possible ways of rectifying things so that our index shows or
measures the variability such that the value obtained indeed indicates the magnitude of the
variability, and not the same in all cases as in the present case.
Considering all the above, two ways are possible of getting the index of variability:
1. Considering the sum of the absolute deviations and dividing by total number of cases (n).
This is the absolute mean deviation or simply called mean deviation.
n
∑ |X i − X̄| n
i=1
∑|X i − X̄|
Mean deviation = n where i=1 is the sum of absolute deviations
(disregarding plus and minus signs).
Note that we divide by size of sample, n (or N when we consider population) to make our index
independent of the size of the sample (or population). Thus, as long as there is the same
Computing mean deviation is very easy. Suppose we have scores: 3, 4, 5, 5, 6, 8, 9, and 10.
Xi X i − X̄ |X i− X̄|
3 -3.25 3.25
4 -2.25 2.25
5 -1.25 1.25
5 -1.25 1.25
6 -0.25 0.25
8 1.75 1.75
9 2.75 2.75
10 3.75 3.75
n n
∑ Xi ∑|X i − X̄|
i=1 = 50 i=1 = 16.5
50
X̄ = 8 = 6.25
n
∑ ( X i − X̄ )
Note that i=1 =0
n
∑ |X i − X̄|
i=1
16 . 5
= 8
=2.0625
2. The second way is to square the actual deviations and add them together and then
divide by n (i.e. the total number of cases). We obtain an index known as variance.
n
∑ ( X i − X̄ )2
i=1
Variance = n
The variance is not in the unit of the scores and hence is not a measure of variability {recall
cm is unit for length while cm2 is unit for area]. In order to put the measure of variability into
the right perspective; that is, to return to our original unit of measurement, we take the
square root of the variance. The square root of the variance is standard deviation, which is
a measure of variability and the most reliable measure of variability.
Standard deviation (S) is:
√
n
∑ ( X i − X̄ )2
i=1
S= n i.e. S = √ (variance)
Observe that standard deviation and mean deviation have the same units. Standard deviation
has better properties than mean deviation hence more reliable or stable.
n ( ∑ X i )2
∑ X 2i − i=1
n
i =1
S2x=
n Subscript x on S2 is just emphasizing or indicating we are looking for
the variance for X-variable (scores).
The computational formula for variance has a great advantage over the definitional formula
n
∑ ( X i − X̄ )2
i=1
n since in the former formula the raw scores are used directly without first
resorting to determination of the mean. The computational formula is also easy to use when
some scores in a distribution are fractional i.e. not whole numbers as already indicated.
2
Computation of variance ( S x ) and standard deviation ( S x )
We shall use the scores 3, 4, 5, 5, 6, 8, 9and 10 to compute variance and the standard deviation
using both the definitional and computational formulae. You will notice that the computational
formula will be easier to use than the definitional formula.
Xi X i − X̄ ( X i − X̄ )2
3 -3.25 10.5625
4 -2.25 5.0625
5 -1.25 1.5625
5 -1.25 1.5625
6 -0.25 0.0625
8 1.75 3.0625
9 2.75 7.5625
n
∑ ( X i − X̄ )2
i=1
2
Variance, S x = n
43 . 5
= 8
= 5.4375
Standard deviation,
S x =√ 5.4375
= 2.33
Xi X i2
3 9
4 16
5 25
5 25
6 36
8 64
9 81
10 100
Sum 50 356
n n
∑ Xi ∑ X 2i
Thus i=1 = 50 i=1 = 356
n ( ∑ X i )2
∑ X 2i − i=1
n
i =1
S2x=
Variance, n
50×50
356−
8
= 8
356−312. 5
= 8
43 . 5
= 8
= 5.4375
Standard deviation,
S x =√ 5.4375
= 2.33
Note that even for a simple case like the one above, the definitional formula method tended to be
unwieldy, while in the case of computational formula method, it is rather straightforward.
2
Effect on variance, S , when each score is added a constant(2)
X X +C = Original score ( X X ) +C [ X X +C ]2
5 25
6 36
7 49
7 49
∑ X X +C = 66 ∑ X 2X +C = 588
( ∑ X X +C )2
2
∑ X 2X +C − n
S x=
n
662
588−
8
= 8
66×66
588−
8
= 8
588−544 .5
= 8
= 5.4375
2
Adding 2 to each score did not change the value of variance, S . In general, adding or
subtracting a constant to each score in a group will not change the variance (nor the standard
deviation) of the scores.
2
What would happen to S if each score was multiplied by a constant, say 2? Let us illustrate this
with the same set of scores i.e., 3, 4, 5, 5, 6, 8, 9 and 10. let us multiply each score by 2 and then
2
calculate S .
2
Effect on variance, S when each score is multiplied by a constant
1002
1424−
8
S2 = 8
1424−1250
= 8
= 21.75
Note that 21.75 is 4 times or 22 times 5.4375 In general, multiplying each score by a constant
makes the variance, of the resulting scores equal to C2S2. However, dividing each score by a
S2
2
constant makes the variance of the resulting scores equal to C . Thus we have come up with
two important properties of the variance and are:
2 2
1. Var. (X+C) = Var. (X) or S X +C = S X i.e. adding a constant to every score in the
distribution does not change the variance at all, and consequently the standard deviation
is not affected either.
2 2 2
2. Var.(CX) = C2Var.(X) or SCX =C S X i.e. multiplying each and every score by a
constant in the distribution, the resulting scores have a variance equal to constant
squared times the original variance.
Summary
LESSON ELEVEN
MEASURES OF RELATIONSHIP
The relationship or association between two variables is an important concept in research or any
studies. It can help in prediction, given one variable and not the other, and if their relationship
is known and is high enough to allow prediction.
Measures of Relationship
Quite often, we are interested in finding out how two variables are related to each other. The
kind of question that may come to our mind is: are those who do well in mathematics, the very
ones who do well in science? Or the same question put in different words would be: how is
performance in mathematics related to performance in science?
When two measures are related, the term correlation (or association) is used to describe
this fact. Correlation has two distinctions; that is, correlation which merely describes
presence or absence of relation, and correlation, which shows the degree or magnitude of
relationship between two measures.
Two ways of detecting the presence or absence of a relationship include logical examination of
the two measures and plotting a scatter diagram.
Logical examination
This can be demonstrated by an example. Suppose one postulates (assumes / guess) that
mathematics scores are related to science scores. Then suppose that the following data linking
mathematics and science scores are obtained. What would logical examination of pairs of the
data suggest?
Examination of data for case I above, suggests that some logical relationship exists between
mathematics and science scores. It may neither be necessary nor possible to get an index of
degree of relationship using this method, but one can at least be confident that some relationship
is present. However, suppose one was examining the data below in our case II, would one be
justified in concluding that some logical relationship exists between the two sets of scores?
Examination of the data for our case II shows that there is no logic in the
differences in science and mathematics scores. Hence the relationship between
them cannot be defended logically. One can only conclude that there is no
relationship. We next look at the relationship between two variables depicted
graphically to study the kind of relationship between the two variables. This is
done by means of a scatter diagram.
Scatter diagram
A scatter diagram is a graph of data points based on two measures (variables), where one
measure defines the horizontal axis and the other defines the vertical axis. In other words, when
we depict graphically a relationship between two variables, that graph (or presentation) is
referred as a scatter diagram or scattergram, also called scatterplot.
140
120
100
Science scores
80
60
40
20
0
0 20 40 60 80 100 120 140
Maths scores
Observe here in case I, one could draw a straight line through the scatter diagram in such a way
that it would approximate the pattern of points. The pattern of points in this case suggests a
highly positive relationship. This means that as mathematics scores increase, there is a
corresponding increase in science scores. Our case II, depicted below is that of, there is no
systematic relationship between the two variables. The points do not show any distinct pattern.
This scatter diagram suggests the absence of a relationship.
100
90
80
70
60
Science scores
50
40
30
20
10
0
0 20 40 60 80 100 120 140
Maths scores
Scatter diagram does not provide a very precise measure of the relationship. We definitely need
a more precise measure (or index) of relationship.
Three of the methods that provide precise measures of relationship between variables are:
covariance, Pearson product-moment correlation coefficient and Spearman rank correlation
coefficient.
Covariance
Covariance provides some information on the degree of relationship between two variables by
a simple averaging procedure. Let us illustrate this by a case where we want to determine the
covariance between mathematics (X) and science (Y) scores using data generated from n
students. Each of the n students provide two scores i.e. one mathematics score (X) and one
science score (Y).
The first step in the determination of covariance involves obtaining the product of deviation of
the two scores (X and Y) from their respective means. This in summary is ( X i − X̄ )(Y i−Ȳ ) .
The procedure is repeated for all the n students. The products of deviation scores are then
summed. The sum of the products of the deviation scores is divided by n (the total number
of students). The quantity obtained is covariance, which a more precise measure of
Cov. (X, Y) = n
Note if a student’s scores are high on both variables, X and Y, the product of the deviations i.e.
( X i − X̄ )(Y i−Ȳ ) will be high and positive for him or her.
If the student has low scores in both variables, the product of deviations will be both high and
negative while the product of two negative numbers is positive.
X: 7, 8, 9
Y: 40, 50, 60
LESSON TWELVE
The process of deviating both the X and Y values around their respective means has made the
quantity, Cov. (X, Y) independent of the means, of the values
To make the desired measure of relationship independent of standard deviations of the two
groups’ values, one need only to divide covariance of the two i.e., Cov. (X, Y) by the standard
deviation of the two variables i.e. Sx and Sy
The result is the desired measure of relationship between X and Y. It is called the Pearson
product-moment correlation coefficient and is denoted by rxy
Cov .( X , Y )
i.e. rxy = S x S y
The above formula can be simplified and presented in a slightly different form to get what is
commonly referred to as the definitional formula.
n
∑ ( X i− X̄ )( Y i− Ȳ )
i=1
n
n
∑ ( X i − X̄ )( Y i−Ȳ )
√
n n i=1
∑ ( X i− X̄ )2 ∑ ( Y i −Ȳ )2
Cov .( X , Y )
√
n n
i =1 i=1
∑ ( X i − X̄ )2 ∑ ( Y i −Ȳ )2
. rxy = S x S y = n n = i=1 i=1
n
∑ ( X i− X̄ )( Y i− Ȳ )
i=1
n
√
n n
∑ ( X i− X̄ ) ∑ ( Y i −Ȳ )2
2
i =1 i=1
rxy = n n
n
∑ ( X i − X̄ )( Y i−Ȳ )
i=1
√∑
n n
2
( X i − X̄ ) ∑ ( Y i −Ȳ )2
rxy = i=1 i=1
n
∑ ( X i − X̄ )( Y i−Ȳ )
i=1
√∑
n n
2
( X i − X̄ ) ∑ ( Y i −Ȳ )2
rxy = i=1 i=1
1360
= √1522×1270
1360
=1390 = 0.978
The Pearson product moment correlation coefficient, rxy, varies between –1.00 and 1.00. A value
of 1.00 is a perfect positive relationship. This means that an increase in one variable is
accompanied by a commensurate increase in the other variable. A value of 0.00 (zero) indicates
no relationship. A value of –1.00 is a perfect negative relationship. Values in between the
two extremes (minimum, -1 and maximum, +1) are judged low to high depending on the size.
The scatter diagrams indicating rxy’s of different sizes are shown below. The interpretation of rxy,
itself is relative. That is, we cannot say that a correlation of 0.5 is twice as strong as a
correlation of 0.25, but only that it is stronger. This kind of ordinal thinking is only
12
10
8
Y scores
0
0 2 4 6 8 10 12
X scores
12
10
8
Y SCORES
0
0 2 4 6 8 10 12
X SCORES
12
10
8
Y SCORES
0
0 2 4 6 8 10 12
X SCORES
Adding, subtracting or multiplying every score in the two distributions, X and Y by a constant
has no effect on the size of the correlation coefficient between X and Y. Adding, subtracting or
multiplying every score in a distribution by a constant is an example of a linear transformation.
In a linear transformation, the scores retain their relative positions and hence the correlation
coefficient between two sets of scores, X and Y is not affected by the transformation.
Curvilinearity
Four assumptions, which need be met by data of the two variables, X and Y for the index, rxy to
be a meaningful index in the sense that it is accurate in measuring relationship. The assumptions
are:
1. The relationship between the two variables must be linear
2. The two distributions should have similar shapes
3. The scatter diagram should be homoscedastic (equal variance of X and Y distribution)
4. The data should be based on at least interval scale of measurement.
Concerning the linear assumption, rxy is a measure of linear relationship between X and Y. If X
and Y are perfectly linearly related, the points in the scatter diagram will fall on a single straight
line. If we scatter the points in such a scatter diagram above and below the line in a haphazard
manner and about the same distance in each direction, we obtain various degrees of basically
linear relationship between X and Y. In both cases, the assumption of linearity would be met.
However, if there is some evidence that the relationship between X and Y is curvilinear, then the
assumption of linearity would have been violated, and the use of rxy would underestimate the real
magnitude of association between X and Y.
The third assumption has to do with homoscedasticity or equal variance of the X and Y
distributions. In simple terms, the requirement here is that the points on a scattergram showing
relationship between X and Y should be uniformly distributed. No places on the scattergram (or
scatter diagram) should have more points than others. The density of points on the scatter
diagram should be nearly the same. Whenever the scattergram is homoscedastic, the variance
of X variable is the same as the variance of Y variable.
The fourth assumption has to do with the level or scale of measurement. The Pearson product
moment correlation can only be used if the data from the two variables being correlated is based
on interval scale of measurement or on ratio scale of measurement . If the data is based on
ordinal (ranked) scales of measurement, then the Spearman rank- order correlation
coefficient, ρ, (rho) should be used.
We have such index, which we do not have to require our data to meet the 4 or more
assumptions. The assumptions for the index are minimal. The index is Spearman rank-order
correlation coefficient. It is not the only one in this series of what we may refer to as Non-
parametric statistics. Note, a price is paid for the minimal assumption. It seems there no free
things in this world; a price has to be paid, which we shall not go into.
The Spearman rank correlation uses data that is form of ranks (i.e. ordinal data). Thus, if both of
the two variables to be correlated are measured on an ordinal scale (rank-order scale), the
Spearman rank coefficient is the technique generally applied. The formula for obtaining the
Spearman rank-order correlation coefficient, ρ, (rho) is:
n
6 ∑ d 2i
i=1
1− 2
ρ= n(n −1 )
where ρ (rho) is the Spearman correlation index
Table II
Scores Position Rank
39 1 1
38 2 2
36 3 4
36 3 4
36 3 4
35 6 6
30 7 7
The first column provides interval data in the two tables above. Column two provides the
positions of the scores, while their ranks are given in third column. Rank is similar to position
except where some scores have ‘ties’. Ties are those scores obtain by more than one individual.
Ranks of scores with ties are the mean of the ranks they would occupy if no tie existed. Thus in
our case in table I, the two 36’s would have occupied the ranks of 3 and 4 if the tie did not exist.
Since they tie, the mean of 3 and 4 i.e. (3+4)/2, which is 3.5 is assigned to the two scores. For
table II the three 36’s, their rank is (3+4+5)/3, which is 4.
To illustrate the calculation of rho, let us again use data used to illustrate the calculation of
Pearson product moment correlation coefficient with a slight modification. Since the data is
based on interval measurement, it is converted to ordinal data by assignment of ranks in the third
|d |
and fourth columns. The fifth column gives i , the absolute difference between the ranks of X
and Y variables for respective subjects. The sixth column gives values of squared differences
2
between ranks, d i .
Sum 2.5
n=5 d 2i = 2.5
n
6 ∑ d 2i
i=1
1− 2
ρ= n(n −1 )
6×2 .5
1− 2
= 5(5 −1)
6×2. 5
1−
= 5×24
= 1 - 0.125 = 0.875
The ρ (rho) is interpreted in the same way as rxy. The value of rho can never be less than –1
nor greater than +1. It equals to +1, only if each person has exactly the same ranks on both X
and Y. It is –1, if there are no ties and the order is completely reversed for the two variables
such that the first is the last in the other variables and so forth.
Note
1. Although the Spearman correlation coefficient formula is simpler and does not look
much like the computational formula we used for Pearson correlation coefficient, it is
algebraically equivalent to the Pearson when it is used with ranked data instead of the
interval data.
2. Tie places are easily handled by assigning the mean value of ranks to each of the tie
holders.
3. If a very large number of ties occur, however you would probably be wise to reconsider
the use of Spearman (rho) coefficient, other non-parametric methods such as Kendall’s
tau or chi-square may be more appropriate.
4. Ranking can be done from the smallest or largest and so forth as long as you stick to the
convention you use to the end.
5. If there are no ties in the data, Spearman coefficient is merely what one obtains by
replacing the observations by their ranks and then computing Pearson product moment
correlation coefficient of ranks.