0% found this document useful (0 votes)
398 views129 pages

TDP 301 Eductional Measurements and Evaluation Notes Sept Dec 2023-1

1. Educational measurement and evaluation involves methods of measuring, assessing, and evaluating student progress and achievement in educational settings. 2. Key terms include measurement, assessment, evaluation, tests, diagnostic evaluation, and formative and summative evaluation. Measurement assigns numerals to represent characteristics, while assessment and evaluation make judgments about student learning and progress. 3. Assessment tools include tests, observations, interviews and can be formative (ongoing) or summative (end-of-unit/term). Evaluation determines if students meet criteria and makes determinations.

Uploaded by

brianouko25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
398 views129 pages

TDP 301 Eductional Measurements and Evaluation Notes Sept Dec 2023-1

1. Educational measurement and evaluation involves methods of measuring, assessing, and evaluating student progress and achievement in educational settings. 2. Key terms include measurement, assessment, evaluation, tests, diagnostic evaluation, and formative and summative evaluation. Measurement assigns numerals to represent characteristics, while assessment and evaluation make judgments about student learning and progress. 3. Assessment tools include tests, observations, interviews and can be formative (ongoing) or summative (end-of-unit/term). Evaluation determines if students meet criteria and makes determinations.

Uploaded by

brianouko25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

MURANG’A UNIVERSITY OF TECHNOLOGY

SCHOOL OF EDUCATION

TDP 301: EDUCTIONAL MEASUREMENTS AND


EVALUATION

LESSON ONE
INTRODUCTION TO EDUCATIONAL MEASUREMENT & EVALUATION
Introduction
Educational measurement and evaluation is the study of methods, approaches and strategies used to
measure, assess and evaluate in the educational setting. Evaluation has been conceived either as the
assessment of the merit and worth of educational programmes (Guba and Lincoln, 1981; Glatthorn, 1987;
and Scriven, 1991), or as the acquisition and analysis of information on a given educational programme for
the purpose of decision making (Nevo, 1986; Shiundu & Omulando, 1992). The course involves the study
of tests, test construction and item construction as well as statistical procedures used to analyze tests and
test results.

Meaning of Terms and Concepts


(Measurement, assessment, evaluation, tests, diagnostic evaluation, formative and summative evaluation)

Test: A method to determine a student's ability to complete certain tasks or demonstrate mastery of a skill
or knowledge of content. Some types would be multiple choice tests, or a weekly spelling test. While it is
commonly used interchangeably with assessment, or even evaluation, it can be distinguished by the fact that
a test is one form of an assessment. A test or assessment yields information relative to an objective or goal.
In that sense, we test or assess to determine whether or not an objective or goal has been obtained.

Assessment: Assessment is a process by which information is obtained relative to some known objective or
goal. Assessment is a broad term that includes testing. A test is a special form of assessment. Tests are
assessments made under contrived circumstances especially so that they may be administered. In other

Page 1 of 129
words, all tests are assessments, but not all assessments are tests. An assessment may also include methods
such as observations, interviews, behavior monitoring, etc. We test at the end of a lesson or unit. We assess
progress at the end of a school year through testing, and we assess verbal and quantitative skills through
such instruments as the SAT and GRE. Whether implicit or explicit, assessment is most usefully connected
to some goal or objective for which the assessment is designed.
The process of gathering information to monitor progress and make educational decisions
if necessary. Assessment is therefore quite different from measurement, and has uses that suggest very
different purposes.
Assessment of skill attainment is rather straightforward. Either the skill exists at some acceptable level or it
doesn’t. Skills are readily demonstrable. Assessment of understanding is much more difficult and complex.
Skills can be practiced; understandings cannot. We can assess a person’s
knowledge in a variety of ways, but there is always a leap, an inference that we make about what a person
does in relation to what it signifies about what he knows.

Evaluation: Procedures used to determine whether the subject (i.e. student) meets a preset criteria, such as
qualifying for special education services. This uses assessment (remember that

Page 2 of 129
an assessment may be a test) to make a determination of qualification in accordance with a predetermined
criteria. Evaluation is perhaps the most complex and least understood of the terms. Inherent in the idea of
evaluation is "value." When we evaluate, what we are doing is engaging in some process that is designed to
provide information that will help us make a judgment about a given situation. Generally, any evaluation
process requires information about the situation in question. When we evaluate, we are saying that the
process will yield information regarding the worthiness, appropriateness, goodness, validity, legality, etc.,
of something for which a reliable measurement or assessment has been made.
We evaluate every day. Teachers, in particular, are constantly evaluating students, and such evaluations are
usually done in the context of comparisons between what was intended (learning, progress, behavior) and
what was obtained. When used in a learning objective, the definition of evaluate is: To classify objects,
situations, people, conditions, etc., according to defined criteria of quality. Indication of quality must be
given in the defined criteria of each class category.

Measurement refers to the process by which the attributes or dimensions of some physical object are
determined. One exception seems to be in the use of the word measure in determining the IQ of a person,
attitudes or preferences.
However, when we measure, we generally use some standard instrument to determine how big, tall, heavy,
voluminous, hot, cold, fast, or straight something actually is. Standard instruments refer to instruments such
as rulers, scales, thermometers, pressure gauges, etc. We measure to obtain information about what is. Such
information may or may not be useful, depending on the accuracy of the instruments we use, and our skill at
using them.
To sum up, we measure distance, we assess learning, and we evaluate results in terms of some set of
criteria. These three terms are certainly connected, but it is useful to think of them as separate but connected
ideas and processes

Assessment:
This is the use of both formal and informal data gathering procedures to establish the extent to which
learners have gained the required knowledge, skills, values and attitudes following instruction. The results
of an assessment are used in decision making.
Assessment is a process by which information is obtained relative to some known objective or goal.
Assessment is a broad term that includes testing. A test is a special form of assessment. Tests are
assessments made under contrived circumstances especially so that they may be administered. In other
words, all tests are assessments, but not all assessments are tests. An assessment may also include methods
Page 3 of 129
such as observations, interviews, behavior monitoring, etc. We test at the end of a lesson or unit. We assess
progress at the end of a school year through testing, and we assess verbal and quantitative skills through
such instruments as the CAT. Whether implicit or explicit, assessment is most usefully connected to some
goal or objective for which the assessment is designed.

Types of Assessment
Diagnostic Assessment
This is carried out before instruction to determine whether or not students possess certain entry behavior
and during instruction to help the teacher determine the difficulties students are experiencing.
Formative Assessment
It takes place during instruction to provide feedback to teachers and students on students’ progress towards
attainment of desired objectives and to identify areas that need further attention.
Types of formative assessment
a) Oral questions
b) Written Assignments (Take home assignments)
c) Classwork
d) Question/Answer session
e) In- Class activities
f) Student feedback
Summative Assessment
It is carried out at the end of a unit, chapter, term or year to measure student progress during a given time
span. Summative assessment is used mainly for:
(a) Grading learners
(b) Certifying learners
(c) Judging the effectiveness of the teacher
(d) Comparing the performance of students, schools and districts.
Types of Summative assessment
i) Projects
ii) Term papers
iii) End of course examinations
iv) Portfolios
v) Student evaluation of the course
vi) Instructors self evaluation

Page 4 of 129
Importance of assessments to a teacher
More specifically, assessment is a method the teacher uses to make decisions on learners’ progress. It is an
essential process in teaching and learning as it enables the teacher to evaluate the level and extent of
learners’ achievement of the set objectives.
Assessment enables the teacher to;
i. Determine the level of achievement of set objectives
ii. Determines how much knowledge the learners have grasped
iii. Establishes how the learners have mastered skills taught and acquired attitudes
iv. Detects the difficulties and challenges learners are encountering, which forms the basis for
remedial teaching
v. Check on the effectiveness on the use of resources and methods of instruction
vi. Provides basis for learner promotion and reward.
vii. Provide information to school administration, parents and stakeholders for necessary action.
viii. motivating and directing learning
ix. providing feedback to student on their performance
x. providing feedback on instruction and/or the curriculum

Measurement and Evaluation


Measurement refers to the process of assigning numerals to events, objects etc according to certain rules.
Measurement is the process of determining the presence or absence and amount or type of characteristics or
behaviors possessed by an individual group or program and then assigning a number, score or rating to the
entity.
We can measure characteristics by giving tests, observations, rating scales or any other device that allows
us to obtain information in a quantitative form.
Evaluation
Evaluation involves the process of making judgment. It is the use of the results of an assessment in making
decisions in relation to student learning. Teachers assess the learners so that they collect evidence to use in
decision making about the quality of their teaching. There is always a standard which informs the decision.
The standard may also be true or arbitrary. For example a typist typing 80 words per minute may be
described as ‘Grade A’ typist. A child may be 3 ft tall may be described as ‘short’. In evaluation one
makes some value judgement based on some standard.
Page 5 of 129
Importance of Measurement and Evaluation in Education
Assessment is important because it drives students learning (Brissenden and Slater, n.d.). Whether we like it
or not, most students tend to focus their energies on the best or most expeditious way to pass their ‘tests.’
Based on this knowledge, we can use our assessment strategies to manipulate the kinds of learning that
takes place. For example, assessment strategies that focus predominantly on recall of knowledge will likely
promote superficial learning. On the other hand, if we choose assessment strategies that demand critical
thinking or creative problem- solving, we are likely to realize a higher level of student performance or
achievement.
Good assessment can help students become more effective self-directed learners (Angelo and Cross,
1993).Well-designed assessment strategies also play a critical role in educational decision-making and Is a
vital component of ongoing quality improvement processes at the lesson, course and/or curriculum level.
Characteristics of Measurement
1. All measurements contain errors.
2. All measurements are approximates
3. The results of repeated measurements do not agree exactly.
Sources of Errors
The following are the main sources of errors in measurement.
1. Personal errors, such as errors in the manipulative skill of the person making the measurement.
2. Errors in measuring instruments
3. Errors in method e.g. using an instrument under conditions for which it was not intended.

Measurement Scales or levels of measurement


There are four levels of measurement
(i) Nominal scale
(ii) Ordinal scale
(iii) Interval scale
(iv) Ratio scale
Nominal Scale
This is the lowest level of measurement. In nominal measurement numbers are used to identify, name or
classify objects, events, groups etc.
Nominal scales merely identify different categories. They have no property of magnitude for example
people may be classified as:
Page 6 of 129
(1) Hindu
(2) Muslim
(3) Sikh
(4) Christian

None of these people is greater than the other the number is used for identification.
Ordinal Scale
This is the second level of measurement. In ordinal measurement, numbers denote the rank order of the
objects or the individuals. Ordinal measures reflect which person or object is larger or smaller, heavier or
lighter, harder or softer etc.
Socio-economic status is a good example of ordinal measurement because every member of the upper class
is higher in social prestige than every member of the middle and lower class.
The draw back of ordinal measurement is that ordinal measures are not absolute quantities nor do they
convey that the distance between the different rank value is equal. Ordinal measurements do not show the
distances between the values and have no absolute zero point.
Interval Scale
This is the third level of measurement and includes all the characteristics of the nominal and ordinal scale of
measurement. Interval scales provide information about the distance between the units and the ordering of
the magnitude of the measure, but which lack an absolute zero point. The zero is arbitrary for measuring
attitude, aptitude, and temperature.
Ratio Scale
This is the highest level of measurement and has all the properties of nominal, ordinal and interval scales
plus an absolute or true zero point. Common examples of ratio scale are the measures of weight, width,
length, loudness etc.
The purpose of measurement and evaluation in education
 Placement of the students
 Selecting students for courses
 Certification
 Stimulating learning
 Improving teaching
 For research purpose
 For guidance and counselling
 For modification of curriculum purposes
Page 7 of 129
 For purpose of selecting students for employment
 For modification of teaching methods
 For the purpose of promotions to the student
 For reporting students’ progress to their parents
 For the award of scholarship and merit awards
 For the admission of students into educational institutions
 For the maintenance of the students

The Nature of Educational Measuring Instruments


Teachers are interested in human attributes such as achievement, attitude, intellectual ability, interest and
motivation. These attributes cannot be measured directly. They are measured through their effects for this
reason, the usual method of measuring these attributes is to present the individual with a series of tasks to
which he/she is expected to respond. On the basis of the performance on these tasks, we are able to infer
some quantitative estimate particular attributes which the tasks were meant to sample. These tasks are
compiled into tests and questionnaires. These are the main measuring instruments in education.
Achievement Tests
These are examinations or tests, which are designed to measure the extent to which students have learned
the intended curriculum.
Aptitude Tests
These tests measure the degree to which students have the capacity to acquire knowledge, skills and
attitudes.
Criterion Referenced Tests
They measure the extent to which prescribed standards have been met. They examine students’ mastery of
educational objectives and are use to determine whether a student has learned specific knowledge and skills.
Norm Referenced Tests
These are tests which compare a student’s performance on the test with that of other students in his/her
cohort. Unlike criterion referenced tests, norm referenced test are not concerned with determining how
proficient a student is in a particular subject or skill.

Methods of Assessment
Students are assessed using different methods. The most commonly used methods are:

Page 8 of 129
 Written exercises
These are questions set and administered by teachers to students to determine the extent to which students
have acquired specified knowledge and skills.
 Homework assignments
Teachers assign students work to do after classes.
 Check-lists
Checklists are inventories of learning tasks that have been completed and level of competence that has been
achieved.
 Attitude Scales
They consist of statements with which the pupil may express agreement or disagreement, one such a scale is
the Likert scale. This scale is divided into categories: Strongly agree, Agree, Undecided, Disagree and
strongly Disagree.
 Direct Observation
Another method of assessment is direct observation of students’ activities.
 Oral Exam
Here the student is interviewed and expected to give an oral response for example languages e.g. French.

Types and Approaches to Assessment


Numerous terms are used to describe different types and approaches to learner assessment. Although
somewhat arbitrary, it is useful to these various terms as representing dichotomous poles (McAlpine, 2002).
Formative <---------------------------------> Summative
Informal <---------------------------------> Formal
Continuous <----------------------------------> Final
Process <---------------------------------> Product
Divergent <---------------------------------> Convergent
Formative vs. Summative Assessment
Formative assessment is designed to assist the learning process by providing feedback to the learner, which
can be used to identify strengths and weakness and hence improve future performance. Formative
assessment is most appropriate where the results are to be used internally by those involved in the learning
process (students, teachers, curriculum developers).

Page 9 of 129
Summative assessment is used primarily to make decisions for grading or determine readiness for
progression. Typically summative assessment occurs at the end of an educational activity and is designed to
judge the learner’s overall performance. In addition to providing the basis for
grade assignment, summative assessment is used to communicate students’ abilities to external
stakeholders, e.g., administrators and employers.

Informal vs. Formal Assessment


With informal assessment, the judgments are integrated with other tasks, e.g., lecturer feedback on the
answer to a question or preceptor feedback provided while performing a bedside procedure. Informal
assessment is most often used to provide formative feedback. As such, it tends to be less threatening and
thus less stressful to the student. However, informal feedback is prone to high subjectivity or bias.

Formal assessment occurs when students are aware that the task that they are doing is for assessment
purposes, e.g., a written examination. Most formal assessments also are summative in nature and thus tend
to have greater motivation impact and are associated with increased stress. Given their role in decision-
making, formal assessments should be held to higher standards of reliability and validity than informal
assessments.

Continuous vs. Final Assessment


Continuous assessment occurs throughout a learning experience (intermittent is probably a more realistic
term). Continuous assessment is most appropriate when student and/or instructor knowledge of progress or
achievement is needed to determine the subsequent progression or sequence of activities. Continuous
assessment provides both students and teachers with the information needed to improve teaching and
learning in process. Obviously, continuous assessment involves increased effort for both teacher and
student.

Final (or terminal) assessment is that which takes place only at the end of a learning activity. It is most
appropriate when learning can only be assessed as a complete whole rather than as constituent parts.
Typically, final assessment is used for summative decision-making. Obviously, due to its timing, final
assessment cannot be used for formative purposes.

Process vs. Product Assessment

Page 10 of 129
Process assessment focuses on the steps or procedures underlying a particular ability or task, i.e., the
cognitive steps in performing a mathematical operation or the procedure involved in analyzing a blood
sample. Because it provides more detailed information, process assessment is most useful when a student is
learning a new skill and for providing formative feedback to assist in improving performance.
Product assessment focuses on evaluating the result or outcome of a process. Using the above examples, we
would focus on the answer to the math computation or the accuracy of the blood test results. Product
assessment is most appropriate for documenting proficiency or competency in a given skill, i.e., for
summative purposes. In general, product assessments are easier to create than process assessments,
requiring only a specification of the attributes of the final product.
Divergent vs. Convergent Assessment
Divergent assessments are those for which a range of answers or solutions might be considered correct.
Examples include essay tests, and solutions to the typical types of indeterminate problems posed in PBL.
Divergent assessments tend to be more authentic and most appropriate in evaluating higher cognitive skills.
However, these types of assessment are often time consuming to evaluate and the resulting judgments often
exhibit poor reliability.
A convergent assessment has only one correct response (per item). Objective test items are the best example
and demonstrate the value of this approach in assessing knowledge. Obviously, convergent assessments are
easier to evaluate or score than divergent assessments.
Unfortunately, this “ease of use” often leads to their widespread application of this approach even when
contrary to good assessment practices.

Page 11 of 129
Comparison between Assessment and Evaluation

Dimension Assessment Evaluation


Timing Formative Summative
Focus of Measurement Process-Oriented Product-Oriented
Relationship Between Administrator and Reflective Prescriptive
Recipient
Findings and Uses Diagnostic Judgmental
Modifiability of Criteria, Measures Flexible Fixed
Standards of Measurement Absolute (Individual) Comparative
Relation Between Objects of A/E Cooperative Competitive

From: Apple, D.K. & Krumsieg. K. (1998). Process education teaching institute handbook.
Pacific Crest
Roles of Assessment and Evaluation
1. Identify problem areas in student achievement so as to strengthen the weak points in the
learning process.
2. Reward good performance
3. Evaluate student’s progress at school and recommend ways to improve student learning
4. Monitor the knowledge, skills and attitudes acquired by students.
5. Evaluate the progress of schools in achieving curriculum objectives
6. Compare levels of achievement among several schools or education districts.
7. Identify curriculum areas that may need study or revision.
8. Provide information to teachers about the effectiveness of the teaching method.

Review Questions
1. What is a test?
2. Explain what is meant by measurement in education
3. Distinguish between measurement, assessment and evaluation
4. Differentiate between assessment and evaluation
5. Why is assessment and evaluation important?

Page 12 of 129
6. Describe the importance of measurement and evaluation in education
7. Analyze the 5 different approaches used in educational assessment and evaluation

LESSON TWO

INSTRUCTIONAL / LEARNING OBJECTIVES IN EDUCATIONAL ASSESSMENT


Introduction
This unit discusses the importance of instruction / learning objectives in educational assessment.
Steps in writing instructional objectives, followed by a description of the taxonomy of
educational objectives. It further describes categories under each of the three domains of
educational objectives.

Specific Learning Outcome


After studying this unit you should be able to:
1. Explain the importance of instruction / educational objectives in educational assessment.
2. Describe the various categories of the cognitive, affective and psychomotor domain.
3. Construct objectives under each of the domains.

THE ROLE OF OBJECTIVES IN EDUCATIONAL ASSESSMENT

What is an Instructional Objective?

An objective

 Is an intent communicated by a statement describing a proposed change in a learner


 Is a statement of what the learner is to be like when he/she has successfully completed a
learning experience

An instructional objective describes an intended outcome. A usefully stated objective is stated in


behavioral, or performance, terms that describe what the learner will be doing when
demonstrating his/her achievement of the objective. An instructional objective must state the

Page 13 of 129
audience, Behaviour, Consequence and Degree. (The ABCD of objectives) In addition they
must:

a)Describe what the learner will be doing when demonstrating that he/she has reached the
objective; i.e., What should the learner be able to do? (Performance)

b)Describe the important conditions under which the learner will demonstrate his/her
competence; i.e., Under what conditions do you want the learner to be able to do it?
(Conditions)

c)Indicate how the learner will be evaluated, or what constitutes acceptable performance; i.e.,
How well must it be done? (Criterion)

Objectives are statements of intent specifying behavior expected after learners being subjected to
some learning experience. Educational objectives play an important role in educational
assessment. Well written objectives:
 clearly state the behavior that pupils should demonstrate at the end of a period of instruction
 The measurement of what education has achieved may be useful in determining what
education should achieve.
 They provide guidance and direction to teaching and testing.
 Objectives specify in precise terms the student behavior to be measured hence they help
teachers to identify what pupils should learn and indicate to the teacher what questions to
ask.
 They also help the teacher in assessing relevant material and act as a standard for measuring
learning outcomes.
 They indicate samples of learning outcomes that the test developer is willing to accept as
evidence that the stated instructional objectives have been achieved.
 They give direction to education

Page 14 of 129
INSTRUCTIONAL OBJECTIVES
Instructional objectives are statements of what is to be achieved at the end of the instructional
process. They are, therefore, the subject of assessment and evaluation. This chapter discusses the
importance of instructional objectives, and their formulation.
Importance of instructional/learning objectives
 Instructional/learning objectives communicative to the learners, instructors and other
interested people what the learners should be able to do at the end of the lesson.
 Instructional/learning objectives help learners organize their study and avoid getting lost
(if learners are informed).
 Instructional/learning objectives help the teacher plan learning activities with focus
 Instructional/learning objectives enable the teacher select the most appropriate teaching
approaches.
 Well written Instructional/learning objectives help to take time in developing the lesson.
 Instructional/learning objectives form the basis for the development of the instruction by
limiting the content.
 Instructional/learning enable the teacher identify appropriate teaching & learning
resources
 Instructional/learning form the basis for lesson evaluation.

Steps in Writing Instructional Objectives


There are 5 steps in writing outcomes
Decide on the content area. This is defining the limits of what should go into instruction.
Use action verbs to identify specific behaviors. The verb should be (a) an observable behavior
that produces measurable results (b) the verb should also be at the highest skill level that the
learner would be required to perform.
Specify the content area after the verb, for example: calculate averages or computer variances.
It is important to specify the content area for clarity. An unspecified content area would for
example be
Calculate statistical information
Computer values needed in economics These are wide areas and cannot be measured well.
Specify applicable conditions. Identify any tools used, information to be supplied or other
constraints. For example: given a calculator, calculate the average of a list of numbers
OR
Without using a calculator, calculate the average of a list of numbers.

Page 15 of 129
Specify application criteria -identify any desired levels of speed, accuracy, quality, quantity e.t.c
. For example: Given a calculator, calculate averages from a list of numbers correctly, all the
time. OR
Given a spreadsheet package, compute variances from a list of numbers rounded to the second
decimal point.
Review each learning outcome to be sure it is complete, clear and concise

Categories of Objectives
Bloom et al (ed.) (1964), Krathwol, Masia (1964) and Horrow (1972) categorize components of
learning into what has now become known as the taxonomy of educational objectives.
The categories are:
(a) The cognitive domain
(b) The effective domain
(c) The psychomotor domain
Their main contribution was to emphasize that all the three domains of learning exist, and that
most importantly, the respective domains of learning should be addressed consciously by
Educationists in general and by the classroom teachers in particular. Further than identifying the
three domains of the taxonomy, they sub-categorized the three domains.
The Cognitive Domain
The cognitive domain of educational objectives deals principally with the development of the
learner’s mental abilities or achievements, which include intellectual aptitudes. The taxonomy of
educational objectives identifies six categories of cognitive objectives: knowledge,
comprehension, application, analysis, synthesis and evaluation. Each category is assumed to
include behaviors at the lower levels. Each of these categories is discussed below.
1. Knowledge:
It is defined as “remembering of previously learned material.” At this level students are
expected simply to recall the information previously presented to them. The following
are statements of objectives at the knowledge level.
At the end of the lesson, the student should be able to:
a Define the term “photosynthesis”
b List four conditions necessary for photosynthesis.

Page 16 of 129
From the two examples it is evident that knowledge objectives deal with behavior that is
mere rote learning.
2. Comprehension
It is defined as “the ability to grasp the meaning of material.” The following are
examples of objectives at the comprehension level.
At the end of the lesson the student should be able to:
a Describe the cause of natural changes in landforms.
b Write a two-paragraph summary of the themes of Imbuga’s “Betrayal in the City”.
c Explain what fresh air is.
3. Application
It is defined as “the ability to use learned material in new and concrete situations.”
Examples of objectives at this level are:
At the end of the lesson the student should be able to:
a Explain how we can control road accidents.
b Explain how the chief’s baraza operates.
c Compute the area of a rectangle measuring 60cm by 45 cm, correctly.

4. Analysis
It is defined as the “ability to break down material into its component parts so that its
organizational structure may be understood.” The following are examples of objectives
at the analysis level.
At the end of the lesson, the student should be able to:
a State the causes of the Mau Mau in Kenya
b Explain the advantages and disadvantages of liberalization of the economies of
Eastern African countries.
c Classify people who need help.
d Draw a time-line showing the development of science and technology in the 20 th
century.
5. Synthesis
It is defined as the “ability to put parts together to form a new whole.” The following are
some objectives at the synthesis level.

Page 17 of 129
At the end of the lesson, the student should be able to:
a Conduct a survey of costs of different types of transport.
b Compose a song for use during the national literacy day.
c Briefly describe the factors she would consider in searching for a new water supply
for Nairobi.
6. Evaluation
It is defined as the “ability to judge the value of material for a given purpose.” Evaluation
is the highest level of the cognitive domain. The following are examples of objectives at
the evaluation level.
At the end of the lesson the student should be able to:
a Compare and contrast the activities of non-governmental organizations and local
government authorities in the provision of non-formal education in Kisumu district.
b Justify the effects of traffic police on the reduction of road accidents.
c With specific examples, justify the teaching of mathematics.
d Give reasons for and against the importation of sugar.

The following are action verbs for different levels of the cognitive domain.
1. Knowledge 2. Comprehension 3. Application
define translate use
repeat discuss illustrate
name describe interpret
memorize identify dramatize
arrange locate employ
recall review practice
order explain operate
recognize classify solve
relate recognize

4. Analysis 5. Synthesis 6. Evaluation


Compare plan Choose
Contrast Construct Estimate

Page 18 of 129
Differentiate Organize Measure
Test Prepare Prove
Relate Create Revise
Analyze Summarize Evaluate
Appraise Design Appraise
Calculate Arrange Value
Criticize Manage Revise
Distinguish Collect Score
Inspect Formulate Select
Experiment Set up Argue
Examine Write Assess
Choose
The Affective Domain
The affective domain is concerned with the development of, or change of values, attitudes,
interests and appreciations. Most educators feel that it is the responsibility of the school to
develop positive values and attitudes. Hence the need for teachers to state affective objectives.
According to the taxonomy, the affective domain consists of five categories according to the
degree of internalization i.e. receiving, responding, valuing, organization and characterization.
Each of these categories is briefly described below:
Receiving. This is the lowest level of the affective domain. It is defined as sensitivity to the
existence of certain phenomena, that is, willingness to receive or attend to them. The following
are examples of objectives at receiving level.
1. The student develops a tolerance for African music.
2. The student patiently listens to a lecture on the dangers of drug abuse.
Responding. This refers to active attending to the phenomena. At this level, learners act out
behaviors that are consistent with people who hold a particular value. (Lorber and Pierce, 1983).
Student responses at this level indicate more than passive listening/attending; they require active
participation. More complete responding would be indicated by a student’s willingness to
engage in an activity, even when allowed a choice. Examples of objectives at the responding
level are:

Page 19 of 129
1. The student indicates interest in protecting the environment by voluntarily reading
magazines designed for people involved in environmental protection.
2. The student demonstrates a commitment to honesty by not cheating.
Valuing. This refers to the worth an individual attaches to an object, phenomenon, or behavior.
It implies something as having worth and consequently revealing consistency in behavior related
to the object or phenomena. Below are some examples objectives stated at this level.
1. The student indicates her commitment to political reforms by writing letters to the press
on the need for political reforms.
2. The student shows commitment to Christianity by becoming a member of one of the
Catholic Organizations in the school.
Organization. Organization is defined as the conceptualization of values and the employment
of these concepts for determining the inter-relationship among values. As ideas are internalized
they became increasingly interrelated and prioritized i.e. they become organized into a value
system. This requires that the student conceptualizes a value by analyzing interrelationships
drawing generalizations that reflect the valued idea. In school, students should be offered
learning opportunities that help them organize their values. Some examples of these learning
opportunities are simulation games, project work and case studies. The following are examples
of objectives at the organization level.
1. The student should form judgements as to whether population and family life education
should be taught.
2. The student should balance his/her argument for and against polygamy.
Characterization. This is the fifth level of the affective domain. At this level, the individual
develops a consistent value system, which becomes part and parcel of his or her life style. Such
individuals would never say “do as I say not as I do” Here are two examples of objectives
stated at this level.
1. The learner should develop a consistent philosophy of life.
2. The student should demonstrate the value of honesty by consistently acting honestly in her
dealings with fellow students.
Behavioral Terms for Objectives in the Major Categories of the Affective Domain
Table 3.1 shows the behavioral terms for objectives in the major categories of the Affective
Domain.

Page 20 of 129
Level of Examples of Behavioral Terms Examples of Questionnaire Items
Internalization
Receiving Take note of value concepts, ask, Would you be presently interested
choose, give, identify, select, in joining a club that discusses the
use, and point to. Bible? Would you like to know
more about Jomo Kenyatta.
Responding Reads willingly, follows Is it usually possible for you to go
instructions, volunteers to help, to Church?
applauds performances, answers,
assists, complies, gives, practices
Valuing Helps protect, campaigns Do you support elderly people?
actively, supports community
organizations, describe,
differentiate, follow, invite,
justify, initiate, select.
Organization Compares codes of conduct, Have any of the books you read
defines limits of behavior, alter, markedly influenced your views
arrange, combine, defend, and about one party system of
prepare. government?
Characterizatio Changes behavior in light of
n value re-organization,
consistently demonstrates
humanitarianism as rated by
peer, act, display, perform,
revise, unify.
Source: Lorber and Pierce, 1983.

The Psychomotor Domain


The psychomotor domain deals with manipulative skills and body movements and are developed
by several subjects taught in primary and secondary schools including music, art and craft,

Page 21 of 129
physical education, home science, typing and technical subjects. The psychomotor domain is
divided into six categories in ascending order of complexity and sophistication. These are:
1. Reflex movements
2. Basic Fundamental movements
3. Perceptual abilities
4. Physical abilities
5. Skilled movements
6. Non discursive communication.
Reflex Movements. These are involuntary actions elicited as a response to some stimulus and
are not a concern of educators. These movements are either evident at birth or develop with
maturation.
Basic Fundamental Movements. These skills are developed during the first year of life.
Examples of these skills are crawling, standing up and manipulating objects.
Perceptual Abilities. This is the third level of the psychomotor domain. Perceptual abilities are
not observed but depend on cognitive tasks that students are required to perform. Examples of
objectives at the perceptual ability level are:
1. The student should walk the full distance and back across a balance beam without falling.
2. The student should list the names of four musicians playing jazz from an audio recording
of music.
Physical Abilities. The physical abilities include endurance, strength, flexibility and agility.
Below are examples of objectives stated in this level. The student should be able to:
1. Run a hundred meters distance in less than ten seconds (endurance)
2. Execute correctly fifty-four push- ups continuously (strength).
Skilled Movements. They are a result of learning often-complex learning. They result in
efficiency in carrying out a complex movement task. These include simple adaptive skills such
as dancing and typing; compound adaptive skills such as playing tennis, hockey and golf; and
complex adaptive skills such as aerial gymnastic stunts. The following are examples of
objectives in this category:
1. The student should type at a rate of fifty words per minute without making six errors.
2. The student should execute two twisting dives with 100 per cent accuracy.

Page 22 of 129
Non-discursive Communication. This category involves non-verbal communications that are
used to convey a message to an observer. Examples of such movements are postures and facial
expressions. A typical objective in this category is: the student will exhibit appropriate gestures
and facial expressions.
Behavioral Terms for Objectives in the Major Categories of the Psychomotor Domain
Table 3.2 shows the behavioral terms for objectives in the major categories of the Psychomotor.
Table 3.2: Behavioral Terms for Objectives in the Major Categories of the Psychomotor
Domain.
Category Examples of Behavioral Terms
Perceptual abilities Maintains balance, bounces ball, differentiates objects, selects
by size, draws geometric symbols, writes the alphabet, catches
thrown ball, identifies shapes consistently, repeats poem, plays
piano from memory.
Physical abilities Runs 3 kilometers, executes push-ups, extends to toes
Skilled movement Plays the guitar, jumps hurdles, plays tennis, dances to music
Non-discursive Exhibits appropriate gestures and facial expressions, performs
communication original dance.
Source: Adapted from Lorber and Pierce, 1983.
Note: It is important to note that setting objectives is very crucial in assessment of learning
outcomes.
Activities
1. For the same content area, construct two objectives each at the knowledge, comprehension,
application, analysis, synthesis and evaluation levels of the taxonomy of cognitive objectives.
2. Identify ways we measure aspects of achievement, intelligence and classroom conduct.
3. Make up examples in your subject area that illustrate mismatches between what is being
tested and how it is being tested.
Written Exercises
1. Distinguish between cognitive, affective and psychomotor domains.
2. Identify and explain
(a) Five levels of cognition
(b) Five categories of the affective domain

Page 23 of 129
(c) Six categories of the psychomotor domain

LESSON THREE
TESTS IN EDUCATIONAL ASSESSMENT
Introduction
This unit discusses the importance of tests in educational assessment. It is followed by a
description of the steps in test construction. It further describes types of tests, their strengths and
their weaknesses.
Specific Objectives
After studying this unit, you should be able to:
1. Give reasons for testing in the classroom.
2. Describe the steps is test development
3. Explain the various types of tests.
4. Construct various test items

TEST CONSTRUCTION
A test is a collection of items developed to measure human educational or psychological
attributes. It can also be used to make predictions.
Bean (1953) defines a test as an organized succession of stimulate designed to measure
quantitatively or to evaluate qualitatively some mental process, trait or characteristic. For
example the reading ability of a child may be measured with the help of a test specially designed
for the purpose. His/her reading ability score may be evaluated with respect to the average
performance of the reading ability of other children of his/her age or class.
Why Teacher Made Tests
It is important for teachers to know how to construct their own tests because of the following:
 The teacher Made Tests can be closely related to a teacher’s particular objectives and pupils
since he/she knows the needs, strengths and weaknesses of his/her students.
 The teacher can tailor the test to fit her/his particular objectives fit a class or even fit the
individual pupils.

Page 24 of 129
 The classroom tests may be used by the teacher to help him/her develop more efficient
teaching strategies e.g. a teacher develop his/her own tests, administer them to the students as
pre-tests and
(a)Re-teach some of the information assumed known by the students.
(b)Omit some of the material planned to be taught because the students already know it.
(c)Provide some of the students with remedial instruction while giving other students some
enriching experience.
 They are used for diagnosis where the teacher diagnoses the pupil’s strengths and
weaknesses.

Types of Test

Achievement Tests
Achievement refers to what a person has acquired or achieved after the specific training or
instruction has been imparted. Hence achievement tests are designed to measure the effects of a
specific program of instruction or training i.e. the extent to which students have learned the
intended curriculum. Examples of achievement tests are Kenya Certificate of Secondary
Education (KCSE) examination and Kenya certificate of primary education (KCPE).
Aptitude Test
Tuckman (1975) defines aptitude as “ a combination of abilities and other characteristics whether
native or acquired, known or believed to be indicative of an individual’s ability to acquire skill or
knowledge in a particular area. On the basis of such abilities, future performance of a child can
be predicted. Aptitude tests are tests or examinations that measure the degree to which students
have the capacity to acquire knowledge, skills and attitudes. The primary purpose of an aptitude
test is to predict what a person can learn and are future oriented.
Criterion-referenced Test
Criterion-referenced tests are tests that measure the extent to which prescribed standards have
been met. They examine students’ mastery of educational objectives and are used to determine
whether a student has learned specific knowledge or skills. Criterion-referenced assessment asks
the question: can student X do Z?
Norm Referenced Tests

Page 25 of 129
These are tests which compare a student’s performance on the test with that of other students in
his/her cohort. For example, scores on a test given to Form II students can be compared to the
scores of other students in Form II. Unlike criterion-referenced tests, norm referenced tests are
not concerned with determining how proficient a student is in a particular subject or skill. The
main problem with making normative comparisons is that they do not indicate what students
know or do not know.

Steps in Test Development


For a teacher to develop a good test he/she need to follow the following steps:
1. Planning
Good tests do not just happen, they require adequate and extensive planning so that the
instructional objectives, the teaching strategy to be employed, the teaching material and the
evaluative procedures are all related in some meaningful fashion.
At this stage the teacher specifies the:
a. Purpose of the Test
To be helpful, classroom tests must be related to the teacher’s instructional objectives,
which in turn must be related methods used, and eventually to the use of the test results.
Purpose of Classroom Tests
Why do we give exams to students? Classroom achievement tests serve a variety of
purposes such as:
 Judging the pupil’s mastery of certain essential skills and knowledge
 To evaluate and grade students. Exams provide a
controlled environment for independent work and so are
often used to verify the state of students’ learning.
 To motivate students to study. Students do tend to open their books more
often when an evaluation is coming up. Exams can be great motivators.
 To add variety to student learning. Exams are a form of learning
activity. They can enable students to see the material from a
different perspective. They also provide feedback that students
can then use to improve their understanding.
 To identify faults and correct them. Exams enable both students and
instructors to identify which areas of the material taught are not
being understood properly. This allows students to seek help, and
instructors to address areas that may need more attention, thus
enabling student progression and improvement.
 To obtain feedback. You can use exams to evaluate your own

Page 26 of 129
teaching. Students’ performance on the exam will pinpoint
areas where you should spend more time or change your
current approach.
 To provide statistics for the course or institution. Institutions often
want information on how students are doing. How many are
passing and failing, and what is the average achievement in class?
Exams can provide this information.
 To accredit qualified students. Certain professions demand that
students demonstrate the acquisition of certain skills or knowledge.
An exam can provide such proof – for example, ensuring standards of
progression are met
 To find out students’ progress
 To diagnose students difficulties
 To report students’ progress to the stake holders
 To motivate students
 To compare performance between class
 To select students with a certain aptitude
 To predict performance
 Measuring growth over time
 Ranking pupils in terms of their achievement of particular instructional
objectives
 Diagnosing pupils difficulties
 Evaluating the teacher’s instructional method
 Ascertaining the effectiveness of the curriculum
 Encouraging good study habits
 Motivating students

The teacher should not hope that because a test can serve many masters it will automatically
serve his/her intended purpose. The teacher must plan for this in advance.
b) What is to be tested?
The next major question the teacher needs to ask himself or herself is what knowledge,
skills and attitudes do I want to measure? Should I test for factual knowledge or should I
test the extent to which my students are able to apply their factual knowledge?
c) Decide on the nature of the content or items to be included

What are the qualities / characteristics of a good / fair exam / test?


 Consistency. If you gave the same exam twice to the same students,
they should get a similar grade each time.

Page 27 of 129
 Validity. Make sure your questions address what you want to evaluate.
 Realistic expectations. Your exam should contain questions that
match the average student’s ability level. It should also be possible
to respond to all questions in the time allowed. To check the exam,
ask a teaching assistant to take the test – if they can’t completed it in
well under the time permitted then the exam needs to be revised.
 Uses multiple question types. Different students are better at different
types of questions. In order to allow all students to demonstrate their
abilities, exams should include a variety of types of questions.
 Offer multiple ways to obtain full marks. Exams can be highly
stressful and artificial ways to demonstrate knowledge. In recognition of
this, you may want to provide questions that allow multiple ways to
obtain full marks. For example, ask students to list five of the seven
benefits of multiple-choice questions.
 Free of bias. Your students will differ in many ways including
language proficiency, socio-economic background, physical
disabilities, etc. When constructing an exam, you should keep student
differences in mind to watch for ways that the exams could create
obstacles for some students. For example, the use of colloquial
language could create difficulties for students for whom English is a
first language, and examples easily understood by North American
students may be inaccessible to international students.
 Redeemable. An exam should not be the sole opportunity to obtain
marks. There should be other opportunities as well. Assignments and
midterms allow students to practice answering your types of questions
and adapt to your expectations.
 Demanding. An exam that is too easy does not test students’
understanding of the material.
 Transparent marking criteria. Students should know what is expected
of them. They should be able to identify the characteristics of a
satisfactory answer and understand the relative importance of those
characteristics. This can be achieved in many ways; you can provide
feedback on assignments, describe your expectations in class, or post
model solutions on a course website.
 Timely. Spread exams out over the semester. Giving two exams one
week apart doesn’t give students adequate time to receive and respond to
the feedback provided by the first exam. When possible, plan the exams
to fit logically within the flow of the course material. It might be helpful
to place tests at the end of important learning units rather than simply
give a midterm halfway through the semester.

NEXT WEEK DATE MUT A


2. Prepare the test Items: In so doing consider Blooms taxonomy of instructional objectives.
All the different levels of abilities and skills should be tested.

Page 28 of 129
 Knowledge-Verbs to use in the statement of instructional objectives at this level
include: define, describe, enumerate, identify label, list, match, name, read, select,
reproduce and state.
 Comprehension- Learners should be able to classify, cite, convert, describe,
discuss, explain, give examples, paraphrase, restate in own words, summarize,
understand, distinguish and rewrite.
 Application- apply, change, compute, modify, predict, prepare, relate, solve,
show, use and produce.
 Analysis- at this level, learners should be able to break down, correlate,
discriminate, differentiate, distinguish, focus, illustrate, infer, limit, outline point
out, prioritize, recognize, separate, subdivide, select and compare
 Synthesis- Learners at this level can put parts together to form a whole that has a
new meaning or structure. Key words in use for this level of learning include
categories
 Evaluation- Learning and the level includes appraising, comparing and
contrasting, defending, judging, interpreting, justifying, discriminating and
evaluating

3). Prepare a Table of specification


A table of specification is a blue print which defines clearly as possible the scope and emphasis
of the test to relate to the objectives, to content and to construct a balanced test.
A Table of Specifications is a blueprint for an objective selected response assessment. The
purpose is to coordinate the assessment questions with the time spent on any particular content
area, the objectives of the unit being taught, and the level of critical thinking required by the
objectives or state standards. The use of a Table of Specifications is to increase the validity and
quality of objective type assessments.
Importance of table of specification
 Ensure content validity of the test
 Ensures that the measure covers the broad range of areas within the concept under
study. This ensures sampling validity. The use of a Table of Specifications is to
increase the validity and quality of objective type assessments.
 Ensures that the learners are tested at different levels/domains of learning
 Ensures construct validity and quality of objective type assessments
 This table provides teachers and their students with a visual approximation of the
content that will tested and the amount of weight it is given on a test. The teacher
should know in advance specifically what is being assessed as well as the level of
critical thinking required of the students.

Page 29 of 129
 The purpose is to coordinate the assessment questions with the time spent on any
particular content area, the objectives of the unit being taught, and the level of critical
thinking required by the objectives or state standards.

The teacher should know in advance specifically what is being assessed as well as the level of
critical thinking required of the students. Tables of Specifications are created as part of the
preparation for the unit, not as an afterthought the night before the test. Knowing what is
contained in the assessment and that the content matches the standards and benchmarks in level
of critical thinking will guide learning experiences presented to students. Students appreciate
knowing what is being assessed and what level mastery is required.
Any question on an assessment should require students to do three things: first, access
information on the topic of the question. Second, use that knowledge to complete critical
thinking about the information. Third, determine the best answer to the question asked on the
assessment. A Table of Specifications is a two-way chart which describes the topics to be
covered in a test and the number of items or points which will be associated with each topic.
Sometimes the types of items are described as well. The purpose of a Table of Specifications is
to identify the achievement domains being measured and to ensure that a fair and representative
sample of questions appear on the test.
As it is impossible, in a test, to assess every topic from every aspect, a Table of Specifications
allows us to ensure that our test focuses on the most important areas and weights different areas
based on their importance / time spent teaching. A Table of Specifications also gives us the proof
we need to make sure our test has content validity. Tables of Specifications are designed based
on:
 course objectives
 topics covered in class
 amount of time spent on those topics
 textbook chapter topics emphasis and
 space provided in the text

A Table of Specification could be designed in 3 simple steps:

1. Identify the domain that is to be assessed

Page 30 of 129
2. Break the domain into levels (e.g. knowledge, comprehension, application …)
3. Construct the table
The more detailed a table of specifications is, the easier it is to construct the test.
A table of specification has two dimensions. The first dimension represents the different abilities
that the teacher wants the pupil to display and the second represents the specific content and
skills to be measured.
A Table of Specification for a 15 Item C.R.E Test for Form 3

Major Knowledg Comprehensi Applicati Analys Synthes Evaluatio Total


Content e on on is is n
Areas
Call of 2 1 1 1 5
Abraham
Passion of 2 1 1 1 5
Jesus
Apostleship 2 1 2 5
Covenant at 1 2 1 1 5
Sinai
5 3 3 3 2 4 20

Moderation of the test: This means examining the items set by a group of teachers in the same
subject area. Questions that are ambiguous are identified and feedback is given about the test
prepared. When moderating, attention should be paid to the following:
 Ensure all the questions are within the syllabus
 Ensure that the test has face validity i.e. by looking at the questions tell
whether the items are clear and not ambiguous.
 Ensure items are free of bias
 Check that appropriate action verbs are used.
 Check whether the questions can be answered in the time given
 Check whether the items sample the syllabus (content validity)
 Check whether the marks in the question paper and marking scheme tally.

Page 31 of 129
3. Administering of the Test
During administration you need to:
 During administration you need to inform the students in advance about the
test and the time that the test will take place.
 Prepare a register of attendance.
 Write the start and end time of the test.
 Ensure there is enough room and there is enough space in the sitting
arrangement.
 Collecting question papers in time from custodian to be able to start the test at
the appropriate time stipulated
 Ensure compliance with the stipulated sitting arrangements in the test to
prevent collision between and among the testees
 Ensure orderly and proper distribution of question papers to the testees
 Make any corrections before the test starts
 Do not talk unnecessarily before the test. Testees time should not be wasted at
the beginning of the test with unnecessary remarks, instructions or threat that
may develop test anxiety
 It’s necessary to remind the testees of the need to avoid malpractices before
they start and make it clear that cheating will be penalized
 Stick to the instructions regarding the conduct of the test and avoid giving hits
to testees who ask about particular items. But make correction or clarification
the testees wherever necessary.
 Keep interruptions during the test to a minimum

4. Marking and Scoring of the Scripts


The guidelines when making marking schemes
 Look at what others have done
 Make a marking scheme usable by non-experts
 Give consequential marks
 Review the marking scheme after the exam
 When marking, make notes on exams

Page 32 of 129
6. Interpret the scores and you can use statistical tools e.g. the mean, mode and standard
deviation.

DEVELOPING TEST ITEMS


Developing Test Items for Different Cognitive Abilities.
Knowledge
Define the term ………………….
Describe how…………………….
List the ………………………….
State the …………………………
Name the ………………………..
Comprehension
Explain why ………………………….
Illustrate ………………………………..
Give five reasons for ……………………
Classify …………………………………
Compare ……………………………….
Application
What methods should be used to solve a problem …………………………………
Solve ………………………………….
Predict
Using ………………. Demonstrate how
Construct …………………..
Perform mouth to mouth resuscitation
Analysis
Analyze ………………………..
Differentiate …………………..
Criteria for arriving at a conclusion ……………………….
Synthesis

Page 33 of 129
Summarize …………………………
Prepare a plan …………………….
Organize …………………………
Describe reasons for selection of …………………………
Argue for and against …………………………………….
Evaluation
Make an ethical judgement ………………………………
Justify ………………………………..
Critically assess this statement …………………………..

N.B.: The table of specification can aid immensely in the preparation of test items, in the
production of a valid and well balanced test.
In the classification of objectives to both teacher and students.
The table is only a guide, it is not designed to be adhered to strictly.
Relating the test Items to the instructional objective. It is important to obtain a match between a
test’s items and the test’s instructional objectives, which is not guaranteed by the table of
specification. The table of specification only indicates the number or proportion of test items to
be allocated to each of the instructional objectives specified.
Example 1
Objective:
The student will be able to differentiate between assessment and measurement.
Test Item
Distinguish between assessment and measurement.
Example 2
Objective
Students can perform mouth-to-mouth resuscitation on a drowning victim.
Test Item:
Describe the correct procedure for administering mouth-to-mouth resuscitation on a drowning
victim.

Page 34 of 129
First example there is a match between one learning outcome and the test item. In the second
there is no match, because describing something is not valid evidence that the person can do
something.
Writing down the Items
After preparing a table of specification, you should construct test items. There are two types of
test items free response type and choice type.
Free Response (Essay Questions)
They require students to provide answers. They are of two types
a) Restricted type
b) Extended type

The restricted type is one that requires the examine to supply the answer in one or two lines and
is concerned with one central concept (Marshall & Hales, 1972)
An extended essay is one where the examinee’s answer comprises several sentences and is
usually concerned with more than one central concept.
Examples
Restricted Type
Describe the meaning of reliability of an educational test.
Extended Type
Describe five characteristics of a good test
Advantages of Restricted Type of Question
 They cover a wide content
 They are easy to construct
 They are easy to mark
Disadvantage
 They do not test pupils’ ability to organize or criticize.
Advantages of Extended Type
 Easy to set
 Improves students writing skills
 Help develop student ability to organize and express his/her ideas in a logical and
coherent manner.

Page 35 of 129
 Allows for creativity since examinee is required to give a coherent and organized
answer rather than recognize the answer.
 Leaves no room for guess work
 Tests student’s ability to supply rather than select the correct answer
Disadvantages
 Inadequate sampling of the syllabus
 It is disadvantageous to pupils with difficulties in expressing themselves
 Marking is highly unreliable and varies form scorer to scorer and sometimes within
the same score when asked to evaluate the answer at different time intervals.
 Scoring takes a longer time because of the length of the answer.
Open to subjectivity i.e. halo effect thus what influences the marks may not be what was being
tested i.e. handwriting or language.

Guidelines to Writing Good Essay Items


1. Have clearly in mind what mental processes you want the students to use before starting to
write the question. If you want students to judge, analyze or think critically what mental
processes involve analysis, judgment or critical thinking? Once you determine this use the
appropriate verbs in your question.
2. Write the questions in such a way that the task is clearly and unambiguously defined for the
student.
i) In the overall instructions preceding test items.
ii) In the test items themselves.
3. Start essay questions with phrases or words such as compare, contrast, describe, explain,
assess etc. Do not begin with such words as what, who when and list since these words
generally lead to tasks that require only recall of information.
4. Avoid using optional items i.e. require all students to complete the same items. Allowing
students three of five, four of seven and so forth decreases test validity as well as decreases
your basis for comparison among students.
5. Be sure each question relates to a instructional objective.
6. The learner should be guided on how many points or factors are required by the examiner.
This guards against wide or wild writing, e.g. poor item: Describe the struggle for

Page 36 of 129
independence in Kenya. Better Item: Describe five methods used in the struggle for
independence in Kenya.
Suggestions for Grading Essay Items
1. Check your marking scheme against actual responses. Before actually beginning to mark
the exam papers, it is recommended that a few papers are selected at random to a certain
the appropriateness of the marking scheme. It also helps in the up dating of the marking
scheme.
2. Be consistent in your grading. Graders are human and may be influenced by the first few
papers they read and thereby grade them either too leniently or too harshly depending on
their initial mind set (Hales &Tokar, 1975). For this reason it is important that once
grading has started teachers should occasionally refer to the first few papers graded to
satisfy them that standards are being applied consistently. This may be especially true for
those papers read near the end of the day when the reader might be physically and
mentally tired.
3. Randomly shuffle the papers before grading them. Research shows that a student’s essay
grade will be influenced by the position of his/her paper especially if the preceding
answers were either very good or very poor.
It is hence recommended that the examiner shuffles the papers prior to grading to the
bias introduced. It is significant especially if the teacher is working with high and low
level classes and read the best papers first or last.
4. Grade only one question at a time for all papers. To reduce the “halo” effect it is
recommended that teachers grade one question at a time rather than one paper (containing
several responses) at a time. This also makes it possible for the examiner to concentrate
and become thoroughly familiar with one set of scoring criteria and not be distracted by
moving from one question to another.
5. Try to grade all responses to a particular question without interruption. One source of
unreliability is that the grader’s standards may vary markedly from one day to the next
and even from morning to afternoon of the same day. If a lengthy break is taken the
reader should re-read some of the first few papers to re-familiarize him/herself with
his/her grading standards so that she/he will not change them mid-stream.

Page 37 of 129
6. The mechanics of expression should be judged separately form the content. For those
teachers who feel that the mechanics of expression are very important, it is recommended
that they assign a proportion of the question’s value to such factors as legibility, spelling,
handwriting punctuation, and grammar.
The proportion assigned to these factors should be spelt out in the grading criteria and the
students should be informed in advance.
7. Provide comments and correct errors. Although it is time consuming to write comments
and correct errors, it should be done if we are to help the student to become a better
student.
This also helps the teacher in explaining his/her method of assigning a particular grade.
Oral Questions
The oral question is variation of essay. It is well suited for testing students who are unable to
write because of their physical handicaps or in languages.
Advantages
1. They permit the examiner to determine how well the student can synthesize and organize
his ideas and express himself.
2. They require the pupil to know and be able to supply the answer.
3. They allow students to demonstrate their oral competence in mastery of a language.
4. They permit free response by the student.
5. Permits detailed probing by the examiner
6. Students may ask for clarification.
Limitation
1. They provide for a limited sampling of content
2. They have low rater reliability.
3. They are time consuming (only one student can be tested at a time).
4. Do not permit or provide for any record of the examinee’s response to be use for future
action by the teacher and pupil unless the examination process is recorded.
Choice Items
They require controlled response from the candidate and are at times referred to as objective
type. Examples include
 Multiple choice

Page 38 of 129
 True/false
 Matching
 Completion
Multiple Choice Item
It consists to two parts (1) the stem, which contains the problem
(2) A list of suggested answers (responses or options).
The incorrect responses are often called distracters.
The correct response is called the Key.
The stem may be stated as a direct question or an incomplete statement.
There are five variations of the multiple-choice item:
 Correct answer type
 Best answer
 Incomplete statement
 Multiple response
 Negative variety
a) Correct Answer Type:
This is the simplest type of multiple choice item. The student is told to select the one
correct answer listed among several plausible but incorrect options.

Example
When a test item and the objective it is intended to measure match the item
a. Is called an objective item
b. Has content validity
c. Is too easy
d. Should be discarded
b) Best Answer Type:
The directions are similar to those of the single correct answer except the student is told
to select the best answer.
Example
Which of the following is the most important agent of curriculum implementation
a. Teacher

Page 39 of 129
b. Inspectorate
c. Curriculum development center
d. Parents Teachers Associations
c) Incomplete Statement
The stem is an incomplete statement rather than a question. It is best for lower levels.
For example the first president of Uganda was ______________________
a. Kabaka Mutesa I
b. Obote
c. Museveni
d. Idd Amin
d) Multiple Response Type
The candidate is required to endorse more than one response.
Which of the following reasons explain why a teacher needs to prepare a lesson plan in
advance.
i. To enable him/her collect the necessary material in good time.
ii. To enable him/her focus on the questions that pupils are likely to ask.
iii. To keep a meaningful record of what has been taught to a given class.
iv. To visualize and organize a complete learning situation in advance.
a) i &ii
b) iii & iv
c) i, iii & iv
d) i, ii & iii
Negative Variety Item
Here all the responses are correct except one example:
Which of the following is not a good reason for organizing an educational visit?
a. Correlate several school subjects
b. Broaden students experiences beyond the classroom
c. Break the monotony of the class.
d. Arouse the students’ curiosity and develop an inquiring mind.
How to Construct Effective Multiple Choice Items

Page 40 of 129
1. The stem should contain the central problem so that the student will have some idea as to
what is expected to him/her and some tentative answer in mind before he begins to read the
options.
Poor Stem: A criterion reference test. This example is poor because it does not ask a question
or set a task. It is essential that the intent of the item be stated clearly in the stem.
2. Avoid repetition of words in the options. The stem should be written so that the key words
are incorporate in the stem and will not have to be repeated in each option.
Poor: According to Engle’s law
a. Family expenditures for food increase in accordance with the size of the
family.
b. Family expenditures for food decrease as income increases.
c. Family expenditures for food require a smaller percentage of an increasing
income.
d. Family expenditures for food rise in proportion to income.
Better: According to Engle’s Law, Family expenditures for food.
a. Increase in accordance with the size of the family
b. Decrease as income increases
c. Require a smaller percentage of an increasing income
d. Rise in proportion to income
3. Avoid making the key consistently longer or shorter than the distracters.
4. Avoid giving irrelevant clues to the correct answer. The length of the answer may be a clue
others may be of a grammatical nature such as the use of “a” or “an” at the end of a statement
and using a singular or plural subject and/ or verb in the stem with just one or two singular
or plural options.

Poor: Roosevelt was an


a. President
b. Man
c. Alcoholic
d. General

Page 41 of 129
5. There should be only one key.
6. An item in the test should not reveal the answer to another stem
Example:
Item 1 : The “halo” effect is pronounced in essay tests. The best way to minimize its effects
is to:
a Provide optional questions
b “Aim” the student to the desired response
c Read all responses to one question before reading the responses to the other questions.
d Permit students to write essays at home.
If item 10 – In what type of test is the “halo” effect more operative.
a Essay
b Matching
c True-false
d Short-Answer
The student can obtain the correct answer to item 10 from item 1.
7. The distracters should be plausible and homogeneous. The student should be forced to read
and consider all options. No distracter should be automatically eliminated by the student
because it is irrelevant or a stupid answer.
Advantages of Multiple Choice Tests
 Easy to mark/score
 Objective marking
 Covers a wide content area
 Can lend itself to machine scoring
 Provides level playing ground for students good in language and those poor in the same
Limitations of Multiple Choice Tests
 Susceptible to guess-work
 Not fit for measuring arguments, opinions and creativity
 Encourages note learning
 Difficult to set and takes a long time to prepare
 Do not test ability to communicate or organize ideas

Page 42 of 129
True – False Items
These are when items are expressed in the form a declarative statement which is either entirely
true or entirely false.
Advantages
 Easy to set and mark
 More items can be answered
 Samples a wide content area
 Favors both the linguistically gifted and poor candidates.
 A sure way of detecting misconceptions in learners.
Disadvantages
 Susceptible to guess work
 Tests trivial facts
 Does not test higher cognitive abilities i.e. analysis, evaluation etc.
 True or false items are relative
 Susceptible to ambiguity and misinterpretation
Suggestions for Writing True-False Items
1 Construct statements that are definitely true or definitely false.
2 Keep true and false statements at approximately the same length and be sure that there
are approximately equal numbers of true and false items.
3 Keys should not fall in a pattern
4 The statement should include only a single issue.
Matching Items
The candidates are asked to match up items in two columns.
The matching exercise is well suited to those situations where one is interested in testing the
knowledge of terms, definitions, dates, events and other matters involving simple relationships.
Example:
For each definition below, select the most appropriate term from the set of terms at the right.
Mark your answer in the blank before each definition.
Definitions Terms
1. A professional judgement of the 1. Behavioral objective

Page 43 of 129
Adequacy of test scores
2. Determination of the amount of some skill 2. Criterion referenced test
Test or trait
3. Specification of what a child must do to in 3. Evaluation
Indicate mastery of a skill
4. A series of tasks or problems 4. Measurement
5. Tests used to compare individuals 5. Norm referenced
6. Test
Advantages of Matching Exercises
 Easy to mark and score
 Easy to set
 Provide level ground for those with good or poor language
Disadvantages
 Limited to measuring factual information
 Limited to only parts of the content
 Susceptible to guess work
Completion Items
Also referred to as supply items
The examinee is expected to complete or fill in the spaces. So that the sentence can be a
complete one.
Example
The longest river in Africa _____________________
Guidelines for writing completion items:
 The item should be clear and unambiguous word each item in specific terms with clear
meaning so that the intended answer is the only one possible.

Note: It is important that you prepare a table of specification when constructing your tests. It
ensures balance and comprehensiveness in a test.

The Marking scheme and its relevance to teachers

Page 44 of 129
Guidelines When Making Marking Schemes
Look at what others have done. Chances are that you are not the only person who teaches
this course. Look at how others choose to assign grades.
Make a marking scheme usable by non-experts. Write a model answer and use this as the
basis for a marking scheme usable by non-experts. This ensures that your TAs and your students
can easily understand your marking scheme. It also allows you to have an external examiner
mark the response, if need be.
Give consequential marks. Generally, marking schemes should not penalize the same error
repeatedly. If an error is made early but carried through the answer, you should only penalize it
once if the rest of the response is sound.
Review the marking scheme after the exam. Once the exam has been written, read a few
answers and review your key. You may sometimes find that students have interpreted
your question in a way that is different from what you had intended. Students may come up with
excellent answers that may be slightly outside of what was asked. Consider giving these students
partial marks.
When marking, make notes on exams. These notes should make it clear why you gave a
particular mark. If exams are returned to the students, your notes will help them understand their
mistakes and correct them. They will also help you should students want to review their exam
long after it has been given, or if they appeal their grade.

Principles for developing marking guidelines in a classroom test

1. Marking guidelines are developed in the context of relevant syllabus outcomes and
content.
2. Marks are awarded for demonstrating achievement of aspects of the syllabus outcomes
addressed by the question.
3. Marking guidelines reflects the nature and intention of the question and will be expressed
in terms of the knowledge and skills demanded by the task.

4. Marking guidelines indicate the initial criteria that will be used to award marks.
5. Marking guidelines allow for less predictable and less defined responses, for example,
characteristics such as flair, originality and creativity, or the provision of alternative
solutions where appropriate.
6. Marking guidelines for extended responses uses language that is consistent with the
outcomes and the band descriptions for the subject.
7. Marking guidelines are to incorporate the generic rubric provided in the examination
paper as well as aspects specifically related to the question.

8. The language of marking guidelines should be clear, unambiguous and accessible to


ensure consistency in marking.

Page 45 of 129
9. Where a question is designed to test higher-order outcomes, the marking guidelines will
allow for differentiation between responses, with more marks being awarded for the
demonstration of higher-order outcomes.
10. Marking guidelines will indicate the quality of response required to gain a mark or a sub-
range of marks.
11. High achievement will not be defined solely in terms of the quantity of information
provided.

12. Optional questions within a paper will be marked using comparable marking criteria.

13. Marking guidelines for questions that can be answered using a range of contexts and/or
content will have a common marking guideline exemplified using appropriate contexts
and/or content.

The importance of preparing a marking scheme in assessment of learning in the classroom


 as part of teaching and learning strategy can aid learners creating meanings of previously
learned content
 It provides an opportunity for students to be a part of the thinking process around
judging performance and deepens their understanding of what is required
 It can also allow for discussion and agreements to be reached about the meanings of
certain words and phrases in the context of the assessment task
 Provides an opportunity to learn how marks are allocated in relation to the task and
content which in return aids them to prepare for assessment more effectively
 Forms a framework for decision making
Merits of preparing a marking scheme in assessment of learning in the classroom
 Marks available for each part of the question are distributed so it is easy to justify
awarding the marks
 The content of the answer is specified and this guides the scoring
 Clearly indicates how other areas may be scored for instance the grammar and
expression of ideas
 Extra information to help the Examiner make his or her judgment is provided
 Whatever is acceptable or not worthy of credit or, in discursive answers, to give an
overview of the area in which a mark or marks may be awarded is specified
 Serves as a feedback to the teacher in regards to the depth of content coverage
Major Uses of Item Analysis

Item analysis can be a powerful technique available to instructors for the guidance and
improvement of instruction. For this to be so, the items to be analyzed must be valid measures of
instructional objectives. Further, the items must be diagnostic, that is, knowledge of which
incorrect options students select must be a clue to the nature of the misunderstanding, and thus
prescriptive of appropriate remediation.

In addition, instructors who construct their own examinations may greatly improve the
effectiveness of test items and the validity of test scores if they select and rewrite their items on
the basis of item performance data.

Page 46 of 129
Item Analysis Guidelines

Item analysis is a completely futile process unless the results help instructors improve their
classroom practices and item writers improve their tests. Let us suggest a number of points of
departure in the application of item analysis data.

1. Item analysis gives necessary but not sufficient information concerning the
appropriateness of an item as a measure of intended outcomes of instruction. An item
may perform beautifully with respect to item analysis statistics and yet be quite irrelevant
to the instruction whose results it was intended to measure. A most common error is to
teach for behavioral objectives such as analysis of data or situations, ability to discover
trends, ability to infer meaning, etc., and then to construct an objective test measuring
mainly recognition of facts. Clearly, the objectives of instruction must be kept in mind
when selecting test items.
2. An item must be of appropriate difficulty for the students to whom it is administered. If
possible, items should have indices of difficulty no less than 20 and no greater than 80. lt
is desirable to have most items in the 30 to 50 range of difficulty. Very hard or very easy
items contribute little to the discriminating power of a test.
3. An item should discriminate between upper and lower groups. These groups are usually
based on total test score but they could be based on some other criterion such as grade-
point average, scores on other tests, etc. Sometimes an item will discriminate negatively,
that is, a larger proportion of the lower group than of the upper group selected the correct
option. This often means that the students in the upper group were misled by an
ambiguity that the students in the lower group, and the item writer, failed to discover.
Such an item should be revised or discarded.
4. All of the incorrect options, or distracters, should actually be distracting. Preferably, each
distracter should be selected by a greater proportion of the lower group than of the upper
group. If, in a five-option multiple-choice item, only one distracter is effective, the item
is, for all practical purposes, a two-option item. Existence of five options does not
automatically guarantee that the item will operate as a five-choice item.

Aim of Item analysis

 How well did my test distinguish among students according to the how well they met my
learning goals?

Recall that each item on your test is intended to sample performance on a particular learning
outcome. The test as a whole is meant to estimate performance across the full domain of learning
outcomes targeted.

One way to assess how well your test is functioning for this purpose is to look at how well the
individual items do so. The basic idea is that a good item is one that good students get correct
more often than do poor students. An item analysis gets at the question of whether your test is
working by asking the same question of all individual items—how well does it discriminate? In
short, item analysis gives the teacher a way to exercise additional quality control over their tests.

Page 47 of 129
Well-specified learning objectives and well-constructed items gives teachers a head-start in that
process, but item analyses can give you feedback on how successful you actually were. Item
analyses can also help you diagnose why some items did not work especially well, and thus
suggest ways to improve them (for example, if you find distracters that attracted no one, try
developing better ones). The important test for an item’s discriminability is to compare it to the
maximum possible. How well did each item discriminate relative to the maximum possible for
an item of its particular difficulty level? Here is a rough rule of thumb.

 Discrimination index is near the maximum possible = very discriminating item


 Discrimination index is about half the maximum possible = moderately discriminating
item
 Discrimination index is about a quarter the maximum possible = weak item
 Discrimination index is near zero = non-discriminating item
 Discrimination index is negative = bad item (delete it if worse than -.10)

n addition to these and other qualitative procedures, a thorough item analysis also includes a
number of quantitative procedures. Specifically, three numerical indicators are often derived
during an item analysis: item difficulty, item discrimination, and distractor power statistics.

Item Difficulty Index (p)

The item difficulty statistic is an appropriate choice for achievement or aptitude tests when the
items are scored dichotomously (i.e., correct vs. incorrect). Thus, it can be derived for true-false,
multiple-choice, and matching items, and even for essay items, where the instructor can convert
the range of possible point values into the categories “passing” and “failing.”

The item difficulty index, symbolized p, can be computed simply by dividing the number of test
takers who answered the item correctly by the total number of students who answered the item.
As a proportion, p can range between 0.00, obtained when no examinees answered the item
correctly, and 1.00, obtained when all examinees answered the item correctly. Notice that no test
item need have only one p value. Not only may the p value vary with each class group that takes
the test, an instructor may gain insight by computing the item difficulty level for a number of
different subgroups within a class, such as those who did well on the exam overall and those who
performed more poorly.

Although the computation of the item difficulty index p is quite straightforward, the
interpretation of this statistic is not. To illustrate, consider an item with a difficulty level of 0.20.
We do know that 20% of the examinees answered the item correctly, but we cannot be certain
why they did so. Does this item difficulty level mean that the item was challenging for all but the
best prepared of the examinees? Does it mean that the instructor failed in his or her attempt to
teach the concept assessed by the item? Does it mean that the students failed to learn the

Page 48 of 129
material? Does it mean that the item was poorly written? To answer these questions, we must
rely on other item analysis procedures, both qualitative and quantitative ones.

Item Discrimination Index (D)

Item discrimination analysis deals with the fact that often different test takers will answer a test
item in different ways. As such, it addresses questions of considerable interest to most faculty,
such as, “does the test item differentiate those who did well on the exam overall from those who
did not?” or “does the test item differentiate those who know the material from those who do
not?” In a more technical sense then, item discrimination analysis addresses the validity of the
items on a test, that is, the extent to which the items tap the attributes they were intended to
assess. As with item difficulty, item discrimination analysis involves a family of techniques.
Which one to use depends on the type of testing situation and the nature of the items. I’m going
to look at only one of those, the item discrimination index, symbolized D. The index parallels the
difficulty index in that it can be used whenever items can be scored dichotomously, as correct or
incorrect, and hence it is most appropriate for true-false, multiple-choice, and matching items,
and for those essay items which the instructor can score as “pass” or “fail.”

We test because we want to find out if students know the material, but all we learn for certain is
how they did on the exam we gave them. The item discrimination index tests the test in the hope
of keeping the correlation between knowledge and exam performance as close as it can be in an
admittedly imperfect system.

The item discrimination index is calculated in the following way:

1. Divide the group of test takers into two groups, high scoring and low scoring. Ordinarily,
this is done by dividing the examinees into those scoring above and those scoring below
the median. (Alternatively, one could create groups made up of the top and bottom
quintiles or quartiles or even deciles.)
2. Compute the item difficulty levels separately for the upper (p upper) and lower (p lower)
scoring groups.
3. Subtract the two difficulty levels such that D = p upper- plower.

How is the item discrimination index interpreted? Unlike the item difficulty level p , the item
discrimination index can take on negative values and can range between -1.00 and 1.00.
Consider the following situation: suppose that overall, half of the examinees answered a
particular item correctly, and that all of the examinees who scored above the median on the exam
answered the item correctly and all of the examinees who scored below the median answered
incorrectly. In such a situation pupper = 1.00 and p lower = 0.00. As such, the value of the item
discrimination index D is 1.00 and the item is said to be a perfect positive discriminator. Many
would regard this outcome as ideal. It suggests that those who knew the material and were well-
prepared passed the item while all others failed it.

Page 49 of 129
Though it’s not as unlikely as winning a million-dollar lottery, finding a perfect positive
discriminator on an exam is relatively rare. Most psychometricians would say that items yielding
positive discrimination index values of 0.30 and above are quite good discriminators and worthy
of retention for future exams.

Finally, notice that the difficulty and discrimination are not independent. If all the students in
both the upper and lower levels either pass or fail an item, there’s nothing in the data to indicate
whether the item itself was good or not. Indeed, the value of the item discrimination index will
be maximized when only half of the test takers overall answer an item correctly; that is, when p
= 0.50. Once again, the ideal situation is one in which the half who passed the item were students
who all did well on the exam overall.

Does this mean that it is never appropriate to retain items on an exam that are passed by all
examinees, or by none of the examinees? Not at all. There are many reasons to include at least
some such items. Very easy items can reflect the fact that some relatively straightforward
concepts were taught well and mastered by all students. Similarly, an instructor may choose to
include some very difficult items on an exam to challenge even the best-prepared students.

The difficulty Index


 Difficulty index is a measure of the probability of passing individual test items
 Very useful in assessing whether the students have learned the concept have acquired the
cognitive skill that the question calls for
 Serves as a feedback mechanism to the teacher and student
 Can also be used by teachers to improve the quality of test items
 Helps in identifying specific areas of course content which need greater emphasis or
clarity
 Very useful in identifying specific areas of strength and weakness of learners
The discrimination index
 Useful in establishing item effectiveness
 Provides evidence of learning difficulties in relation to cognitive tasks
 A useful technique in decision making
 Creates an opportunity to teachers to build their skills in test development
 Provides evidence of internal consistency
 Very useful in identifying specific areas of strength and weakness of learners\tic
 An effective strategy in analysis of the effectiveness of teaching methods
 Useful in identifying learning misconceptions among learners
 Very useful is assessing transfer of training

Measures of Central tendency

Page 50 of 129
The mean is also the number that divides the scores into two equal groups and mode is the score
that occurs most frequently.

The shape of a distribution of your test scores can provide useful clues about your test and your
students’ performance. When representing students’ scores on a graph, the scores often will be
positively or negatively skewed. When the distribution is positively skewed, that implies that the
most frequent scores (the mode) and the median are below the mean. If your test is very difficult,
there may be many low scores and few high ones. The distribution of scores would have a shape
similar to the one depicted below that is positively skewed.

When the tail points to the left, the distribution is negatively skewed. In this distribution there are
high scores and relatively few low scores. Notice that the mean is influenced by the skewing.

Page 51 of 129
The mean can be distorted if there are some scores that are extremely different (outliers) from the
mean of the majority of scores for the group. Consequently, the median is the most descriptive
measure of central tendency.

Indicators of Variability

Variability is the dispersion of the scores within a distribution. Given a test, a group of students
with a similar level of performance on a specific skill tend to have scores close to the mean.
Another group with varying levels of performance will have scores widely spread and further
from the mean. In other words, how varied are the scores? Two common measures of variability
are the range and standard deviation.

Range

The range, R, is the difference between the lowest and the highest scores in a distribution. The
range is easy to compute and interpret, but it only indicates the difference between the two
extreme scores in a set.

If we use the scores from Mr. Walker’s class (above), we would calculate the range as: Range
(R) = the highest score – the lowest score in the distribution.

95 91 100 96 92 91 87 84 70 65 96 65 56 86 43 65 22 40 93

R = 100 - 22 = 78, so the range is 78.

Standard Deviation

A more useful statistic than simply knowing the range of scores would be to see how widely
dispersed different scores are from the mean. The most common measure of variability is the
standard deviation (SD). The standard deviation is defined as the numeric index that describes

Page 52 of 129
how far away from the mean the scores in the distribution are located. The formula for the
standard deviation is:

Where X = the test score, M = mean, and N = number of scores.

The higher the standard deviation, the wider the distribution of the scores is around the mean.
This indicates a more heterogeneous or dissimilar spread of raw scores on a scale. A lower value
of the standard deviation indicates a narrower distribution (more similar or homogeneous) of the
raw scores around the mean.

The characteristics of the mean


 It is an interval statistic which is a superior level of measurement compared to other
levels
 More precise than median or mode
 Takes into account every score in the distribution
 Most stable measure of central tendency
 Best indicator of combined performance in the classroom
The demerits of the mean as an effective measure of central tendency in the classroom tests
 It is an interval statistic and therefore inappropriate for ordinal level of measurement
 Sensitive to extreme scores or even outliers
 Affected by skewed data
 It is inadequate on its own as a measure of central tendency sometimes the value
calculated is not meaningful
 Cannot be computed for qualitative data
 Cannot be determined graphically
The median as a measure of central tendency
 It is the middle score in a distribution
 Usually presents the ordinal level of measurement
 Very useful in calculation of the skewness of a distribution
 Ranking of performance can be done using the median score

Page 53 of 129
 Sensitive to extreme scores
Properties of the Standard deviation
 The standard deviation is only used to measure spread or dispersion around the mean of a
data set.
 Standard deviation is never negative.
 Standard deviation is sensitive to outliers. A single outlier can raise the standard
deviation and in turn, distort the picture of spread.
 For data with approximately the same mean, the greater the spread, the greater the
standard deviation.
 If all values of a data set are the same, the standard deviation is zero (because each value
is equal to the mean).

Activities
1. Construct five extended essay questions and five restricted essay questions in your
subject area
2. Prepare a table of specification for a 20 item multiple choice test.

Written Exercises
1. Distinguish between testing and assessment
2. Describe the main steps in the development of tests.
3. Outline the main considerations you would bear in mind in the construction of tests.

LESSON FOUR

ADMINISTRATION OF TESTS AND EXAMINATIONS


Introduction

Page 54 of 129
This unit describes the procedures involved in the administration of tests and examinations. It is
followed by a discussion of the criteria for the awarding of grades for achievement tests.
Specific Objectives
After studying this unit, you should be able to:
1. Identify the main steps in the administration of tests.
2. Explain how emergency cases should be treated in an exam room
3. Distinguish grading using the normal curve and a standard scale.
Practical Procedures involved in the Administration of Tests and Examinations
1. The teacher should inform students several days to the test about the purpose of the test,
areas to be covered by the test and the type of questions contained in the test.
2. He/she should prepare a register of attendance for each test and make sure that students sign
it upon receiving and handing in their answer scripts.
3. The teachers should write on the chalkboard the starting time and the finishing time of the
test.
4. You should ensure that there is enough room between seats to avoid the chance of students
copying from one another.
5. The examination room should be well lighted and ventilated. It should be quite free from
disruptions.
6. During administration you need to inform the students in advance about the test and the time
that the test will take place.
7. Prepare a register of attendance.
8. Write the start and end time of the test.
9. Ensure there is enough room and there is enough space in the sitting arrangement.
10. Collecting question papers in time from custodian to be able to start the test at the appropriate
time stipulated
11. Ensure compliance with the stipulated sitting arrangements in the test to prevent collision
between and among the testees
12. Ensure orderly and proper distribution of question papers to the testees
13. Make any corrections before the test starts
14. Do not talk unnecessarily before the test. Testees time should not be wasted at the beginning
of the test with unnecessary remarks, instructions or threat that may develop test anxiety
15. It’s necessary to remind the testees of the need to avoid malpractices before they start and
make it clear that cheating will be penalized
16. Stick to the instructions regarding the conduct of the test and avoid giving hits to testees who
ask about particular items. But make correction or clarification the testees wherever
necessary.
17. Keep interruptions during the test to a minimum

Page 55 of 129
Good Conditions of Examination Room
Uniform conditions refer to the need for all candidates to be accorded similar situations in the
examination room. All candidates must be treated equally with supplies, time, comforts and
instructions.
Lighting is one of the most important needs in an examination room. It should be adequate to all
in the room. The supervising teacher must satisfy her/himself to this before the examination
starts.
The teacher must also be satisfied that the seating arrangement is proper. Candidates should
preferably not sit at their own desks for others may have coded facts somewhere. The desks
should be completely empty. Candidates should not be able to read the scripts of their fellow
candidates. This is minimized by ensuring that a neighboring candidate sits directly in front,
indirect to the left and directly to the right. The entire desk arrangement should be in the form of
a square matrix.
Equally important is the issue of positioning examination candidates away from external noise,
and distracting human traffic.

Before candidates arrive, the supervising teacher should supply writing paper by placing one on
each desk. This saves time, particularly some, where the examination is a laboratory practical
requiring in addition, the supply of several pieces of equipment.

Instructions to Candidates before the Start of the Examination


The most important instruction is of course, the examination time-table. It should be given
earlier than a week before the start of the first examination paper. This has the all-important
benefit of assisting candidates on their revision efforts. In the case of laboratory and workshop
practical timetables also help the laboratory and workshop teacher in-charge to keep equipment
ready for each practical examination. Examination candidates should also be informed as to
what they can bring and not bring to examination rooms. It is necessary to demand that each
candidate brings his/her own set of basic requirements including pencil sharpernr, eraser and
ruler. Cheating at examinations is enhanced when candidates have frequently to share someone
else’s basic item. Sharing should, therefore not be allowed.

Page 56 of 129
Procedure at the start of an examination
Examination candidates should be allowed to enter and be seated at their desks at least fifteen
minutes before the starting signal is due. This will enable them to have time to sort out their
basic writing item, and to write their names and numbers on the answer sheets, which should
already be on the candidates’ desks. It also helps in the candidates’ psychological adjustment to
an examination situation.
Before the starting signal is given, the supervising teacher should announce any necessary
instructions. These include, length of the examination, what a candidate should do if he/she
requires additional writing paper, whether time remaining shall be announced, whether a
candidate can leave before the time ends, whether and where there will be a stapler to bind the
answer sheets and whether any papers shall be taken out of the examination room.

A most important announcement concerns any correction, which should be made in the question
papers themselves. It is extremely disorientating to have such corrections announced during the
course of the examination. In any case, some candidates already may have attempted the
particular item. It would disturb them to have to repeat the item in the light of the correction.
When satisfied that all is well, the starting signal should be given it is immediately after this that
the supervising teacher makes not of absent candidates, if any. This exercise helps in resolving
disputes that may arise later over lost scripts, for instance.

Procedure during Examination


During the course of the examination several ‘dos’ and ‘don’ts’ should be observed. A
supervising teacher should not stop to read what a candidate is writing. A supervising teacher
should not leave the examination room. He/she should not allow more than one candidate at a
time to leave the examination room to go answer calls of nature.
A supervising teacher should be prepared to take additional writing paper to where candidates
sit. The supervising teacher should make rounds in the examination room, but in such a way that
candidates are not disturbed by noise. The teacher should announce time left at half-time, 30
minutes and 10 minutes to the end. With about two minutes left, it is advisable to tell candidates
to be sure all answer sheets bear their names and numbers. When eventually the stop signal is

Page 57 of 129
given, candidates should stop writing. They should be given a few minutes to arrange their
papers in order, before the scripts are handed in.

Collecting examination scripts can present difficulties. A weak student can make sure he/she
marked present, but fail to hand in his script, thereby putting the teacher on the defensive and
causing embarrassment. A fool-proof method of avoiding this problem calls for the supervising
teacher to position him/herself at the exit, and collect scripts as candidates leave the examination
room. Thereafter, the bundle of scripts should be well secured and stored, pending marking.

How to Treat Emergency Cases


Emergencies can do arise, and supervising teachers are called upon to exercise calm decision
making. One issue concerns late arrivals. Lateness can be handled by the teacher on the basis of
the explanation a candidate gives. If it is trivial or far fetched, the teacher should reject it and
have the candidate to do his best to catch up with the others. If however, there had been a
situation which had blocked traffic movement, or evidence of one having had to go to see a
doctor, surely the teacher should exercise judgement and fair play and act accordingly. If he
issues test materials, proper notes of the circumstances should be kept. The candidate should
then be left behind when the rest of the group files out at expiry of their time.

A second example of emergency results from sickness or other emergency, which interrupts an
on-going test. If interruptions like this arise, the teacher should consider the likelihood of
consultation among the candidates during the period of interruptions.

Absence are not normal emergencies as described in the preceding paragraphs. It is the school
policy that should be the guiding factor. For, should another paper be prepared for the candidate
who turns up a day later? In the absence of a large item bank, is it feasible to set a test of the
same depth as the test already sat? Equally important should the teacher’s consideration over his
students be limitless? And shouldn’t candidates be expected to be proper managers of their
affairs as part of normal training? It boils down to letting the candidate face the consequences of
his absence.
Securing Test Scripts

Page 58 of 129
The embarrassment of the consequences of lost scripts presents a teacher with no mean
experience. The students face the danger of being awarded bogus marks, to say the least. If it is
an external annual examination, the candidate may have to waste a full year of his life. The fact
that test scripts should be locked safely away immediately. They should be closely bundled in
envelopes during any movement by the teacher or the marker. After marking, the same
precaution should prevail.
Marking Test Scripts
In zonal, regional and national examinations, it is necessary to use teams of markers. In order to
ensure consistency of marking, the markers. In order to ensure consistency of marking, the
markers are trained on a common set of scripts and then mark scripts as a group so that
unexpected responses can be discussed and added to the marking scheme. Some scripts are re-
marked by the team leader in order to make a check on consistency.

A good test can be spoiled by poor marking. Indeed, marking counts towards test objectivity,
and therefore everything must be done to ensure adherence to the points explained below. To
start with, a marking scheme should be made. In fact, the marking scheme should be made as
the test items are being constructed. Most important, marking schemes are necessary even if
there is only the subject teacher to do the marking.

Test papers should be marked as soon as possible after the test has been administered. This
ensures that issues are still fresh in the teacher’s mind and once marking has started, the teacher
should proceed uninterrupted to the end of a whole set of test items. This suggestion is
particularly required of essay items where a teacher needs to maintain consistency in interpreting
scorable points.

Before the actual marking starts, a sufficient conducive marking atmosphere needs to be
established. The marking atmosphere should be free of physical distractions such as noise and
human traffic. Equally, the teacher should be emotionally ready.

Page 59 of 129
It is good practice to first read through a sample of scripts. This gives the teacher a general feel
of the task ahead. It has sometimes been found necessary to alter a marking scheme as a result of
such sampling.

Notes should be made of all common mistakes observed in test scripts. But on actual scripts, the
errors should not be corrected. Instead, specific but consistent symbols should be used to
identify points of errors. In fact such symbols are used only where a mark has been earned or
failed to be earned. In essays, a candidate’s words do not have to match words of the teacher’s
marking scheme. Rather it is parity of the ideas a teacher should seek to see if they match his
own.

When marking a teacher’s personal image of the candidate should not be allowed to crowd the
teacher’s marking of the student’s scripts, nor should handwriting and flattery language
influence.

Recording of scores into a respective ledger should come last in the exercise. If possible, this
ledger should be stored separately from the scripts. This is for reasons of possible loss of one lot
or the other.

Awarding Grades
The awarding of letter grades A, B, C, D, E and F for achievement is a social-cultural practice of
a long time now, so are the implications of the respective grades, for example, A for success and
E (or F) for failure in a given endeavor.
In the past, intuition, experience and tradition have been the basis of awarding the particular
letter grade. Later, it became necessary to assign marks to assist in guiding towards the grade
awards. Attaining a certain number of marks would earn a particular grade, and so on. Later
still, closer scrutiny became necessary so as to apply more scientific analyses at least on
educationally sound basis to judge which candidate gets which grade.

Today, two alternative methods of grade awarding are in use.


1. Grading on the normal curve (a norm-referenced evaluation basis), and

Page 60 of 129
2. Grading on fixed pass marks (a criterion- referenced evaluation basis).
A third practice is called grading on a standard. It is preferred for certain courses only.

We shall examine the three approaches in turns.

Grading on the Normal Curve (Norm-Referenced Judgement)

Grading on the curve involves dividing test scores into five groups of different sizes in such a
way that the corresponding letter grades A, B, C, D and E (or F) lead to a frequency distribution
which approximates the shape of the normal curve. In such a distribution, two smallest groups,
A and E (or F), occur at the extreme ends. The next two small groups B and D occur on either
side of the largest group C. But what will be the percentage of each group?

It will be recalled that stanines 1 to 9 represent, respectively, the percentages 4,7, 12, 17, 17, 12,
7 and 4 of the total distribution, thus.

Stanine 1 2 3 4 5 6 7 8 9
Equivalent % 4 7 12 17 20 17 12 7 4
By a method of interpolation, the percentage 6, 12 and 17 can be broken up, to lead to the
suggested percentage weightings for the letter grades, as follows:

Grade E D C B A
Equivalent % 0 – 10 24 40 24 6

Strict adherence to the curve has its demerits, however for even if a student performs quite well,
he/she may still fail the test if his/her peers do better. Secondly, the grade E (or F) suggests that
the student acquired no basic skills. Thirdly, not many class groups are normally distributed on
aptitude, or in large enough numbers to justify statistical data that can lead to a near normal
distribution.

Page 61 of 129
Putting these cautionary points before us, we are led by reason, intuition and experience to adopt
a modified system of distribution the letter grades. It is more positive in attitude, and it provides
for flexibility, depending on subject classification and also on the decisions of a school or region.
It may be expressed as shown in the following table.

Grade E D C B A
Equivalent % 0 - 10 10 –20 30 - 50 20 - 50 10 - 20
range

As a practical exercise, let us adopt the percentages 5, 15, 45, 25 and 10 to be the respective
weights of the grades E (or F) and A. Let us also consider the following table of marks of 35
students from a biology test. The task is to identify which marks score the grade A, B, C, D and
E (or F) as the case may be.

Table 5.3: Biology marks for 35 students


66 66 72 85 66 74
64 68 64 77 63 83
74 55 73 65 49 59
63 56 65 41 53 57
86 68 47 53 45 65
71 75 57 31 61

The first step is to calculate how many of the 35 scores fit into each letter grade. After
manipulations at percentage approximations, we have the table.

Grade A B C D E Total
Equivalent % 10 25 45 15 5 100
No. of Scorers 3 9 16 5 2 35

The following problems are associated with the grading system based on the normal curve.
1. It assumes that we are dealing with large enough data to lead to a normal distribution.

Page 62 of 129
2. It assumes that the students in the test group are normally distributed on the
characteristic of aptitude.
3. It condemns a percentage of candidates to sure failure. Conversely, it implies that the
passing or failing of candidates does not depend on the strength of the percentage of the
teacher'’ performance but is the reflection of their relative position as against their peers.
4. Finally, grading on the curve assumes competence o the machinery of item construction,
prior analysis of items in the item bank, syllabus coverage, and the objectivity of the
marking itself.
Grading on Standard Scale

Grading on this criterion is otherwise called the system of grading on fixed pass marks.
Essentially, students are judged on the basis of mastery of the specific course contents. On this
basis, the student’s peers are not an issue. Indeed, a large number of students can legitimately
score high grades and vice versa.

The following table shows an example of the fixed mark level utilized in many institutions and
regions.

Grade Range of Marks


A 85 – 100 Distinction
B 65 – 84 Credit
C 50 – 64 Satisfactory
D 40 – 49 Pass
E 0 - 39 Fail

The grades A,B,C,D, and E are the scales of the standard which has been designed, hence the
description “standard scale” grading.

As should be expected, this system calls for strict adherence to the statement of instructional
objectives in behavioral terms. In turn, standards of achievement should be specific.

Page 63 of 129
Theoretically, no adjustments should be made to the grade distributions if the criteria of
instructional objectives, syllabus coverage, item difficulty and discrimination, test administration
and marking procedures were strictly followed.

Grading on a Standard (Criterion – referenced Mastery Judgement)

Another form of criterion – referenced grading is the one calling for a candidate to pass or,
otherwise fail the test. The candidate who passes is judged, ready and able to proceed to the next
level of learning. Many courses demand this mode of grading, such as practical on-the-road
driving test, surgical technique of removing the appendix, meeting Olympic qualifying height in
the high-jump event, the technique of defusing a time-bomb devise, and so on. The issue of
“average” performance, for instance, does not arise in these critical issues. The standard of
performance must be mastered.

The teacher is put in a demanding situation when he/she has to grade on a standard. He/she is
expected to thoroughly know his subject matter from the point of view of the facts, the principles
as well as thinner-relationships between the course at hand and he course that follows it.

Secondly, the teacher should have a record of past performances of his/her previous student
groups at that level and the performance of these student groups at the next level of study or
occupation. This calls for follow-up inquiry to see whether the standard he/she himself set was
indeed “up to standard.” Equally important, the teacher should consult other professionals to
assist him/her in confirming the standard he/she is setting. An outside opinion is always healthy.
But the outside opinion must best be assisted by what records the teacher has at hand as his/her
guide.

Grade Point Average


Some Universities and Colleges use a system called grade point average. In the grade average
system, raw scores are assigned letters grades and equated with numerical equivalents as shown
below:

Page 64 of 129
Letter Grade Numerical Equivalent
A 4
B 3
C 2
D 1
F 0

The systems use credit / semester hours and grade points.


Credit hour / semester hour refers to one hour of lecture / tutorial / practicals that is allocated to a
particular unit / per course per week per semester or term. A student who is registered for a
course that is allocated a three-hour lecture period per week will obtain three credit hours if
he/she successfully completes the course.

Grade point refers to the student’s score on a course multiplied by the number of credit hours
assigned to that particular course. For instance, statistics 1 is allocated 3 hours per week per
semester, and a student gets “D”, his/her grade point for the course will be 2 multiplied by 3
equals 6.

Grade point average (GPA) refers to the sum of the student’s grade points on various courses
units divided by the total number of the student’s semester – hours.

Example: A third year student obtained grades on four courses / units as follows:

Unit Letter Numeral Semester / Grade


Grade Equivalent Credit hour Points

Page 65 of 129
Teaching Practice A 4 6 24
Educational C 2 3 6
Assessment and
Education
Sociology of Education B 3 3 9
Comparative Education C 2 3 6
Total 10 15 45

Computation

Formula for computing grade point average (GPA) is:

GPA = ∑GP
∑SH

where GP = Grade Points


SH = Semester Hours

GPA = 45 =3
15

Cumulative grade point average

This is the student’s overall grade point at the end of his / program. This is the grade that
determines a student’s classification as either First Class, Second Class Upper, Second Class
Lower, or Pass.

The formula for computing the cumulative grade point average GPA is: -

GPA = ΣGP
ΣSH

Page 66 of 129
Where GP = Grade Points
SH = Semester Hours
Example
Below is a record of a student’s performance in a two year program.

First Year Second Year


1st Semester 2nd Semester 1st Semester 2nd Semester

GP 52 48 40 60

SH 18 22 18 20

GPA = 52+48+40+60
18+22+18+20

= 2.5
Note: For assessment to be meaningful, it must be reliable. For this reason, teachers
should ensure that tests are properly invigilated and marked.
Activities
1. Compute the grade point average of a university student with the following
scores. The semester credit hour is 3 hours for all the units.
Unit letter Grade Numeric eq.
Ed. assessment and evaluation A 4
History of education B 3
Philosophy of education A 4
Research methods A 4
Written Exercises
1. Identify the main steps in the administration of tests
2. Explain how emergency cases should be treated in an examination
Describe the procedure of marking scripts.

Page 67 of 129
LESSON FIVE

ASSESSMENT OF ATTITUDES AND VALUES


Introduction
This unit deals with the assessment of attitudes and values. It discusses the challenges and
methods of assessing attitudes and values.
Components of an Attitude
An attitude has three components: (i) the affective component, (ii) the cognitive components,
and (iii) the active component
The affective component of an attitude is revealed by the extent to which the student likes or
dislikes the attitude target. The cognitive component of the attitude is connected with students’
knowledge and assumptions. It is his belief regarding the nature of the attitude target and
relationship to other objectives. The active component of the attitude means the likelihood of a
person demonstrating positive and negative feelings to a target by means of actual concrete
actions. It also means willingness to act for or against an attitude target.

For example when assessing students’ attitudes towards mathematics, you should find out
whether they like or dislike mathematics. This is the affective component. You should also find
out their knowledge (belief regarding the nature of mathematics) and assumptions and
presumptions about mathematics and its relationship to other school subjects.
Finally, you should evaluate students’ preparedness to be involved in the processes of inquiry
and involvement in investigations. However, it should be noted that the components can only be
separated from each other theoretically.

Problems of Assessing Attitudes and Values


 Attitudes and values are not easy to measure. You can set out to teach students the
importance of conserving vegetation and at the end of the lesson you can easily find out
whether the students have understood the topic. On the other hand, you may try to change
students’ attitudes towards the environment by explaining to them that they should conserve

Page 68 of 129
vegetation. But at the end of the lesson, it is very difficult to find out whether students’
attitudes have really changed.
 Another problem is that of finding a sufficiently reliable criterion. Teachers find it difficult
to identify the things they should accept as evidence of the acquisition of desirable attitudes
and values.
 The other problem is the teachers’ inability to construct valid and reliable assessment
instruments.
Methods of Assessing Attitudes and Values
Research has shown that it is not correct to infer a person’s attitude towards something from his
knowledge of the object. For example, a student can give correct answers to all questions on
conservation of natural resources, but that does not mean that the student has a positive attitude
towards natural resources.

Before you start teaching attitudes, values and designing assessment techniques, you should
identify the attitudes and values you want your students to learn, e.g. co-operation, honesty and
responsibility. After you have identified the attitudes and values, you should carefully analyze
the concept and clearly specify indicators of the presence of the attitude or value. This will make
it possible for you to measure it. For example, if you want to find out whether a student is
developing the attitude of cooperation, the following are some of the indicators you should look
for:
 Willingness to share materials
 Liking for group work and
 Willingness to help other students solve problems.

Techniques of measuring attitudes and values


Explain the techniques a teacher can use to measure attitudes and values [10 marks]
Attitudes and values can be measured through the use of the following techniques.
1) attitude and value scales
2) rating scales
3) checklists
4) paper and pencil questionnaire

Page 69 of 129
5) oral interviews
6) observation
7) group discussion
Attitude and Value Scales
An attitude and value scale consists of a set of statements or questions such as “I think every
student should learn AIDS education”. The student, parent, teacher is asked to respond to such a
question or statement in terms of personal preferences or beliefs. Unlike tests, attitudes don not
have correct or wrong answers. Attitude scales assume that subjective attitudes of people can be
measured in quantitative techniques. The responses of individuals are assigned numerical scores.
Two most commonly used types of attitude scales are described below: Likert Scale and
Semantic Differential
Likert Scale:
This type of scale is normally used to assess students’ attitudes. Under this method, statements
which reflect both positive and negative attitudes towards an object are stated. Students are then
asked to indicated their level of agreement with each statement by marking one of the following
categories.
Strongly agree : 5
Agree : 4
Undecided : 3
Disagree : 2
Strongly disagree: 1
For example, a Likert Scale to assess students’ attitudes towards people with HIV would like
this:
Directions: Indicate whether you strongly agree (SA), agree (A), are undecided (U), Disagree
(D), or Strongly Disagree (SD).
S A U D SD
A
1. People with AIDS have themselves to blame.
2. Children with the HIV virus should not be allowed in
schools
3. I would not allow my children to play with HIV positive

Page 70 of 129
children
4. A spouse has a right to leave the infected partner

Weight of 1,2,3,4, and 5 are assigned. The direction of weighting is determined by the
favorableness or un-favourableness of the item. The attitude scale is scored by assigning weights
for response alternatives as follows: strongly disagree 1, agree 2, undecided 3, disagree 4,
strongly disagree 5. For positive items, the order is reversed.
Semantic Differential
The semantic differential is another method of assessing attitudes toward a target object. The
semantic differential scale measures how an individual judges a particular concept on a set of
semantic scales. The approach uses a six point scale anchored by adjective opposites. For
example, after you have taught about polygamy and wish to determine the attitude students have
developed towards polygamy, you may want to use the following scale;

Polygamy is:
Desirable 6 5 4 3 2 1 Undesirable

Fashionabl 6 5 4 3 2 1 Outdated
e
Acceptable 6 5 4 3 2 1 Unacceptabl
e

Guidelines on Developing Attitude Items


1. Use short simple statements
2. Avoid using factual statements.
3. Avoid using double negatives
4. Try to have as many negatively oriented statements as positively oriented ones.
5. Have at least 10 statements in the set
6. Avoid statements that are irrelevant to the object under consideration.
7. Avoid statements that are likely to be endorsed by almost no one.
8. Keep the language of the statements simple, clear and direct.

Page 71 of 129
Rating Scales
Rating scales are used to provide the means of obtaining quantified databases on observed
behavior or characteristics of individuals. They consist of a set of descriptive words or
statements. They are useful for behaviors that cannot be quantified by counting procedures. For
example mastery of the content being taught. The evaluator observes the lesson and later
quantifies these attributes using for example a scale of 1-5 where 5 is the highest and vice versa.

Questionnaires
Questionnaires are also used for determining interest and attitudes of people. Direct questions
are asked and the responses are normally presented in a yes – no format.
Note: Attitudes are very important and teachers should make attempts to develop the right
attitudes in their students.
Activities
1. Construct an attitude scale to assess changes in a selected value or attitude.
2. Construct an instrument showing value towards abortion

Written Exercises
1. What are the main problems of assessing attitudes?
2. Distinguish between a Likert scale and a Semantic Differential
3. Explain the three components of an attitude.

LESSON SIX

ASSESSMENT OF PRACTICAL SKILLS


Introduction
This unit discusses the importance of assessing skills and explains the methods used in assessing
these skills.
Specific Objectives
At the end of this unit you should be able to :

Page 72 of 129
(1) Describe the main features of a good assessment of practical skills.
(2) Explain the methods of assessing practical skills.
Importance of Assessing Skills
The development of useful skills is one of the main functions of both formal and informal
education. Many education systems place a lot of stress on the development of practical skills.
For example, in Kenya, practical skills have been implemented as follows:
(1) At the Primary School Level
Agriculture, Home Science, Art Education, Craft Education, Music, Business Education and
Science.
(2) At the Secondary School Level
Industrial Education, Agriculture, Home Science, Music, Art and Design, Business Education
and Science.
(3) At the Post School Level
Artisan Courses, Craft Courses, Diploma Courses and Higher Diploma Courses

The following are examples of skills that are developed at the primary school level: making
instruments, playing musical instruments, asking questions, observing, recording, collecting
information, collecting objects, drawing, communicating, writing, reading, reporting, discussing,
classifying.
Evaluation of the primary school curriculum in Kenya has shown that most teachers us paper and
pencil tests to assess practical skills. This is because of three main reasons.
1. Learning outcomes in the psychomotor domain are difficult to assess.
2. Since the Kenya National Examinations Council uses multiple choice items to assess
practical subjects, some teachers feel that this is the method of assessing practical skills.
3. Lack of equipment and materials for teaching practical skills subjects has left teachers with
no alternative but teach these subjects theoretically. Consequently, primary school teachers
spend very little time developing practical skills.
What to Consider when assessing skills
In assessing skills, the teacher must consider the following:
1. The objectives of the course.
2. Skill assessment should be continuous throughout the teaching-learning process.

Page 73 of 129
3. It should be economical in materials and time.
4. It should test important skills (be valid).
5. The marks gained by each student should be reliable.
6. Both the process and product should be assessed.
Unfortunately, may teachers assess practical skills only through paper and pencil tests. Although
such tests may provide useful information about the student’s knowledge concerning the skills
being tested, they do not clearly tell us whether the student can exhibit the skills being tested. It
is better to assess skills by assessing the performance of the students. Some ideas are given in
this section.
Methods of Assessment
There are several techniques that can be used to assess skills. Some useful assessment
techniques are:
1. Observation of students.
2. Observation and judging the quality of the product resulting from students’ projects.
3. Performance tests.
Observation of students
One way of obtaining information about the progress students are making in the acquisition of
skills is to observe students’ behavior as it occurs. The main advantage of this method is its
directness. The teacher does not have to ask students questions about the skills they have
acquired. What he/she does is watch them do and say things. This method is very useful at the
primary school level because most children have no vocabulary for describing skills and may not
be willing to express themselves verbally.

When using observation methods, three major considerations should be dealt with. The first and
most significant consideration concerns a decision with respect to what should be observed. For
example, an English language teacher may decide to determine students’ mastery of the English
language. Some of the indicators of this are:
i) Correct use of tenses,
ii) Correct pronunciation of words, and
iii) Appropriate use of vocabulary.

Page 74 of 129
The second major consideration concerns the timing and recording of observations. Since it is
not possible to make continuous observations, the teacher must make a decision about when to
observe students. One acceptable approach to this problem is to keep a brief anecdotal record
based on the skills you want children to learn. When a student displays the behavior, you should
make a tally in the appropriate row. However, only one tally should be made in the same row
during one lesson. Below is an example of a record card for teacher’s observation of standard
seven pupils.
School ……………………………………………… Name of Teacher
………………………
Name of Pupil ………………………………………
Age …………………………………………………
Class ………………………………………………
Number of lessons …………………………………

Specific Behavior Tally of separate instances Total


of behavior
Asks questions …………………………… ……………………………
…. ….
Conducts interviews …………………………… ……………………………
…. ….
Discusses …………………………… ……………………………
… ….
Draws …………………………… ……………………………
…. ….
Models …………………………… ……………………………
…. ….
Classifies materials …………………………… ……………………………
…. …
Interprets pictures …………………………… ……………………………
…. …
Makes measures …………………………… ……………………………

Page 75 of 129
…. …

Recording Instructions:
1. Make a tally in the appropriate row each time you observe a new display of the behavior.
Only one tally may be made in the same row during one lesson.
2. After 4 tallies in the same row put the fifth tally across thus, //// //// ////.
Checklists
Checklists are a way of improving other forms of assessment, especially observations. They are
not a method of assessment as such. They are aimed at providing evidence of the presence of
desired amounts of an attitude of determining the extent to which students are using certain skills
to solve problems. You should use one record form for each child. At the end of the course, you
will be able to give a fair and quick summary of the skills possessed by the student. Checklists
can be used to give advice to the pupil.
To make the method reliable, the following suggestions are made:
i) The teacher should break the trait or skill to be assessed into smaller more clearly defined
units, such as; appears to like group work, cooperates with other students, obey rules and
laws, etc.
ii) Students should be observed on several different occasions.
Example 2: A Checklist for the Topic Clothing & Textiles
SCHOOL …………………………………………..
AGE ………………………………………………..
NAME OF CHILD …………………………………
NAME OF TEACHER …………………………….
CLASS ……………………………………………..
TOPIC ……………………………………………...
SUBJECT …………………………………………..

Page 76 of 129
NOT DONE DONE DONE
INCORRECTLY CORRECTLY

You can watch the student preparing the dress and put ticks in the right column for each part
done correctly. At the end of the test, you should add the number of ticks in the “done correctly”
column and give the score for the student out of 10. The pass standard will depend on the
difficulty of the test.
Example 3: A Checklist for Assessing Practical Skills in Science
SCHOOL __________________________________
AGE ______________________________________
NAME OF CHILD ___________________________
NAME OF TEACHER ________________________
CLASS _____________________________________
TOPIC _____________________________________
SUBJECT ___________________________________

SPECIFIC YES NO
BEHAVIOR

Page 77 of 129
Skillful with hands …………………………… ……………………………
Likes practical work … …
Handles tools and …………………………… ……………………………
instruments competently … …
Collects materials
Conducts experiments
…………………………… ……………………………
…. ….
…………………………… ……………………………
…. ….

The advantages of the checklists are:


i) It makes the marking fairer. Different teachers watching a student perform clearly
defined tasks are likely to give the same score if they have a checklist.
ii) It is very useful for giving feedback to the student because the evidence is clear.

Observing and judging the Quality of Pupil Projects.


A project is an exercise from which time constraints have been largely removed. It can be
tackled either as an individual task or by a group. It usually involves a significant element of the
work being done out of school or at home.

In a number of courses, such as home science, agriculture, art and craft, and industrial education
students are asked to work on a project. This may involve the production of a dress, musical
instrument or table. Naturally, the students will be more motivated in their work if the projects
are assessed. However, assessment of projects is a difficult exercise because there are usually no
clear standards to follow. Some guidelines may help you:

a) Choose the projects carefully to involve the students in important skills.


b) Explain to the students what you want them to do and what standards they should aim
for.

Page 78 of 129
Demonstrate to the students each step of the project.
c) Your role should be that of tutor, continuous assessor and facilitator.
d) Care must be taken to ensure that work has actually been done by the students since it is
easy for a student to get help for a student to get help from various sources. For this
reason, only work actually under the supervision of the teacher should be accepted as a
measure of achievement.
e) Both the process and products should be assessed. In assessing pupils’ finished products,
teacher should consider the following:
(i) Craftsmanship: This includes the accuracy with which a student performs various
tasks. You should use physical measures and your own experience to judge the
quality of product. Questions that you should ask yourself in craftsmanship
include the following is the project neat? Is the project accurate?
(ii) Time: Students are normally given a specified time limit within which to produce
a finished product. It is therefore necessary for the teacher to find out the time it
took the student to complete the task.

One technique that has proved useful in assessing pupils’ final products is the use of a rating
scale.
Performance Tests
Performance tests are one of the most widely used forms of assessment of practical skills. They
are used primarily in practical skill subjects, such as art education, craft education, music, home
science and industrial education. Performance tests are administered to students to determine
extent to which they have mastered various elements of the skill being tested. This method of
assessment requires students to perform skills they have learned under conditions of the trade
concerned before one or more experts. In assessing practical skills using performance tests, the
following aspects are considered.

The quality of work: This is measured in terms of perfect-ness of the product, work finish or
appearance, accuracy and precision with which the student works.

Page 79 of 129
The degree of proficiency: This measures correct use of tools and equipment, time taken to
complete the work, ease and efficiency in handling tools and equipment and regard for safety
practices.
Procedure: This is the extent to which the student follows the detailed steps.
Assessment of Practical Skills in Science
Science subjects i.e. biology, physics and chemistry include objectives whose purpose is to
develop practical skills. Teachers are expected to periodically assess pupils’ practical work in
science. Assessment should be aimed at supporting the learning and teaching process. Feedback
from assessments should enable teachers to improve their teaching techniques and plan remedial
action for individual pupils.
The following practical skills or abilities are usually assessed in science subjects.
1. Manipulative skills – These include the ability to:
 Assemble apparatus.
 Handle chemicals and instruments.
 Use apparatus.
2. Following instructions during practical work.
 Their understanding of procedure.
 Their ability to complete an investigation in accordance with laid down procedure.
3. Observation, identification, recording and interpretation.
 Their ability to recognize, identify and interpret scientific material.
 Their ability to make accurate recordings of data.
4. Analysis and interpretation of data: Pupils’ ability to analyze data using qualitative and
quantitative methods and their ability to interpret the data are assessed.
5. Presentation of the results: pupils are assessed in terms of their ability to write a report on
the basis of the data collected.
6. The design of the investigation: The pupils should show the ability to identify a problem,
formulate hypotheses, work out a design, test hypotheses and make generalizations.
You should prepare a checklist on the a basis of the six categories listed above and use it to
assess the work of each pupil, or if pupils are working in groups, each group.
Note: Although written tests are widely used to assess skills, paper and pencil test are not valid
for the measurement of skills.

Page 80 of 129
Activities
1. Prepare a checklist for assess five skills in Science.
2. Construct an observation schedule to assess five skills in science.
Written Exercises
1. Identify four things you should consider in assessing skills.
2. Briefly describe two methods of assessing practical skills.
Explain six categories of abilities that are assessed in science subjects.

LESSON SEVEN
PREPARATION OF MARKING SCHEME
Introduction
A marking scheme is a set of criteria used in assessing student learning.
Why prepare a marking scheme?
Preparing a marking scheme ahead of time will allow you to review your questions, to verify that
they are really testing the material you want to test, and to think about possible alternative
answers that might come up. This section discusses guidelines for preparing marking schemes as
well as moderation of tests
Objectives
At the end of this topic, you should be able to:
• Prepare marking schemes for different kinds of tests
• Explain the meaning and purpose of moderation
• Discuss how to moderate examinations/tests
Learning Activities
Learning Activity 1.1 Reading
Read the provided topic notes on marking preparation for different kinds of tests.
Learning Activity 1.2 Discussion
Take part in the group discussion on the preparation of marking schemes for tests and the
purpose of test moderation, the importance of preparation of a marking scheme on time.
Learning Activity 1.3 Review

Page 81 of 129
Read and comment on two of the posts in the discussion forum
Assessment
The activity in 1.2 and participation in the discussion in activity 1.3 will be graded.
Topic Resources
Lecturer notes
Topic Seven Notes
Guidelines When Making Marking Schemes
• Look at what others have done. Chances are that you are not the only person who
teaches this course. Look at how others choose to assign grades.
• Make a marking scheme usable by non-experts. Write a model answer and use this as
the basis for a marking scheme usable by non-experts. This ensures that your TAs and
your students can easily understand your marking scheme. It also allows you to have
an external examiner mark the response, if need be.
• Give consequential marks. Generally, marking schemes should not penalize the same
error repeatedly. If an error is made early but carried through the answer, you should
only penalize it once if the rest of the response is sound.
• Review the marking scheme after the exam. Once the exam has been written, read a
few answers and review your key. You may sometimes find that students have
interpreted your question in a way that is different from what you had intended.
Students may come up with excellent answers that may be slightly outside of what
was asked. Consider giving these students partial marks.
• When marking, make notes on exams. These notes should make it clear why you gave
a particular mark. If exams are returned to the students, your notes will help them
understand their mistakes and correct them. They will also help you should students
want to review their exam long after it has been given, or if they appeal their grade.
Sample Marking Scheme for Presentations
This is an example of a marking scheme for a presentation assignment
• Presentation (40%)
1. verbal & non-verbal communication (10%) - avoid note reading, eye contact, enthusiasm,
body language, volume of voice, clarity of language
2. Visual aids (10%) - minimal text, appealing layout, distraction-free, large font size
3. Structure (10%) - clarity of goals, organization, logical progression, good flow
4. Discussion questions (10%) - ability to guide discussion: ability to ask and answer
questions, maintain order

Page 82 of 129
• Content (60%)
1. Introduction (10%) - introduce background material
2. Thesis statement (5%) - clear, focused
3. Main body (35%) - depth, synthesis of references, accuracy, summary, figures or tables
4. Discussion questions (10%) - thought-provoking, sufficient quantity

Scoring Essay Items


Although essay questions are powerful assessment tools, they can be difficult to score. With
essays, there isn't a single, correct answer and it is almost impossible to use an automatic
scantron or computer-based system. In order to minimize the subjectivity and bias that may
occur in the assessment, teachers should prepare a list of criteria prior to scoring the essays.
Consider, for example, the following question and scoring criteria:
Sample Question (history/Social Studies)
Consider the time period during the Vietnam War and the reasons there were riots in cities and at
university campuses. Write an essay explaining three of those reasons. Include information on
the impact (if any) of the riots. The essay should be approximately one page in length. Your
score will depend on the accuracy of your reasons, the organization of your essay, and
brevity. Although spelling, punctuation, and grammar will not be considered in grading, please
do your best to consider them in your writing. (10 points possible)
Scoring Criteria (for the teacher)
• Content Accuracy -- up to 2 points for each accurate reason the riots ensued (6 points
total)
• Organization -- up to 3 points for essay organization (e.g., introduction, well
expressed points, conclusion)
• Brevity -- up to 1 point for appropriate brevity (i.e., no extraneous or "filler"
information) No penalty for spelling, punctuation, or grammatical errors.
By outlining the criteria for assessment, the students know precisely how they will be assessed
and where they should concentrate their efforts. In addition, the instructor can provide feedback
that is less biased and more consistent. Additional techniques for scoring constructed response
items include:
• Do not look at the student's name when you grade the essay.
• Outline an exemplary response before reviewing student responses.

Page 83 of 129
• Scan through the responses and look for major discrepancies in the answers -- this
might indicate that the question was not clear.
• If there are multiple questions, score Question #1 for all students, then Question #2,
etc. Use a scoring rubric that provides specific areas of feedback for the students

What is Moderation?
Moderation is a set of processes designed and implemented by the assessors/evaluators to:
• Provide system-wide comparability of grades and scores derived from internal-based
assessment
• Form the basis for valid and reliable assessment in schools
• Maintain the quality of assessment and the credibility, validity and acceptability of
certificates.
• Qualitative moderation - Unit Grades from student assessment are moderated by peer
review against system criteria;
• Statistical moderation - Scores from student assessment within courses are placed on
the same scale
• Moderation is necessary for producing valid, credible and publicly acceptable
certificates in an assessment system.
• Moderation provides for comparability of standards across the classes and schools.
How to Moderate…
Let’s say our exam has a mean of 70 and a standard deviation of 10. The students have done
fairly well here. If I want to compare the scores in this exam with another exam with a mean of
50 and standard deviation of 20, it’s possible to scale that in a very simple way. We reduce the
mean from the marks. We divide by the standard deviation. Then multiply by the new standard
deviation. And add back the new mean.
If the first column has the marks in a school internal exam, and the second in a public exam, we
can scale the internal scores to be in line with the public exam scores for them to be comparable.
The internal exam has a higher average, which means that it was easier, and a lower spread,
which means that most of the students answered similarly. When scaling it to the public exam,
students who performed well in the internal exam would continue to perform well after scaling.
But students with an average performance would have their scores pulled down.
This is because the internal exam is an easy one, and in order to make it comparable, we’re
stretching their marks to the same range. As a result, the good performers would continue getting
a top score. But poor performers who’ve gotten a better score than they would have in a public
exam lose out.

Page 84 of 129
TEST ITEM ANALYSIS
Introduction
After you create your objective assessment items and give your test, how can you be sure that the
items are appropriate -- not too difficult and not too easy? How will you know if the test
effectively differentiates between students who do well on the overall test and those who do not?
An item analysis is a valuable, yet relatively easy, procedure that teachers can use to answer both
of these questions.
Objectives
At the end of this topic, you should be able to:
1. Discuss how to determine the Difficulty index of a test
2. Discuss how to determine the Discrimination index of a test
Learning Activities
Learning Activity 1.1 Reading
Read the provided topic notes on how to determine Difficulty Index and Discrimination Index.
Learning Activity 1.2 Discussion
Participate in the discussion on the determination of the difficulty and discrimination index.
Learning Activity 1.3 Review
Read and comment on two of the posts in the discussion forum
Assessment
The activity 1.2 and participation in the discussion in activity 1.3 will be graded
Topic Resources
• Lecture notes
• Internet
• Exercises

Topic Eight Notes


Item Analysis
Involves test discrimination index and test difficulty index

Page 85 of 129
1. Difficulty Index
To determine the difficulty level of test items, a measure called the Difficulty Index is used. This
measure asks teachers to calculate the proportion of students who answered the test item
accurately. By looking at each alternative (for multiple choice), we can also find out if there are
answer choices that should be replaced. For example, let's say you gave a multiple-choice quiz
and there were four answer choices (A, B, C, and D). The following table illustrates how many
students selected each answer choice for Question #1 and #2.
Question A B C D
#1 0 3 24* 3
#2 12* 13 3 2

* Denotes correct answer.


For Question #1, we can see that A was not a very good distractor -- no one selected that answer.
We can also compute the difficulty of the item by dividing the number of students who choose
the correct answer (24) by the number of total students (30). Using this formula, the difficulty of
Question #1 (referred to as p) is equal to 24/30 or .80. A rough "rule-of-thumb" is that if the item
difficulty is more than .75, it is an easy item; if the difficulty is below .25, it is a difficult item.
Given these parameters, this item could be regarded moderately easy -- lots (80%) of students
got it correct. In contrast, Question #2 is much more difficult (12/30 = .40). In fact, on Question
#2, more students selected an incorrect answer (B) than selected the correct answer (A). This
item should be carefully analyzed to ensure that B is an appropriate distractor.
2. Discrimination Index
Another measure, the Discrimination Index, refers to how well an assessment differentiates
between high and low scorers. In other words, you should be able to expect that the high-
performing students would select the correct answer for each question more often than the low-
performing students. If this is true, then the assessment is said to have a positive discrimination
index (between 0 and 1) -- indicating that students who received a high total score chose the
correct answer for a specific item more often than the students who had a lower overall score. If,
however, you find that more of the low-performing students got a specific item correct, then the
item has a negative discrimination index (between -1 and 0). Let's look at an example.
Table 2 displays the results of ten questions on a quiz. Note that the students are arranged with
the top overall scorers at the top of the table.

Page 86 of 129
Student Total Questions

Score (%) 1 2 3
Asif 90 1 0 1
Sam 90 1 0 1
Jill 80 0 0 1

Charlie 80 1 0 1
Sonya 70 1 0 1
Ruben 60 1 0 0
Clay 60 1 0 1
Kelley 50 1 1 0
Justin 50 1 1 0
Tonya 40 0 1 0

"1" indicates the answer was correct; "0" indicates it was incorrect.
Follow these steps to determine the Difficulty Index and the Discrimination Index.
 After the students are arranged with the highest overall scores at the top, count the
number of students in the upper and lower group who got each item correct. For Question #1,
there were 4 students in the top half who got it correct, and 4 students in the bottom half.
 Determine the Difficulty Index by dividing the number who got it correct by the total
number of students. For Question #1, this would be 8/10 or p=.80.
 Determine the Discrimination Index by subtracting the number of students in the lower
group who got the item correct from the number of students in the upper group who got the item
correct. Then, divide by the number of students in each group (in this case, there are five in each
group). For Question #1, that means you would subtract 4 from 4, and divide by 5, which results
in a Discrimination Index of 0.
 The answers for Questions 1-3 are provided in Table 2.

Page 87 of 129
Item # Correct (Upper # Correct (Lower Difficulty
Discrimination group) group) (p) (D)

Question 4 4 .80 0
1
Question 0 3 .30 -0.6
2
Question 5 1 .60 0.8
3
Now that we have the table filled in, what does it mean? We can see that Question #2 had a
Difficulty index of .30 (meaning it was quite difficult), and it also had a negative discrimination
Index of -0.6 (meaning that the low-performing students were more likely to get this item
correct). This question should be carefully analyzed, and probably deleted or changed. Our
"best" overall question is Question 3, which had a moderate difficulty level (.60), and
discriminated extremely well (0.8).
Another consideration for an item analysis is the cognitive level that is being assessed. For
example, you might categorize the questions based on Bloom's taxonomy (perhaps grouping
questions that address Level I and those that address Level II). In this manner, you would be able
to determine if the difficulty index and discrimination index of those groups of questions are
appropriate. For example, you might note that the majority of the questions that demand higher
levels of thinking skills are too difficult or do not discriminate well. You could then concentrate
on improving those questions and focus your instructional strategies on higher-level skills.

Page 88 of 129
LESSON EIGHT
RESULTS ANALYSIS AND PRESENTATION
Introduction to Statistics
Introduction
A Statistic is a mathematical value that summarizes a characteristic of a sample. A statistic is a
summarizing measure that is calculated from a sample data.
Statistics is the mathematical science concerned with gathering, processing, and analyzing
numerical data in any field. Statistics can also be defined as the study of methods of handling
quantitative information including techniques for organizing and summarizing as well as for
making generalizations and inferences from data.
In general, statistical methods can be grouped into two broad classes
 Descriptive statistics and
 Inferential statistics
Objectives
By the end of the topic, you should be able to:
1) Definition of terms
2) Differentiate between descriptive and inferential statistics
3) Discuss Frequency distribution
4) Explain the presentation of raw scores and drawing of distribution curves
Learning Activities
Learning Activity1.1 Reading
Read the provided topic notes on statistics, types of data, you have also been given access to
lecturer notes.
Learning Activity 1.2: Discussion
Statistics is divided into two broad categories. Participate in discussion of the different two types.
Learning Activity 1.3 Review
Read and comment on two of the posts in the discussion forum
Assessment
The journal in activity 1.2 and participation in the discussion in activity 1.3 will be graded.
Topic Resources
Lecturer notes.

Page 89 of 129
Topic Nine Notes
Types of statistics
1. Descriptive Statistics
Descriptive statistics refers to procedures for organizing, summarizing and describing
quantitative information or data.
The aim of descriptive statistics is the reduction of the quantitative data to a form that can be
readily comprehended.
Examples – Descriptive Statistics
 Mean
 Standard deviation, and
 Correlation coefficient
In descriptive statistics, no attempt is made to generalize beyond the data at hand. Descriptive
statistics include measures of central tendency, dispersion, and correlation. Descriptive statistics
deal with describing and examining associations of variables within given data sets.

2 Inferential Statistics
 Inferential Statistics is concerned with the methods by which inferences are made to a
larger group on the basis of observation made on a smaller subgroup. In other words, it is
concerned with the process of generalization from the part to the whole (i.e., from the
sample to the population).
 In inferential statistics, conditional inferences are made about populations from statistics
of samples by means of logic and probability theory. E.g., Is there a statistically
significant difference between form four boys and girls in mathematics achievement?
 Suppose after administering a sound mathematics test for some selected form four boys
and girls, the mean for girls was 52% and the mean for boys was 60%. Is this difference a
result of chance differences or is it of statistical significance? Such a question is answered
by inferential statistics.
 Generally, the aim of inferential statistics is that of generalizing beyond the data at hand.
 Inferential statistics help us to draw conclusions beyond our immediate samples and data.
For example, inferential statistics could be used to infer, from a relatively small sample of
employees, what the job satisfaction is likely to be for a company’s entire work force.
 In other words, inferential statistics help us to draw general conclusions about the
population on the basis of the findings identified in a sample.
 The most widely used inferential statistical procedures; include the t-test, analysis of
variance (ANOVA), chi-square, and regression.

Scales of Measurement

Page 90 of 129
A measurement scale can have three attributes
 Magnitude
 Equal Intervals
 Absolute zero point
When a scale has magnitude, one instance of the attribute can be judged greater than, less than,
or equal to another instance of the attribute. E.g., If person A is 160cm tall and person B is
162cm tall, this scale of measurement reflects the difference in height between person A and
person B. Person B is taller than person A.
Equal interval denotes that the magnitude of the attribute represented by a unit of measurement
on the scale is equal regardless of where on the scale the unit falls. E.g., The difference in height
between someone measuring 61 inches versus someone measuring 60 inches is the same
magnitude as the difference in height that exists between someone measuring 75 inches versus
someone measuring 74 inches. In short, an inch reflects a certain amount of height regardless of
where that inch falls on the scale.
An absolute zero point is a value that indicates that nothing at all of the attributes being
measured exists. E.g., “0 inches” of height is a scale value that implies no height whatsoever –
absolute zero.
Levels/Scales of Measurement
 Nominal Scale
 Ordinal Scale
 Interval Scale
 Ratio Scale
Nominal Scale
A nominal scale refers to the classification of items into discrete groups which do not bear any
magnitude relationships to one another.
 A nominal scale is the most limited (or least powerful) types of measurement scale. It
indicates no order or distance relationship.
 A nominal scale simply describes differences between things by assigning them to
categories.
 Nominal scale is simply a system of assigning number symbols to events in order to label
them. E.g. The assignment of numbers to basketball players is almost purely arbitrary
since one number would do as well as another.
 The numbers have no quantitative value. One cannot do much with the numbers
involved. For example, one cannot usefully average the numbers on the back of a group
of football players and come up with a meaningful value.
 Nominal data are thus, counted data.
Distinguishing Characteristics of Nominal Measurement Scales and Data
• Used only to qualitatively classify or categorize not to quantify.

Page 91 of 129
• No absolute zero point.
• Cannot be ordered in a quantitative sequence.
• Impossible to use to conduct standard mathematical operations.
• Examples include gender, religious and political affiliation, and marital status.
. Purely descriptive and cannot be manipulated mathematically.
Ordinal Scale
 An ordinal scale reflects only magnitude and does not possess the attributes of equal
intervals or an absolute zero point.
 It is the lowest level of the ordered scale.
 The ordinal scale places events in order on some continuum; It can be said that one class
is higher than another.
 There is no attempt to make the intervals of the scale equal in terms of some rule.
 Rank orders represent ordinal scales.
 Ordinal measures have no absolute values, and the real differences between adjacent
ranks may not be equal.
 All that can be said is that one person is higher or lower on the scale than another but
more precise comparisons cannot be made. One has to be very careful in making
statements about scores based on ordinal scales. E.g., If student’s A’s position in her class
is 5 and students B’s position is 20, it cannot be said that A’s position is four times as
good as that of B. The statement would make no sense at all.
 Thus, the use an ordinal scale implies a statement of “greater than” or “less than” (an
equality statement is also acceptable) without our being able to state how much greater or
less.
 The real difference between ranks 1 and 2 may be more than or less than the difference
between ranks 5 and 6.
 Since the numbers of this scale have a rank meaning, the appropriate measure of central
tendency is the “median.”

Distinguishing Characteristics of Ordinal Measurement Scales and Data


• Build on nominal measurement.
• Categorize a variable and its relative magnitude in relation to other variables.
• Represent an ordering of variables with some number representing more than another.
• Information about relative position but not the interval between the ranks or categories.
• Qualitative in nature.
• Example would be finishing position of runners in a race.
• Lack the mathematical properties necessary for sophisticated statistical analyses.

Page 92 of 129
Interval Scale
 An interval scale possesses the attributes of magnitude and equal intervals but not an
absolute zero point.
 Interval scales can have an arbitrary zero, but it is not possible to determine for them
what may be called an absolute zero or the unique origin.
 The primary limitation of the interval scale is the lack of a true zero; it does not have the
capacity to measure the complete absence of a trait or characteristic.
 The Fahrenheit scale is the most common example of an interval scale.
 One can say that an increase in temperature from 300 to 400 involves the same increase
in temperature as an increase from 600 to 700.
 But one cannot say that the temperature of 600 is twice as warm as the temperature of
300 because both numbers are dependent on the fact that the zero on the scale is set
arbitrarily at the temperature of the freezing point of water.
 The ratio of the two temperatures, 300 and 600 means nothing because zero is an
arbitrary point.
 Ratio statements cannot be made without an absolute zero point.
 Interval scales provide more powerful measurement than ordinal scales for interval scale
also incorporates the concept of equality of intervals.
 Mean is the appropriate measure of central tendency while standard deviation is the most
widely used measure of dispersion.

Distinguishing Characteristics of Interval Measurement Scales and Data


• Quantitative in nature.
• Build on ordinal measurement.
• Provide information about both order and distance between values of variables.
• Numbers scaled at equal distances.
• No absolute zero point; zero point is arbitrary.
• Addition and subtraction are possible.
• Examples include temperature measured in Fahrenheit and Celsius.
• Lack of an absolute zero point makes division and multiplication impossible.

Ratio Scale
 Any scale of measurement possessing magnitude, equal intervals, and an absolute zero
point is called a ratio scale.

Page 93 of 129
 This scale is termed “ratio” because the collection of properties that it possesses allows
ratio statements to be made about the attribute being measured. E.g., If a man is 28 years
old and an adolescent is 14 years old, it is correct to infer that the man is twice as old as
the adolescent.
 Such ratio statements may be made only if the scale possesses all three characteristics.
 Ratio scales have meaningful, absolute zero points; zero actually means exactly nothing
of the quantity being measured. E.g., The zero point on a centimeter scale indicates the
complete absence of length or height. But an absolute zero of temperature is theoretically
an obtainable and it remains a concept existing only in the scientist’s mind.
 The number of incorrect letters in a page of type script represents a score on a ratio scale.
The scale has an absolute zero.
 The ratio involved does have significance and facilitates a kind of comparison which is
not possible in case of an interval scale.
 Measures of physical dimensions such as weight, height, distance, etc. are examples of
ratio scale.
Distinguishing Characteristics of Ratio Measurement Scales and Data
• Identical to the interval scale, except that they have an absolute zero point.
• Unlike with interval scale data, all mathematical operations are
• Examples include height, weight, and time.
• Highest level of measurement.
• Allow for the use of sophisticated statistical techniques.
Proceeding from the nominal scale (the least precise type of scale) to ratio scale (the most
precise), relevant information is obtained increasingly. Generally speaking, the levels are
hierarchical.

LESSON NINE
MEASURES OF CENTRAL TENDENCY
Mean

The mean is the most widely used and preferred measure of central tendency. The mean is
defined as the sum of all data scores divided by the number of scores in the distribution. Mean is
arithmetic average of scores.

Mean X = (X1+X2+X3+……. +XN)

Page 94 of 129
N

Where X1, X2, X3 ……… are variables

Example:

 Scores = 5, 2, 4, 3, 8, 6.
 Mean = (5+2+4+3+8+6)/6 = 4.67

For grouped data the variables are the midpoints of the class intervals. Given variables x, and
frequency f, therefore;

Mean, x = fx

f

The mean is the best known and the most reliable measure of central tendency. Hence it is
often preferred to both the median and the mode. Every score in the distribution is considered in
its computation. This is not done when calculating the mode and the median.
Thus mean is simply found by adding all the scores in a distribution and dividing by the total
number of scores (N). It is denoted by X̄ pronounced X bar.
The formula is:
n
∑ Xi
i=1
X̄ = N
Where X̄ = mean
X i is the raw score for each individual i.e. the ith person’s score
N is the number of scores.
n

i=1
is the summation sign indicating we are summing from the first score to
th
the n score i.e. all the X-scores in the distribution are added.

Example

Find the mean of 3, 3, 4, 5, 6, 6, 8, 9 and10. The sum of these is 3 + 3 + 4 + 5 + 6 + 6 + 8


+ 9 + 10 = 54
N=9

Page 95 of 129
Therefore, the mean is 54/9 = 6.

For grouped frequency distribution, the mean is obtained as follows:


x
Each class-mark or mid-point ( i ) is multiplied by its corresponding frequency (
f i ). The
products are then summed and divided by total frequency to give the mean.
This can be summarized as follows:
n
∑ f i xi n
i=1
n ∑ f i xi

∑ fi i=1
N
= i=1 =
x
Where i refers to the class-marks or mid-points and i to the
f
corresponding frequencies. The calculation of the mean of grouped
frequency distribution is illustrated below:

Class interval
Frequency (
f i) Class-mark (
xi f i xi
)
65-69 3 67 201
60-64 4 62 248
55-59 8 57 456
50-54 10 52 520
45-49 9 47 423
40-44 3 42 126
35-39 4 37 148
30-34 1 32 32
n n
∑ fi ∑ f i xi
i=1 =42 i=1 = 2154

m
n
∑ f i xi
i=1
n
2154
X̄ =
∑ fi
i=1 =42 = 51.3

Properties of the Mean

1. One important property of the mean is that, it is the point in a distribution of scores such
that the summed deviations of scores from it (the mean) are equal to zero. What do
we mean by deviation? Deviation is the difference between a score and the mean,
X i − X̄ , and it can be either positive or negative. In any distribution the sum of
deviations about the mean is always equal to zero.

Page 96 of 129
N
∑ ( X i −μ)
i.e. i=1 = 0 where μ is the population mean for X-scores and population
has
N the subjects
n
∑ ( X i − X̄ )
i=1 = 0 where X̄ is the sample mean and sample is of size n.
For illustration, let us consider the following scores. Suppose our scores are 3, 3, 4, 5, 6, 6,
8, 9 and 10 (note this can be considered as a population or a sample without any change of
the results). The mean will be 6 and the deviation scores will be 3-6, 3-6, 4-6, 5-6, 6-6, 6-6,
8-6, 9-6 and 10-6, in general i
X − X̄
. These deviations will be respectively -3, -3, -2, -1, 0,
0, 2, 3 and 4 (note their sum is zero). Thus, the mean may be considered as the exact
balance point in a distribution.

2. If we add a constant, say C to every score in the distribution, the resulting scores
X̄ X̄
will have a mean X +C equal to the original mean X plus the constant C. If
we subtract a constant instead, the resulting scores will be a mean equal to
original mean minus the constant. Note that subtracting a constant C is the same
as adding –C. Hence the first formula is adequate or it includes even the second
formula.
i.e.
X̄ X +C = X̄ X + C
X̄ X −C = X̄ X - C

Let us illustrate this with data below, of which the mean is 5, but let us add 3 to every score
and calculate the new mean:

Xi Xi+ C
3 3+3 = 6
4 4+3 = 7
5 5+3 = 8
8 8+3 = 11
n n
∑ Xi ∑ X i +C
i=1 = 20 i=1 = 32
X̄ X =20/4 = 5 n
∑ ( X i +3)
X̄ i=1
Thus X = 5 X̄ X +C = 4 = 32/4 = 8
X̄ X +C = X̄ X + 3 or X̄ X + C

Page 97 of 129
3. If each score in a distribution of scores is multiplied by a constant C, then the mean of

these scores will be original mean multiplied by the constant i.e. C X . Let us illustrate
this property with the data that was used above. Let all the scores be multiplied by 2 (see
the table below).

Xi CX i
3 3¿ 2 = 6
4 4¿ 2 = 8
5 5¿ 2 = 10
8 8¿ 2 = 16
n n
∑ Xi ∑ CX i
i=1 = 20 i=1 = 40
X̄ X =20/4 = 5 n
∑ CXi
X̄ i=1
Thus X = 5 X̄ CX = 4 = 40/4 = 10
X̄ CX = 2 X̄ X or C X̄ X .

Note that the division is reciprocal of multiplication. Hence if we divide each score by
a constant C, the mean of the resulting scores is original mean
X̄ X divide by constant, C
1
× X̄ X
(or C ).

Advantages of mean

The mean is a popular Statistical Measure because:

 It is simply arithmetic average.


 It reflects the inclusion of every item in the data set or distribution or observation.
 It is easily used with other statistical measurements and further analysis of data.
 It is easy to calculate and simple to follow
 It is finite and not indefinite.

Page 98 of 129
 It is least affected by fluctuations of sampling.
Disadvantages of mean

 The arithmetic mean is highly affected by extreme values.


 It is not an appropriate average for skewed distributions.
 It cannot be computed accurately if any item or entry is missing.
Differences in measures of central tendency

Records taken on a highway speeding motorists were as follows in kilometers per hour;
96, 96, 97, 99, 100, 101, 102, 104, 155.

From these scores the measures of central tendency are:

Mode = 96km/hr.

Median = 100km/hr.

Mean = 105km/hr.

Median is the best representation of these score. Mode is low and mean is higher than all scores
except one (155). The mean pulled up towards 155 and median ignored it. Mean can be used by
management to settle disputes of salaries and mode can be used by the unionists because
majority of labour force earns lower salaries. There are other examples which can be sighted on
how the mean, median and mode can be used in favour of one another

Exercise

1. Given the following score; 3, 4, 4, 5, 6, 6, 6, 8, 8, 9, which is the appropriate measure of


central tendency suitable for the set of score and why?
2. The ages of people on a holiday four are given by the following frequency distribution table.
Age in years 12 13 14 15 over 16

Frequency 6 8 7 6 8

Page 99 of 129
It is shown that some of the people were adult guides. Select a statistical method to describe
measurement of central tendency or average age. Explain why?

3. The monthly wages and salaries of a firm are grouped in the following.
Wages and salary

6000 – under 8000 50

8000 – under 10,000 35

10,000 – under 20,000 10

20,000 – under 30,000 4

80,000 – under 100,000 1

a) Find an estimate of the mean


b) Calculate the media of the distribution.
Summary

The following statements summarize the major points in this unit:


1. The mean, median and mode are measures of central tendency. They give an idea of the
average or typical score in a distribution.
2. The mode is the most frequent score value in a distribution. It is the score with highest
frequency in a distribution.
3. The median is a point in a distribution of scores such that 50 percent of scores are located
above it and the other.
4. The mean is found by adding all the scores in a distribution and dividing by the total
number of scores.
5. One important property of the mean is that it is the point in a distribution of scores such
that the summed deviations of the scores from it (the mean) are zero.
6. The sum of squared deviations from the mean is less than the sum of squared deviations
from any other point.
7. The mean is generally preferred by statisticians as the measure of central tendency, but
the median is quite often easier to compute, and therefore preferred by classroom teachers
in considerable number of cases.
8. For distribution that are fairly normal, it matters little which measure of central tendency
is used.

Page 100 of 129


LESSON TEN

MEASURES OF VARIABILITY

The measures or indexes considered here are range, quartile deviation, mean deviation, variance
and standard deviation. Range is the simplest measure of variability (or dispersion). Standard
deviation is the most reliable measure of variability and standard deviation is the square root
of variance; that is, variance is standard deviation squared.

Expected learning outcome

At the end of the lesson, you should be able to:

1. Compute range, quartile deviation, mean deviation, variance and standard deviation for
grouped and ungrouped data using computational and definitional formulae.
2. Give the properties of variance and standard deviation (s.d.) (e.g. when a constant is
added to all the scores of the distribution)
3. Compute variance and s.d. using assumed mean method.
4. Interpret computed s.d.

Measures of variability
A variable can be defined as a trait, which can take on a range of values. When we talk about
variation, we refer to the arrangement or spread of values that the variable takes in the
distribution. Measures of variation give us some information about the difference between
scores. While measures of central tendency give information about typical score in a
distribution, measures of variability provide information about the differences in spread
between scores in the distribution.

Page 101 of 129


When we try to describe a distribution, giving the measure of central tendency is not enough.
We need to know something about variability or spread of the scores in a distribution. In fact,
distributions may have the same mean, yet differ in the extent of variation of the scores around
that measure of central tendency.

For example:
The spread of the variable makes a different on how you can compare one set to another. Two
sets can have equal mean and median yet there very different.

For example;

SET A: 59 59 59 60 61 61 61

SET B: 30 40 50 60 70 80 90

The mean = median = 60 yet Set A is different from Set B. Set A is close together yet, Set B is
more spread out. The dispersion of the two sets is different hence difference in variability. The
measures of variability or spread of variables give part of solution on how the set of data is
different.

What we are saying is that to describe a distribution, a measure of central tendency is not enough
(sufficient), even if you give the most reliable of them. May be the mean is necessary but not
sufficient. To describe any distribution of variables (e.g. scores) three elements are
important considerations:
(i) Measure of central tendency
(ii) Measure of variability, and
(iii) Shape of the distribution.
These three elements are necessary and sufficient to describe any distribution of variables of a
sample or population. Thus, in order to adequately describe a distribution of scores (variables),
we need, in addition to a measure of central tendency, a measure of variability of variables
besides the shape of the distribution. Information concerning variability is as important, if not
more important than information concerning central tendency.

Types of Measurers of Variability

Page 102 of 129


The measures of variability (spread or dispersion) provide a needed index of the extent of
variation among variable (scores) in a distribution. Indexes used are:
1. Range
2. Quartile deviation
3. Mean deviation
4. Variance and standard deviation.
These measures of variability along measures of central tendency make up the two types of
descriptive statistics, which are dispensable in describing distributions of a variable though not
sufficient, for we need the shape of the distribution to make the description sufficient.

Range

The range is defined as the difference between the highest score and the lowest score.
For example for the data or scores 3, 3, 4, 5, 6, 6, 8, 9, 10 the range is simply 10-3, which is 7.

Range is the difference between the lowest score and highest score.

Set A: 59, 59, 59, 60, 61, 61, 61; Range is 61-59=2

Set B: 30, 40, 50, 60, 70, 80, 90; Range is 90-30=60

Like the mode, range is not a very stable measure of variability and its main advantage is that it
gives a quick, rough estimate of variability

When dealing with grouped data, an estimate of the range can be computed by subtracting the
midpoint (class-mark) of the lowest interval from the class-mark of the highest interval.
The range is the simplest measure of variability. Hence it is not a stable measure of variability.
Thus, the range is only used as a quick reference to the spread of scores in a distribution.

It is the difference between the lowest score and highest score.

Set A: 59, 59, 59, 60, 61, 61, 61 ; Range is 61-59=2

Page 103 of 129


Set B: 30, 40, 50, 60, 70, 80, 90, ; Range is 90-30=60

Like the mode, range is not a very stable measure of variability and its main advantage is that it
gives a quick, rough estimate of variability.

Quartile Deviation
The quartile deviation as called “semi-interquartile range” is defined as half of the difference
between the 75th percentile (Q3) and the 25th percentile (Q1). Hence it is one half the scale
distances between the 75th and 25th percentiles in a frequency distribution.
To find the quartile deviation (Q), which includes the middle 50th percentile or Q2 (or median),
we fist locate the 75th percentile and the 25th percentile.
 75th percentile is a point in a distribution such that a quarter of distribution is above it and
other three quarters below it. Similarly,
 25th percentile has a quarter below it while the other three quarters above it,
 while 50th percentile or median has half of the distribution above and below it.
Thus the formula for calculation of quartile deviation, Q, is:
Q 3 −Q 1 P 75−P25
Quartile deviation (Q) = 2 or 2
Example

Class interval
Frequency (
f i) Cumulative
frequency
65-69 3 42
60-64 4 39
55-59 8 35
50-54 10 27
45-49 9 17
40-44 3 8
35-39 4 5
30-34 1 1
n
∑ fi
i=1 =42

Page 104 of 129


3N
( −Cf b )i
4
Q3 = P75 = L + fw

3×42
(−27)5
4
=54.5 + 8
=54.5 + 2.8
=57.3
N
(−Cf b )i
4
Q1 = P25 = L + fw

42
(
−8 )5
4
= 44.5 + 9
= 44.5+1.4
= 45.9

Q 3 −Q 1 P 75−P25
Hence Quartile deviation (Q) = 2 or 2
57 . 3−45. 9
= 2
11. 4
= 2
= 5.7

Note, the same procedure as used for median earlier has been used here to find the 75 th and 25th
percentiles or Q1 and Q3.

Page 105 of 129


Mean Deviation, Variance and Standard Deviation

You will find that the two measures of dispersion (variability) discussed above i.e. range and
quartile deviation do not take into consideration explicitly the values of each and every one of
the raw scores in the distribution. To arrive at a more reliable indicator of the variability (or
spread or dispersion) in a distribution, one should consider the value of each individual score and
determine the amount by which each varies from the most expected value (mean) of the
distribution. Recall that the mean was identified as the most stable measure of central tendency.
Thus deriving an index based on how each score deviates from the mean score, one will have
considerable stable or very stable, index of variability in a distribution.
The deviation scores provide a good basis for measuring the spread of scores in a distribution.
However, we cannot use sum of these deviations in order to get an index of spread because this
sum in any distribution will be zero.

N n
∑ ( X i −μ) ∑ ( X i − X̄ )
i=1 = 0 or i=1 =0

We know why things are getting messed up to obtain zero in every distribution for this sum. The
reason is, because we are summing up negative and positive values, which happen to be exactly
the same. Consequently, there are possible ways of rectifying things so that our index shows or
measures the variability such that the value obtained indeed indicates the magnitude of the
variability, and not the same in all cases as in the present case.
Considering all the above, two ways are possible of getting the index of variability:
1. Considering the sum of the absolute deviations and dividing by total number of cases (n).
This is the absolute mean deviation or simply called mean deviation.
n
∑ |X i − X̄| n
i=1
∑|X i − X̄|
Mean deviation = n where i=1 is the sum of absolute deviations
(disregarding plus and minus signs).
Note that we divide by size of sample, n (or N when we consider population) to make our index
independent of the size of the sample (or population). Thus, as long as there is the same

Page 106 of 129


variability, the index is the same irrespective of the number of cases are in our sample (or our
population).

Computing mean deviation is very easy. Suppose we have scores: 3, 4, 5, 5, 6, 8, 9, and 10.

The mean deviation may be obtained as follows:

Xi X i − X̄ |X i− X̄|
3 -3.25 3.25
4 -2.25 2.25
5 -1.25 1.25
5 -1.25 1.25
6 -0.25 0.25
8 1.75 1.75
9 2.75 2.75
10 3.75 3.75
n n
∑ Xi ∑|X i − X̄|
i=1 = 50 i=1 = 16.5

50
X̄ = 8 = 6.25
n
∑ ( X i − X̄ )
Note that i=1 =0
n
∑ |X i − X̄|
i=1

But the mean deviation = n

16 . 5
= 8
=2.0625

Page 107 of 129


A larger value of the mean deviation indicates a greater spread in the values of the
distribution.

2. The second way is to square the actual deviations and add them together and then
divide by n (i.e. the total number of cases). We obtain an index known as variance.
n
∑ ( X i − X̄ )2
i=1

Variance = n

The variance is not in the unit of the scores and hence is not a measure of variability {recall
cm is unit for length while cm2 is unit for area]. In order to put the measure of variability into
the right perspective; that is, to return to our original unit of measurement, we take the
square root of the variance. The square root of the variance is standard deviation, which is
a measure of variability and the most reliable measure of variability.
Standard deviation (S) is:


n
∑ ( X i − X̄ )2
i=1

S= n i.e. S = √ (variance)
Observe that standard deviation and mean deviation have the same units. Standard deviation
has better properties than mean deviation hence more reliable or stable.

Population variance is symbolized by σ2; consequently population standard deviation is σ.


Sample variance is symbolized as S2 and sample standard deviation obviously as S.
N
∑ ( X i−μ )2
σ 2= i =1
N
Computation Formula for Variance and Standard Deviation
Using the above formula referred to as “definitional formula” to compute variance is easy when
each score and the mean are whole numbers. However, computation of variance is tedious and
unwieldy, and in many cases inaccuracy results due to rounding error. In such cases the
computational formula or raw score formula is preferred.

The formula is:

Page 108 of 129


n

n ( ∑ X i )2
∑ X 2i − i=1
n
i =1
S2x=
n Subscript x on S2 is just emphasizing or indicating we are looking for
the variance for X-variable (scores).

The computational formula for variance has a great advantage over the definitional formula

n
∑ ( X i − X̄ )2
i=1
n since in the former formula the raw scores are used directly without first
resorting to determination of the mean. The computational formula is also easy to use when
some scores in a distribution are fractional i.e. not whole numbers as already indicated.

2
Computation of variance ( S x ) and standard deviation ( S x )

We shall use the scores 3, 4, 5, 5, 6, 8, 9and 10 to compute variance and the standard deviation
using both the definitional and computational formulae. You will notice that the computational
formula will be easier to use than the definitional formula.

Computation using definitional formula:

Xi X i − X̄ ( X i − X̄ )2
3 -3.25 10.5625
4 -2.25 5.0625
5 -1.25 1.5625
5 -1.25 1.5625
6 -0.25 0.0625
8 1.75 3.0625
9 2.75 7.5625

Page 109 of 129


10 3.75 14.0625
43.500

n
∑ ( X i − X̄ )2
i=1
2
Variance, S x = n

43 . 5
= 8
= 5.4375

Standard deviation,
S x =√ 5.4375
= 2.33

Computation using computational formula or raw scores formula:

Xi X i2
3 9
4 16
5 25
5 25
6 36
8 64
9 81
10 100
Sum 50 356
n n
∑ Xi ∑ X 2i
Thus i=1 = 50 i=1 = 356

Page 110 of 129


n

n ( ∑ X i )2
∑ X 2i − i=1
n
i =1
S2x=
Variance, n
50×50
356−
8
= 8
356−312. 5
= 8
43 . 5
= 8
= 5.4375

Standard deviation,
S x =√ 5.4375
= 2.33
Note that even for a simple case like the one above, the definitional formula method tended to be
unwieldy, while in the case of computational formula method, it is rather straightforward.

Some Properties of Variance, S2 or / and Standard Deviation S


Suppose we added a constant number to every score in a set of scores. How would the variance
of the scores be affected? In illustrating the calculation of variance, we found that the scores 3,
4, 5, 5, 6, 8, 9 and 10 have the variance of 5.4375. Let us add 2 to each score and then calculate
S2 .

2
Effect on variance, S , when each score is added a constant(2)

X X +C = Original score ( X X ) +C [ X X +C ]2
5 25
6 36
7 49
7 49

Page 111 of 129


8 64
10 100
11 121
12 144

∑ X X +C = 66 ∑ X 2X +C = 588

( ∑ X X +C )2
2
∑ X 2X +C − n
S x=
n
662
588−
8
= 8
66×66
588−
8
= 8
588−544 .5
= 8
= 5.4375

2
Adding 2 to each score did not change the value of variance, S . In general, adding or
subtracting a constant to each score in a group will not change the variance (nor the standard
deviation) of the scores.
2
What would happen to S if each score was multiplied by a constant, say 2? Let us illustrate this
with the same set of scores i.e., 3, 4, 5, 5, 6, 8, 9 and 10. let us multiply each score by 2 and then
2
calculate S .

2
Effect on variance, S when each score is multiplied by a constant

Page 112 of 129


X CX = Original score ( X X ) ×C [ X CX ]2
6 36
8 64
10 100
10 100
12 144
16 256
18 324
20 400

∑ X CX = 100 ∑ X 2CX = 1424

1002
1424−
8
S2 = 8
1424−1250
= 8

= 21.75

Note that 21.75 is 4 times or 22 times 5.4375 In general, multiplying each score by a constant
makes the variance, of the resulting scores equal to C2S2. However, dividing each score by a
S2
2
constant makes the variance of the resulting scores equal to C . Thus we have come up with
two important properties of the variance and are:
2 2
1. Var. (X+C) = Var. (X) or S X +C = S X i.e. adding a constant to every score in the
distribution does not change the variance at all, and consequently the standard deviation
is not affected either.
2 2 2
2. Var.(CX) = C2Var.(X) or SCX =C S X i.e. multiplying each and every score by a
constant in the distribution, the resulting scores have a variance equal to constant
squared times the original variance.

Page 113 of 129


These properties can be used to ease computation of the variance quite considerably, and the
method is encouraged whenever we are computing variance; much so, when we have grouped
data.

Summary

The following statements summarize the major pints of this unit:


1. The range, variance and standard deviation are measures of variability.
They give an indication of the spread of scores in a distribution.
2. The range is defined as the difference between the highest and the lowest scores in a
distribution.
3. Variance is obtained by dividing the sum of squared deviations by the total number of
observations in the distribution.
4. The standard deviation is the square root of variance.
5. The bigger the standard deviation, the bigger the spread of scores and the more
heterogeneous (varied / diverse /unrelated) the group is on which the scores are based.
6. The smaller the standard deviation, the smaller the spread of scores and the more
homogeneous (same / similar / consistent) the group is on which the scores are based.
7. Adding or subtracting a constant to every score in the distribution has no effect on variance
or standard deviation.
8. When every score in a distribution is multiplied by a constant, the new variance is the
original variance times the constant squared.
9. When every score in a distribution is divided by a constant, the new variance is the original
variance divided by the constant squared

LESSON ELEVEN

MEASURES OF RELATIONSHIP

Page 114 of 129


Introduction

The relationship or association between two variables is an important concept in research or any
studies. It can help in prediction, given one variable and not the other, and if their relationship
is known and is high enough to allow prediction.

Expected learning outcome

At the end of this lesson, you should be able:


1. Explain two methods of studying the relationship, one requiring stringent requirement
(assumptions) while the other not so stringent requirements.
2. Compute, given two sets of data for a group, the two indexes (measures) of relationship
i.e. Pearson product moment correlation coefficient and Spearman rank order correlation
coefficient.
3. Interpret the computed value of the relationship
4. Draw a scatter diagram (also called scatter-plot or scatter-gram) and describe
relationship it portrays in simple terms.
5. Give the properties of the indices e.g. what happens to the relationship index when the
scores are linearly transformed.

Measures of Relationship

Quite often, we are interested in finding out how two variables are related to each other. The
kind of question that may come to our mind is: are those who do well in mathematics, the very
ones who do well in science? Or the same question put in different words would be: how is
performance in mathematics related to performance in science?
When two measures are related, the term correlation (or association) is used to describe
this fact. Correlation has two distinctions; that is, correlation which merely describes
presence or absence of relation, and correlation, which shows the degree or magnitude of
relationship between two measures.

Detecting presence or absence of relationship

Two ways of detecting the presence or absence of a relationship include logical examination of
the two measures and plotting a scatter diagram.

Logical examination

This can be demonstrated by an example. Suppose one postulates (assumes / guess) that
mathematics scores are related to science scores. Then suppose that the following data linking
mathematics and science scores are obtained. What would logical examination of pairs of the
data suggest?

Logical examination of maths and science scores for case I

Page 115 of 129


Maths scores (X) 42 54 66 78 100 120
Science scores (Y) 81 88 93 99 109 125

Examination of data for case I above, suggests that some logical relationship exists between
mathematics and science scores. It may neither be necessary nor possible to get an index of
degree of relationship using this method, but one can at least be confident that some relationship
is present. However, suppose one was examining the data below in our case II, would one be
justified in concluding that some logical relationship exists between the two sets of scores?

Relationship between maths and science scores for case II

Maths scores (X) 42 54 66 78 100 120


Science scores (Y) 81 45 55 42 91 77

Examination of the data for our case II shows that there is no logic in the
differences in science and mathematics scores. Hence the relationship between
them cannot be defended logically. One can only conclude that there is no
relationship. We next look at the relationship between two variables depicted
graphically to study the kind of relationship between the two variables. This is
done by means of a scatter diagram.

Scatter diagram

A scatter diagram is a graph of data points based on two measures (variables), where one
measure defines the horizontal axis and the other defines the vertical axis. In other words, when
we depict graphically a relationship between two variables, that graph (or presentation) is
referred as a scatter diagram or scattergram, also called scatterplot.

Page 116 of 129


Scatter diagram showing relationship between maths and science scores for case I

140

120

100
Science scores

80

60

40

20

0
0 20 40 60 80 100 120 140
Maths scores

Observe here in case I, one could draw a straight line through the scatter diagram in such a way
that it would approximate the pattern of points. The pattern of points in this case suggests a
highly positive relationship. This means that as mathematics scores increase, there is a
corresponding increase in science scores. Our case II, depicted below is that of, there is no
systematic relationship between the two variables. The points do not show any distinct pattern.
This scatter diagram suggests the absence of a relationship.

Page 117 of 129


Scatter diagram showing relationship between maths and science scores for case II

100

90

80

70

60
Science scores

50

40

30

20

10

0
0 20 40 60 80 100 120 140
Maths scores

Scatter diagram does not provide a very precise measure of the relationship. We definitely need
a more precise measure (or index) of relationship.

Methods that provide more precise indices of relationship

Three of the methods that provide precise measures of relationship between variables are:
covariance, Pearson product-moment correlation coefficient and Spearman rank correlation
coefficient.

Covariance

Covariance provides some information on the degree of relationship between two variables by
a simple averaging procedure. Let us illustrate this by a case where we want to determine the
covariance between mathematics (X) and science (Y) scores using data generated from n
students. Each of the n students provide two scores i.e. one mathematics score (X) and one
science score (Y).
The first step in the determination of covariance involves obtaining the product of deviation of
the two scores (X and Y) from their respective means. This in summary is ( X i − X̄ )(Y i−Ȳ ) .
The procedure is repeated for all the n students. The products of deviation scores are then
summed. The sum of the products of the deviation scores is divided by n (the total number
of students). The quantity obtained is covariance, which a more precise measure of

Page 118 of 129


relationship, possibly than a scatter diagram. We shall denote covariance between X and Y by
Cov. (X, Y).
Thus :
n
∑ ( X i − X̄ )( Y i−Ȳ )
i=1

Cov. (X, Y) = n

Note if a student’s scores are high on both variables, X and Y, the product of the deviations i.e.
( X i − X̄ )(Y i−Ȳ ) will be high and positive for him or her.
If the student has low scores in both variables, the product of deviations will be both high and
negative while the product of two negative numbers is positive.

Compute the covariance of the following scores

X: 7, 8, 9
Y: 40, 50, 60

Solution is Cov. (X, Y) = 20/3 = 6.667

Thus the covariance of X and Y,


n
∑ ( X i − X̄ )( Y i−Ȳ )
i=1

Cov. (X, Y) = n is a measure of relationship between X and Y.

Notice that the covariance of X with itself is simply the variance of X.


n n
∑ ( X i − X̄ )( X i − X̄ ) ∑ ( X i − X̄ )2
i=1 i=1

Cov. (X, X)= n = n = Var. (X)

LESSON TWELVE

Page 119 of 129


MEASURES OF RELATIONSHIP

The Pearson product-moment correlation coefficient (rxy)

The process of deviating both the X and Y values around their respective means has made the
quantity, Cov. (X, Y) independent of the means, of the values

To make the desired measure of relationship independent of standard deviations of the two
groups’ values, one need only to divide covariance of the two i.e., Cov. (X, Y) by the standard
deviation of the two variables i.e. Sx and Sy

The result is the desired measure of relationship between X and Y. It is called the Pearson
product-moment correlation coefficient and is denoted by rxy

Co var iance( X , Y ) Cov .( X , Y )


i.e. rxy = √ Var .( X )Var .(Y ) = S x S y

Cov .( X , Y )
i.e. rxy = S x S y
The above formula can be simplified and presented in a slightly different form to get what is
commonly referred to as the definitional formula.
n
∑ ( X i− X̄ )( Y i− Ȳ )
i=1
n
n
∑ ( X i − X̄ )( Y i−Ȳ )


n n i=1
∑ ( X i− X̄ )2 ∑ ( Y i −Ȳ )2
Cov .( X , Y )

n n
i =1 i=1
∑ ( X i − X̄ )2 ∑ ( Y i −Ȳ )2
. rxy = S x S y = n n = i=1 i=1
n
∑ ( X i− X̄ )( Y i− Ȳ )
i=1
n


n n
∑ ( X i− X̄ ) ∑ ( Y i −Ȳ )2
2

i =1 i=1

rxy = n n

n
∑ ( X i − X̄ )( Y i−Ȳ )
i=1

√∑
n n
2
( X i − X̄ ) ∑ ( Y i −Ȳ )2
rxy = i=1 i=1

Page 120 of 129


Where Xi is the score of a person on one variable,
Yi is the score of a person on the other variable,
X̄ is the mean of the X-score distribution
Ȳ is the mean of the Y-score distribution
Sx is the standard deviation of X-scores
Sy is the standard deviation of Y-scores
N is the number of scores within each distribution, and
n

i=1 is summation sign implying we summing up to n cases’ scores.

Table showing the calculation of rxy using the definitional formula

Xi Yi ( X i − X̄ ) ( X i − X̄ )2 (Y i −Ȳ ) (Y i −Ȳ )2 ( X i − X̄ )(Y i−Ȳ )


47 42 20 400 16 256 320
46 47 19 361 21 441 399
27 22 0 0 -4 16 0
8 7 -19 361 -19 361 361
7 12 -20 400 -14 196 280
Sums
135 130 1522 1270 1360

n
∑ ( X i − X̄ )( Y i−Ȳ )
i=1

√∑
n n
2
( X i − X̄ ) ∑ ( Y i −Ȳ )2
rxy = i=1 i=1

1360
= √1522×1270

1360
=1390 = 0.978

Ranges for values of rxy:

The Pearson product moment correlation coefficient, rxy, varies between –1.00 and 1.00. A value
of 1.00 is a perfect positive relationship. This means that an increase in one variable is
accompanied by a commensurate increase in the other variable. A value of 0.00 (zero) indicates
no relationship. A value of –1.00 is a perfect negative relationship. Values in between the
two extremes (minimum, -1 and maximum, +1) are judged low to high depending on the size.
The scatter diagrams indicating rxy’s of different sizes are shown below. The interpretation of rxy,
itself is relative. That is, we cannot say that a correlation of 0.5 is twice as strong as a
correlation of 0.25, but only that it is stronger. This kind of ordinal thinking is only

Page 121 of 129


meaningful way of comparing the size of different rxy’s. While there is a significant test for
concluding what constitutes a correlation coefficient different from zero, tradition tells us that in
social science rxy (i.e. coefficients) ranging from0.6 to 0.8 indicate quite strong relationship. The
size of the group being considered affects the coefficient, rxy, you find a large group (i.e. large n)
may have a relatively small coefficient which is significantly different from zero. We now look
at the three types of scatter diagrams depicting the two extreme cases and no correlation
case.

Scatter diagram showing a perfect positive relationship

scatter diagram showing a perfect positive relationship

12

10

8
Y scores

0
0 2 4 6 8 10 12
X scores

Page 122 of 129


A scatter diagram showing a perfect negative relationship
A PERFECT NEGATIVE RELATIONSHIP

12

10

8
Y SCORES

0
0 2 4 6 8 10 12
X SCORES

A scatter diagram showing no relationship

Page 123 of 129


NO RELATIONSHIP

12

10

8
Y SCORES

0
0 2 4 6 8 10 12
X SCORES

The effect on rxy of applying a constant to scores

Adding, subtracting or multiplying every score in the two distributions, X and Y by a constant
has no effect on the size of the correlation coefficient between X and Y. Adding, subtracting or
multiplying every score in a distribution by a constant is an example of a linear transformation.
In a linear transformation, the scores retain their relative positions and hence the correlation
coefficient between two sets of scores, X and Y is not affected by the transformation.

Causation and correlation


The presence of correlation between two variables does not necessarily mean there exists a
causal link between them. Correlation measures the strength of linear relationship. It does
not mean that one variable is causing another. The only notion, of course, available to human
sense is the idea of time and space. If one variable comes before the other in time, we may be
in a better position to assert that it causes the others. Assertions of this kind cannot be based on
the correlation between variables alone.

Page 124 of 129


Even when correlation between events can be useful in identifying causal relationship when
coupled with other methodological approaches, it is a dangerous and misleading test. First, even
when one can presume that a causal relationship does exist between two variables being
correlated, rxy can tell nothing by itself about whether X causes Y or Y causes X. Secondly,
other variables other than the two under consideration are responsible for the relationship.
However, while correlation does not directly establish a causal relationship , it may furnish clues
to causes. When feasible (possible / reasonable), these clues can be formulated or postulated as
hypotheses that can be tested in experiments, in which influences, other than those whose inter-
relationships are being studied, can be controlled.

Curvilinearity

Pearson correlation coefficient is a measure of linear relationship between two variables. It is


quite possible to have a relationship like poverty and age where both younger and older people
are poorer than those in between. This kind of relationship is best described by a curve and is
called a curvilinear relationship. When the association between two variables is curvilinear
(nonlinear), it is likely that the estimates of r xy will be low. It is always wise, therefore, to draw a
scatter diagram before computing the correlation coefficient . It saves time and indicates the
approximate form of the relationship.

Underlying assumptions of Pearson product moment correlation coefficient, rxy

Four assumptions, which need be met by data of the two variables, X and Y for the index, rxy to
be a meaningful index in the sense that it is accurate in measuring relationship. The assumptions
are:
1. The relationship between the two variables must be linear
2. The two distributions should have similar shapes
3. The scatter diagram should be homoscedastic (equal variance of X and Y distribution)
4. The data should be based on at least interval scale of measurement.
Concerning the linear assumption, rxy is a measure of linear relationship between X and Y. If X
and Y are perfectly linearly related, the points in the scatter diagram will fall on a single straight
line. If we scatter the points in such a scatter diagram above and below the line in a haphazard
manner and about the same distance in each direction, we obtain various degrees of basically
linear relationship between X and Y. In both cases, the assumption of linearity would be met.
However, if there is some evidence that the relationship between X and Y is curvilinear, then the
assumption of linearity would have been violated, and the use of rxy would underestimate the real
magnitude of association between X and Y.

Page 125 of 129


The second assumption is that both X and Y have similar distribution shapes. The values of rxy
cannot attain the maximum values of +1 and –1, unless X and Y distributions have identical
shapes. For instance, if X is highly positively skewed and Y is negatively skewed, then these
maximum values cannot be obtained, whatsoever.

The third assumption has to do with homoscedasticity or equal variance of the X and Y
distributions. In simple terms, the requirement here is that the points on a scattergram showing
relationship between X and Y should be uniformly distributed. No places on the scattergram (or
scatter diagram) should have more points than others. The density of points on the scatter
diagram should be nearly the same. Whenever the scattergram is homoscedastic, the variance
of X variable is the same as the variance of Y variable.

The fourth assumption has to do with the level or scale of measurement. The Pearson product
moment correlation can only be used if the data from the two variables being correlated is based
on interval scale of measurement or on ratio scale of measurement . If the data is based on
ordinal (ranked) scales of measurement, then the Spearman rank- order correlation
coefficient, ρ, (rho) should be used.

The Spearman rank-order correlation coefficient, ρ, (rho)


If one is not very happy with the assumptions underlying Pearson coefficient, do we have
another index of which we do not require our data to satisfy all those stringent (rigid / strict)
assumptions (or requirements)? Put in a different way, does it mean if our data has say
curvilinear trend, i.e. the relationship between X and Y is curvilinear, not linear), we cannot find
the relationship of such data which is meaningful? Is there no meaningful index, which will
tell us how our variables are related, despite their curvilinear trend or distribution not similar
or scattergram not homoscedastic, or our data is on ordinal or nominal scale?

We have such index, which we do not have to require our data to meet the 4 or more
assumptions. The assumptions for the index are minimal. The index is Spearman rank-order
correlation coefficient. It is not the only one in this series of what we may refer to as Non-
parametric statistics. Note, a price is paid for the minimal assumption. It seems there no free
things in this world; a price has to be paid, which we shall not go into.

The Spearman rank correlation uses data that is form of ranks (i.e. ordinal data). Thus, if both of
the two variables to be correlated are measured on an ordinal scale (rank-order scale), the
Spearman rank coefficient is the technique generally applied. The formula for obtaining the
Spearman rank-order correlation coefficient, ρ, (rho) is:
n
6 ∑ d 2i
i=1
1− 2
ρ= n(n −1 )
where ρ (rho) is the Spearman correlation index

Page 126 of 129


d 2i is the difference in subjects’ rank on the two measures (variables) squared
n is the number of scores within each distribution.
Although the Spearman coefficient is designed for use with ranked data, it can be used with
interval data that have been expressed as ranks (we shall see how to express this interval data
into ranked data below here):

Converting interval data to ordinal data


Table I
Scores Position Rank
39 1 1
38 2 2
36 3 3.5
36 3 3.5
35 5 5
30 6 6

Table II
Scores Position Rank
39 1 1
38 2 2
36 3 4
36 3 4
36 3 4
35 6 6
30 7 7

The first column provides interval data in the two tables above. Column two provides the
positions of the scores, while their ranks are given in third column. Rank is similar to position
except where some scores have ‘ties’. Ties are those scores obtain by more than one individual.
Ranks of scores with ties are the mean of the ranks they would occupy if no tie existed. Thus in
our case in table I, the two 36’s would have occupied the ranks of 3 and 4 if the tie did not exist.
Since they tie, the mean of 3 and 4 i.e. (3+4)/2, which is 3.5 is assigned to the two scores. For
table II the three 36’s, their rank is (3+4+5)/3, which is 4.
To illustrate the calculation of rho, let us again use data used to illustrate the calculation of
Pearson product moment correlation coefficient with a slight modification. Since the data is
based on interval measurement, it is converted to ordinal data by assignment of ranks in the third
|d |
and fourth columns. The fifth column gives i , the absolute difference between the ranks of X
and Y variables for respective subjects. The sixth column gives values of squared differences
2
between ranks, d i .

Xi Yi Rank (Xi) Rank (Yi) |di| d 2i

Page 127 of 129


47 42 1 2 1 1
46 47 2 1 1 1
27 22 3 3 0 0
8 7 4.5 5 .5 .25
8 12 4.5 4 .5 .25

Sum 2.5

n=5 d 2i = 2.5
n
6 ∑ d 2i
i=1
1− 2
ρ= n(n −1 )
6×2 .5
1− 2
= 5(5 −1)
6×2. 5
1−
= 5×24
= 1 - 0.125 = 0.875

Interpreting correlation coefficient, ρ, (rho)

The ρ (rho) is interpreted in the same way as rxy. The value of rho can never be less than –1
nor greater than +1. It equals to +1, only if each person has exactly the same ranks on both X
and Y. It is –1, if there are no ties and the order is completely reversed for the two variables
such that the first is the last in the other variables and so forth.

Note

1. Although the Spearman correlation coefficient formula is simpler and does not look
much like the computational formula we used for Pearson correlation coefficient, it is
algebraically equivalent to the Pearson when it is used with ranked data instead of the
interval data.
2. Tie places are easily handled by assigning the mean value of ranks to each of the tie
holders.
3. If a very large number of ties occur, however you would probably be wise to reconsider
the use of Spearman (rho) coefficient, other non-parametric methods such as Kendall’s
tau or chi-square may be more appropriate.
4. Ranking can be done from the smallest or largest and so forth as long as you stick to the
convention you use to the end.
5. If there are no ties in the data, Spearman coefficient is merely what one obtains by
replacing the observations by their ranks and then computing Pearson product moment
correlation coefficient of ranks.

Page 128 of 129


Summary

The following statements summarize the major points of this unit:


1. When two measures are related, the term correlation is used to describe this fact.
2. Correlation has two distinctions: correlation that merely describes presence or absence of
relationship and correlation, which shows the degree of magnitude of relationship.
3. A study of correlation to determine presence or absence of relation can be done through
logical examination of data and examination of scatter diagrams. Methods used to
provide indices of the magnitude of relationship include covariance, Pearson product-
moment correlation coefficient and Spearman rank-order correlation coefficient.
4. The measure of correlation assumes only values between –1 and +1.
5. If the larger values (scores) of X tend to be paired with larger values (scores) of Y, and
hence the smaller values (scores) of X and Y tend to be paired together, then the measure
of correlation should be positive and close to +1. If the tendency is strong, then we
would speak of a positive correlation between X and Y.
6. If the large values of X tend to be paired with the smaller values of Y, and vice versa,
then the measure of correlation should be negative and close to –1. If the tendency is
strong, then we say that X and Y are negatively correlated.
7. If the values of X seem to be randomly paired with the values of Y, the measure of
correlation should be fairly close to zero. We then say that X and Y are uncorrelated, or
have no correlation or have correlation zero or are independent.
8. The Pearson product moment correlation, rxy, is obtained by dividing the covariance
between two variables by a product of respective standard deviations.
9. Adding or multiplying every score in two distributions with a constant has no effect on
the size of the correlation.
10. Correlation does not mean causation.
11. In order to use rxy, the relationship between the two variables should be linear, the two
distributions must be similar, the variance of the two distributions should be identical
(homoscedastic) and data should be based on interval scale of measurement.
12. When measure is based on ordinal data, the Spearman rank order correlation coefficient,
ρ (rho), should be used. The Spearman rank order correlation coefficient can be
interpreted in the same way as rxy.
2
13. Coefficient of determination, r xy , which indicates the proportion of which Y can be
accounted for or explained for by X, is more efficient index for expressing relationship
between two variables

Page 129 of 129

You might also like