Wolkite University
College of Education and Behavioral Studies
Department of Psychology
Course: Educational Assessment and Evaluation
Instructor: Demelash Yibeltal (MA, MBA, MSW, PhD
Candidate)
October, 2023
Syllabus
Educational Assessment and Evaluation
(Psyc2034)
Course Description
• This course is designed to acquaint students with the basic
principles of assessment and evaluation.
• As the course is on progress, emphasis will be given to the nature
of measurement and evaluation, statement of instructional
objectives, and construction various types of teacher made tests.
• Test statistics and test score analyses are also given due attention
in order to equip students with the necessary skills for using test
results to various educational decisions.
•
Cont
Course Objectives
• At the end of the course you will be able to:
– Know basic concepts about assessment and evaluation
– Understand the roles of assessment and evaluation in
education.
– Formulate instructional objectives.
– Construct test items to measure attainment of
objectives.
– Evaluate strengths and weaknesses of test items.
– Determine statistical qualities of test items
– Understand concepts of validity and reliability
Chapter One
• Introduction
– Definition of terms
– The role of evaluation in teaching process
– Evaluation procedures
– Types of Test and Assessment
• General principles of evaluation
Chapter Two
• Approaches to Measurement and Evaluation
– Maximum performance and typical performance
evaluation
– Formative, summative, diagnostic and placement
evaluation
– Criterion and norm referenced evaluation
– Classroom and standardized tests
Chapter Three
• The Role of Objectives in Education
– Definition
– Stating objectives
• Classification of objectives
Chapter Four
• Classroom Tests
– Planning stage
– Preparing test blueprint (table of Specifications)
– Selecting Appropriate Test format
– Factors to consider when selecting an item format
• Preparing the test items
Chapter Five
• Writing Objective Test Items
– Writing short answer items
– Writing True-False (alternative – response) items
– Writing matching exercises
– Writing multiple choice items
Chapter Six
• The Essay Test
– Classification
– Advantages and limitations
– General considerations in constructing essay tests
• Grading essays
Chapter Seven
• Assembling, Administering, Scoring &
Analysing Test
– Assembling the test
– Administering the test
– Scoring the test
– Item analysis
Chapter Eight
• Test Statistics
– Measures of central tendency
– Measures of variability
– Measures of relationship
Chapter Nine
• Concepts of Validity and Reliability
– Validity
– Reliability
Chapter One
Definition of Basic Terms
Assessment:
– Assessment is a general term that includes all the
different ways teachers gather information in
their classroom.
– It includes observations, oral questions, paper
pencil tests, homework, lab work, research
paper, and the like.
– It is a process of collecting, synthesizing and
interpreting information to aid in decision making
(Nitko 1996; and Airasian, 1996).
Cont
– Classroom assessment can be defined as the
collection, evaluation, and use of information to
help teachers make better decisions (McMillan,
2001).
– Assessment is a process used by teachers and
students during instruction that provides feedback
to adjust ongoing teaching and learning to
improve students’ achievement of intended
instructional outcomes (Popham, 1998).
Cont
– Assessment is the systematic process of
determining educational objectives,
gathering, using, and analyzing information
about student learning outcomes to make
decisions about programs, individual
student progress, or accountability
(Gronlund, 1993).
Cont
– Assessment is the process of gathering and
discussing information from multiple and
diverse sources in order to develop a deep
understanding of what students know,
understand, and can do with their
knowledge as a result of their educational
experiences; the process culminates when
assessment results are used to improve
subsequent learning (Huba & Freed, 2000).
Cont
• Measurement
– Measurement is concerned with systematic
collection, quantification, and ordering of
information (Payne, 1997).
– It is a process of quantifying or assigning a
number to performance.
– Measurement can take many forms, ranging from
the application very elaborate and complex
electronic devices, to paper – and- pencil exams,
to rating scales or checklists (Nitko, 1996).
Cont
– According to Nihko (1983) measurement is a
procedure for assigning numbers to specified
attributes or characteristics of person in a
manner that maintains the real word
relationship among persons with regard to what
is being measured.
– According Grouland (1985) measurement is the
process of obtaining numerical description of
the degree to which an individual possesses
particular characteristics (“How much?”).
Cont
• Test:
– A test is a particular form of measurement. It is a
formal, systematic usually paper – and- pencil
procedure to gather information about pupil’s
behavior or performance (Airasian, 1996).
– Nitko (1996) also define test as a process of presenting
series of questions that student must answer.
– A test is a specific tool or procedure or a techniques
used too obtained response from students in order to
gain information which provide the basis to make
judgment or evaluation regarding some characteristics
such as fitness, skill, knowledge and values.
Cont
• Testing:
– is the process of administering the test, scoring
and interpreting the result
Cont
• Evaluation:
– It is the process of making judgment about pupil’s
performance instruction, or classroom climate.
– It occurs after assessment information has been
collected, synthesized and though about, because
this is when the teacher is in a position to make
informed judgments (Airasian, 1996).
– According to Scriven (1967) evaluation is “judging
the worth or merit” of a case (ex. student),
program, policies, processes, events, and activities.
Cont
– Evaluation is arrived when the necessary
measurement and assessment have taken place.
– In order to evaluate whether a student will be
retained or promoted to the next level, different
aspects of the student’s performance were
carefully assessed and measured such as the
grades and conduct.
• To evaluate whether the remedial program in math is
effective, the students’ improvement in math,
teachers’ teaching performance, whether students’
attitude towards math changed should be carefully
assessed.
Cont
– Evaluation includes both quantitative and
qualitative description of pupil behavior
plus value judgment concerning the
desirability of that behavior
– Evaluation = Quantitative description of
pupils’ behavior (measurement) and/or
qualitative description of pupils’ behavior
(non measurement) plus value judgments.
THE ROLE OF EVALUATION IN TEACHING
• It also provides information which serves as a
basis for a verity of educational decisions like
– instructional management decisions,
– selection decisions,
– placement decisions,
– counseling and guidance decisions,
– classification decision and
– credentialing and certification decisions.
Cont
• Instructional management decisions
– Instructional management decisions include
planning instructional activities, placing
students into learning sequences, monitoring
students’ progress, diagnosing students’
learning difficulties, providing students and
parents with feedback about achievements,
evaluating teaching effectiveness, and
assigning grades to students.
Cont
• Selection Decisions.
– An institution or organization decides that some
persons are acceptable while others are not;
those not acceptable are rejected and no longer
are the concern of the institution, or organization.
– For example, college admissions are often
selection decisions: some candidates are admitted
and others are not; those not admitted are no
longer the college’s concern.
Cont
• Placement decision
– Placement decisions differ from selection is that in selection
decisions rejection is possible and the institution is not
concerned about what happens to those rejected where as
in placement decisions persons are assigned to different
levels of the same general type of instruction, education or
work, and no one is rejected (Cronbach, 1990, Cronbach and
Glesor, 1995, cited in Nitko 1996).
– Those students not enrolled in honors sections, for example
must be placed at other educational level. Students with low
first grade reading readiness test scores, for example, cannot
be sent home.
Cont
• Counseling and Guidance Decisions.
– A single assessment result is not used for making
guidance and counseling decisions.
– Rather, a series of assessments is administered, including
an interview, and interest inventory, various aptitude
tests, a personality questionnaire and an achievement
battery. Information from these assessments, along with
additional background information, is discussed with the
student during a series of counseling sessions.
– This facilitates a student’s decision making processes and
is an entree to the exploration of different careers.
Cont
• Classification decision.
– Sometimes teacher must make a decision that results in a
person being assigned to one of several different but
unordered categories jobs, or programs.
– These types of decisions are called classification decisions.
(Cronbach and Gleser, 1965).
– For example, legislation in the area of educating persons
with disabilities has given a legal status to many labels for
classifying children with disabilities into one (or more) of a
few designated categories.
– These categories are unordered (that is, blindness is not
higher or lower than deafness).
Cont
• Credentialing and certification Decisions.
– Credentialing and certification decisions are
concerned with assuring that a student has
attained certain standards of learning.
– Student certification decisions may focus on
whether a student has attained minimum
competence or whether a student has a obtained
a high standard. E.g: COC
Cont
• Specifically, the purpose of measurement and
evaluation are:
• Placement of student, which involves bringing
students appropriately in the learning
sequence and classification or streaming of
students according to ability or subjects.
• Selecting the students for courses – general,
professional, technical, commercial etc.
Cont
• Certification: This helps to certify that a
student has achieved a particular level of
performance.
• Stimulating learning: this can be motivation of
the student or teacher, providing feedback,
suggesting suitable practice etc.
• Improving teaching: by helping to review the
effectiveness of teaching arrangements.
Cont
• For research purposes.
• For guidance and counseling services.
• For modification of the curriculum purposes.
• For the purpose of selecting students for
employment
• For modification of teaching methods.
Cont
• For the purposes of promotions to the student.
• For reporting students progress to their
parents.
• For the awards of scholarship and merit
awards.
• For the admission of students into educational
institutions.
• For the maintenance of students.
Evaluation Procedures
• Evaluation procedures refer to the systematic processes and
methods used to assess student learning and instructional
effectiveness.
• Common evaluation procedures include tests, quizzes,
assignments, projects, presentations, observations, and
portfolios.
• These procedures can be formative (ongoing and diagnostic) or
summative (final and evaluative).
• Evaluation procedures should be aligned with the learning goals
and objectives of the curriculum.
• They should be fair, reliable, valid, and transparent to ensure
accurate and meaningful assessment of student performance.
Cont
• Evaluation procedures for students'
performance typically involve the following
steps:
• Setting Clear Criteria:
– Establish clear performance criteria or standards
that define what successful performance looks like.
– These criteria should be communicated to students
beforehand, ensuring they understand the
expectations.
Cont
• Collecting Data:
– Gather data on students' performance using a
variety of assessment methods such as tests,
projects, presentations, observations, portfolios,
or performances.
– The choice of assessment methods depends on
the nature of the subject and the learning
outcomes being assessed.
Cont
• Analyzing Data:
– Analyze the collected data to determine the
extent to which students have met the established
performance criteria.
– This involves reviewing and interpreting the
assessment results to identify strengths and
weaknesses in students' performance.
Cont
• Providing Feedback:
– Provide constructive feedback to students that
highlights their strengths, areas for improvement,
and specific recommendations for enhancing their
performance.
– Feedback should be timely, specific, and
actionable, focusing on the learning objectives
and how students can further develop their skills
and knowledge.
Cont
• Grading or Scoring:
– Assign grades or scores to students' performance
based on the established criteria and the analysis
of the assessment data.
– This step involves translating the qualitative
assessment information into a quantitative form
that represents students' achievement levels.
Cont
• Reporting and Communicating:
– Prepare reports or communicate the evaluation
results to students, parents, or other stakeholders
as necessary.
– This step involves sharing the assessment
outcomes, including grades, qualitative feedback,
and recommendations for future growth.
Cont
• Reflecting and Revising:
– Reflect on the evaluation process to identify areas
for improvement and make any necessary
revisions to the assessment methods, criteria, or
feedback strategies.
– Continuous reflection and improvement ensure
that the evaluation procedures remain effective
and aligned with the learning goals.
Types of Evaluation
• Preliminary Evaluation
– Preliminary evaluations occur during the first days
of school and provide a base for expectation
thought of the school year.
– They are obtained through a teacher’s
spontaneous informal observations and oral
questions and are concerned with student’s skills,
attitudes, and physical characteristics.
Cont
• Formative Evaluation:
– Formative evaluation occurs during instruction.
– They establish whether or not students have
achieved sufficient mastery of skills and whether
further instruction over these skills is appropriate.
– Formative evaluations also concerned with
students’ attitudes.
– The purpose is to determine what adjustments to
instruction should be made.
Cont
• Summative Evaluation:
– Summative evaluations occur at the conclusion of
instruction, such as at the end of a unit or the end
of the course.
– They are used
• To certify student achievement and assign end-of-term
grades For promoting and sometimes grouping
students
• To determine whether teaching procedures should be
changed before the next school year.
Cont
• Diagnostic Evaluation:
– Diagnostic evaluations occur before or, more
typically, during instruction.
– Diagnostic evaluations are concerned with skills
and other characteristics that are prerequisite to
the current instruction or that enable the
achievement of instructional objectives.
– During instruction, diagnostic evaluations are
used to establish underlying causes for a student
failing to learn a skill.
Types of Academic Tests
• We can classify achievement test as:
• teacher made and
• standardized achievement test.
• There are also a variety of ways in which
teacher made tests or standardized tests can
be classified.
• We can classify them based up- on:
Cont
• Classification by Item format
– essay and objective and
– place the short answer form under
objective rather than essay primarily
because short answer items can be scored
more objectively than essay question.
Cont
• Classification by stimulus Material (verbal versus
nonverbal)
– We generally think of tests in terms of a series of
verbal problems that require some sort of verbal
response.
– But there are many instances where the stimulus
material used to present the problem to the student
need not to be verbal.
• In an art course, the stimulus materials can be pictorial.
• In a music course, it could be a recording.
• In a wood working course, the stimulus material might
be the tools.
Cont
• Classification By purpose:
– Criterion versus
– Norm- referenced Interpretation
• Generally there are four types of frame of
reference of interpretation and in classroom
the most useful frame of interpretation are
called norm and criterion referenced
interpretations.
Cont
• Frames of reference for interpreting performance
– There are four references that teachers and
others often use for interpreting student
performance.
– They are called ability-referenced, growth-
referenced, and norm-referenced
• Criterion-referenced
• Ability-referenced, in which a student's performance
is interpreted in light of that student's maximum
performance.
Cont
• Growth-referenced, in which performance is
compared to the student's prior performance
• Norm-referenced, in which interpretation is provided
by comparing the student’s performance to the
performance of others or to the typical performance
for that student
• Criterion-referenced, in which meaning is provided
by describing what the student can and can not do.
General Principles of Evaluation
• Evaluation process has the following general
principles
• Determining and clarifying what is to be
evaluated.
– No evaluation device should be selected or
developed until the purposes of evaluation
have been carefully defined.
– This is what the teacher should do at the
beginning of evaluation process.
Cont
• Selecting evaluation techniques in terms of
the purpose to be served.
– While selecting evaluation technique you
should an answer this question:
– Is this evaluation technique the most
effective method for determining what I
want to know about the pupil?
Cont
• Using a variety of evaluation techniques
– Comprehensive evaluation requires a
variety of evaluation technique.
– No single evaluation technique is adequate
for appraising pupil progress toward the
entire important outcomes of instructional.
Cont
• Being aware of the limitations of the evaluation
technique used.
– Proper use of evaluation technique requires
awareness of limitations as well as of their strengths.
– Evaluation techniques vary from fairly well developed
measuring instrument (e.g. scholastic aptitude test) to
rather crude observational methods. All are subject to
one or more types of error.
– Sampling error:- since we can only measure a small
sample of a pupil’s behavior at any one time, there is
always a question of the adequacy of the sample.
Cont
– The other error is found in the evaluation
instrument it self or in the process of using the
instrument.
– For example, scores on objective tests are influenced
by chance factor such as guessing; scores on essay
tests are modified by the subjective judgment of the
person doing the scoring, the result of self report
techniques are distorted by the individual’s desire to
present himself in a favorable light; and observations
of behavior are subject to all of the biases of human
judgment.
Cont
– A major source of error arises from improper
interpretation of evaluation results. In general
a healthy awareness of the limitations of evaluation
instruments makes it possible to use them most
effectively.
• Regarding evaluation as a means to an end
not an end in it self
– It is a basis for making instructional, guidance, and
administrative decision (Mehrens and Lehman,
1984).
Chapter Two
Approaches to Measurement and Evaluation
• Maximum Performance Evaluation:
– Maximum performance evaluation aims to assess
an individual's highest level of achievement or
ability under ideal or optimal conditions.
– It helps identify an individual's full potential,
exceptional abilities, or talents in specific
domains.
– Maximum performance evaluation focuses on
pushing individuals to their limits and measuring
their peak performance.
Cont
– Assessment methods for maximum performance
evaluation may include challenging tasks,
competitions, or specialized tests designed to
elicit exceptional performance.
– Maximum performance evaluation is often used in
fields such as sports, arts, or gifted education to
identify individuals with extraordinary talents or
abilities.
Cont
• Typical Performance Evaluation:
– Typical performance evaluation aims to assess an
individual's average or normal level of achievement or
ability in everyday situations.
– It helps evaluate an individual's performance in
realistic contexts, reflecting their typical day-to-day
abilities.
– Typical performance evaluation focuses on measuring
performance under ordinary conditions, considering
factors that individuals encounter in their regular
lives.
Cont
– Assessment methods for typical performance
evaluation may include assignments, projects,
observations, or simulations that reflect real-life
situations.
– Typical performance evaluation is commonly used
in educational settings, workforce assessments, or
performance appraisals to gauge individuals'
competence and suitability for specific tasks or
roles.
Distinctions
• Definition: Maximum performance evaluation assesses an
individual's highest level of achievement, while typical
performance evaluation assesses their average or normal
level of achievement.
• Conditions: Maximum performance evaluation occurs under
ideal or optimal conditions, while typical performance
evaluation occurs in everyday, realistic conditions.
• Focus: Maximum performance evaluation focuses on
exceptional or peak performance, while typical performance
evaluation focuses on regular performance in typical
situations.
Cont
• Purpose: Maximum performance evaluation identifies
exceptional abilities or talents, while typical performance
evaluation assesses competence and suitability for everyday
tasks or roles.
• Methods: Maximum performance evaluation may use
specialized tests or competitions, while typical performance
evaluation may use assignments, observations, or
simulations.
• Application: Maximum performance evaluation is used in
fields like sports or gifted education, while typical
performance evaluation is used in education, workforce
assessments, or performance evaluations.
Cont
• Assessment Methods for Maximum Performance
Measurement:
• Specialized Testing: Specialized tests designed to
challenge individuals' abilities and assess their peak
performance in specific domains (e.g., IQ tests for
cognitive abilities, performance assessments for artistic
skills).
• Competitive Assessments: Competitions or contests that
allow individuals to showcase their exceptional abilities
and compare their performance against others in the
field.
Cont
• Performance-Based Tasks: Complex tasks or
simulations that require individuals to
demonstrate their highest level of skill or
expertise.
• Portfolio Assessment: Compilation of an
individual's best work or accomplishments that
highlight their exceptional abilities or talents.
• Expert Evaluation: Evaluation by professionals or
experts in the field who can assess individuals'
performance based on their expertise and criteria.
Cont
• Assessment Methods for Typical Performance
Measurement:
• Assignments and Projects: Tasks or projects that reflect
real-life situations and require individuals to apply their
knowledge and skills in practical contexts.
• Observations: Direct observations of individuals'
performance in everyday situations or tasks to assess
their typical behavior, skills, or abilities.
• Self-Report Measures: Surveys or questionnaires that
individuals complete to provide self-assessments of
their typical performance or behavior.
Cont
• Work Samples: Examples of individuals' work
or performance in real-world settings, such as
job-related tasks or projects.
• Standardized Tests: Tests designed to measure
individuals' general knowledge, skills, or
abilities in a standardized manner, providing a
comparison to a normative group.
Types of Evaluation
• The different types of evaluation are:
Placement,
Formative (assessment for learning)
Diagnostic and
Summative evaluations (assessment of
learning ).
Placement Evaluation
– This is a type of evaluations carried out in order to fix
the students in the appropriate group or class.
– This is in form of pretest or aptitude test.
– Finding the right person for the right course
Cont
Formative Evaluation
• This is a type of evaluation designed to help both the student
and teacher to pinpoint areas where the student has failed to
learn so that this failure may be corrected.
• It provides a feedback to the teacher and the student and
thus estimating teaching success
e.g. weekly tests, terminal examinations etc.
Diagnostic Evaluation
• which is applied during instruction to find out the
underlying cause of students persistent learning
difficulties.
Cont
Summative evaluation:
• This is the type of evaluation carried out at the end of
the course of instruction to determine the extent to
which the instructional objectives have been
achieved.
• It is called a summarizing evaluation because it looks
at the entire course of instruction or programme and
can pass judgment on both the teacher and students,
the curriculum and the entire system.
• It is used for certification
Criterion-referenced and Norm-referenced Assessments
• Criterion-referenced Assessment: This type of
assessment allows us to quantify the extent students
have achieved the goals of a unit of study and a course.
• It is carried out against previously specified criteria and
performance standards.
• Criterion referenced classrooms are mastery-oriented,
informing all students of the expected standard and
teaching them to succeed on related outcome
measures.
• Criterion referenced assessments help to eliminate
competition and may improve cooperation.
Cont
• Norm-referenced Assessments : This type of assessment has
as its end point the determination of student performance
based on a position within a cohort of students – the norm
group.
• This type of assessment is most appropriate when one wishes
to make comparisons across large numbers of students or
important decisions regarding student placement and
advancement.
• The criterion-referenced assessment emphasizes description
of student’s performance, and the norm-referenced
assessment emphasizes discrimination among individual
students in terms of relative level of learning.
Classroom test Vs Standardized Tests
• The classroom test, which is otherwise called teacher-made test.
• It is a major technique used for the assessment of students
learning outcomes.
• The classroom tests can be achievement or performance test
and/or any other type of test like practical test, etc prepared by
the teacher for his specific class and purpose based on what he
has taught.
• Most test prepared by teacher lack clarify in the wordings. The
questions of the tests are ambiguous, not precise, not clear and
most of the times carelessly worded,the questions are general or
global questions and tests fail item analysis test.
Cont
• Standardized Test: A standardized test is any form of test
that (1) requires all test takers to answer the same
questions, or a selection of questions from common bank of
questions, in the same way, and that (2) is scored in a
standard or consistent manner, which makes it possible to
compare the relative performance of individual students or
groups of students.
• A standardized test is a test that is given to students in a
very consistent manner; meaning that the questions on the
test are all the same, the time given to each student is the
same, and the way in which the test is scored is the same
for all students.
Chapter Three
The Role of Objectives in Education
• Objectives generally indicate the end points of a
journey.
• They specify where you want to be or what you intend
to achieve at the end of a process.
• An educational objective is that achievement which a
specific educational instruction is expected to make or
accomplish.
• It is the goal & outcome of any educational instruction.
• It is the purpose for which any particular educational
undertaking is carried out.
Characteristics/ Attributes of the Objectives
• Good objectives have three essential
characteristics:
– Behavior :
• Firstly, an objective must explain the
competency to be learned, the intended
change in the behavior of the learners.
• For this purpose it is necessary to use
the verb in the statement of the
objective which identifies an observable
behavior of the learner.
Cont
– Criterion:
• Secondly, an objective must clarify the
intended degree of performance.
• In other words objective should not only
indicate the change in the behavior of
the students but also the level or degree
of that change as well.
• For this purpose the statement of the
objective must indicate a degree of
accuracy, a quantity or proportion of
correct responses or the like.
Cont
– Conditions:
• Thirdly, an objective should describe the
conditions under which the learning will occur.
• In other words, under what circumstances the
learner will develop the competency? What
will the learner be given or already be expected
to know to accomplish the learning?
• For example, a condition could be stated as,
told a case study, shown a diagram, given a
map, after listening a lecture or observing a
demonstration, after through reading, etc.
Various Level of Educational Objectives
• Educational objectives can be specified at various
levels.
• These levels include the national level, the institutional
level and the instructional level.
• At the National Level
– At this level of educational objectives, we have merely policy
statements of what education should achieve for the nation.
– They are in broad outlines reflecting national interests,
values, aspirations and goals.
– The objectives are general and somewhat vague. At this
level, they may be interpreted. It can be in the form of the
National Policy on Education.
At the Institutional Level
– This is the intermediate objectives level.
– The aims are logically derived and related to both
the ones at the national level and the one’s at the
instructional levels.
– What are the objectives of your university(WKU)?
– By the time you look at the educational objectives of
three or four institutions, you would have noticed
that the educational objectives at this institution
have been established.
– They are narrowed to achieve local needs like the
kinds of certificate to be awarded by the institutions.
At the Instructional Level
• These can be realized in pieces at the
instructional level.
• Here, educational objectives are stated in the
form in which they are to operate in the
classroom.
Methods of Stating Instructional Objective
Lists of objectives for a subject or unit of study should
be detailed enough to clearly communicate the intent
of the instruction and to serve as an effective overall
guide in planning for teaching and evaluation
1. Stating general instructional objectives as intended
learning outcomes.
2. Listing, under each general objectives, a sample of
specific type of performance that students are
expected to demonstrate when they have achieved
the objectives
Instructional Objectives
Educational goal General objective Specific objective
Stated As a broad, long term out Not measureable, In terms of definite,
come to work towards encompasses a set of measureable and
specific learning out observable students
comes performance
Serve Primarily in Policy making In developing As performance
and general program syllabus, annual plan, indicator that
planning and resources of students have to
specific learning out demonstrate when
comes they have achieved
that objectives
Example Develop proficiency in the Comprehend the Identify detail that
basic skills of reading, literal meaning of are explicitly stated
writing and arithmetic writing material in passage
Cont
• The instructional objectives state what teaching is
expected to achieve, what the learner is expected
to learn from the instruction, how the learner is
expected to behave after being subjected to the
instruction and what s/he has to do in order to
demonstrate that s/he has learnt what is expected
from the instruction.
• These instructional objectives are therefore stated
in behavioral terms with the use of action verbs to
specify the desirable behavior which the learner
will exhibit in order to show that he has learnt.
Example
• At the end of this lesson or unit you should be able to
– Define the mean
– Calculate the mean in a given distribution of scores
– Explain the median of a set of scores.
• Now look at the action verbs, define, explain,
calculate, specify, construct, design, state, mention,
list, draw, choose, find, prepare, paint, apply, analyze,
deduce, differentiate etc.
– These can be easily evaluated. These are the type of
verbs you will be using when you specify your
objectives.
Classwork
• Write 5 instructional objectives to any course
unit of your choice using action verbs?
Bloom’s Taxonomy of Educational
Objectives
• It is a classification of thinking organized by levels of
complexity.
• Bloom’ Benjamin’s has put forward a taxonomy of
educational objectives, which provides a practical
framework within which educational objectives could
be organized and measured.
• In this taxonomy Bloom et al (1956) divided
educational objectives into three domains.
• Cognitive domain,
• Affective domain and
• Psychomotor domain.
Cognitive Domain
• The cognitive domain involves those objectives that deal
with the development of intellectual abilities and skills.
• These have to do with the mental abilities of the brain.
• The cognitive domain is categorized into seven
hierarchical:
– knoweldge/memory,
– comprehension,
– application,
– analysis,
– synthesis and
– Evaluation and
– more recently Creating.
Cont
• These levels are of hierarchical and increasing
operational difficulties that achievement of a
higher level of skill assumes the achievement of
the previous levels.
• This implies that a higher level of skill could be
achieved only if a certain amount of ability called
for by the previous level has been achieved.
• For instance, you cannot apply what you do not
know or comprehend
Hierarchical Levels of Bloom’s Cognition
Taxonomy
Evaluation
synthesis
Analysis
Application
Comprehension
Knoweldge/memory
Knowledge (or Memory)
• If you have studied the above figure, you
would have noticed that knowledge or
memory is the first, the lowest and the
foundation for the development of higher
order cognitive skills.
• It involves the recognition or recall of previous
learned information.
• There is no demand for understanding or
internalization of information.
• This cognitive level emphasizes the
psychological process of remembering.
Cont
• Action verbs which can serve as appropriate
signals, cues and clues that can effectively bring
out stored information in the mind include:
– to define, to describe, to arrange, to underline,
to state, to list, to tell, to label, to recall, to
identify, to remember, who, which, where,
when, what, recognize, how much, how many
etc.
• E.g. the students will list the six levels of cognitive
domain according to their complexity with out
error by tomorrow
Comprehension
• Comprehension is all about internalization of
knowledge.
• It involves making memory out of what is
stored in the brain file.
• It is on this basis that what is stored in the
brain can be understood and translated,
interpreted or extrapolated.
• It is only when you have known something that
you can understand it.
• Again it is only when you know and understand
that you can re-order or re-arrange.
• Action verbs here include:
– explain, compare, discuss, discriminate,
choose, represent, demonstrate, restate,
convert, interpret, re-arrange, re-order,
translate, rephrase, transform etc.
• E.g: By the end of the unit the student will
summarize the main concepts of the lesson in
grammatically correct English.
Application
• In the previous section we noted that you
cannot understand what you have not known.
• It also means that you cannot apply what you
do not understand.
• The use of abstractions in a concrete situation
is called application. These abstractions can
be in the form of general ideas, rules, or
procedures or generalized methods, technical
terms, principles, ideas and theories which
must be remembered, understood and
applied.
Cont
• Some action verbs here include : apply, build,
dramatize, illustrate, explain, operate
calculate, classify, solve, specify, state,
transfer, demonstrate, determine, design,
employ, predict, present, use which,
restructure, relate, employ, organize etc.
• E.g.: At the end of the lesson the student will
be able to solve quadratic equation not
covered in class
Analysis
• This is the breaking down of communication
into its constituent parts or elements in order
to establish the relationship or make the
relations between ideas expressed to be clear
or explicit.
• It means breaking a learnt material into parts,
ideas and devices for clearer understanding.
Cont
• It goes beyond application and involves such
action verbs as analyze, detect, determine,
establish, compare, debate, determine, why,
discriminate, distinguish, check consistency,
categories, establish evidence etc.
• E.g: Given a presidential speech, the student
will be able to point to the positions that
attach an individual rather than that
individual’s program.
Synthesis
• In synthesis you build up or put together
elements, parts, pieces and components in order
to form a unique whole or to constitute a new
form, plan, pattern or structure.
• In other words, synthesis is concerned with the
ability to put parts of knowledge together to
form a new knowledge.
• It requires fluency of novel ideas and flexible
mind.
• It allows students great freedom at looking for
solutions, using many possible approaches to
problem solving.
Cont
• Action verbs includes: plan, develop, devise,
write, tell, make, assemble, express, illustrate,
produce, propose, specify, suggest, document,
formulate, modify, organize, derive, design,
create, combine, construct, put together,
constituted, etc.
• E.g.:Given a short story, the student will
formulate a different but plausible ending.
Evaluation
• It involves making a quantitative or qualitative
judgment about a piece of communication, a
procedure, a method, a proposal, a plan etc.
• Based on certain internal or external criteria
alternatives abound, choice depends on the
result of judgment which we make consciously
or unconsciously based on values we held.
• Every day, we make judgments such as good
or bad, right or wrong, agree or disagree, fast
or slow etc.
Cont
• Action verbs include: agree, assess, compare,
appraise, choose, evaluate, why, validate,
judge, select, conclude, consider, decide,
contract, etc.
• Evaluation can be subdivided into:
– judgment in terms of internal criteria and
– judgment in terms of external criteria.
• E.g.: Given a previously unread short story, the
student will criticize the content and form of
the story
Creating
• It’s the highest level in the Bloom taxonomy revised
version.
• Making something new. It produce new or original
works.
• Action verbs: change, combine, compare, compose
construct, create, design, devise, formulate, generate
hypothesize, imagine, improve, invent, etc.
• We can assesses by
• Formulate a theory for…?
• Predict the outcome of…?
• How would you test…?
Affective Domain
• In the past, people were not very happy about
emotionalism in education.
• They argue that intellectualism had little or nothing
to do with the learner’s interests, emotions or
impulses.
• This is why a group of interests led by Tanner and
Tanner (1975) insist that the primary goals of
learning are affective.
• They are of the opinion that learners should not
learn what is selected for them by others. This is
because it amounts to imposition on the learners of
other peoples values and purposes.
Cont
• As a matter of fact, what we have in our school
systems is the discipline-centered curriculum
projects which focus on the cognitive learning
to the neglect of affective processes.
• Basically you, as a learner, have internalized
and appreciated /what you have been taught
or what you have learnt are demonstrated in
attitudes, likes and dislikes etc.
• Affective domain is generally covert in
behavior.
Cont
• The educational objectives here vary from simple
attention to complex and internally consistent
qualities of character and conscience.
• Examples of learning outcomes in the affective
domain are:
• The learner will be able to show awareness of the
rules and regulations in a technical workshop to
prevent accidents.
• The learner should be able to show his likeness
for neatness and accuracy in the use of
measurement instruments etc.
Cont
• Affective domain has five hierarchical categories.
• These level are: receiving, responding, valuing,
organization and characterization.
• Receiving
– This is the lowest level of the learning
outcomes in the affective domain. It means
attending.
– It is the learner’s willingness to attend to a
particular stimulus or his being sensitive to the
existence of a given problem, event, condition
or situation.
Cont
• Action verbs for stating specific learning
outcomes:
– Ask, choose, describe, follows, hold,
identify locate, name, point to so select,
give, perceive, favor, listen, attend, accept.
• E.g: The student will be able to listen to all of a
Mozart concerto without leaving his or her
seat.
Cont
• Responding:
– In this case the learner responds to the
event by participating.
– He does not only attend, he also reacts by
doing something.
Cont
• Action verbs for stating specific learning
outcomes:
– Answer, assist, confirm, discuss, perform,
receive, read, and write, lists, state, record,
report, talk.
• E.g.: the student will follow the directions given
in the book with out argument when asked to do
so.
Cont
• Valuing :
– This is concerned with the worth or value
or benefit which a leaner attaches to a
particular object, behavior or situation.
– This ranges in degree from mere
acceptance of value or a desire to
improve group skills to a more complex
level of commitment or an assumption of
responsibility for the effective functioning
of the group.
Cont
• Action verbs for stating specific learning
outcomes:
– Accepts, complete study, initiate, join,
follow, explain, propose, report,
differentiate.
• E.g: The student will express an opinion
about unclear disarmament when ever
national events raise the issue.
Cont
• Organization:
– In this level the learner starts to bring
together different values as an organized
system.
– He/she determines the interrelationships
and establishes the order of priority by
comparing, relating and synthesizing the
values.
– He then builds a consistent value system by
resolving any possible conflicts between
them.
Cont
• Action verbs for stating specific learning
outcomes:
– Adhere, alter, arrange, combine, defend,
generalize, integrate, modify, organize, relate.
• E.g: The student will be able to formulate the
reasons why she or he support civil rights
legislation that does not support her or his
belief.
Cont
• Characterization by a Value or a Value Complex
– At this stage the value system is so
internalized by the people of individuals so
that they act consistently in accordance with
such values, beliefs or ideals that comprise
their total philosophy or view of life.
– A life-style which reflects these beliefs and
philosophy are developed.
– The behavior of such individuals or groups can
be said to be controlled by the value system.
Cont
• Action verbs for stating specific learning
outcomes:
– Act, discriminate, display, influence, listen,
perform, practice, propose quality, quest
on, revise, serve, solve, use, verify.
• E.g: By the end of the semester, students will
demonstrate empathy and compassion
towards individuals from diverse backgrounds.
Psychomotor Domain
• Psychomotor domain deals with abilities and skillwhich are
physical in nature but activated by inter mental process.
• It is mostly concerned with a variety of learning activities
like hand writing speech, physical education, laboratory
science, industrial, arts, vocational and technical education.
It means therefore that the instructional objectives here will
make performance skills more prominent.
• The psychomotor domain has to do with muscular
activities. It deals with such activities which involve the use
of the limbs (hand) or the whole of the body.
• These tasks are inherent in human beings and normally
should develop naturally.
Cont
• Psychomotor domain is sub divided into
hierarchical levels.
– (i) Reflex movements
– (ii) Basic Fundamental movements
– (iii) Perceptual abilities
– (iv) Physical abilities
– (v) Skilled movements and
– (vi) Non-discursive communication.
Cont
• Reflex Movements
– At the lowest level of the psychomotor domain is the
reflex movements which every normal human being
should be able to make.
– The movements are all natural, except where the case is
abnormal, in which case it may demand therapy
programs.
– This level involves involuntary, automatic responses to
stimuli.
• Action Verbs:
– Respond, react, blink, withdraw, run.
• Example : Students will demonstrate the ability to quickly
withdraw their hand from a hot surface
Cont
• Basic Fundamental Movements
– Like the case of reflex movements, these are basic
movements which are natural.
– Educators have little or nothing to do with them,
except in an abnormal cases where special
educators step in to assist.
• There are three sub-categories at this stage.
Cont
• These are:
– Locomotor movement: which involves
movements of the body from place to place
such as crawling, walking, leaping, jumping etc.
– Non-locomotor movements: which involves
body movements that do not involve moving
from one place to another.
– They are movements that are performed while
staying in one place.
– These include muscular movements, wriggling
of the trunk, head and any other part of the
body.
Cont
– Manipulative movements: which involves the use
of the hands or limbs to move things to control
things etc.
• Basic fundamental movements: examples are,
jumping, leaping, crawling, walking, muscular
movements, wriggling of the trunk, head,
turning, twisting, etc.
• Action verbs:
– Walk, run, move, skip, jump, hold, and catch.
Cont
• Perceptual abilities
– This has to do with the senses and their
developments.
– Perceptual abilities are concerned with the
ability of the individuals to perceive and
distinguish things using the senses.
– Such individuals recognize and compare
things by physically tasting, smelling,
seeing, hearing and touching.
Cont
• Behavioral activities:
– Following verbal instruction, dodging a
moving ball, balancing the body, jump rove,
punting, catching.
• Action verbs:
– Discriminate, find the differences, select,
separate group.
Cont
• Physical abilities
– These abilities fall in the area of health
and physical education.
– You know that in athletics and games or
sports in general, you need physical
abilities and that these abilities can be
developed into varying degrees of
perfection with the help of practices.
Cont
• Behavioral Activities:
– Distance running, distance swimming
weight lifting, wrestling, touching toes,
backbend, and ballet exercises.
• Actions verbs:
– Bear with, hold, adopt, continuum carry.
Cont
• Skilled Movements
– This is a higher ability than the physical abilities.
Once you have acquired the physical abilities,
you know apply various types of these physical
abilities in making or creating things. You can
combine skills in manipulative, endurance and
flexibility in writing and drawing.
– For skills like drumming, typing or playing the
organ or the keyboard in music, you will need a
combination of manipulative movements and
some perceptive abilities and flexibility.
Cont
• There are three sub-levels of the skilled movements.
• These are
• simple adaptive skills,
• compound adaptive skills and
• complex adaptive skills.
• Behavioral Activities:
– Sawing, Waltzing, Hockey, Golf tennis, Gymnastic
twisting dives.
• Actions verbs:
– Manipulate, draw, dance, sing, play, and speak.
Cont
• Non-discursive Communication
– This is the highest level which demands a combination
of all the lower levels to reach a high degree of
expertise.
– Everybody that is normal can move his limbs and legs.
– But you must have some level of training, practice and
the ability to combine a variety of movements and
some perceptive abilities in order to do diving,
swimming, typing, driving, cycling etc.
• Action verbs:
– Gesture, non-verbal, cues, understand non-verbal
message.
Chapter Four
Class Room(Teacher-Made) Tests
• The main goal in classroom testing is to obtain a valid,
reliable and useful information concerning pupil
achievement.
• This requires a series of steps to be followed.
• Planning to a Good Classroom Test :
– Some preliminary considerations:
• The frequency of testing
• The effect of unannounced tests
• Mode of item presentation
• Looking for instructional objectives
• The test blue print
Cont
– The frequency of testing
• The frequency of testing is dependent on
the purpose it serves.
– Single testing may be sufficient for
selection, placement and summative
evaluation decisions.
– For diagnosis and formative evaluation
more frequent testing may be necessary.
Cont
• The effect of unannounced tests
– Surprise tests are generally not recommended
– Students perform slightly higher when they are
informed
– Unannounced tests may create unnecessary
anxiety among students
– Give students adequate preparation time
(students might read 10 subjects)
– Surprise tests do not promote either efficient
learning or higher achievement
Cont
• Modes of item presentation
– Mode of item presentation must be considered:
oral, paper and pencil, vedio, power point…
• If items must be presented orally, true-false or
completion tests should be used because
multiple choice and matching items contain to
many alternatives to be presented
• Studies showed that arranging items from easy
to hard will yield higher scores than if items
arranged in other sequence such as hard to
easy.
Guidelines in Planning a Classroom Test
Determine the purpose of the test;
Describe the instructional objectives and
content to be measured;
Determine the relative emphasis to be
given to each learning outcome;
Select the most appropriate item formats
(essay or objective);
Cont
Develop the test blue print to guide the test
construction;
Prepare test items that is relevant to the learning
outcomes specified in the test plan;
Decide on the pattern of scoring and the
interpretation of result;
Decide on the length and duration of the test; and
Assemble the items into a test, prepare direction
and administer the test.
Preparing Table Of Specifications (TOS)
• It is a tool used to ensure that a test or
assessment measures the content and
thinking skills that the test intends to
measure.
• It is the teacher’s blue print.
• It determines the content validity of the
tests.
• It is two-way table that relates the
instructional objectives to the course content
Cont
• It makes use of Bloom’s Taxonomy in determining
the levels of cognitive domain:
1. Preparing a list of learning outcomes, i.e. the type
of performance students are expected to
demonstrate
2. Outlining the contents of instruction, i.e. the area
in which each type of performance is to be shown,
and
3. Preparing the two way chart that relates the
learning outcomes to the instructional content.
TOS Matrix
Time Levels of Cognitive Abilities No. of Test %
Topic spent K C A HA Items
Step 1 Step 2 Step 9 Step 6 Step 4
Identify determine compute the number of items determine Find
the the time per topic per level the number the %
topics spent in of test items time
to be hours for per topic spent
Step 10
tested each for
topic Determine the test item
from each
placement and indicate it in the
the topic
cell per topic per level
syllabus
Step 3 Step 7 Allocate % marks for the Step 5
Total find the different levels determine 100%
total time Step 8 Compute number of the total test
spent items per levels items
Selection of Appropriate Test format
• There are no general guidelines provided in the
text for choosing the appropriate test format.
• The guidelines for choosing an appropriate test
format include:
– considering the developmental level of the
students,
– assessing the intended outcome, and
– embedding the assessment into instructional
tasks .
Cont
• What makes a test appropriate?
– The test will accurately evaluate the
learners' knowledge and progress ; it is easy
to administer, check and interpret the
results; it can be used several times; it can
affect the learning process positively and
keep students motivated throughout the
course.
Cont
• Appropriate tools or combinations of tools must
be selected and used if the assessment process
is to successfully provide information relevant to
stated educational outcomes.
• There are two general categories of test items:
• (a) objective items which require students to
select the correct response from several
alternatives or to supply a word or short phrase
to answer a question or complete a statement;
Cont
• (2) subjective or essay items which permit the
student to organize and present an original
answer.
• Objective items include multiplechoice, true-
false, matching and completion, while
• subjective items include short-answer essay,
extended-response essay, problem solving and
performance test items.
Factors to Be Consider When Selecting an
Item Format
• Choosing the test items for assessing the
students’ achievement is considered to be one
of the greatest elements in test construction.
• The Purpose of the test
• The most important factor to be considered is
what you want the test to measure.
Cont
• To measure expression of ideas, you would
use the essay; for spoken self expression, the
oral.
• To measure the extent of the pupil factual
knowledge, his understanding of principles or
ability to interpret, the objective test is prefer
because it is more economical.
Cont
• The time available to prepare and score the
test
• It will take less time to prepare five extended
response essay questions for a two-hour
twelfth – grade test than it would be prepare
75 multiple choice items for that the same test.
• But the time saved in preparing the essay test
may be used up in reading and grading the
responses.
Cont
• Number of examinees
• If there are only a few pupils to be tested and
if the test is not to be reused, then the essay
or oral test is practical.
• However, if a large number of pupils are to be
tested and/or if the test is to be reused at a
later time with another group, the objective
test is appropriate.
Cont
• Physical facilities
• If stenographic and reproduction facilities are
limited, the teacher is forced to use either the essay
test, with the questions written on the board, or
the oral test, or he/she can use the true /false or
short answer item by reading the questions aloud.
• However, multiple- choice items must be mimeo-
graphed or reproduced mechanically. Because they
are very complex and cover large amount of
material.
Cont
• Age of Students
• While constructing test items consider the age
of your students.
• Using a variety of format in a test create
confusion for young children than the older
one.
• Teacher skill
• Constructing objective test needs more skill
than essay test.
Preparing the Test Items
• Write test items according to the rules of construction
for the type(s)chosen
• Select the item to be included in the text according to
TOSs.
• Review and edit items according to guidelines
• Arrange items: decide on:
– Groups of items
– Sequence of items within the group
– Sequence of grouping
• Prepare direction for the test: if necessary prepare
directions for individual items.
• Decide on method of scoring
Cont
• Preparation of test items is the most important task
in the preparation step. Therefore care must be
taken in preparing a test item.
• The following principles help in preparing relevant
test items.
• Test items must be appropriate for the learning out
come to be measured:
– The test items should be so designed that it will
measure the performance described in the
specific learning outcomes. So that the test items
must be in accordance with the performance
described in the specific learning outcome.
Cont
• Test items should measure all types of
instructional objectives and the whole
content area:
– The items in the test should be so prepared that it
will cover all the instructional objectives—
Knowledge, understanding, thinking skills and
match the specific learning outcomes and subject
matter content being measured.
– When the items are constructed on the basis of
table of specification the items became relevant.
Cont
• The test items should be free from ambiguity:
– The item should be clear. Inappropriate
vocabulary and awkward sentence structure
should be avoided.
– The items should be so worded that all
pupils understand the task.
Cont
• The test items should be of appropriate
difficulty level:
– The test items should be proper difficulty level, so
that it can discriminate properly. If the item is
meant for a criterion-referenced test its difficulty
level should be as per the difficulty level indicated
by the statement of specific learning outcome.
– Therefore if the learning task is easy the test item
must be easy and if the learning task is difficult
then the test item must be difficult.
Cont
• The test item must be free from technical
errors and irrelevant clues:
– Sometimes there are some unintentional clues in
the statement of the item which helps the pupil to
answer correctly.
– For example grammatical inconsistencies, verbal
associations, extreme words (ever, seldom,
always), and mechanical features (correct
statement is longer than the incorrect).
– Therefore while constructing a test item careful
step must be taken to avoid most of these clues.
Cont
• Test items should be free from racial, ethnic
and sexual biasness:
– The items should be universal in nature. Care
must be taken to make a culture fair item.
– While portraying a role all the facilities of the
society should be given equal importance.
– The terms used in the test item should have
an universal meaning to all members of
group.
Cont
• What does it take to be a good item writer?
• To be a good item writer one should be
proficient in the following six areas:
• Know the subject matter thoroughly
– The greater the item writer’s knowledge the
subject matter, the greater the likelihood that
she/he will know and understand facts and
principles as well as some of popular
misconceptions.
Cont
• Know and understand the pupils being tested.
– The kinds of pupils the teacher deals with
will determine in part the kinds of item
format, vocabulary level and level of
difficulty of the teacher made test.
– For example primary school teacher seldom
use multiple choice item because young
children better able to response to the short
answer type.
Cont
• Be skilled in verbal expression
– It is essential that the item writer clearly
covey to the examinee the intent of the
question.
– In an oral examination, the pupil may have
the opportunity to ask for and receive
clarification of the question.
– But in paper-pencil test this is less possible.
Cont
• Be thoroughly familiar with various item format
– The item writer must be knowledgeable of the
various item formats – their strengths and
weaknesses, the error commonly made in this
and that type of item – and guidelines that
can assist her in preparing better test item.
• Be creative.
– Item writing needs creativity. The teachers’
ability of writing items in a novel way is very
crucial (Meherens & Lehmann,1984).
Chapter Five
Writing Objective Items
• Objective tests are those test items that are set in
such a way that one and only one correct answer
is available to a given item.
• In this case every scorer would arrive at the same
score for each item for each examination even on
repeated scoring occasions.
• This type of items sometimes calls on examinees
to recall and write down or to supply a word or
phrase as an answer (free – response type).
Writing Short Answer Items
• The short-answer items and completion test items are
essentially the same that can be answered by a word,
phrase, number or formula.
• They differ in the way the problem is presented. The
short answer type uses a direct question, where as the
completion test item consists of an incomplete
statement requiring the student to complete.
• The short-answer test items are one of the easiest to
construct
• This reduces the possibility that students will obtain the
correct answer by guessing
Disadvantages
• One is that they are unsuitable for assessing
complex learning outcomes.
• The other is the difficulty of scoring, this is
especially true where the item is not clearly
phrased to require a definitely correct answer
and the student’s spelling ability.
• The following suggestions will help to make
short-answer type test items to function as
intended.
Cont
• Word the item so that the required answer is
both brief and specific
• Do not take statements directly from textbooks
to use as a basis for short-answer items.
• A direct question is generally more desirable
than an incomplete statement.
• If the answer is to be expressed in numerical
units, indicate the type of answer wanted
Writing True-False (alternative – response)
items
• The chief advantage of true/false items is that they
do not require the student much time for answering.
• This allows a teacher to cover a wide range of
content by using a large number of such items.
• True/false test items can be scored quickly, reliably,
and objectively by any body using an answer key.
• If carefully constructed, true/false test items have
also the advantage of measuring higher mental
processes of understanding, application and
interpretation.
Disadvantages
• is that when they are used exclusively, they tend to promote
memorization of factual information: names, dates, definitions,
and so on.
• Some argue that another weakness of true/false items is that
they encourage students for guessing.
In addition true/false items:
• Can often lead a teacher to write ambiguous statements due to
the difficulty of writing statements which are clearly true or false
• Do not discriminate b/n students of varying ability as well as
other test items
• Can often include more irrelevant clues than do other item types
• Can often lead a teacher to favour testing of trivial knowledge
Suggestion to Teachers to Construct Good
Quality T/F Test Items
• Avoid negative statements, and never use double
negatives. In Right-Wrong or True-False items,
negatively phrased statements make it needlessly
difficult for students to decide whether that
statement is accurate or inaccurate.
• Restrict single-item statements to single concepts
• Use an approximately equal number of items,
reflecting the two categories tested.
• Make statements representing both categories equal
in length.
Writing matching exercises
• A matching item consists of two lists of words
or phrases according to a particular kind of
association indicated in the item’s directions.
• Matching items sometimes can work well if
you want your students to cross-reference and
integrate their knowledge regarding the listed
premises and responses.
• matching items can cover a good deal of
content in an efficient fashion
Advantage and disadvantages
The major advantage of matching items is its compact form,
which makes it possible to measure a large amount of related
factual material in a relatively short time.
Another advantage is its ease of construction.
• The main limitation of matching test items is that they are
restricted to the measurement of factual information based
on rote learning.
• Another limitation is the difficulty of finding homogenous
material that is significant from the perspective of the learning
outcomes.
• As a result test constructors tend to include in their matching
items material which is less significant
Guidelines
• Use fairly brief lists, placing the shorter entries on the
right
• Employ homogeneous lists
• Include more responses than premises
• List responses in a logical order
• Describe the basis for matching and the number of times
a response can be used(“Each response in the list at the
right may be used once, more than once, or not at all.”)
• Try to place all premises and responses for any matching
item on a single page.
Writing multiple choice items
• It can effectively measure many of the simple learning
outcomes, in addition, it can measure a variety of complex
cognitive learning outcomes.
• A multiple-choice item consists of a problem(item stem)
and a list of suggested solutions(alternatives, choices or
options).
• There are two important variants in a multiple-choice item:
• (1) whether the stem consists of a direct question or an
incomplete statement, and
• (2) whether the student’s choice of alternatives is supposed
to be a correct answer or a best answer.
Recognition Type
1.stem-and-options variety : the stem serves as the
problem
2.setting-and-options variety : the optional respon-ses are
dependent upon a setting or foundation of some sort, i.e.
graphical representation
3.group-term variety : consist of group of words or terms in
which one does not belong to the group
4.structured – response variety: makes use of structured
response which are commonly use in classroom testing
for natural science subjects
5.contained-option variety: designed to identify errors in a
word, phrase, sentence or paragraph
Advantage
• is its widespread applicability to the assessment of
cognitive skills and knowledge, as well as to the
measurement of students’ affect.
• Another advantage is that it’s possible to make them
quite varied in the levels of difficulty they possess.
• Cleverly constructed multiple-choice items can present
very high-level cognitive challenges to students.
• And, of course, as with all selected-response items,
multiple-choice items are fairly easy to score
Weaknesses
• is that when students review a set of alternatives for an item,
they may be able to recognize a correct answer that they would
never have been able to generate on their own.
• In that sense, multiple-choice items can present an exaggerated
picture of a student’s understanding or competence, which
might lead teachers to invalid inferences.
• Another serious weakness, one shared by all selected-response
items, is that multiple-choice items can never measure a
student’s ability to creatively synthesize content of any sort.
• Finally, in an effort to come up with the necessary number of
plausible alternatives, beginner item-writers sometimes toss in
some alternatives that are obviously incorrect.
Useful Ruels
• The question or problem in the stem must be self-
contained.
• Avoid negatively stated stems. If you have to use this kind
of item, emphasize the fact by underlining the negative
part, putting it in capital letters or using italics.
• Each alternative must be grammatically consistent with
the item’s stem
• Make all alternatives plausible, but be sure that one of
them is indisputably the correct or best answer.
• Randomly use all answer positions in approximately
equal numbers.
Cont
• Never use “all of the above” as an answer
choice, Not recommended but better to use
“none of the above” to make items more
demanding.
• Use 4or 5 alternatives in each item.
• Make all incorrect alternatives (i.e., distractors)
plausible and attractive
• Avoid terms such as "always" or "never," as
they generally signal incorrect choices.
Chapter Six
Essay Test
• According to Linn & Gronlund (2000:237) essay
tests allow for freedom of response. Students are
free to select, relate and present ideas in their own
words.
• But the freedom is a matter of degree. In some
instances that freedom is delimited to specific size.
• In other cases no restriction is put. So, based on the
extent of freedom essay tests can be classified into
restricted essay tests and extended response essay
tests.
Cont
Restricted Response Essay Questions
• These questions usually limit both the content and the
form of the response.
• The content is usually restricted by the scope of the
topic to be discussed.
• Limitations on the form of the response are generally
indicated in the question.
• e.g. a) Why is multiple choice item considered the
most versatile type? Answer in a brief paragraph.
• b) Describe two situations that demonstrate the
application of the Newton's third law of motion. Do
not use those examples discussed in class.
Cont
• Although delimiting students’ responses to essay
questions makes it possible to measure more
specific learning outcomes, these same
restrictions make them less valuable as a measure
of those learning outcomes emphasizing
integration, organization, and originality.
• This is because for higher order learning
outcomes, greater freedom of response is needed
(Linn & Gronlund, 2000).
Cont
Extended Response Essay Questions
• In this type of test no restriction either in form or
content are placed. Students can provide answers
by organizing their ideas the way they like.
• Moreover it is the student himself/herself that
determines the size of the answer. However, in
spite of the fact that this freedom allows for the
measurement of higher order skills scoring
difficulties come into play.
Cont
• The following are examples of extended
response essay tests.
• eg. a) Describe the influence of textbooks in
sex stereotyping.
• b) Write your own evaluation of the value of
the new pre-service Teacher Education
Program (TESO) in preparation of qualified or
well trained secondary school teachers.
Advantages and Disadvantages
Advantage of Essay tests:
1. are much easier in their preparation ,
2. cost low in typing ,
3. Give the chance for good expressing and
discovering the talents of the student’s
language , organization, interpretations ,….
4. It is very useful in small classes (10-15
students only)
Cont
Disadvantage of Essay tests:
1. Very difficult in correction , losses
concentration
2. Never been stable in corrections
3. Could be affected by the student’s hallow
effect
4. It took much time and effort in correcting
them
5. Not valid
6. Not objective
Suggestions for improving the phrasing of
essay test
1. Use essay questions only to evaluate
achievement that can’t be achieved by short-
answer tests
2. Phrase the questions so they require as
precisely as possible the specific mental
processes operating in specific subject
3. You can use the essay test as a “projective
technique” and give the student no information
about what basis to use in answering it.
Therefore, you can improve the student’s
insight of his abilities , difficulties , and ways of
thinking, so you can guide his learning
Cont
1. Phrase the questions so as give as many hints
concerning the organization of the student’s
answer
2. `should essay tests have choices ? It depends.
3. Plan the questions so that the pupil can
actually give adequate answers within the
allotted time if he has the required
achievement e.g. questions should be arranged
in order of increasing difficulty to promote a
better distribution of working time
Scoring/grading essay items
• Prepare an outline of the expected answer in
advance.
• The outline you prepare should contain the major
points to be included, the characteristic of the
answer (e.g. organization) to be evaluated and
the amount of credit to be allotted to each.
• Use the scoring method that is most appropriate.
There are two common methods of scoring essay
questions:
– analytical method and holistic method.
Cont
• Decide how to handle factors that are irrelevant to
the learning outcomes being measured.
• Several factors influence our evaluation of answers
that are not directly pertinent to the purpose of
the measurement. Prominent among these are :
– legibility of hand writing spelling
– sentence structure punctuation
• We should make an effort to keep such factors
from influencing our judgment when evaluating
the content of the answers.
Cont
• Evaluate all the responses to one question before
going to the next one. One factor that contributes to
unreliable scoring of essay questions is a shifting of
standards from one paper to the next.
• A paper with average answers may appear to be of
much higher quality when it follows a failing paper
than when it follows a near perfect one.
• One way to minimize this is to score all answers to the
first question, reorder the papers and score all
answers to the second question, and so on until all the
questions have been scored.
Cont
• When possible evaluate the answers without
looking at the students’ names.
• The general impression we form about each
student during our teaching is also a source of
bias in evaluating essay questions.
• It is not uncommon for a teacher to give a high
score to a poorly written answer by
rationalizing that “the student is really capable,
even though he/she didn’t express it clearly”.
Chapter Seven
Assembling, Administering, Scoring & Analysing Test
• Assembling Test Items
• Assembling involves
– recording the test items,
– reviewing them,
– arranging the items and the formats,
– writing directions and
– reproducing the test.
Cont
• Recording Test Items
– Each test item should be recorded on a
separate page.
• This will help maintain organization and
clarity.
– In addition to the item itself, please record
the content source from which the item has
been extracted.
• This information will assist in
understanding the context of the item.
Cont
– It is also important to record the specific
learning outcome that the item is measuring.
• This will allow for easy cross-checking with
the table of specifications.
– Make sure to leave sufficient space to record
item analysis data.
• This data will provide valuable insights into
the performance and effectiveness of the
test items.
Cont
• Reviewing Test Items
– Before tests are made ready for
reproduction and administration they
should be carefully reviewed.
– This is because there are many chances
errors might be committed when we
construct tests.
– Linn & Gronlund(2000) explain as we
concentrate so closely on some aspect of
item construction we may overlook others.
Cont
• This results in an accumulation of unwanted
errors that may distort the function of the
item.
• However such problems can be detected and
minimized by:
– reviewing the items after they have been set
aside for a few days.
– asking a fellow teacher to review and
criticize them.
Cont
• Arranging Items in the Test
– First test items should be arranged in sections by
item type i.e. all True/False, all short answer , all
multiple choice items etc. should be grouped
together.
– This has the following advantages
• we will have a single set of direction for each
type
• students can maintain the same mental set
through out each section
• scoring will be easier
Cont
• Linn and Gronlund(2000)suggest the following
arrangement order of items by format:
– True False
– Matching
– Short Answer/completion
– Multiple choice
– Essay
Cont
• Preparing Directions for the Test
– Sometimes due to problems with clarity of
directions students get confused with
regard to how they are supposed to
respond to the questions.
– This problem can be reduced if certain
guidelines are followed in writing test
directions.
Cont
• Linn & Gronlund (2000 ) suggest that test
directions generally should include:
– Purpose of the test
– Time allowed for completing the test,
– Directions for responding
– How to record the answers,
– What to do about guessing for selection type
items, and
– Basis for scoring open ended or extended
response tests.
Cont
• Administrating Tests
– The guiding principle of test administration is that
there should be a fair chance for all students to
demonstrate their achievement of the learning
outcomes being measured (Linn & Gronlund,
2000).
– In addition to this, we have to consider physical
and psychological conditions of testees as they
may help or hamper students from demonstrating
their full performances or achievements.
Cont
• The physical environment should be as conducive as
possible.
• Conducive physical environment includes that the
testing room should be quiet, there should be
adequate light, ventilation, adequate work space,
comfortable seats etc.
• The psychological conditions influence students’ scores
more seriously than the physical ones.
• The psychological conditions include mental
preparedness of testees to take and pass exams.
• Any condition that may result in tension should be
eliminated.
Cont
• According to Linn & Gronlund (2000), the following
are some sources of anxiety among students when
taking tests.
– threatening students with tests if they do not
behave in a required way
– warning students to do their best because the
test is important
– telling students to work fast in order to
complete on time.
– threatening students on consequences if they
fail
Cont
• Apart from working for the conduciveness of
physical and psychological factors there are some
practices we need to avoid when we administer
our tests. Linn & Gronlund (2000) list them as:
– Don't talk unnecessarily before letting
students start working.
– Keep interruptions to a minimum
– Avoid giving hints to students about individual
items.
– Discourage cheating.
Cont
• So in order to get valid results on students
achievement or performance we have to
discourage cheating.
• The best way to avoid cheating is careful
proctoring of testees.
• In a condition in which there is a large number
of testees, it is advisable to have another
person assist you.
• Another and complementary way is being
careful about seating arrangements.
Cont
• Scoring the Answers
– There are basically three types of scoring.
They are
• hand scoring,
• machine scoring and
• self scoring.
– Which scoring method to use depends on
availability of scoring equipments, the
speed with which the test results are
needed.
Cont
• Hand Scoring
– For completion items the teacher can prepare
a scoring key by writing out the answers on a
test paper or
– he/she may make a strip key that corresponds
to the column of blanks provided to the
student with either of these methods, the
teacher can place the scoring key next to the
pupil’s responses and score the papers quickly.
Cont
• Machine Scoring
– As the name itself implies in this type of
scoring a scoring machine is used.
– This is quite useful with large number of
testees like for example students taking
EGSCE all over Ethiopia
Item Analysis
Item analysis is the process of “testing the
item” to determine specifically whether the
item is functioning properly in measuring what
the entire test is measuring.
Item analysis begins after the test has been
administered and scored.
It involves detailed and systematic
examination of the testees’ responses to each
item to determine the difficulty level and
discriminating power of the item.
Cont
This also includes determining the
effectiveness of each option.
For an item to effectively measure what the
entire test is measuring and provide valid and
useful information, it should not be too easy
or too difficult.
General Purpose of Item Analysis
To select the best available items for future use and
keep it in the item bank
To find out structural or content defects in the items
To detect learning difficulties of the class as a whole
To identify for individual students area of weakness
and need of remediation
The Process of Item Analysis
• The item analysis procedures ( Example of
40 test takers)
Arrange the 40 test papers by ranking them in
order from the highest to the lowest score.
Select the best 10 papers (upper 25% of 40
testees) with the highest total scores and the
least 10 papers (lower 25% of 40 testees) with
the lowest total scores.
Cont
Drop the middle 20 papers (the remaining 50%
of the 40 testees) because they will no longer be
needed in the analysis.
Draw a table as shown in table 3.1 in readiness
for the tallying of responses for item analysis.
For each of the 10 test items, tabulate the
number of testees in the upper and lower groups
who got the answer right or who selected each
alternative (for multiple choice items).
Cont
Compute the difficulty of each item (percentage of
testees who got the item right).
Compute the discriminating power of each item
(difference between the number of testees in the upper
and lower groups who got the item right).
Evaluate the effectiveness of the distracters in each
item (attractiveness of the incorrect alternatives) for
multiple choice test items.
Computing Item Difficulty
The difficulty of an item may be defined as the
proportion of the examinees that marked the item
correctly.
The difficulty index, may range from 0 to 100
percent.
An item answered correctly by 60 percent of the
students is said to have difficulty index of 0.60.
Obviously any item of 0 or 100 percent difficulty
would not differentiate between good and poor
students and, therefore, has no functional value in
a test.
Cont
A general rule of thumb is to consider as
worthless for measurement any items whose
difficulty index is lower than 10 or higher
than 90.
50 is the most ideal difficulty level
Cont
The difficulty index P for each of the items is obtained by using the
formula:
Item Difficulty (P) = No of testees who got item right (T) X 100
Total No of testees responding to item (N)
i.e. P = T X 100
N
Thus for item I in table 3.1,
P = 14 X 100 = 0.7 X 100=70%
20
The item difficult indicates the percentage of testees who got the item
right in the two groups used for the analysis.
That is 0.7 x 100% = 70%.
Computing Item Discriminating Power (D)
• Item discrimination power is an index which
indicates how well an item is able to
distinguish between the high achievers and
low achievers given what the test is measuring.
• Test item regarded as having positive
discriminating power if the examinees that
rank higher in ability answer it correctly more
often than those who rank lower in ability
answer.
Cont
• An item may be considered as having no
discriminating power when good and poor
students answer it correct about equally
often.
• An item may be said to have negative
discriminating power when poor students
answer it correctly more often than the good
students.
• The discrimination index ranges from -1.00 to
+1.00.
Cont
Items with discrimination indexes above 0.20 are ordinarily regarded as
having sufficient discrimination power for use in most tests of academic
achievement.
It is obtained from this formula:
No of high scorers who No of low scorers who
Item Dp (D) = got items right (H) - got item right (L)
Total No of testees in upper group (n)
That is,
D=H–L
n
Hence for item 1 in table 3.1, the item discriminating power D is obtained thus:
D = H - L = 10-4 = 6 = 0∙60
n 10 10
Evaluating the Effectiveness of Distracters
• The distraction power of a distractor is its ability to differentiate
between those who do not know and those who know what the item
is measuring.
• That is, a good distracter attracts more testees from the lower group
than the upper group.
• Formula:
Option Distractor Power (Do) = marked option (L) - marked option (H)
Total No of testees in upper group (n)
Cont
That is,
Do = L - H
n
For item 1 of table 3.1 effectiveness of the distracters are:
For option A: Do = L – H = 2 - 0 = 0∙20
n 10
B: The correct option starred (*)
C: Do = L – H = 1 - 0 = 0∙10
N 10
D: Do = L – H = 3 - 0 = 0∙30
N 10
E: Do = L – H = 0-0 = 0∙00
N 10
Cont
Incorrect options with positive distraction
power are good distractors while one with
negative distracter must be changed or revised
and those with zero should be improved on
because they are not good.
Hence, they failed to distract the low
achievers.
Item-analysis procedure for Essays
• Difficulty Level (DL) = Sum of the upper group score + sum of the lower group score X100
Maximum score point for an item X sum of total group (U + L)
• Discrimination Power (DP) = Sum of the upper group score - sum of the lower group score
(Maximum score point for an item) X (½ sum of total group (U+L))
.
Cont
• For example: Let us assume 20 students took
an Essay type item which has 3 points. After
the teacher scored all the students’ work on the
specific item, he or she arranged students’
mark from the highest to the lowest.
• Then, like multiple choice item analysis,
he/she may take the upper group and the lower
groups proportionally.
Cont
• From the upper group, 6 students scored the
highest point (3) and 4 of them scored 2 points
and from the lower group students , 5 students
scored 2 points and the rest 5 scored 1 point
Cont
• Now let’s calculate the item difficulty level (DL) and discrimination power
(DP) the item as follows:
• The sum of the upper group score = (6 x 3) + (4x2) =26 points
• The sum of the lower group score = (5 x2) + (5 x 1) = 15 points
• DL of the item = (6 x 3) + (4x2) + (5 x2) + (5 x 1) X 100 =
3x20
26 + 15 X 100 = 68 %
• DP of the item = (6 x 3) + (4x2) - (5 x2) + (5 x 1) = 26- 15 = 11 = 0.36
3 X ½ (20) 30
Cont
• Based on the above information, the item is
reasonably good since the difficulty level is
68% and the discrimination level is 0.36 that
fulfills the characteristics of good item.
Chapter Eight
Test Statistics
• Measure of Central Tendency
– Mean, Median, Mode
• Measures of Variability
– Range, Quartile Deviation, Standard
Deviation
• Point Measures
– Quartiles, Deciles, Percentiles
Measure of Central Tendency
Mode: is the crude or inspectional average
measure. It is most frequently occurring score. It
is the poorest measure of central tendency.
Advantage: Mode is always a real value since it
does not fall on zero.
– It is simple to approximate by observation for
small cases.
– It does not necessitate arrangement of values.
Disadvantage:
– It is not rigidly defined and is inapplicable to
irregular distribution.
Cont
• What is the mode of these scores?
– 75,60,78, 75 76 75 88 75 81 75
Cont
Media: is the scores that divides the distribution
into halves. It is sometimes called the counting
average.
Advantage:
– It is the best measure when the distribution
is irregular or skewed.
– It can be located in an open-ended
distribution or when the data is incomplete
(ex. 80% of the cases is reported)
Disadvantage:
– It necessitates arranging of items according
to size before it can be computed.
Cont
• What is the median?
– 75,60,78, 75 76 75 88 75 81 75
Cont
Mean: is the most widely used and familiar
average. The most reliable and the most stable
of all measures of central tendency.
Advantage:
– It is the best measure for regular
distribution.
Disadvantage:
– It is affected by extreme values
• What is the mean?
75,60,78, 75 76 75 88 75 81 75
Quartiles
• Point measures where the distribution is
divided into four equal parts.
– Q1 : N/4 or the 25% of distribution
– Q2 : N/2 or the 50% of distribution
( this is the same as the median of
the distribution)
– Q3 : 3N/4 or the 75% of distribution
Deciles
• Point measures where the distribution is
divided into 10 equal groups.
– D1 : N/10 or the 10% of the distribution
– D2 : N/20 or the 20% of the distribution
– D3 : N/30 or the 30% of the distribution
– D4 : N/40 or the 40% of the distribution
– D5 : N/50 or the 50% of the distribution
– D….
– D9 : N/90 or the 90% of the distribution
Percentiles
• Point measures where the distribution is
divided into 100 equal groups
– P1 : N/1 or the 1% of the distribution
– P10 : N/10 or the 10% of the distribution
– P25 : N/25 or the 25% of the distribution
– P50 : N/50 or the 50% of the distribution
– P75 : N/75 or the 75% of the distribution
– P90 : N/90 or the 90% of the distribution
– P99 : N/99 or the 99% of the distribution
Measure of Variability
• Range
– R = highest score – lowest score
• Quartile Deviation
– QD = ½ (Q3 – Q1)
– It is known as semi inter quartile range
– It is often paired with median
Cont
Standard Deviation
– It is the most important and best measure
of variability of test scores.
– A small standard deviation means that the
group has small variability or relatively
homogeneous.
– It is used with mean.
Cont
• Compute the mean.
• Subtract the mean from each individual’s score.
• Square each of these individual scores.
• Find the sum of the squared scores (∑X 2).
• Divide the sum obtained in step 4 by N, the number
of students, to get the variance.
• Find the square root of the result of step 5. This
number is the standard deviation (SD) of the scores.
• Thus the formula for the standard deviation (SD) is:
SD=
TABLE 1
Class limits Midpoints (M) Frequency (f) f.M Cum f <
45 – 47 46 2 46(2) 30
42 – 44 43 3 43(3) 28
39 – 41 40 1 40(1) 25
36 – 38 37 2 37(2) 24
33 – 35 34 4 34(4) 22
30 – 32 31 4 31(4) 18
27 – 29 28 1 28(1) 14
24 – 26 25 3 25(3) 13
21 – 23 22 2 22(2) 10
18 – 20 19 3 19(3) 8
15 – 17 16 4 16(4) 5
12 – 14 13 1 13(1) 1
TOTAL 30
Cont
Mean = fM
f
fM – total of the product of the frequency (f)
and midpoint (M)
f – total of the frequencies
Cont
• Median = L + c [N/2 - cum f<]
fc
L – lowest real limit of the median class
cum f< – sum of cum f ‘less than’ up to but
below median class
fc – frequency of the median class
c – class interval
N – number of cases
Cont
Mode = L Mo + c/2 [ f1 – f2 ]
[2fo – f2 – f1]
LMo – lower limit of the modal class
c – class interval
f1 – frequency of class after modal class
f2 – frequency of class before modal class
f0 – frequency of modal class
Chapter Nine
Concepts of Validity and Reliability
• The evaluation of psychological tests centers
on the test's: Reliability and validity
• Reliability – is the consistency of the instrument.
A reliability is the ability of test to provide a
consistent test score on repeated measurement.
A reliable test is one that yields consistent scores
when a person takes two alternate forms of the test
or when an individual takes the same test on two or
more different occasions.
Cont
It’s synonymous to dependability,
consistency, reproducibility or replicability
over time.
Reliability can be considered as the degree to
which test scores are free from errors of
measurement.
Cont
Reliability always refers to the results obtained with an
evaluation instrument and not to the instrument itself
Reliability is a necessary but not sufficient condition
for validity
• Methods of estimating reliability
• More common methods
Measures of stability (test retest method)
Measures of equivalence (equivalent method)
Measures of internal consistency
– split-half
Cont
Test retest method
The same test is administered twice to the same
group of pupil with a given time interval between
the two administrations of the test
The resulting test score can be correlated and the
correlation coefficient provides a measure of
stability (index of stability or stability coefficient)
That indicates how stable the test result over the
given period of time.
Test retest method can be affected by time
interval.
Cont
Equivalent forms method
Involves the use of two different but equivalent
forms of tests(also called parallel or alternate
forms).
The two forms of tests are administered to the
same group of students in close succession and
the resulting scores are correlated
The correlation coefficient provides a measure of
equivalence, it indicates the degree to which both
forms of test are measuring the same aspect of
behavior.
Cont
Split half method
Estimated from administration of a single form of a
test
To split the test in to halves, the usual procedure is
to score the even-numbered and odd-numbered
items separately.
This provides two scores for each students, when
correlated, provides a measure of internal
consistency.
The degree to which the two halves are equivalent
Cont
• Factors that Affect Reliability :
Error of measurement
Lack of consistency in scoring, direction
and use of answer sheet affects reliability if
the same test administered twice .
Over all structure should be structured to
minimize error of measurement.
Cont
Test length
– The longer the test the greater reliability assuming other factors
constant.
Item difficulty
– Items with moderate difficulty enhances reliability
General guide line for interpreting reliability coefficients
Reliability coefficient value interpretation
• .90 and up excellent
• .80-.89 good
• .70-.79 adequate
• Below .70 may have limited applicability
Cont
• Validity - has to do with the ability to measure what
it is supposed to measure and the extent to which it
predicts outcomes.
• Types of validity
Content validity
• How well the test samples measure the learning target
and not other extraneous attributes.
• content validity shows the degree to which a measure
covers the range of meanings included within a
concept.
• Does the test emphasize what you have taught
Cont
Criterion-related validity-test score is related
to some criterion
- predictive-future performance-job or academic
success
- concurrent-current performance- job or
academic performance is where the results of
one test are compared with those of another test
across the same attribute.
Cont
• Construct validity is test performance can be
interpreted in terms of certain psychological
construct .
• It refers to whether the operational definition of
a variable actually reflects the theoretical
meanings of a concept.
• In other words, construct validity shows the
degree to which inferences are legitimately made
from the operationalizations in one’s study to the
theoretical constructs on which those
operationalizations are based.
Cont
• Face validity refers to researchers’ subjective
assessments of the presentation and
relevance of the measuring instrument as to
whether the items in the instrument appear to
be:
• relevant,
• reasonable,
• unambiguous and
• clear.
Factors Influencing Validity
Unclear direction
Too difficult vocabulary and sentence structure
Inappropriate level of difficulty of the test items
Poorly constructed test items
ambiguity
Too short test
Identifiable pattern of answers
Factors in test administration and scoring
Kinds of Scores/ Levels of Measurement
Data differ in terms of what properties of the real
number series (order, distance, or origin) we can
attribute to the scores.
• A nominal scale involves the assignment of different
numerals to categorize that are qualitatively different.
• For example, we may assign the numeral 1 for males
and 2 for females.
• These symbols do not have any of the three
characteristics (order, distance, or origin) we attribute
to the real number series.
• The 1 does not indicate more of something than the
0.
Cont
• An ordinal scale has the order property of a
real number series and gives an indication of
rank order.
For example, ranking students based on their
performance on a certain athletic event would
involve an ordinal scale. We know who is best,
second best, third best, etc.
But the ranked do not tell us anything about
the difference between the scores.
Cont
• With interval data we can interpret the
distances between scores.
• If, on a test with interval data, a Almaz has a
score of 60, Abebe a score of 50, and Beshatu
a score of 30, we could say that the distance
between Abebe’s and Beshatu’s scores (50 to
30) is twice the distance between Almaz’s and
Abebe’s scores (60 t0 50).
Cont
• If one measures with a ratio scale, the ratio of
the scores has meaning.
• Thus, a person whose height is 2 meters is twice
as a tall as a person whose height is 1 meter.
• We can make this statement because a
measurement of 0 actually indicates no height.
• That is, there is a meaningful zero point.
However, if a student scored 0 on a spelling
test, we would not interpret the score to mean
that the student had no spelling ability.
The End