The nittygritty of language testing

THE NITTYGRITTY OF
LANGUAGE TESTING:
IMPLICATIONS FOR TEST
CONSTRUCTORS
MS. MARIA ZAHEER
PRINCE SULTAN UNIVERSITY
KSAALT Mini Conference 2015

• What are the principles of language testing?
• How can we define them?
• What factors can influence them?
• How can we measure them?
Objectives

What is a test?
‘A method of measuring a person’s ability,
knowledge, or performance in a given
domain’(Brown,2004:3).
Method
Measure
Ability
Performance /competence

Teaching
Assessments
Test
Adapted from Brown,
2004:5
Assessment is one
component of teaching
 Assessment helps
teachers to gain
information about every
aspects of their students
especially their
achievement.
An aspect that plays
crucial role in
assessment is tests.

A good test is constructed by considering the principles of language Testing
Validity
Reliability
PracticalityAuthenticity
Wash bacK

Validity is the extent, to which it exactly
measures what it is supposed to measure
(Hughes, 2003:26).
Construct
Validity
Content
Validity
Consequent
ial Validity
Criterion
validity
Face
validity

what is meant
to be measured
has to be
crystal clear.
The correlation
between the
contents of the test
and the language
skills, structures
Content
Validity
The test items
should really
represent the
course
objective.
http://www.slideshare.net/Samcruz5/validit
y-reliability-practicality?next_slideshow=1

the
relationship
between the
test score
and the
outcome.
The test score
should really
represent the
criterion that is
intended to
measure in the
test
Criterion
Validity

Criterion validity can be established through two
ways.
Concurrent Validity
A test is said to have
concurrent validity if its
result is supported by
other concurrent
performance beyond
the assessment itself
(Brown, 2004:24).
Predictive Validity
The predictive validity
tends to assess and
predict a student’s
possible future success
(Alderson et
al.,1995:180-183).

Construct Validity
Construct validity refers to concepts or theories which
are underlying the usage of certain ability including
language ability.
Construct validity shows that the result of the test really
represents the same construct with the ability of the
students which is being measured (Djiwandono,
1996:96).
Consequential Validity
Consequential validity to refer to the social consequences
of using a particular test for a particular purpose.
 The use of a test is said to have consequential validity to
the extent that society benefits from that use of the test.

Face Validity
 A test is said to have face validity if it looks to
other testers, teachers, moderators, and students as
if it measures what it is supposed to measure
(Heaton, 1990:159).
The test can be judged to have face validity by
simply look at the items of the test.
face validity can affect students in doing the test
(Brown, 2004:27 & Heaton, 1988:160

To overcome this, the test constructor has to consider
these:
Students will be more confident if they face a well- constructed,
expected format with familiar tasks.
Students will be less anxious if the test is clearly doable within
the allotted time limit.
 Students will be optimistic if the items are clear and
uncomplicated (simple).
 Students will find it easy to do the test if the directions are very
clear.
Students will be less worried if the tasks are related to their
course work (content validity).
Students will be at ease if the difficulty level presents a

Reliability
Reliability refers to the consistency of the
scores obtained (Gronlund, 1977:138).
Reliability actually does not really deal with
the test itself. It deals with the results of the
test.
The test results should be consistent.

Reliability
falls
Into
4 kinds
Student-Related
Reliability
Test
Administration
Reliability
Test Reliability
Rater Reliability
(Taken from Brown, 2004:21-22).

Test Administration
Reliability
The condition and situation in
which the test is administered.
Student-Related Reliability
This kind of reliability refers to
temporary illness, fatigue, a bad day,
anxiety and other physical or
psychological factors of the students.
Thus, the score obtained of the student
maybe not his/her actual score.
Test Reliability
The test fits into the time
constraints.
The items of the test should
be crystal clear that it will not
end with ambiguity.

Rater Reliability
This kind of reliability fall into two categories. They are:
1. Inter-rater reliability
It occurs when two or more scorers yield inconsistent
scores of the same test, possibility for lack of attention
to scoring criteria, inexperience, inattention, or even
biases.
2.Intra-rater reliability
It is a common occurrence for classroom teacher
because of unclear scoring criteria, fatigue, and bias
toward particular “good” or “bad” students or simple
carelessness.

Practicality
The relationship between available resources for the test, i.e. human
resources, material resources, time, etc. and resources which will be
required in the design, development, and use of the test (Bachman &
Palmer, 1996:35-36).

Brown (2004:19) defines practicality is in terms of:
1) Cost -The test should not be too expensive to conduct
2) Time- The test should stay within appropriate time
constraints.
3) Administration- The test should not be too complicated or
complex to conduct.
4) Scoring / Evaluation Practicality- The scoring/evaluation
process should fits into the time allocation.
Please put this text in graph

Authenticity
Authenticity is the degree of correspondence of the characteristics of
a given language test task to the features of a target language task
Brown (2004:28).
Brown (2004:28) also proposes considerations that might be helpful to
present authenticity in a test.
 The language in the test is natural as possible.
 Items are contextualized rather than isolated.
Topics are meaningful (relevant, interesting) to the learners.
Some thematic organization to items is provided, such as through a story or
episode.
Tasks represent, or closely approximate, real-world tasks.

Washback/Backwash
The term washback is commonly used in applied
linguistics. it is rarely found in dictionaries.
“An effect that is not the direct result of something”
Cambridge Advanced Learner’s Dictionary.
In dealing with principles of language assessment, these two
words somehow can be interchangeable.
Washback (Brown, 2004)
or
Backwash (Heaton, 1990)

The influence of testing on teaching and learning.
The influence itself can be positive or negative (Cheng et al. (Eds.),
2008:7-11)
Positive Wash back
Teachers and students have a positive
attitude toward the examination or test, and
work willingly and collaboratively towards
its objective (Cheng & Curtis, 200).
Negative Wash back
Negative Wash back does not give any
beneficial influence on teaching and
learning (Cheng and Curtis, 2008:9).

The quality of wash back might be independent of the quality of the test.
 (Fulcher & Davidson, 2007:225).
teachers as the test constructor need to consider the probability of the wash back
of tests which will be constructed and what the future impact on teaching and
learning later on.
Teaching and learning will be impacted in many different ways depending upon
the variables at play at specific contexts.
What these variables are, how they are to be weighted, and whether we can
discover patterns of interaction that may hold steady across contexts, is a matter for
ongoing research (Fulcher & Davidson, 2007:229).

Conclusion
 A test is good if it contains practicality, good
validity, high reliability, authenticity, and
positive wash back.
The five principles provides guidelines for
both constructing and evaluating the tests.
Teachers should apply these five principles in
constructing or evaluating tests which will be
used in assessment activities.

The nittygritty of language testing

More Related Content

What's hot

Viewers also liked

Similar to The nittygritty of language testing

More from Marzs

Recently uploaded

The nittygritty of language testing

Editor's Notes