Language Test Design Essentials
Language Test Design Essentials
Crafting a language test requires careful planning, meticulous execution, and a deep
understanding of both the material (language) and the purpose of the creation
(assessment). This chapter, delves into the intricate art of language test construction,
guiding you through the process of transforming theoretical knowledge into practical
assessment tools.
Designing a language test requires planning, systematic design, and quality control. This
section will present the key steps involved in developing an effective language test, from
defining its purpose and objectives to administering it in a standardized manner. The
following these steps can ensure that a language test is valid, reliable, and fair to test-
takers.
• Deciding a test’s purpose: Clearly establish the reason for the test (e.g., placement,
proficiency, achievement). Identify the target test-takers, including their age,
background, and language learning experience.
• Deciding the test’s objectives: Determine the specific language skills and
knowledge that the test should assess.
• Designing the test’s specification: Create a detailed blueprint of the test, outlining
the content, format, and scoring criteria. Specify the language skills to be tested,
the types of test items to be used, and the weighting of different sections.
• Writing the test items: Create test items that align with the test specifications and
accurately measure the intended language skills. Ensure that the items are clear,
unambiguous, and appropriate for the target population.
• Reviewing and moderating test items: Have experts review the test items for
clarity, accuracy, and fairness. Revise or discard items that are problematic.
• Administer the test in a standardized and controlled environment: Ensure that all
test-takers have equal opportunities to demonstrate their language
skills/competence.
Objective test items, with their clear-cut right or wrong answers, offer a world of
efficiency and objectivity in language assessment (Hughes, 2020). This section delves
into the realm of these structured assessment tools, exploring their diverse forms and
applications. From the familiar multiple-choice questions to true-false, matching, and
gap-fill items, the following section will unravel the characteristics that make them a
staple in language testing.
Multiple-choice questions are a common and versatile format used in various language
tests, from classroom quizzes to high-stakes standardized examinations. They offer
several advantages in terms of objectivity, efficiency, and ease of scoring, making them a
popular choice for assessing knowledge and skills in a wide range of language areas,
including grammar, vocabulary, reading comprehension, and listening comprehension
(Currie & Chiramanee, 2010).
• Stem: The initial part of the question, which presents the problem or situation to
be addressed.
• Options: A set of possible answers, usually consisting of three or four choices.
• Key: The correct answer among the options.
• Distractors: The incorrect answer options, designed to be plausible but incorrect.
The following diagram illustrates the components of a multiple choice test item
Below are examples of multiple choice items from the English textbook Ilearn Smart
World 9
Faulty MC items
When designing MC test items it is essential to avoid faulty MC items. Below are some
examples of faulty MC items:
Ambiguous question
Question: "What is the meaning of the word 'bank' in this sentence: 'I went to the
bank to deposit my paycheck.'"
(A) A financial institution
(B) The side of a river
(C) A row of seats
Fault: The word "bank" has multiple meanings, and the sentence does not provide
enough context to determine the intended meaning. This makes the question
ambiguous and potentially confusing for students.
More than one correct answer
Question: Which of the following are topics covered in Unit 3 of the English 10 textbook
in Vietnam?
(A) Environmental pollution
(B) Natural disasters
(C) Protecting endangered species
Fault: Environmental pollution, natural disasters, and Protecting endangered species are
often interconnected and could be addressed together.
Unclear instructions
Question: Read the following passage and...
(A) Identify the main idea.
(B) Find the supporting details.
(C) Determine the author's purpose.
Fault: The instructions are too broad. Students need more specific guidance on what they
are supposed to do with the passage.
Misleading distractors
Question: What is the correct spelling of the word?
(A) Accommodate
(B) Acommodate
(C) Accomodate
Fault: The distractors are too similar to the correct answer, making it difficult to identify
the correct spelling. This tests students' ability to spot minor differences rather than their
actual knowledge of spelling rules.
• Clear and concise stem: The stem should be clear, concise, and unambiguous. It
should present the problem or situation to be addressed in a straightforward
manner.
• Plausible distractors: Distractors should be plausible but incorrect. They should be
grammatically correct and relevant to the stem, but not the correct answer.
• Grammatically correct options: All options, including the key and the distractors,
should be grammatically correct.
• Avoid ambiguity: The stem and options should be free from ambiguity and double
negatives.
• Test a single skill: Each item should focus on a single skill or concept.
In general, multiple-choice items are a valuable tool for language assessment, offering
several advantages in terms of objectivity, efficiency, and ease of scoring. However, it is
crucial to recognize their limitations and use them appropriately in conjunction with other
assessment methods that more comprehensively assess language proficiency. When
designing multiple-choice items, careful attention should be paid to clarity, plausibility,
and the overall quality of the distractors to ensure that the items are effective and reliable.
• Comprehend explicitly stated information: Identify key details and understand the
literal meaning of the text.
• Draw inferences: Make logical deductions based on the information provided.
• Identify the scope of the text: Recognize what information is included and,
crucially, what is not.
To maximize the effectiveness and reliability of T/F/NG items, test developers should
consider the following:
• Clear and concise statements: Avoid complex language and ensure each statement
focuses on a single idea.
• Unambiguous wording: Statements should have only one possible interpretation
based on the text.
• Variety of difficulty levels: Include a range of items that assess both explicit and
implicit understanding.
• Avoid verbatim copying: Rephrase information from the text to prevent simple
matching exercises.
• Thorough review and piloting: Ensure items are accurate, clear, and free of bias
before including them in a test.
Fault: This statement is too general and relies on a potentially inaccurate cultural
stereotype. Musical preferences vary greatly among individuals, and there is no
definitive answer without specific data or context.
Fault: This detail might be mentioned in the text, but it's not crucial to the main
storyline or character development. Focusing on such trivial information does not
effectively assess comprehension of the key themes or messages.
Statement: "It is not impossible to learn English fluently with dedicated practice."
(T/F/NG)
Fault: The double negative ("not impossible") makes this statement unnecessarily
convoluted. It's better to rephrase it in a more straightforward way (e.g., "It is
possible to learn English fluently with dedicated practice.")
Overall, T/F/NG items are a valuable tool in language testing, providing an efficient and
objective means of assessing reading comprehension. By understanding their
characteristics, applications, and potential limitations, test developers can effectively
utilize this item type to create reliable and valid language assessments. However, careful
attention must be paid to item construction and piloting to ensure clarity, avoid
ambiguity, and promote accurate measurement of language proficiency.
Cloze tests are a versatile and widely used tool in language education. They involve
presenting learners with a text where certain words have been systematically removed
(typically every fifth, sixth, or seventh word) and replaced with blanks. Learners are then
tasked with filling in these blanks with appropriate words based on their understanding of
the context and their language knowledge. This chapter explores the nature of cloze tests,
their applications in language learning, their advantages and limitations, and best
practices for constructing and implementing them effectively.
By requiring learners to actively engage with the text and make informed choices about
missing words, cloze tests encourage deep processing of language and promote both
receptive and productive skills.
Cloze tests can be used for a variety of purposes in language teaching and assessment:
• Fixed-ratio deletion: Words are removed at regular intervals (e.g., every fifth
word). This is the most common type of cloze test.
• Variable-ratio deletion: Words are removed based on specific criteria, such as
targeting particular grammatical structures or vocabulary items.
• Rational deletion: Words are removed based on their importance for understanding
the text, creating a more challenging and nuanced assessment.
• C-test: The second half of every second word is deleted, requiring learners to
complete the words based on the remaining letters.
• Potential for ambiguity: Some blanks may have multiple possible answers,
making scoring subjective.
• Limited scope: May not fully capture complex aspects of language use, such as
pragmatic understanding or communicative competence.
• Artificiality: The artificial nature of deleting words can sometimes disrupt the
natural flow of the text.
• Choose appropriate texts: Select texts that are relevant to learners' interests and
proficiency levels.
• Determine deletion rate: Consider the difficulty level desired and the specific
skills being assessed.
• Provide clear instructions: Ensure learners understand the task and how to
respond.
• Pilot test the cloze: Administer the test to a small group of learners to identify any
ambiguities or issues.
• Provide feedback: Use the cloze test as a learning opportunity by providing
feedback on learners' responses and discussing the rationale behind correct
answers.
Below are the sample faulty items in a cloze test that should be avoided.
Fault: This sentence lacks sufficient context to determine the missing word. It could
be "museum," "zoo," "park," "historical site," or any other place that students might
visit on a field trip. This makes the item ambiguous and potentially frustrating for
test-takers.
Sentence: "Although it was raining ____, we decided to go for a walk in the park."
Fault: Placing the gap in the middle of the adverbial clause disrupts the natural flow
of the sentence. It makes it harder for students to understand the sentence structure
and choose the correct word (e.g., "heavily," "outside," etc.).
Fault: This item assumes knowledge of Vietnamese culture that might not be
relevant to all students, especially in an EFL context. It's better to choose words or
concepts that are more universally known or covered within the textbook's scope.
Fault: Including grammatically incorrect options can confuse students and hinder
their ability to identify the correct form of the verb (in this case, "does").
In short, cloze tests are a valuable tool in language education, offering a flexible and
efficient means of assessing and developing various language skills. By carefully
considering the principles of cloze test construction and implementation, educators can
harness their potential to enhance language learning and promote learner engagement and
autonomy.
Matching items are a common and effective question type found in English language
tests for high school students in Vietnam. They require students to connect related pieces
of information from two separate columns, testing their ability to recognize relationships,
analyze information, and make accurate connections. This chapter explores the
characteristics, applications, advantages, and limitations of matching items in English
tests, along with guidelines for constructing effective matching activities.
Understanding matching items
Students are tasked with identifying the correct relationships between the items in the two
columns and indicating the matches. This process assesses their ability to:
Matching items are versatile and can be used to assess various aspects of English
language proficiency:
• Efficiency: Can assess a wide range of knowledge and skills in a concise format.
• Objectivity: Scoring is straightforward and less prone to subjective interpretation.
• Clarity: The format is generally easy for students to understand and follow.
• Versatility: Can be adapted to assess different language skills and levels of
difficulty.
Limitations of matching items
• Limited cognitive demand: May primarily test recognition and recall rather than
higher-order thinking skills.
• Potential for guessing: Students may resort to guessing if they are unsure of the
correct answers, especially if the number of items in each column is equal.
• Difficulty in constructing effective items: Creating meaningful and unambiguous
matches can be challenging.
To maximize the effectiveness and validity of matching items, test developers should
consider the following:
Column A Column B
a) happy 1. big
b) large 2. joyful
c) sad 3. unhappy
d) angry 4. furious
e) small 5. tiny
Fault: Some words in Column A could have multiple synonyms in Column B. For
example, "happy" could be matched with both "joyful" and "glad" (if it were an
option), while "large" could be matched with both "big" and "huge" (if it were an
option). This ambiguity makes it difficult for students to determine the single best
match.
Column A Column B
a) photosynthesis 1. the process of making food in plants using sunlight
b) gravity 2. a type of animal that lives in the ocean
c) whale 3. a force that pulls objects towards each other
Fault: The item "whale" and its definition "a type of animal that lives in the ocean"
are not relevant to the other vocabulary words and definitions, which are related to
science and social studies. This creates a mismatch and can distract students.
Instructions: Match the sentences in Column A with the correct question tags in
Column B.
Column A Column B
a) She is a doctor, 1. isn't she?
b) They are playing football, 2. aren't they?
c) He has finished his work, 3. hasn't he?
d) We will go to the party, 4. won't we?
Overall, matching items are a valuable tool in English language tests for high school
students in Vietnam. They provide an efficient and objective way to assess various
language skills and knowledge areas. By adhering to the principles of effective item
construction and considering the potential limitations, educators can utilize matching
activities to create reliable and valid assessments that contribute to meaningful language
learning.
5.2.5 Designing gap-fill items
Gap-fill items, also known as fill-in-the-blank questions, are a staple in English language
tests for high school students in Vietnam. These questions require students to complete a
sentence or passage by filling in missing words or phrases, demonstrating their
understanding of grammar, vocabulary, and overall language structure. This section
delves into the nature of gap-fill items, their applications in English tests, their
advantages and limitations, and guidelines for effective construction and implementation.
Gap-fill items present students with a text where specific words or phrases have been
removed and replaced with blanks. Students must analyze the context and utilize their
language knowledge to determine the missing elements and complete the text coherently
and accurately. This process assesses their ability to:
• Apply grammatical knowledge: Identify the correct tense, form, and agreement
of verbs, nouns, pronouns, adjectives, and adverbs.
• Utilize vocabulary: Select appropriate words that fit the context and convey the
intended meaning.
• Understand sentence structure: Recognize the syntactic roles of words and phrases
within a sentence.
• Comprehend discourse: Maintain coherence and cohesion within the text by
using appropriate linking words and phrases.
Gap-fill items can be used to assess a wide range of language skills and knowledge areas:
• Open-ended: Students are free to choose any word or phrase that fits the context.
• Closed-ended: Students are provided with a word bank or a limited set of options
to choose from.
• Targeted: Gaps are strategically placed to assess specific grammar points or
vocabulary items.
• Sentence completion: Students complete a sentence by filling in a missing word
or phrase.
• Passage completion: Students fill in multiple gaps within a longer text.
• Potential for ambiguity: Some gaps may have multiple possible answers, making
scoring subjective.
• Limited scope: May not fully capture complex aspects of language use, such as
pragmatic understanding or communicative competence.
• Difficulty in constructing effective items: Creating gaps that have a single
correct answer and effectively assess the intended skill can be challenging.
• Choose appropriate texts: Select texts that are relevant to learners' interests and
proficiency levels.
• Determine the type and number of gaps: Consider the difficulty level desired
and the specific skills being assessed.
• Provide clear instructions: Ensure learners understand the task and how to
respond.
• Focus on key language points: Target specific grammar structures or vocabulary
items for assessment.
• Avoid excessive gaps: Too many gaps can disrupt the flow of the text and make
the task overwhelming.
• Thorough review and piloting: Ensure items are accurate, clear, and free of bias
before including them in a test.
Faulty gap-fill items
Fault: This sentence is too open-ended. Students could fill the gap with various
words like "store," "supermarket," "shop," "market," etc., making it difficult to
determine a single correct answer.
Fault: This item requires knowledge of Vietnamese culture that might not be
covered in the English textbook or familiar to all students.
Fault: This sentence provides no context or clues for the missing words. Students
are left to guess randomly, which doesn't effectively assess their language skills.
Fault: Using a highly specific or obscure word like "emblazoned" might not be
appropriate for a high school level gap-fill test, especially if it's not a word
commonly encountered in the textbook.
In general, gap-fill items are a valuable component of English language tests for high
school students in Vietnam. They provide a flexible and focused way to assess various
language skills, particularly grammar, vocabulary, and sentence construction. By
adhering to the principles of effective item construction and considering the potential
limitations, educators can utilize gap-fill activities to create meaningful assessments that
contribute to effective language learning.
Subjective test items, unlike their objective counterparts, require students to construct
their own responses rather than selecting from pre-defined options. These items, such as
essays, short answer questions, and oral interviews, offer a valuable means of assessing
deeper levels of language proficiency and cognitive skills in high school English
language learners in Vietnam. This chapter explores the characteristics, applications,
advantages, and limitations of subjective test items, along with guidelines for their
effective design and implementation.
Subjective test items typically present students with open-ended prompts or questions
that require them to generate unique responses using their own language and ideas. These
items assess a broader range of skills and knowledge, including:
Subjective test items are valuable for assessing various aspects of English language
proficiency:
• Clear and concise prompts: Provide specific and unambiguous instructions that
clearly define the task and expectations.
• Relevant to curriculum: Align prompts with the learning objectives and content
covered in the curriculum.
• Appropriate difficulty level: Ensure tasks are challenging yet attainable for the
students' proficiency level.
• Defined evaluation criteria: Develop clear scoring rubrics that outline the criteria
for evaluating responses and awarding marks.
• Rater training and standardization: Provide training to ensure consistent and
reliable scoring across different raters.
In the realm of English language assessment for high school students in Vietnam, the
choice of scoring methods plays a crucial role in determining the accuracy, fairness, and
effectiveness of evaluations. This section explores the key distinctions between objective
and subjective scoring methods, highlighting their respective advantages, limitations, and
applications in various assessment contexts.
Objective scoring methods are characterized by their clear-cut criteria and standardized
procedures, leaving little room for individual interpretation or bias. These methods are
typically employed for assessing responses to objective test items, such as multiple-
choice, true-false, and matching questions.
• Efficiency: Objective scoring is generally quick and efficient, allowing for rapid
evaluation of large numbers of test papers.
• Reliability: With standardized procedures and answer keys, objective scoring
produces consistent results, minimizing variations between different raters.
• Objectivity: The scoring process is free from personal biases or interpretations,
ensuring fairness and equal treatment for all test-takers.
• Ease of analysis: Objective scores are easily quantifiable, facilitating statistical
analysis and reporting of results.
• Subjectivity and bias: Raters' personal biases and interpretations can influence
scoring, potentially leading to inconsistencies and unfairness.
• Time-consuming: Evaluating open-ended responses requires careful reading and
analysis, making subjective scoring more time-consuming than objective scoring.
• Rater reliability: Ensuring consistency and agreement between different raters
requires thorough training and standardization procedures.
• Difficulty in providing specific feedback: Providing detailed and specific
feedback on subjective responses can be challenging.
• Clarity of Criteria: Establish clear and specific scoring criteria or rubrics that
outline the expectations for each assessment task and the characteristics of
different performance levels.
• Transparency: Communicate the scoring criteria to students beforehand to ensure
they understand the expectations and can focus their efforts accordingly.
• Consistency: Apply the scoring criteria consistently across all student work to
ensure fairness and equity in the evaluation process.
• Objectivity: Minimize personal biases and subjective interpretations when
evaluating student work, particularly for subjective assessments.
• Feedback: Provide constructive feedback to students that highlights their strengths
and areas for improvement, guiding their future learning.
• Accurate answer keys: Develop accurate and unambiguous answer keys for
objective test items, such as multiple-choice, true-false, and matching questions.
• Automated scoring: Utilize technology, where feasible, to automate the scoring
of objective tests, ensuring efficiency and accuracy.
• Item analysis: Analyze student responses to identify problematic items that may
need revision or removal from future assessments.
• Detailed rubrics: Develop detailed scoring rubrics that outline the specific criteria
for evaluating different aspects of performance, such as content, organization,
language use, and mechanics.
• Rater training: Provide thorough training to raters on the scoring rubrics to
ensure inter-rater reliability.
• Multiple raters: When possible, use multiple raters to evaluate subjective
responses, such as essays or oral presentations, and average their scores to
minimize bias.
• Anonymity: Mask student identities during the scoring process to prevent
unconscious biases from influencing evaluations.
• Holistic scoring: Consider the overall quality of the response, rather than focusing
solely on individual errors, to provide a more comprehensive assessment.
• Specificity: Provide specific and detailed feedback that pinpoints areas of strength
and weakness in the student's work.
• Actionable advice: Offer actionable advice on how the student can improve their
performance in the future.
• Timeliness: Provide feedback promptly while the assessment is still fresh in the
student's mind.
• Encouragement: Balance constructive criticism with positive reinforcement to
motivate students and foster a growth mindset.
Implementing best scoring practices is essential for ensuring accurate, fair, and
meaningful assessments of English language proficiency in Vietnam. By adhering to the
principles of clarity, consistency, objectivity, and constructive feedback, educators can
provide valuable evaluations that support student learning and contribute to the overall
effectiveness of the educational system.
In summary, the choice between objective and subjective scoring methods depends on
the specific learning objectives, the type of assessment tasks, and the desired level of
detail in the evaluation. By understanding the strengths and limitations of each approach,
educators can make informed decisions about scoring procedures, ensuring fair, reliable,
and meaningful assessments of English language proficiency for high school students in
Vietnam. Table 5.1 compares objective vs. subjective scoring:
Instruction: Carefully read each statement in the table below. For each statement, rate
your understanding using the following scale:
1: I do not understand this at all.
2: I understand this a little, but I need more help.
3: I understand this fairly well, but I have some questions.
4: I understand this very well and can explain it to others.
Objectives:
• To develop students' understanding of multiple-choice question design principles.
• To enhance students' critical thinking and analytical skills by analyzing a given
text and formulating appropriate test items.
Procedure: Students in groups/pairs read the following passage from Unit 2, the English
textbook Global Success for Grade 11. The students first read the passage carefully and
identify the main ideas, supporting details, any challenging vocabulary or grammatical
structures, and the overall tone and purpose of the passage. The items should include a
variety of question types:
o Factual questions: Test for recall of specific information (e.g., "What is the
main cause of...?").
o Inferential questions: Test for understanding of implied meanings and
making deductions (e.g., "What can be inferred about the author's attitude
towards...?").
o Vocabulary questions: Test for understanding of vocabulary in context.
o Main idea questions: Test for comprehension of the overall theme or central
message.
“Over the past two centuries, different generations were born and given different
names. Each generation comes with its characteristics, which are largely influenced
by the historical, economic, and social conditions of the country they live in.
However, in many countries the following three generations have common
characteristics.
Generation X refers to the generation born between 1965 and 1980. When Gen Xers
grew up, they experienced many social changes and developments in history. As a
result, they are always ready for changes and prepared to work through changes.
Gen Xers are also known as critical thinkers because they achieved higher levels of
education than previous generations.
Generation Y, also known as Millennials, refers to those born between the early
1980s and late 1990s. They are curious and ready to accept changes. If there is a
faster, better way of doing something, Millennials want to try it out. They also value
teamwork. When working in a team, Millennials welcome different points of view
and ideas from others.
Generation Z includes people born between the late 1990s and early 2010s, a time
of great technological developments and changes. That is why Gen Zers are also
called digital natives. They grew up online and never knew the world before digital
and social media. They are very creative and able to experiment with platforms to
suit their needs. Many Gen Zers are also interested in starting their own businesses
and companies. They saw so many people lose their jobs, so they think it is safer to
be your own boss than relying on someone else to hire you.
Soon a new generation, labelled Gen Alpha, will be on the scene. Let's wait and see
if we will notice the generation gap.” (From Unit 2, Global Success, Grade 11,
Hoang et al., 2018)
Cities around the world are becoming smarter, and you can do many things that
seemed impossible in the past.
In Singapore, the mobile app [Link] allows you to locate a nearby car park
easily, book a parking space, and make a payment. You can also extend your
booking or receive a refund if you leave early.
New York City (US) has one of the largest bike-sharing systems called Citi Bike.
Using a mobile app, you can unlock bikes from one station and return them to any
other station in the system, making them ideal for one-way trips.
In Copenhagen (Denmark), you can use a mobile app to guide you through the city
streets and tell how fast you need to pedal to make the next green light. The app can
also give you route recommendations and work out the calories you burn.
In London(UK), you don’t have to buy public transport tickets. You can just touch
your bank card on the card reader when you get on and off the bus or the
underground to pay for your trip.
In Toronto (Canada), you can book an appointment and see a doctor online a from
your own home. You can also receive prescriptions and any other documents you
need, all online.
When reading the passage, students need to highlight key information and identifying
main ideas. They review the characteristics of each type of statement:
Each pair/group should create at least 2 statements for each category (True, False, Not
Given), resulting in a minimum of 6 statements. Students should vary the difficulty level
of their statements and to avoid simply copying sentences directly from the text.
Procedure: Students into small groups search for a set of subjective test items in the
English tests for high school students in Vietnam (e.g. essay prompts, short answer
questions, oral interview questions from previous English tests or textbooks in use). They
discuss the following questions:
Procedure: Students in groups begin by reviewing the key learning objectives and
content areas covered in a unit or several units an English textbook for high school
students in Vietnam, then identify the knowledge and skills that should be assessed in an
achievement test. They then discuss the different types of subjective items commonly
used in achievement tests (essays, short answer questions, etc.). During group discussion,
students
Objective: To enable students to analyze and compare two test matrices (one for Grade 6
and one for Grade 8 in Vietnam as in the images below) and identify the key differences
in terms of skills, knowledge, and complexity.
Procedure: Students in groups spend time reviewing the content and structure of each
matrix. They use highlighters or colored pens to mark any noticeable differences between
the two matrices and then report the findings to the whole class. They should focus on
aspects like:
§ Skills: Are there any skills assessed in one matrix that are not
present in the other? Are the same skills assessed at different
levels of complexity?
§ Knowledge: Are there differences in the types or depth of
knowledge expected at each grade level?
§ Weighting: Are certain skills or knowledge areas given more
emphasis in one matrix compared to the other?