0% found this document useful (0 votes)
45 views28 pages

Language Test Design Essentials

Methodology of teaching

Uploaded by

Huong Giang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views28 pages

Language Test Design Essentials

Methodology of teaching

Uploaded by

Huong Giang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 5: Language Test Design and Development

Crafting a language test requires careful planning, meticulous execution, and a deep
understanding of both the material (language) and the purpose of the creation
(assessment). This chapter, delves into the intricate art of language test construction,
guiding you through the process of transforming theoretical knowledge into practical
assessment tools.

5.1 Designing a language test

Designing a language test requires planning, systematic design, and quality control. This
section will present the key steps involved in developing an effective language test, from
defining its purpose and objectives to administering it in a standardized manner. The
following these steps can ensure that a language test is valid, reliable, and fair to test-
takers.

Figure 5.1 Steps for designing a language test

Deciding the Developing


Deciding a Writing the Reviewing the Administering
test's the test's
test's purpose test items test items the test
objectives specification

• Deciding a test’s purpose: Clearly establish the reason for the test (e.g., placement,
proficiency, achievement). Identify the target test-takers, including their age,
background, and language learning experience.
• Deciding the test’s objectives: Determine the specific language skills and
knowledge that the test should assess.
• Designing the test’s specification: Create a detailed blueprint of the test, outlining
the content, format, and scoring criteria. Specify the language skills to be tested,
the types of test items to be used, and the weighting of different sections.
• Writing the test items: Create test items that align with the test specifications and
accurately measure the intended language skills. Ensure that the items are clear,
unambiguous, and appropriate for the target population.
• Reviewing and moderating test items: Have experts review the test items for
clarity, accuracy, and fairness. Revise or discard items that are problematic.
• Administer the test in a standardized and controlled environment: Ensure that all
test-takers have equal opportunities to demonstrate their language
skills/competence.

5.2 Designing objective test items

Objective test items, with their clear-cut right or wrong answers, offer a world of
efficiency and objectivity in language assessment (Hughes, 2020). This section delves
into the realm of these structured assessment tools, exploring their diverse forms and
applications. From the familiar multiple-choice questions to true-false, matching, and
gap-fill items, the following section will unravel the characteristics that make them a
staple in language testing.

5.2.1 Designing multiple-choice test items

Multiple-choice questions are a common and versatile format used in various language
tests, from classroom quizzes to high-stakes standardized examinations. They offer
several advantages in terms of objectivity, efficiency, and ease of scoring, making them a
popular choice for assessing knowledge and skills in a wide range of language areas,
including grammar, vocabulary, reading comprehension, and listening comprehension
(Currie & Chiramanee, 2010).

Characteristics of multiple-choice items:

• Stem: The initial part of the question, which presents the problem or situation to
be addressed.
• Options: A set of possible answers, usually consisting of three or four choices.
• Key: The correct answer among the options.
• Distractors: The incorrect answer options, designed to be plausible but incorrect.

The following diagram illustrates the components of a multiple choice test item

Figure 5.2 Key components of a multiple-choice question

1. Stem: presents the problem


2. Keyed response: correct/best answer
3. Distracters: appear to be reasonable answers to the examinee who does
not know the content
4. Options: include the distracters and the keyed response.

Below are examples of multiple choice items from the English textbook Ilearn Smart
World 9

Source: Vo et al., (2024) Tieng Anh 9-I-learn Smart World

Faulty MC items

When designing MC test items it is essential to avoid faulty MC items. Below are some
examples of faulty MC items:

Ambiguous question
Question: "What is the meaning of the word 'bank' in this sentence: 'I went to the
bank to deposit my paycheck.'"
(A) A financial institution
(B) The side of a river
(C) A row of seats
Fault: The word "bank" has multiple meanings, and the sentence does not provide
enough context to determine the intended meaning. This makes the question
ambiguous and potentially confusing for students.
More than one correct answer
Question: Which of the following are topics covered in Unit 3 of the English 10 textbook
in Vietnam?
(A) Environmental pollution
(B) Natural disasters
(C) Protecting endangered species
Fault: Environmental pollution, natural disasters, and Protecting endangered species are
often interconnected and could be addressed together.

Unclear instructions
Question: Read the following passage and...
(A) Identify the main idea.
(B) Find the supporting details.
(C) Determine the author's purpose.
Fault: The instructions are too broad. Students need more specific guidance on what they
are supposed to do with the passage.

Misleading distractors
Question: What is the correct spelling of the word?
(A) Accommodate
(B) Acommodate
(C) Accomodate
Fault: The distractors are too similar to the correct answer, making it difficult to identify
the correct spelling. This tests students' ability to spot minor differences rather than their
actual knowledge of spelling rules.

Culturally biased question


Question: Which of these is a popular American holiday?
(A) Thanksgiving
(B) Diwali
(C) Ramadan
Fault: This question assumes knowledge of American culture, which may not be relevant
or fair to all students, especially in an EFL context.

Grammatically incorrect stem


Question: Which sentence is using the correct tense?
(A) I am go to the store.
(B) He have went to the library.
(C) She will going to the park.
Fault: The question stem itself contains grammatical errors, which can confuse students
and make it difficult to focus on the actual content of the question.
Overall, when creating multiple-choice questions, it is important to be clear, concise, and
avoid ambiguity. The goal is to assess students' language skills accurately, not to trick
them.

Advantages of multiple-choice items:

• Objectivity: Scoring is typically objective and straightforward, minimizing rater


bias.
• Efficiency: Can be administered and scored quickly and efficiently, especially
with the use of automated scoring systems.
• Versatility: Can be used to assess a wide range of language skills and knowledge
areas.
• Reliability: Multiple-choice tests generally exhibit high levels of reliability,
meaning they tend to produce consistent results across different administrations
(Thanyapa & Currie, 2014).

Designing effective multiple-choice items:

• Clear and concise stem: The stem should be clear, concise, and unambiguous. It
should present the problem or situation to be addressed in a straightforward
manner.
• Plausible distractors: Distractors should be plausible but incorrect. They should be
grammatically correct and relevant to the stem, but not the correct answer.
• Grammatically correct options: All options, including the key and the distractors,
should be grammatically correct.
• Avoid ambiguity: The stem and options should be free from ambiguity and double
negatives.
• Test a single skill: Each item should focus on a single skill or concept.

Limitations of multiple-choice items:

• Limited assessment of higher-order thinking skills: Multiple-choice questions may


not effectively assess higher-order thinking skills such as critical thinking,
problem-solving, and creative expression.
• Guessing: Students may be able to guess the correct answer, which can impact test
scores.
• Limited ability to assess writing and speaking skills: Multiple-choice formats are
less suitable for assessing complex skills such as writing and speaking.

In general, multiple-choice items are a valuable tool for language assessment, offering
several advantages in terms of objectivity, efficiency, and ease of scoring. However, it is
crucial to recognize their limitations and use them appropriately in conjunction with other
assessment methods that more comprehensively assess language proficiency. When
designing multiple-choice items, careful attention should be paid to clarity, plausibility,
and the overall quality of the distractors to ensure that the items are effective and reliable.

5.2.2 Designing true-false-not given test items

True-False-Not Given (T/F/NG) questions, also known as True-False-Cannot Say, are a


common item type in language proficiency tests, particularly in reading comprehension
assessments. They require test-takers to evaluate the veracity of a statement based on a
provided text, determining whether the information is explicitly stated (True),
contradicted (False), or not mentioned at all (Not Given). This chapter delves into the
characteristics, applications, advantages, and limitations of T/F/NG items in language
testing.

Understanding T/F/NG items

T/F/NG items assess a test-taker's ability to:

• Comprehend explicitly stated information: Identify key details and understand the
literal meaning of the text.
• Draw inferences: Make logical deductions based on the information provided.
• Identify the scope of the text: Recognize what information is included and,
crucially, what is not.

These items typically consist of a reading passage followed by a series of statements.


Test-takers must carefully analyze each statement and compare it to the information
presented in the text to determine the correct response.

Applications in language testing

T/F/NG items are widely used in various language tests, including:

• Academic proficiency tests: Such as IELTS, TOEFL, and Cambridge exams,


where they assess reading comprehension skills crucial for academic success.
• General proficiency tests: Used in tests like TOEIC to evaluate comprehension
abilities relevant to everyday situations and workplace communication.
• Placement tests: Help determine a learner's language level and assign them to
appropriate classes.

Advantages of T/F/NG items

• Ease of construction and scoring: Relatively straightforward to develop and can be


scored objectively.
• Efficient assessment: Allow for the assessment of a wide range of information
within a single passage.
• Focus on detailed comprehension: Encourage close reading and attention to
specific information.

Limitations of T/F/NG items

• Ambiguity: Poorly written statements can be open to interpretation, leading to


confusion and unreliable results.
• Limited scope: Primarily assess comprehension of factual information and may
not adequately evaluate inferential or critical thinking skills.
• Potential for test-wiseness: Test-takers can sometimes use strategies to guess the
correct answer even without fully understanding the text.

Constructing effective T/F/NG items

To maximize the effectiveness and reliability of T/F/NG items, test developers should
consider the following:

• Clear and concise statements: Avoid complex language and ensure each statement
focuses on a single idea.
• Unambiguous wording: Statements should have only one possible interpretation
based on the text.
• Variety of difficulty levels: Include a range of items that assess both explicit and
implicit understanding.
• Avoid verbatim copying: Rephrase information from the text to prevent simple
matching exercises.
• Thorough review and piloting: Ensure items are accurate, clear, and free of bias
before including them in a test.

Faulty true-false-not given items

Below are some examples of faulty true-false-not given items

Vague statement with cultural bias

Statement: "Most Vietnamese people prefer traditional music to modern music."


(T/F/NG)

Fault: This statement is too general and relies on a potentially inaccurate cultural
stereotype. Musical preferences vary greatly among individuals, and there is no
definitive answer without specific data or context.

Statement with trivial detail


Text: "The story describes a young girl named Lan who lives in a small village in
the Mekong Delta. She loves to help her parents with their rice farm and dreams of
becoming a doctor."

Statement: "Lan's favorite animal is a water buffalo." (T/F/NG)

Fault: This detail might be mentioned in the text, but it's not crucial to the main
storyline or character development. Focusing on such trivial information does not
effectively assess comprehension of the key themes or messages.

Ambiguous statement with negatives

Statement: "It is not impossible to learn English fluently with dedicated practice."
(T/F/NG)

Fault: The double negative ("not impossible") makes this statement unnecessarily
convoluted. It's better to rephrase it in a more straightforward way (e.g., "It is
possible to learn English fluently with dedicated practice.")

Overall, T/F/NG items are a valuable tool in language testing, providing an efficient and
objective means of assessing reading comprehension. By understanding their
characteristics, applications, and potential limitations, test developers can effectively
utilize this item type to create reliable and valid language assessments. However, careful
attention must be paid to item construction and piloting to ensure clarity, avoid
ambiguity, and promote accurate measurement of language proficiency.

5.2.3 Designing cloze test items

Cloze tests are a versatile and widely used tool in language education. They involve
presenting learners with a text where certain words have been systematically removed
(typically every fifth, sixth, or seventh word) and replaced with blanks. Learners are then
tasked with filling in these blanks with appropriate words based on their understanding of
the context and their language knowledge. This chapter explores the nature of cloze tests,
their applications in language learning, their advantages and limitations, and best
practices for constructing and implementing them effectively.

Understanding cloze tests

Cloze tests assess various aspects of language proficiency, including:

• Reading comprehension: Understanding the overall meaning and flow of a text.


• Vocabulary knowledge: Selecting appropriate words that fit the context.
• Grammatical awareness: Choosing words that are grammatically correct within the
sentence structure.
• Discourse competence: Recognizing cohesive devices and understanding how
sentences connect to form a coherent text.

By requiring learners to actively engage with the text and make informed choices about
missing words, cloze tests encourage deep processing of language and promote both
receptive and productive skills.

Applications in language education

Cloze tests can be used for a variety of purposes in language teaching and assessment:

• Assessing language proficiency: Measuring overall language ability or specific


skills like vocabulary or grammar.
• Diagnosing learner needs: Identifying areas of strength and weakness in language
comprehension and production.
• Developing language skills: Enhancing vocabulary acquisition, grammatical
awareness, and reading comprehension through practice and feedback.
• Promoting learner autonomy: Encouraging learners to take responsibility for
their learning by actively engaging with texts and making choices.

Types of cloze tests

• Fixed-ratio deletion: Words are removed at regular intervals (e.g., every fifth
word). This is the most common type of cloze test.
• Variable-ratio deletion: Words are removed based on specific criteria, such as
targeting particular grammatical structures or vocabulary items.
• Rational deletion: Words are removed based on their importance for understanding
the text, creating a more challenging and nuanced assessment.
• C-test: The second half of every second word is deleted, requiring learners to
complete the words based on the remaining letters.

Advantages of cloze tests

• Versatility: Can be adapted to assess different language skills and proficiency


levels.
• Objectivity: Scoring can be relatively straightforward, especially with fixed-ratio
deletion.
• Efficiency: Can assess a wide range of language knowledge in a relatively short
time.
• Authenticity: Can be based on authentic texts, increasing engagement and
relevance for learners.

Limitations of cloze tests

• Potential for ambiguity: Some blanks may have multiple possible answers,
making scoring subjective.
• Limited scope: May not fully capture complex aspects of language use, such as
pragmatic understanding or communicative competence.
• Artificiality: The artificial nature of deleting words can sometimes disrupt the
natural flow of the text.

Constructing effective cloze tests

• Choose appropriate texts: Select texts that are relevant to learners' interests and
proficiency levels.
• Determine deletion rate: Consider the difficulty level desired and the specific
skills being assessed.
• Provide clear instructions: Ensure learners understand the task and how to
respond.
• Pilot test the cloze: Administer the test to a small group of learners to identify any
ambiguities or issues.
• Provide feedback: Use the cloze test as a learning opportunity by providing
feedback on learners' responses and discussing the rationale behind correct
answers.

Faulty items in a cloze test

Below are the sample faulty items in a cloze test that should be avoided.

Faulty items in a cloze test

Missing word with multiple possibilities

Sentence: "The students were excited to go on a field trip to the ______."

Fault: This sentence lacks sufficient context to determine the missing word. It could
be "museum," "zoo," "park," "historical site," or any other place that students might
visit on a field trip. This makes the item ambiguous and potentially frustrating for
test-takers.

Gaps too close together

Sentence: "My friend and I went to the ____ to ____ a movie."


Fault: Having two gaps so close together makes it difficult for students to focus on
each individual word and its grammatical function. It increases the cognitive load
and might lead to guessing rather than demonstrating actual language proficiency.

Gaps that disrupt sentence flow

Sentence: "Although it was raining ____, we decided to go for a walk in the park."

Fault: Placing the gap in the middle of the adverbial clause disrupts the natural flow
of the sentence. It makes it harder for students to understand the sentence structure
and choose the correct word (e.g., "heavily," "outside," etc.).

Gaps requiring specialized knowledge

Sentence: "The ______ is a traditional Vietnamese musical instrument."

Fault: This item assumes knowledge of Vietnamese culture that might not be
relevant to all students, especially in an EFL context. It's better to choose words or
concepts that are more universally known or covered within the textbook's scope.

Gaps with grammatically incorrect options

Sentence: "She ______ her homework every day after school."

Options: (A) do, (B) does, (C) doing, (D) done

Fault: Including grammatically incorrect options can confuse students and hinder
their ability to identify the correct form of the verb (in this case, "does").

In short, cloze tests are a valuable tool in language education, offering a flexible and
efficient means of assessing and developing various language skills. By carefully
considering the principles of cloze test construction and implementation, educators can
harness their potential to enhance language learning and promote learner engagement and
autonomy.

5.2.4 Designing matching items

Matching items are a common and effective question type found in English language
tests for high school students in Vietnam. They require students to connect related pieces
of information from two separate columns, testing their ability to recognize relationships,
analyze information, and make accurate connections. This chapter explores the
characteristics, applications, advantages, and limitations of matching items in English
tests, along with guidelines for constructing effective matching activities.
Understanding matching items

Matching items typically consist of two columns:

• Premise Column: Contains a list of items, such as definitions, descriptions,


questions, or sentence halves.
• Response Column: Contains a corresponding list of items that need to be matched
to the items in the premise column, such as vocabulary words, phrases, answers, or
sentence completions.

Students are tasked with identifying the correct relationships between the items in the two
columns and indicating the matches. This process assesses their ability to:

• Comprehend and analyze information: Understand the meaning and context of


items in both columns.
• Identify relationships: Recognize connections, patterns, and associations
between items.
• Apply knowledge: Utilize their vocabulary, grammar, and reading comprehension
skills to make accurate matches.

Applications in English tests

Matching items are versatile and can be used to assess various aspects of English
language proficiency:

• Vocabulary: Matching words to definitions, synonyms, antonyms, or pictures.


• Grammar: Matching sentence halves, verb forms to tenses, or pronouns to
antecedents.
• Reading comprehension: Matching headings to paragraphs, characters to
descriptions, or events to timelines.
• Functional language: Matching phrases to situations, requests to responses, or
questions to answers.

Advantages of matching items

• Efficiency: Can assess a wide range of knowledge and skills in a concise format.
• Objectivity: Scoring is straightforward and less prone to subjective interpretation.
• Clarity: The format is generally easy for students to understand and follow.
• Versatility: Can be adapted to assess different language skills and levels of
difficulty.
Limitations of matching items

• Limited cognitive demand: May primarily test recognition and recall rather than
higher-order thinking skills.
• Potential for guessing: Students may resort to guessing if they are unsure of the
correct answers, especially if the number of items in each column is equal.
• Difficulty in constructing effective items: Creating meaningful and unambiguous
matches can be challenging.

Constructing effective matching items

To maximize the effectiveness and validity of matching items, test developers should
consider the following:

• Clear and concise instructions: Provide specific directions on how to complete


the matching task.
• Homogeneous content: Ensure all items within a matching set are related to a
common theme or topic.
• Unequal number of items: Include more options in the response column to
reduce the chance of guessing.
• Plausible distractors: Include response options that are similar to the correct
answers to increase the difficulty level.
• Logical arrangement: Arrange items in a clear and organized manner, such as
alphabetically or chronologically.
• Thorough review and piloting: Ensure items are accurate, clear, and free of bias
before including them in a test.

Faulty matching items

Ambiguous matching with multiple possibilities

Instructions: Match the words in Column A with their synonyms in Column B

Column A Column B
a) happy 1. big
b) large 2. joyful
c) sad 3. unhappy
d) angry 4. furious
e) small 5. tiny

Fault: Some words in Column A could have multiple synonyms in Column B. For
example, "happy" could be matched with both "joyful" and "glad" (if it were an
option), while "large" could be matched with both "big" and "huge" (if it were an
option). This ambiguity makes it difficult for students to determine the single best
match.

Irrelevant or mismatched items

Instructions: Match the vocabulary words with their definitions.

Column A Column B
a) photosynthesis 1. the process of making food in plants using sunlight
b) gravity 2. a type of animal that lives in the ocean
c) whale 3. a force that pulls objects towards each other

Fault: The item "whale" and its definition "a type of animal that lives in the ocean"
are not relevant to the other vocabulary words and definitions, which are related to
science and social studies. This creates a mismatch and can distract students.

Grammatically incorrect items

Instructions: Match the sentences in Column A with the correct question tags in
Column B.

Column A Column B
a) She is a doctor, 1. isn't she?
b) They are playing football, 2. aren't they?
c) He has finished his work, 3. hasn't he?
d) We will go to the party, 4. won't we?

Fault: One or more of the question tags in Column B might be grammatically


incorrect. For example, the question tag for "She is a doctor" should be "isn't she?"
not "doesn't she?" This tests students' ability to identify grammatical errors rather
than their ability to match question tags.

Overall, matching items are a valuable tool in English language tests for high school
students in Vietnam. They provide an efficient and objective way to assess various
language skills and knowledge areas. By adhering to the principles of effective item
construction and considering the potential limitations, educators can utilize matching
activities to create reliable and valid assessments that contribute to meaningful language
learning.
5.2.5 Designing gap-fill items

Gap-fill items, also known as fill-in-the-blank questions, are a staple in English language
tests for high school students in Vietnam. These questions require students to complete a
sentence or passage by filling in missing words or phrases, demonstrating their
understanding of grammar, vocabulary, and overall language structure. This section
delves into the nature of gap-fill items, their applications in English tests, their
advantages and limitations, and guidelines for effective construction and implementation.

Understanding gap-fill items

Gap-fill items present students with a text where specific words or phrases have been
removed and replaced with blanks. Students must analyze the context and utilize their
language knowledge to determine the missing elements and complete the text coherently
and accurately. This process assesses their ability to:

• Apply grammatical knowledge: Identify the correct tense, form, and agreement
of verbs, nouns, pronouns, adjectives, and adverbs.
• Utilize vocabulary: Select appropriate words that fit the context and convey the
intended meaning.
• Understand sentence structure: Recognize the syntactic roles of words and phrases
within a sentence.
• Comprehend discourse: Maintain coherence and cohesion within the text by
using appropriate linking words and phrases.

Applications in English tests

Gap-fill items can be used to assess a wide range of language skills and knowledge areas:

• Grammar and syntax: Testing knowledge of verb tenses, prepositions, articles,


conjunctions, and sentence structure.
• Vocabulary: Assessing understanding of word meanings, collocations, and
idiomatic expressions.
• Reading comprehension: Evaluating ability to understand the overall meaning
and flow of a text.
• Writing skills: Measuring ability to produce grammatically correct and
meaningful sentences.

Types of gap-fill items

• Open-ended: Students are free to choose any word or phrase that fits the context.
• Closed-ended: Students are provided with a word bank or a limited set of options
to choose from.
• Targeted: Gaps are strategically placed to assess specific grammar points or
vocabulary items.
• Sentence completion: Students complete a sentence by filling in a missing word
or phrase.
• Passage completion: Students fill in multiple gaps within a longer text.

Advantages of gap-fill items

• Versatility: Can be adapted to assess various language skills and levels of


difficulty.
• Objectivity: Scoring can be relatively straightforward, especially with closed-
ended or targeted items.
• Focus on production: Encourage active language use and demonstrate ability to
construct grammatically correct sentences.
• Authenticity: Can be based on authentic texts, increasing engagement and
relevance for learners.

Limitations of gap-fill items

• Potential for ambiguity: Some gaps may have multiple possible answers, making
scoring subjective.
• Limited scope: May not fully capture complex aspects of language use, such as
pragmatic understanding or communicative competence.
• Difficulty in constructing effective items: Creating gaps that have a single
correct answer and effectively assess the intended skill can be challenging.

Constructing effective gap-fill items

• Choose appropriate texts: Select texts that are relevant to learners' interests and
proficiency levels.
• Determine the type and number of gaps: Consider the difficulty level desired
and the specific skills being assessed.
• Provide clear instructions: Ensure learners understand the task and how to
respond.
• Focus on key language points: Target specific grammar structures or vocabulary
items for assessment.
• Avoid excessive gaps: Too many gaps can disrupt the flow of the text and make
the task overwhelming.
• Thorough review and piloting: Ensure items are accurate, clear, and free of bias
before including them in a test.
Faulty gap-fill items

Gaps with too many possible answers

Sentence: "I went to the ______ to buy some milk."

Fault: This sentence is too open-ended. Students could fill the gap with various
words like "store," "supermarket," "shop," "market," etc., making it difficult to
determine a single correct answer.

Gaps requiring specialized knowledge

Sentence: "The ______ is a traditional Vietnamese musical instrument made of


bamboo."

Fault: This item requires knowledge of Vietnamese culture that might not be
covered in the English textbook or familiar to all students.

Gaps with no clear context

Sentence: "The ______ was very ______."

Fault: This sentence provides no context or clues for the missing words. Students
are left to guess randomly, which doesn't effectively assess their language skills.

Gaps testing obscure vocabulary

Sentence: "The ancient artifact was ______ with intricate carvings."

Fault: Using a highly specific or obscure word like "emblazoned" might not be
appropriate for a high school level gap-fill test, especially if it's not a word
commonly encountered in the textbook.

In general, gap-fill items are a valuable component of English language tests for high
school students in Vietnam. They provide a flexible and focused way to assess various
language skills, particularly grammar, vocabulary, and sentence construction. By
adhering to the principles of effective item construction and considering the potential
limitations, educators can utilize gap-fill activities to create meaningful assessments that
contribute to effective language learning.

5.3 Designing subjective test items

Subjective test items, unlike their objective counterparts, require students to construct
their own responses rather than selecting from pre-defined options. These items, such as
essays, short answer questions, and oral interviews, offer a valuable means of assessing
deeper levels of language proficiency and cognitive skills in high school English
language learners in Vietnam. This chapter explores the characteristics, applications,
advantages, and limitations of subjective test items, along with guidelines for their
effective design and implementation.

Understanding subjective test items

Subjective test items typically present students with open-ended prompts or questions
that require them to generate unique responses using their own language and ideas. These
items assess a broader range of skills and knowledge, including:

• Writing Proficiency: Ability to express ideas clearly and coherently in written


form, demonstrating grammatical accuracy, vocabulary range, and organizational
skills.
• Speaking Proficiency: Ability to communicate effectively in spoken English,
showcasing fluency, pronunciation, grammar, and vocabulary.
• Critical Thinking: Ability to analyze information, form arguments, express
opinions, and provide justifications.
• Creativity: Ability to generate original ideas and express them in a unique and
engaging manner.
• Problem-Solving: Ability to apply knowledge and skills to resolve problems or
respond to complex scenarios (Ross & Okabe, 2006).

Applications in English tests

Subjective test items are valuable for assessing various aspects of English language
proficiency:

• Essay Writing: Evaluating ability to compose well-structured essays on a given


topic, expressing arguments, opinions, or analysis.
• Short Answer Questions: Assessing understanding of specific concepts or details
by requiring concise written responses.
• Oral Interviews: Evaluating spoken communication skills through interactive
conversations and responses to prompts.
• Presentations: Assessing ability to organize and deliver oral presentations on a
chosen topic.
• Creative Writing: Encouraging expression of imagination and storytelling skills
through original compositions.
Advantages of subjective test items

• Depth of assessment: Allow for a more in-depth evaluation of complex language


skills and cognitive abilities.
• Authenticity: Can simulate real-life communication tasks, increasing relevance
and engagement for learners.
• Reduced guessing: Minimize the chance of students obtaining correct answers
through guessing.
• Promotion of higher-order thinking: Encourage critical thinking, analysis, and
problem-solving skills.
• Individualized expression: Allow students to showcase their unique
understanding and perspectives.

Limitations of subjective test items

• Subjectivity in scoring: Evaluating open-ended responses can be subjective and


require clear scoring rubrics and rater training.
• Time-consuming: Marking subjective items can be more time-consuming than
objective items.
• Limited sampling: May not cover as much content as objective tests due to the
time required for each item.
• Potential for bias: Raters' personal biases can influence scoring, requiring careful
training and standardization.

Constructing effective subjective test items

• Clear and concise prompts: Provide specific and unambiguous instructions that
clearly define the task and expectations.
• Relevant to curriculum: Align prompts with the learning objectives and content
covered in the curriculum.
• Appropriate difficulty level: Ensure tasks are challenging yet attainable for the
students' proficiency level.
• Defined evaluation criteria: Develop clear scoring rubrics that outline the criteria
for evaluating responses and awarding marks.
• Rater training and standardization: Provide training to ensure consistent and
reliable scoring across different raters.

In general, subjective test items are an essential component of comprehensive English


language assessment for high school students in Vietnam. They provide valuable insights
into students' higher-order thinking skills, communicative abilities, and creative
expression. By carefully designing prompts, establishing clear scoring criteria, and
ensuring rater reliability, educators can effectively utilize subjective items to gain a more
holistic understanding of students' English language proficiency and promote meaningful
language development.

5.3 Scoring methods: objective vs. subjective

In the realm of English language assessment for high school students in Vietnam, the
choice of scoring methods plays a crucial role in determining the accuracy, fairness, and
effectiveness of evaluations. This section explores the key distinctions between objective
and subjective scoring methods, highlighting their respective advantages, limitations, and
applications in various assessment contexts.

5.3.1 Objective scoring: precision and efficiency

Objective scoring methods are characterized by their clear-cut criteria and standardized
procedures, leaving little room for individual interpretation or bias. These methods are
typically employed for assessing responses to objective test items, such as multiple-
choice, true-false, and matching questions.

Advantages of objective scoring

• Efficiency: Objective scoring is generally quick and efficient, allowing for rapid
evaluation of large numbers of test papers.
• Reliability: With standardized procedures and answer keys, objective scoring
produces consistent results, minimizing variations between different raters.
• Objectivity: The scoring process is free from personal biases or interpretations,
ensuring fairness and equal treatment for all test-takers.
• Ease of analysis: Objective scores are easily quantifiable, facilitating statistical
analysis and reporting of results.

Limitations of objective scoring

• Limited scope: Objective scoring is primarily suited for assessing lower-order


thinking skills, such as knowledge recall and recognition.
• Potential for guessing: Test-takers may obtain correct answers through guessing,
potentially inflating scores and not accurately reflecting their true abilities.
• Inability to assess complex skills: Objective scoring cannot effectively evaluate
complex language skills like writing, speaking, or critical thinking, which require
more nuanced and holistic assessment.

5.3.2 Subjective scoring: depth and nuance

Subjective scoring methods involve human judgment and interpretation to evaluate


responses to open-ended or performance-based tasks, such as essays, oral presentations,
and creative writing. These methods allow for a more in-depth assessment of complex
language skills and higher-order thinking abilities (Bachman & Adrian, 2022).

Advantages of subjective scoring

• Comprehensive assessment: Subjective scoring can evaluate a wider range of


language skills, including writing proficiency, speaking fluency, critical thinking,
and creativity.
• Authenticity: By assessing real-life communication tasks, subjective scoring
provides a more authentic measure of language proficiency.
• Individualized evaluation: Subjective scoring allows for personalized feedback
and recognition of individual strengths and weaknesses.
• Encouragement of higher-order thinking: Open-ended tasks promote critical
thinking, problem-solving, and expression of original ideas.

Limitations of subjective scoring

• Subjectivity and bias: Raters' personal biases and interpretations can influence
scoring, potentially leading to inconsistencies and unfairness.
• Time-consuming: Evaluating open-ended responses requires careful reading and
analysis, making subjective scoring more time-consuming than objective scoring.
• Rater reliability: Ensuring consistency and agreement between different raters
requires thorough training and standardization procedures.
• Difficulty in providing specific feedback: Providing detailed and specific
feedback on subjective responses can be challenging.

5.5 Best scoring practices in English language assessment

Scoring student work is a critical component of English language assessment in Vietnam.


Accurate and fair scoring provides valuable feedback to students, informs instructional
decisions, and contributes to the overall effectiveness of the educational process. This
section outlines best practices for scoring student work in English, encompassing both
objective and subjective assessment methods, to ensure reliable, valid, and meaningful
evaluations.

General principles for effective scoring

• Clarity of Criteria: Establish clear and specific scoring criteria or rubrics that
outline the expectations for each assessment task and the characteristics of
different performance levels.
• Transparency: Communicate the scoring criteria to students beforehand to ensure
they understand the expectations and can focus their efforts accordingly.
• Consistency: Apply the scoring criteria consistently across all student work to
ensure fairness and equity in the evaluation process.
• Objectivity: Minimize personal biases and subjective interpretations when
evaluating student work, particularly for subjective assessments.
• Feedback: Provide constructive feedback to students that highlights their strengths
and areas for improvement, guiding their future learning.

Best practices for objective scoring

• Accurate answer keys: Develop accurate and unambiguous answer keys for
objective test items, such as multiple-choice, true-false, and matching questions.
• Automated scoring: Utilize technology, where feasible, to automate the scoring
of objective tests, ensuring efficiency and accuracy.
• Item analysis: Analyze student responses to identify problematic items that may
need revision or removal from future assessments.

Best practices for subjective scoring

• Detailed rubrics: Develop detailed scoring rubrics that outline the specific criteria
for evaluating different aspects of performance, such as content, organization,
language use, and mechanics.
• Rater training: Provide thorough training to raters on the scoring rubrics to
ensure inter-rater reliability.
• Multiple raters: When possible, use multiple raters to evaluate subjective
responses, such as essays or oral presentations, and average their scores to
minimize bias.
• Anonymity: Mask student identities during the scoring process to prevent
unconscious biases from influencing evaluations.
• Holistic scoring: Consider the overall quality of the response, rather than focusing
solely on individual errors, to provide a more comprehensive assessment.

Providing effective feedback

• Specificity: Provide specific and detailed feedback that pinpoints areas of strength
and weakness in the student's work.
• Actionable advice: Offer actionable advice on how the student can improve their
performance in the future.
• Timeliness: Provide feedback promptly while the assessment is still fresh in the
student's mind.
• Encouragement: Balance constructive criticism with positive reinforcement to
motivate students and foster a growth mindset.

Ethical considerations in scoring


• Fairness: Ensure all students are evaluated fairly and equitably, regardless of their
background or personal characteristics.
• Confidentiality: Protect the confidentiality of student work and scores.
• Professionalism: Maintain a professional and unbiased approach to scoring,
adhering to ethical guidelines and standards.

Implementing best scoring practices is essential for ensuring accurate, fair, and
meaningful assessments of English language proficiency in Vietnam. By adhering to the
principles of clarity, consistency, objectivity, and constructive feedback, educators can
provide valuable evaluations that support student learning and contribute to the overall
effectiveness of the educational system.

In summary, the choice between objective and subjective scoring methods depends on
the specific learning objectives, the type of assessment tasks, and the desired level of
detail in the evaluation. By understanding the strengths and limitations of each approach,
educators can make informed decisions about scoring procedures, ensuring fair, reliable,
and meaningful assessments of English language proficiency for high school students in
Vietnam. Table 5.1 compares objective vs. subjective scoring:

Table 5.1 Objective vs. subjective scoring

Feature Objective scoring Subjective scoring


Closed-ended, selected- Open-ended, constructed-
Nature of Items
response response
Standardized, based on
Scoring Process Human judgment, interpretation
answer keys
Efficiency High Lower
Can be lower, requires rater
Reliability High
training
Objectivity High Can be influenced by rater bias
Higher-order thinking (analysis,
Types of Skills Primarily lower-order
synthesis, evaluation), complex
Assessed thinking (recall, recognition)
skills (writing, speaking)
Multiple-choice, true-false, Essays, oral presentations,
Examples
matching projects
Can be more detailed and
Feedback Limited to correct/incorrect
individualized

5.6 Student self-assessment

Instruction: Carefully read each statement in the table below. For each statement, rate
your understanding using the following scale:
1: I do not understand this at all.
2: I understand this a little, but I need more help.
3: I understand this fairly well, but I have some questions.
4: I understand this very well and can explain it to others.

Learning Objective/Concept Self- Evidence/Notes Action Plan


Assessment (Explain your (What will
Rating (1-4) rating) you do to
improve?)
Steps in test design: I can identify
and explain the key steps involved in
designing a language test (defining
purpose, objectives, specifications,
writing items, reviewing,
administering).
Developing specifications: I can
explain how to create a detailed test
specification, including content,
format, scoring criteria, and
weighting of different sections.
Writing test items: I can discuss the
principles of writing effective test
items, including clarity,
unambiguousness, and
appropriateness for the target
population.
Practical application: I can apply
the steps of test design to develop a
sample language test for a specific
purpose and context.
Overall understanding: I feel
confident in my understanding of the
process of designing and developing
effective language tests.

5.7 Consolidation activities

Activity 1: Crafting multiple-choice questions: A test design exercise

Objectives:
• To develop students' understanding of multiple-choice question design principles.
• To enhance students' critical thinking and analytical skills by analyzing a given
text and formulating appropriate test items.

Procedure: Students in groups/pairs read the following passage from Unit 2, the English
textbook Global Success for Grade 11. The students first read the passage carefully and
identify the main ideas, supporting details, any challenging vocabulary or grammatical
structures, and the overall tone and purpose of the passage. The items should include a
variety of question types:

o Factual questions: Test for recall of specific information (e.g., "What is the
main cause of...?").
o Inferential questions: Test for understanding of implied meanings and
making deductions (e.g., "What can be inferred about the author's attitude
towards...?").
o Vocabulary questions: Test for understanding of vocabulary in context.
o Main idea questions: Test for comprehension of the overall theme or central
message.

“Over the past two centuries, different generations were born and given different
names. Each generation comes with its characteristics, which are largely influenced
by the historical, economic, and social conditions of the country they live in.
However, in many countries the following three generations have common
characteristics.

Generation X refers to the generation born between 1965 and 1980. When Gen Xers
grew up, they experienced many social changes and developments in history. As a
result, they are always ready for changes and prepared to work through changes.
Gen Xers are also known as critical thinkers because they achieved higher levels of
education than previous generations.

Generation Y, also known as Millennials, refers to those born between the early
1980s and late 1990s. They are curious and ready to accept changes. If there is a
faster, better way of doing something, Millennials want to try it out. They also value
teamwork. When working in a team, Millennials welcome different points of view
and ideas from others.

Generation Z includes people born between the late 1990s and early 2010s, a time
of great technological developments and changes. That is why Gen Zers are also
called digital natives. They grew up online and never knew the world before digital
and social media. They are very creative and able to experiment with platforms to
suit their needs. Many Gen Zers are also interested in starting their own businesses
and companies. They saw so many people lose their jobs, so they think it is safer to
be your own boss than relying on someone else to hire you.

Soon a new generation, labelled Gen Alpha, will be on the scene. Let's wait and see
if we will notice the generation gap.” (From Unit 2, Global Success, Grade 11,
Hoang et al., 2018)

Activity 2: Crafting T/F/NG questions: A test design exercise

Objective: To encourage critical thinking and enhance comprehension skills by having


students design their own True-False-Not Given statements for a given text.

Procedure: Students in pairs/groups to read a passage from the section on


communication and culture the Tiếng Anh 11 Unit 3:

Smart Cities Around The World

Cities around the world are becoming smarter, and you can do many things that
seemed impossible in the past.

In Singapore, the mobile app [Link] allows you to locate a nearby car park
easily, book a parking space, and make a payment. You can also extend your
booking or receive a refund if you leave early.

New York City (US) has one of the largest bike-sharing systems called Citi Bike.
Using a mobile app, you can unlock bikes from one station and return them to any
other station in the system, making them ideal for one-way trips.

In Copenhagen (Denmark), you can use a mobile app to guide you through the city
streets and tell how fast you need to pedal to make the next green light. The app can
also give you route recommendations and work out the calories you burn.

In London(UK), you don’t have to buy public transport tickets. You can just touch
your bank card on the card reader when you get on and off the bus or the
underground to pay for your trip.

In Toronto (Canada), you can book an appointment and see a doctor online a from
your own home. You can also receive prescriptions and any other documents you
need, all online.

When reading the passage, students need to highlight key information and identifying
main ideas. They review the characteristics of each type of statement:

o True: Accurately reflects information explicitly stated in the text.


o False: Contradicts information presented in the text.
o Not Given: Refers to information not mentioned or addressed in the text.

Each pair/group should create at least 2 statements for each category (True, False, Not
Given), resulting in a minimum of 6 statements. Students should vary the difficulty level
of their statements and to avoid simply copying sentences directly from the text.

Activity 3: Subjective item explorers

Objective: To deepen students' understanding of subjective test items by actively


engaging in the process of analyzing, evaluating, and constructing these items.

Procedure: Students into small groups search for a set of subjective test items in the
English tests for high school students in Vietnam (e.g. essay prompts, short answer
questions, oral interview questions from previous English tests or textbooks in use). They
discuss the following questions:

§ What skills and knowledge are being assessed? (e.g., writing,


speaking, critical thinking, creativity)
§ What are the key features of a good response? (refer to the
scoring rubrics)
§ What are some common mistakes or challenges students might
face?
§ How can students best prepare for these types of items?

Activity 4: Achievement test architects

Objective: To enable students to understand the construction and purpose of subjective


test items in English achievement tests by having them design their own assessment
questions.

Procedure: Students in groups begin by reviewing the key learning objectives and
content areas covered in a unit or several units an English textbook for high school
students in Vietnam, then identify the knowledge and skills that should be assessed in an
achievement test. They then discuss the different types of subjective items commonly
used in achievement tests (essays, short answer questions, etc.). During group discussion,
students

§ Specify the learning objective being assessed.


§ Formulate a clear and concise prompt or question.
§ Define the expected response format and length.
§ Outline the criteria for evaluating the response.
§ Consider potential challenges or difficulties for test-takers.
Activity 5: Spot the differences: Comparing test matrices

Objective: To enable students to analyze and compare two test matrices (one for Grade 6
and one for Grade 8 in Vietnam as in the images below) and identify the key differences
in terms of skills, knowledge, and complexity.

Procedure: Students in groups spend time reviewing the content and structure of each
matrix. They use highlighters or colored pens to mark any noticeable differences between
the two matrices and then report the findings to the whole class. They should focus on
aspects like:

§ Skills: Are there any skills assessed in one matrix that are not
present in the other? Are the same skills assessed at different
levels of complexity?
§ Knowledge: Are there differences in the types or depth of
knowledge expected at each grade level?
§ Weighting: Are certain skills or knowledge areas given more
emphasis in one matrix compared to the other?

You might also like