Introduction To Item Analysis Workshop PDF
Introduction To Item Analysis Workshop PDF
INTRODUCTION TO
ITEM ANALYSIS:
EVALUATING AND IMPROVING MULTIPLE
CHOICE QUESTIONS
WHAT IS
ITEM
ANALYSIS?
WHAT IS ITEM ANALYSIS 4
‣Consider the following ...
‣We use the overall score on the exam to assess the student’s aptitude/ability
‣We want to know who understands the material and who doesn’t
‣We want to make sure that the student’s score is stable
‣Question: How do we know we assessing are the student's ability as well as we can?
‣So that you can ensure that your items are effectively evaluating student ability
‣Examples:
‣Item Difficulty
‣Item Discrimination
‣Internal Consistency
‣Differential Item Functioning
ITEM ANALYSIS PROCESS 6
BENEFITS
PURPOSE OF ITEM ANALYSIS 8
‣How well do the questions identify those students who knew the
material from those that did not? (The more of these questions, the better
your exam can precisely measure ability)2
APPLICATIONS
APPLICATIONS OF ITEM ANALYSIS 11
‣Item Analysis can answer many questions (“Am I able to know who has
ability and who does not”, “Am I able to get a fine-grained view on ability”, “How
consistent are scores”)
‣Many fields are concerned with knowing the answer to those questions:
‣Employee selection: Purpose is to select employees that have the highest ability
in the workplace
PRELIMINARY
PREPARATION
BEFORE BEGINNING ITEM ANALYSIS 16
‣Cells = Whether the examinee got the question right (1) or wrong (0)
BEFORE BEGINNING ITEM ANALYSIS 17
Chelsea 1 1 1 1
Chelsea 1 1 1 1
Chelsea 1 1 1 1
Chelsea 1 1 1 1
2) The activity sheet contains five students’ responses to five different multiple choice
question
c) Each cell indicates whether the student answered the question correctly (1) or
incorrectly(0)
22
ITEM DIFFICULTY
ITEM DIFFICULTY: WHAT IS IT? 23
‣Examples
‣If everyone got the question right, the item is considered “easy”
‣If half the people got the question right, then the item is somewhere
between easy and hard.
ITEM DIFFICULTY: WHY DOES IT MATTER? 24
‣If you don’t pay attention to Item Difficulty, you don’t get a precise measure
of ability
‣If true ability has a bell-shaped distribution, then your estimated ability
should have a bell-shaped distribution
ITEM DIFFICULTY: WHY DOES IT MATTER? 25
Bob 100%
‣If question too easy -> everyone gets the Cindy 100%
question right -> you don’t know who is on
Too Easy - Cannot tell who has
the lower end of ability more/less ability
Cindy 0%
‣Therefore, it is important to pay attention to
Too hard - Cannot tell who has
item difficulty to have it be just right more/less ability
ITEM DIFFICULTY: WHY DOES IT MATTER? 26
‣If item difficulty is too high or low, then scores will be truncated (prevents
symmetry)
Examinee Q1 Q2 Q3
‣Imagine a test with THREE questions and FIVE
Alice 1 0 1 examinees
Bob 1 0 1
Examinee Q1 Q2 Q3
‣ We can total up how many people got each question correct by taking the
Alice 1 0 1 sum for each question
‣Question 1 = 5
‣Question 2 = 0
Bob 1 0 1
‣Question 3 = 4
Cindy 1 0 1
Dan 1 0 1
Erin 1 0 0
Total 5 0 4
ITEM DIFFICULTY: EXAMPLE 29
Examinee Q1 Q2 Q3
‣ We can total up how many people got each question correct by taking the
Alice 1 0 1 sum for each question
‣Question 1 = 5
‣Question 2 = 0
Bob 1 0 1
‣Question 3 = 4
Cindy 1 0 1
‣ Divide the total number of people who got the question correct by the
Dan 1 0 1 total number of people who took the test, and you have the item’s
difficulty
Erin 1 0 0 Question # Total Item
correct/ Difficulty
Total test
takers
‣Item Difficulty = P / N
ITEM DIFFICULTY: HOW TO INTERPRET IT 31
‣How to Interpret Item Difficulty
‣Think of it as “Item Easiness”
‣If value is at an ideal “sweet spot”, then your test can better separate high
ability people from low ability people (discussed in next section)
‣If Item Difficulty is too Low / Item is too Hard (< .25 or .30):
‣The item may be too challenging relative to the overall level of ability of the
class
‣Find where people are being confused and clarify in the question
‣Not meaningful if
‣The recommended values (.60-.75) assume you want to assess people’s ability
relative to others
‣If you are concerned with content mastery, you want all items answered
correctly
‣Test had a short time limit (“speed test”) - later items seem difficult
ACTIVITY 35
3) Think of a question you can ask your fellow attendees that would probably have a
difficulty value close to your group’s assigned value
EXERCISE Example: If you are assigned a difficulty value of .25, what is a question you could ask
that only 25% of the attendees would know
4) Think of 4 multiple choice options to go with your question, one of which is right
‣ Remember: Item
Difficulty: Percentage of 5) When you are ready, have one member of the group go up to the presenter and share the
group’s question and answers
people answering the
question correctly 6) WHEN TOLD THE SURVEY IS READY BY THE PRESENTER: Complete the combined
survey online (link will be provided) - Skip your question
‣ Larger values = Easier
7) Access the spreadsheet link provided, and compute the difficulty for each question
‣ Smaller values = Harder
8) How close was the actual difficulty to the difficulty you were assigned?
36
ITEM DISCRIMINATION
ITEM DISCRIMINATION: WHAT IS IT? 37
‣People who studied get the question right, people who didn’t study get the question
wrong
ITEM DISCRIMINATION 38
Examinee Q1 Q2 Q3
‣Imagine a test with THREE questions and FIVE
Alice 1 1 1 students
Bob 1 1 1
Cindy 0 0 1
Dan 0 1 0
Erin 0 0 1
ITEM DISCRIMINATION 39
Bob 1 1 1 100% ‣Alice and Bob did the best (perfect scores)
Cindy 0 0 1 33%
‣Cindy, Erin, and Dan did the worst (33%)
Dan 0 1 0 33%
Erin 0 0 1 33%
ITEM DISCRIMINATION 40
Bob 1 1 1 100% ‣Alice and Bob did the best (perfect scores)
Cindy 0 0 1 33%
‣Cindy, Erin, and Dan did the worst (33%)
Dan 0 1 0 33%
Erin 0 0 1 33%
‣Which question’s performance best predicts
who will score high on the exam (Alice and
Bob) and who will score low (Cindy, Dan,
and Erin)?
ITEM DISCRIMINATION 41
‣Make one column that has whether people got the question right (1) or wrong
(0) = Question Scores
‣Make another column that has people’s total score on the exam (0 to 100%) =
Total Scores
‣Often “corrected” by removing the item’s score from the total score
ITEM DISCRIMINATION: HOW TO CALCULATE IT 44
‣If above .20, item is useful for describing people’s overall ability
ITEM DISCRIMINATION: DIAGNOSTICS 46
‣Low item discrimination is problematic: Suggests that people who know the
concepts really well overall, were not any more likely to understand the specific concept of
the question
‣The item may be too easy or difficult (everyone got question right or wrong)
ITEM DISCRIMINATION: IMPROVEMENTS 47
‣Ensure that difficulty is at the ideal level, given the number of response options
ITEM DISCRIMINATION: PRECAUTIONS 48
‣Not meaningful if
‣Partial credit for answers (some answers are less wrong than others)
ACTIVITY: ITEM DISCRIMINATION 49
1) Think to yourself about a multiple choice exam you might give in your respective field
for a specific topic
EXERCISE 3) Which of those questions, if answered correctly would indicate that this
person understands the topic well as a whole?
a) This question has good discrimination
b) Can tell you who likely has high knowledge and who has lower knowledge
RELIABILITY
TEST RELIABILITY: WHAT IS IT 51
‣ What is Test Reliability?
‣True ability = what you actually know for the entire topic
‣If test 100% reliable, then the score a person receives is their true score, and they
get the same score each time they retake the exam
‣If our test is reliable, then a student’s ability is reflected in the score
received
‣If test not 100% reliability, then the score a person receives may be either higher or lower
than their actual true score. Next score might be different
‣ If our test is unreliable, then a student’s ability is not reflected in the score received
TEST RELIABILITY: WHAT IS IT 54
‣Parallel forms reliability: Consistency from one exam form and another
‣Internal consistency is how consistent the items are with the other
items
‣If you know an exam internal consistency, then you know the worst-case of its
reliability
‣Good internal consistency makes it more likely the students’ scores are
stable
INTERNAL CONSISTENCY: HOW TO INTERPRET IT 57
‣How to Interpret Internal Consistency
<. 60 Poor
INTERNAL CONSISTENCY: HOW TO CALCULATE IT 58
‣ How to Calculate Internal Consistency
‣For each question, make a column that has whether people got the question right (1) or wrong
(0)
‣Calculate the correlation between each right/wrong column and every other right/wrong
column
‣If K questions, then (K * (K-1) / 2) comparisons
‣If 20 questions, then (20 * (20-1) / 2) comparisons
‣If 40 questions, then (40 * (40-1) / 2) comparisons
Test length: The more questions on the test, the more reliable the test will be.
Average inter-item correlation: The more the questions address a single common domain,
the more reliable the test will be
- All questions pertain to the same topic area = Higher average correlation between
question scores
- All questions pertain to the disparate topic areas = Lower average correlation between
EXERCISE question scores
2) What additional question could we ask that would probably INCREASE the
average inter-item correlation?
‣Items represent too many distinct dimensions (too many concepts being asked)
IMPLEMENTING
ITEM ANALYSIS
IMPLEMENTING ITEM ANALYSIS 64
‣D2L
‣Excel
‣SPSS
IMPLEMENTING ITEM ANALYSIS: D2L 65
‣Go to “Quizzes”
‣Click “Statistics”
(3) Click on “Question Stats” to view Item Analysis (4) Each item has its own analysis statistics
IMPLEMENTING ITEM ANALYSIS: D2L - THE OUTPUT 67
Point Biserial = Item Discrimination
Average Grade = Item Difficulty
IMPLEMENTING ITEM ANALYSIS: EXCEL 68
Cronbach’s alpha
= Internal
Consistency
IMPLEMENTING ITEM ANALYSIS: SPSS 72
‣ You can calculate item analysis with SPSS (or R, SPSS, Minitab, Stata)
‣ Access SPSS for free via DePaul Virtual Labs
‣At the top menu, go to Analyze -> Scale -> Reliability Analysis
‣Click “Statistics” and ask for “item,” “scale,” and “scale if item deleted” statistics
‣Click “OK”
IMPLEMENTING ITEM ANALYSIS: SPSS - STEP 1 - CHOOSE ANALYSIS 73
IMPLEMENTING ITEM ANALYSIS: SPSS - STEP 2 - SELECT VARIABLES 74
IMPLEMENTING ITEM ANALYSIS: SPSS - STEP 3 - INTERPRETATION 75
Cronbach’s alpha = Internal consistency
SUMMARY
SUMMARY 77
‣Items can perform poorly due to wording ambiguity, lack of ability in that
domain, miscoding, lack of conceptual relevance, instructional issues
‣Item-Response Theory - What are people’s ability, when you take into
account the difficulty and discrimination of the item that people’s answered
correctly/incorrectly
Q&A