0% found this document useful (0 votes)

273 views6 pages

Exploring Reliability in Academic Assessment

Uploaded by

Nasir khan Khattak1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

273 views6 pages

Exploring Reliability in Academic Assessment

Uploaded by

Nasir khan Khattak1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

EXPLORING RELIABILITY IN ACADEMIC ASSESSMENT

Written by Colin Phelan and Julie Wren, Graduate Assistants, UNI Office
of Academic Assessment (2005-06)

Reliability is the degree to which an assessment tool produces stable and

consistent results.

Types of Reliability

1. Test-retest reliability is a measure of reliability obtained by administering the

same test twice over a period of time to a group of individuals. The scores from
Time 1 and Time 2 can then be correlated in order to evaluate the test for
stability over time.

Example: A test designed to assess student learning in psychology could

be given to a group of students twice, with the second administration
perhaps coming a week after the first. The obtained correlation coefficient
would indicate the stability of the scores.

2. Parallel forms reliability is a measure of reliability obtained by administering

different versions of an assessment tool (both versions must contain items that
probe the same construct, skill, knowledge base, etc.) to the same group of
individuals. The scores from the two versions can then be correlated in order to
evaluate the consistency of results across alternate versions.

Example: If you wanted to evaluate the reliability of a critical thinking

assessment, you might create a large set of items that all pertain to critical
thinking and then randomly split the questions up into two sets, which
would represent the parallel forms.

3. Inter-rater reliability is a measure of reliability used to assess the degree to

which different judges or raters agree in their assessment decisions. Inter-rater
reliability is useful because human observers will not necessarily interpret
answers the same way; raters may disagree as to how well certain responses or
material demonstrate knowledge of the construct or skill being assessed.

Example: Inter-rater reliability might be employed when different judges

are evaluating the degree to which art portfolios meet certain standards.
Inter-rater reliability is especially useful when judgments can be considered
relatively subjective. Thus, the use of this type of reliability would probably
be more likely when evaluating artwork as opposed to math problems.

4. Internal consistency reliability is a measure of reliability used to evaluate the

degree to which different test items that probe the same construct produce
similar results.

A. Average inter-item correlation is a subtype of internal consistency

reliability. It is obtained by taking all of the items on a test that probe the
same construct (e.g., reading comprehension), determining the correlation
coefficient for each pair of items, and finally taking the average of all of
these correlation coefficients. This final step yields the average inter-item
correlation.

B. Split-half reliability is another subtype of internal consistency reliability.

The process of obtaining split-half reliability is begun by “splitting in half”
all items of a test that are intended to probe the same area of knowledge
(e.g., World War II) in order to form two “sets” of items. The entire test is
administered to a group of individuals, the total score for each “set” is
computed, and finally the split-half reliability is obtained by determining the
correlation between the two total “set” scores.

Validity refers to how well a test measures what it is purported to measure.

Why is it necessary?

While reliability is necessary, it alone is not sufficient. For a test to be reliable,

it also needs to be valid. For example, if your scale is off by 5 lbs, it reads
your weight every day with an excess of 5lbs. The scale is reliable because it
consistently reports the same weight every day, but it is not valid because it
adds 5lbs to your true weight. It is not a valid measure of your weight.

Types of Validity

1. Face Validity ascertains that the measure appears to be assessing the intended

construct under study. The stakeholders can easily assess face validity. Although this is
not a very “scientific” type of validity, it may be an essential component in enlisting
motivation of stakeholders. If the stakeholders do not believe the measure is an
accurate assessment of the ability, they may become disengaged with the task.

Example: If a measure of art appreciation is created all of the items should be

related to the different components and types of art. If the questions are
regarding historical time periods, with no reference to any artistic movement,
stakeholders may not be motivated to give their best effort or invest in this
measure because they do not believe it is a true assessment of art
appreciation.

2. Construct Validity is used to ensure that the measure is actually measure

what it is intended to measure (i.e. the construct), and not other variables.
Using a panel of “experts” familiar with the construct is a way in which this
type of validity can be assessed. The experts can examine the items and
decide what that specific item is intended to measure. Students can be
involved in this process to obtain their feedback.

Example: A women’s studies program may design a cumulative assessment

of learning throughout the major. The questions are written with complicated
wording and phrasing. This can cause the test inadvertently becoming a test
of reading comprehension, rather than a test of women’s studies. It is
important that the measure is actually assessing the intended construct, rather
than an extraneous factor.

3. Criterion-Related Validity is used to predict future or current performance

- it correlates test results with another criterion of interest.

Example: If a physics program designed a measure to assess cumulative

student learning throughout the major. The new measure could be correlated
with a standardized measure of ability in this discipline, such as an ETS field
test or the GRE subject test. The higher the correlation between the
established measure and new measure, the more faith stakeholders can have
in the new assessment tool.

4. Formative Validity when applied to outcomes assessment it is used to assess how

well a measure is able to provide information to help improve the program under study.

Example: When designing a rubric for history one could assess student’s
knowledge across the discipline. If the measure can provide information that
students are lacking knowledge in a certain area, for instance the Civil Rights
Movement, then that assessment tool is providing meaningful information that
can be used to improve the course or program requirements.

5. Sampling Validity (similar to content validity) ensures that the measure

covers the broad range of areas within the concept under study. Not
everything can be covered, so items need to be sampled from all of the
domains. This may need to be completed using a panel of “experts” to ensure
that the content area is adequately sampled. Additionally, a panel can help
limit “expert” bias (i.e. a test reflecting what an individual personally feels are
the most important or relevant areas).

Example: When designing an assessment of learning in the theatre

department, it would not be sufficient to only cover issues related to acting.
Other areas of theatre such as lighting, sound, functions of stage managers
should all be included. The assessment should reflect the content area in its
entirety.

What are some ways to improve validity?

1. Make sure your goals and objectives are clearly defined and operationalized.
Expectations of students should be written down.
2. Match your assessment measure to your goals and objectives. Additionally, have
the test reviewed by faculty at other schools to obtain feedback from an outside
party who is less invested in the instrument.
3. Get students involved; have the students look over the assessment for
troublesome wording, or other difficulties.
4. If possible, compare your measure with other measures, or data that may be
available.

References

American Educational Research Association, American Psychological

Association, &

National Council on Measurement in Education. (1985). Standards for

educational and psychological testing. Washington, DC: Authors.

Cozby, P.C. (2001). Measurement Concepts. Methods in Behavioral

Research (7th ed.).

California: Mayfield Publishing Company.

Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.). Educational

Measurement (2nd ed.). Washington, D. C.: American Council on

Education.

Moskal, B.M., & Leydens, J.A. (2000). Scoring rubric development: Validity
and

reliability. Practical Assessment, Research & Evaluation, 7(10).

[Available online: [Link]

The Center for the Enhancement of Teaching. How to improve test reliability
and

validity: Implications for grading. [Available online:

[Link]

Reliability and Validity in Assessment
100% (2)
Reliability and Validity in Assessment
5 pages
Reliability and Validity: Written Report in Educ 11a
No ratings yet
Reliability and Validity: Written Report in Educ 11a
4 pages
LESSON-ACTIVITIES VII: How Do I Know Whether Students Learned? Assessment
No ratings yet
LESSON-ACTIVITIES VII: How Do I Know Whether Students Learned? Assessment
6 pages
Lesson 6.2 Item Analysis and Validation
No ratings yet
Lesson 6.2 Item Analysis and Validation
24 pages
Lesson 6.2 Item Analysis and Validation 3
No ratings yet
Lesson 6.2 Item Analysis and Validation 3
11 pages
Psy 323 Topic 3
No ratings yet
Psy 323 Topic 3
5 pages
Kyu Edu 2301 WK3
No ratings yet
Kyu Edu 2301 WK3
5 pages
Issues of Realiability and Validity
100% (1)
Issues of Realiability and Validity
23 pages
Measuring Reliability and Validity
No ratings yet
Measuring Reliability and Validity
18 pages
Chapter 4 Assessment & Evaluation
No ratings yet
Chapter 4 Assessment & Evaluation
10 pages
3.4. Validity, Reliability and Fairness
100% (1)
3.4. Validity, Reliability and Fairness
3 pages
Understanding Test Validity and Reliability
No ratings yet
Understanding Test Validity and Reliability
6 pages
Qualities of Good Measuring Instruments
56% (9)
Qualities of Good Measuring Instruments
4 pages
What Is Questionnaire?
No ratings yet
What Is Questionnaire?
4 pages
Qualities of Good Test
No ratings yet
Qualities of Good Test
37 pages
Validity and Reliability in Education
No ratings yet
Validity and Reliability in Education
2 pages
Characteristics of A Good Test: Validity and Reliability Criteria of Assessment and Rubric of Scoring
No ratings yet
Characteristics of A Good Test: Validity and Reliability Criteria of Assessment and Rubric of Scoring
6 pages
Educ 216A Module 1 Lesson 2 Principles of High Quality Assessment
No ratings yet
Educ 216A Module 1 Lesson 2 Principles of High Quality Assessment
45 pages
Ed 216 NOTES
No ratings yet
Ed 216 NOTES
21 pages
Psych Testing Assignment 2.
No ratings yet
Psych Testing Assignment 2.
5 pages
Educ105 - Coverage Exam
No ratings yet
Educ105 - Coverage Exam
14 pages
Quantitative Analysis - Sir Audrey
No ratings yet
Quantitative Analysis - Sir Audrey
6 pages
Validity and Reliability in Measurement
No ratings yet
Validity and Reliability in Measurement
27 pages
L9 Qualities of A Good Measuring Instrument
No ratings yet
L9 Qualities of A Good Measuring Instrument
22 pages
Qualities of A Good Test
No ratings yet
Qualities of A Good Test
39 pages
Effective Assessment for Teachers
No ratings yet
Effective Assessment for Teachers
5 pages
Unit 6
No ratings yet
Unit 6
37 pages
Midterm Educ-3 Talo
No ratings yet
Midterm Educ-3 Talo
13 pages
Validity&Reliability
No ratings yet
Validity&Reliability
16 pages
Ped 8 Input 8 9
No ratings yet
Ped 8 Input 8 9
7 pages
Concept of Reliability, Validity and Norms (AutoRecovered)
No ratings yet
Concept of Reliability, Validity and Norms (AutoRecovered)
10 pages
Reliability vs Validity in Research
No ratings yet
Reliability vs Validity in Research
3 pages
Understanding Types of Validity
No ratings yet
Understanding Types of Validity
5 pages
Validity in Assessment Explained
No ratings yet
Validity in Assessment Explained
24 pages
Establishing Validity-and-Reliability-Test
No ratings yet
Establishing Validity-and-Reliability-Test
28 pages
Inter-Scorer Reliability in Assessment
100% (1)
Inter-Scorer Reliability in Assessment
45 pages
Validity and Reliability in Education
No ratings yet
Validity and Reliability in Education
5 pages
Understanding Reliability in Assessments
No ratings yet
Understanding Reliability in Assessments
13 pages
Assessment Reliability and Validity
No ratings yet
Assessment Reliability and Validity
31 pages
Concepts of Reliability in Research
No ratings yet
Concepts of Reliability in Research
27 pages
In Class Task 4
No ratings yet
In Class Task 4
16 pages
Test Ok
No ratings yet
Test Ok
8 pages
Educators' Guide to Test Quality
No ratings yet
Educators' Guide to Test Quality
49 pages
3.3 Validity & Reliability of The Test.
No ratings yet
3.3 Validity & Reliability of The Test.
7 pages
Running Head: Reliability and Validity 1
No ratings yet
Running Head: Reliability and Validity 1
10 pages
Quality of A Test
No ratings yet
Quality of A Test
7 pages
Properties of Assessment Methods
100% (5)
Properties of Assessment Methods
19 pages
16.characteristics of Good Assessment
No ratings yet
16.characteristics of Good Assessment
6 pages
Measurement
No ratings yet
Measurement
8 pages
Validity and Reliability Updated
No ratings yet
Validity and Reliability Updated
9 pages
KPD Validity & Realibility
No ratings yet
KPD Validity & Realibility
25 pages
UNIT 1 - Authentic Assessment
No ratings yet
UNIT 1 - Authentic Assessment
15 pages
8602 2
100% (1)
8602 2
8 pages
Lesson 8
No ratings yet
Lesson 8
1 page
Bed 106
No ratings yet
Bed 106
93 pages
Understanding Assessment in ELT
No ratings yet
Understanding Assessment in ELT
3 pages
VSMS
No ratings yet
VSMS
7 pages
Anecdotal Home Visitation Form
No ratings yet
Anecdotal Home Visitation Form
5 pages
Idealism Classroom-Tamiat
No ratings yet
Idealism Classroom-Tamiat
2 pages
Dlp-Tle-Ia (Week 1)
100% (5)
Dlp-Tle-Ia (Week 1)
10 pages
Theta Release Negative Reiki
No ratings yet
Theta Release Negative Reiki
6 pages
Decision Making Under Stress The Role of
No ratings yet
Decision Making Under Stress The Role of
14 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
Lesson Plan: Adverbs & Environmental Care
No ratings yet
Lesson Plan: Adverbs & Environmental Care
2 pages
Leadership Training Program Manual
67% (3)
Leadership Training Program Manual
8 pages
Experimental Psychology Guide
No ratings yet
Experimental Psychology Guide
15 pages
Guest Concern Handling in F&B Services
No ratings yet
Guest Concern Handling in F&B Services
37 pages
Rural Sociology Essentials
No ratings yet
Rural Sociology Essentials
15 pages
Dating Dr. Dil PDF
No ratings yet
Dating Dr. Dil PDF
356 pages
Teaching With The Teen Brain in Mind: 10 Top Tips
No ratings yet
Teaching With The Teen Brain in Mind: 10 Top Tips
2 pages
Simple Process - Consumer Decision-Making
No ratings yet
Simple Process - Consumer Decision-Making
58 pages
Quiz-Assignment For Understanding The Self
No ratings yet
Quiz-Assignment For Understanding The Self
3 pages
Conflict Transformation
100% (1)
Conflict Transformation
12 pages
BORED
No ratings yet
BORED
4 pages
Keller - P. - Janata - P. 2009 Embodied
No ratings yet
Keller - P. - Janata - P. 2009 Embodied
4 pages
CHAPTER 1dimensions of Health
No ratings yet
CHAPTER 1dimensions of Health
6 pages
GRADE 11 PE Midterm 2nd Sem
No ratings yet
GRADE 11 PE Midterm 2nd Sem
2 pages
Article
No ratings yet
Article
3 pages
BWSI's Core Values and Culture Analysis
No ratings yet
BWSI's Core Values and Culture Analysis
14 pages
Essentials For Transformative Family Partnerships
No ratings yet
Essentials For Transformative Family Partnerships
13 pages
Topic 9 The Power of Self Belief
No ratings yet
Topic 9 The Power of Self Belief
17 pages
正向心理學主要內涵及其在心理諮商之應用
No ratings yet
正向心理學主要內涵及其在心理諮商之應用
7 pages
LEAD Self - Instrument
No ratings yet
LEAD Self - Instrument
5 pages
1 s2.0 S0022537174800113 Main
No ratings yet
1 s2.0 S0022537174800113 Main
5 pages
Chapter 6 Purcom
No ratings yet
Chapter 6 Purcom
18 pages
Understanding Interpersonal Communication
No ratings yet
Understanding Interpersonal Communication
76 pages

Exploring Reliability in Academic Assessment

Uploaded by

Exploring Reliability in Academic Assessment

Uploaded by

EXPLORING RELIABILITY IN ACADEMIC ASSESSMENT

Reliability is the degree to which an assessment tool produces stable and

1. Test-retest reliability is a measure of reliability obtained by administering the

Example: A test designed to assess student learning in psychology could

2. Parallel forms reliability is a measure of reliability obtained by administering

Example: If you wanted to evaluate the reliability of a critical thinking

3. Inter-rater reliability is a measure of reliability used to assess the degree to

Example: Inter-rater reliability might be employed when different judges

4. Internal consistency reliability is a measure of reliability used to evaluate the

A. Average inter-item correlation is a subtype of internal consistency

B. Split-half reliability is another subtype of internal consistency reliability.

Validity refers to how well a test measures what it is purported to measure.

While reliability is necessary, it alone is not sufficient. For a test to be reliable,

1. Face Validity ascertains that the measure appears to be assessing the intended

Example: If a measure of art appreciation is created all of the items should be

2. Construct Validity is used to ensure that the measure is actually measure

Example: A women’s studies program may design a cumulative assessment

3. Criterion-Related Validity is used to predict future or current performance

Example: If a physics program designed a measure to assess cumulative

4. Formative Validity when applied to outcomes assessment it is used to assess how

5. Sampling Validity (similar to content validity) ensures that the measure

Example: When designing an assessment of learning in the theatre

What are some ways to improve validity?

American Educational Research Association, American Psychological

National Council on Measurement in Education. (1985). Standards for

Cozby, P.C. (2001). Measurement Concepts. Methods in Behavioral

California: Mayfield Publishing Company.

Measurement (2nd ed.). Washington, D. C.: American Council on

reliability. Practical Assessment, Research & Evaluation, 7(10).

validity: Implications for grading. [Available online:

You might also like