8602 Assignment
8602 Assignment
2
COURSE CODE - 8602
STUDENT ID - 0000758015
1. Validity and validation of the test Tests can take the form of written answers to a
series of questions, such as paper and pencil tests, or professional assessments of
behavior in the classroom/school or as an assessment of work performance. The form of
written test results also varies from pass/fail to holistic judgments to a complex series of
numbers designed to capture minute differences in behaviour.
Regardless of the form of the test, its most important aspect is how the results are used
and how those results affect the individual and society as a whole. Tests used for
admission to schools or programs or for educational diagnosis not only affect the
individual but also attribute value to the content tested. A test that is perfectly appropriate
and useful in one situation may be inappropriate or inadequate in another. For example,
a test that may be adequate for use in pedagogical diagnostics may be completely
inappropriate for use in determining matriculation. Test validity, or test validation, explicitly
means validating the use of a test in a specific context, such as college admissions or
course placement. Therefore, when determining the validity of a test, it is important to
study the results of the tests in the environment in which they are used. In the previous
example, in order to use the same test for educational diagnosis as for matriculation, each
use would have to be validated separately, even though the same test is used for both
purposes.
2. The purpose of measuring validity Most, but not all, tests are designed to measure
skills, abilities, or characteristics that are and are not directly observable. For example,
Scholastic Aptitude Test (SAT) scores developed critical reading, writing, and math skills.
The SAT score a test taker receives when they pass the test is not a direct measure of
critical reading ability, any more than degrees Celsius are a direct measure of the heat of
an object. An examinee's level of critical reading ability development must be inferred
from his or her SAT Critical Reading score. The process of using test scores as a sample
of behavior to draw inferences about a larger domain of behavior is characteristic of most
educational and psychological tests. Responsible test developers and publishers must be
able to demonstrate that it is possible to use the sample of behavior measured by the test
to make valid inferences about the examinee's ability to perform tasks that represent the
broader domain of interest.
Understanding the nature of validity At its core, validity is the degree to which a test
measures what it claims to measure. It is the strength of our inferences, conclusions and
suggestions about a student's knowledge and skills based on their assessment results.
Think of validity as the "truth" of a test score. For educators and students alike, the
implications of validity are far-reaching, affecting everything from curriculum design to
student progress.
Conclusion
Validity in educational assessments is a topic rich in depth and significance. It’s the linchpin that
guarantees assessments do more than just generate scores; they provide meaningful insights into
student learning. As we’ve explored, the various facets of validity—content, criterion-related,
construct, and factorial—each play a vital role in creating a holistic picture of a student’s
knowledge and abilities.
--------------------------------------------
QUESTION NO. 2
--------------------------------------------
Discuss general consideration in constructing essay type test items with suitable examples?
--------------------------------------------
ANSWER
--------------------------------------------
General Consideration in Constructing Essay type Test Items
Robert L. Ebel and David A. Frisbie (1991) write in their book that “teachers often treat
the measurement of students' ability to think and use knowledge as a measure of the
knowledge their students possess. In these cases, tests that give students some degree
of freedom in their answers are necessary. Essay tests are tailored for this purpose. The
student writes an answer to the question on several paragraphs in the range of several
pages. Essays can be used for higher learning outcomes such as synthesis or
assessment as well as for lower level outcomes. They provide students with items to
supply, rather than choosing an appropriate answer, usually students compose the
answer into one or more sentences. Essay tests allow students to demonstrate their
ability to recall, organize, synthesize, connect, analyze, and evaluate ideas.
Types of Essay Tests
Essay tests can be divided into many types. W.S. Monree and R.I. Cater (1993) divide
essay tests into many categories, such as selective feedback, given assessment basis,
Comparing two things on one set basis, Comparing two things in general, Decision - for
or against, cause and effect, explanation of use or the exact meaning of a word, phrase
in a statement, summary of some unit of a textbook or article, analysis, statement of
relationships, illustration or example, classification, application of rules, laws, or principles
to new situations, discussion, statement of author's intent in selection or organization of
material , Criticism - of the adequacy, correctness or relevance of a printed statement or
a classmate's answer to a class question, reorganization of facts, formulation of a new
question - raised problems and questions, new methods of procedure, etc.
Types of Constructed Response Items
Essay items can vary from very long, open-ended term papers or take-home tests that
have flexible page limits (eg, 10-12 pages, no more than 30 pages, etc.) to limited-
response essays or essays of one page or less. . Thus, essay type items are of two types:-
• Limited Essay Response
• Extended Response Essay Items
--------------------------------------------
QUESTION NO. 3
--------------------------------------------
Write a note on the uses of measurement scales for students' learning assessment?
--------------------------------------------
ANSWER
--------------------------------------------
Introduction of Measurement Scales
All types of research data, test results, survey data, etc. are called raw data and are
collected using four basic scales. Nominal, ordinal, interval, and ratio are the four basic
scales for data collection. Ratio is more sophisticated than interval, interval is more
sophisticated than ordinal, and ordinal is more sophisticated than nominal. A variable
measured on a "nominal" scale is a variable that does not actually have any rating
resolution. One value really isn't greater than the other. A good example of a nominal
variable is gender. For nominal variables, there is a qualitative difference between values,
not a quantitative one. Something measured on an "ordinal" scale has an evaluative
connotation. One value is greater or greater or better than the other. With ordinal scales,
we only know that one value is better than another, or 10 is better than 9. A variable
measured on an interval or ratio scale has maximum evaluative resolution. After data
collection, there are three basic ways to compare and interpret the results obtained from
the responses. Student performance can be compared and interpreted against an
absolute standard, a criterion-referenced standard, or a norm-referenced standard. Some
examples from everyday life and educational contexts can make this clear:
Sr. Standard Characteristics daily life educational context
No.
1 Absolute simply state the He is 6' and 2" tall He spelled correctly 45 out
observed outcome of 50 English words
All three types of score interpretation are useful depending on the purpose for which the
comparisons were made. An absolute score merely describes a measure of performance
or success without comparing it to any set or specified standard. Scores aren't particularly
useful without some kind of comparison. Criterion scores compare test performance
against a specific standard; such a comparison allows the test interpreter to decide
whether the score is satisfactory according to established standards. Norm-referenced
tests compare test performance with the performance of others who have been measured
using the same procedure. Teachers are usually more interested in how children compare
to a useful standard than how they compare to other children; however, a comparison
with reference to standards can also provide useful insights.
1. Nominal scale.
Nominal scales are the lowest scale of measurement. A nominal scale, as the name
suggests, is simply some sort of placing data into categories, without any order or
structure. You can only examine whether the reading on the nominal scale equals some
particular value, or count the number of occurrences of each value. For example,
categorizing the blood types of classmates into A, B, AB, O, etc. V The only mathematical
operation we can perform with nominal data is counting. Variables assessed on a nominal
scale are called categorical variables; Categorical data are measured on nominal scales
that merely assign labels to distinguish categories. For example, gender is a nominal
scale variable. Classification of persons by gender is a common application of the nominal
scale.
Nominal data
2. Ordinal scale.
Something measured on an "ordinal" scale has an evaluative connotation. You can also
examine whether the ordinal scale date is less than or greater than another value. For
example, rating job satisfaction on a scale of 1 to 10, with 10 being complete satisfaction.
With ordinal scales, we only know that 2 is better than 1 or 10 is better than 9; we don't
know by how much. It may vary. So you can "score" ordinal data, but you can't "quantify"
the differences between two ordinal values. The properties of the nominal scale are
included in the ordinal scale.
Sequential Data
• ordered, but the differences between the values are not important. The difference
between the values may or may not be the same or the same.
3. Interval Scale
An ordinal scale has a quantifiable difference between values and becomes an interval
scale. You can quantify the difference between two interval scale values, but there is no
natural zero. A variable measured on an interval scale provides as much or better
information than an ordinal scale, but interval variables have the same distance between
each value. The distance between 1 and 2 is equal to the distance between 9 and 10. For
example, temperature scales are interval data with 25 °C warmer than 20 °C, and a
difference of 5 °C has some physical meaning. Note that 0C is arbitrary, so it doesn't
make sense to say that 20C is twice the temperature of 10C, but there is exactly the same
difference between 100C and 90C as there is between 42C and 32C. Student results are
measured on an interval scale
Interval Data
• differences make sense, but ratios do not (eg 30°-20°=20°-10°, but 20°/10° is not twice
as hot!
4. Proportional Scale Something measured on a ratio scale has the same properties as
an interval scale, except that on a ratio scale there is absolute zero. An example is
temperature measured in Kelvin. No value below 0 degrees Kelvin is possible, it is
absolute zero. Physical measurements of height, weight, and length are typically ratio
variables. Weight is another example, 0 lbs. there is a significant absence of weight. This
ratio applies regardless of the scale in which the object is measured (eg meters or yards).
This is because there is a natural zero.
Ratio Data • ordered, constant scale, natural zero • eg height, weight, age, length
Nominal, ordinal, interval, and ratio can be thought of as ordered in relation to each other.
Ratio is more sophisticated than interval, interval is more sophisticated than ordinal, and
ordinal is more sophisticated than nominal.
--------------------------------------------
QUESTION NO. 4
--------------------------------------------
Explain measures of variability with suitable examples?
--------------------------------------------
ANSWER
--------------------------------------------
Measures of Variability
Variability refers to the extent to which scores in a distribution differ from each other. An
equivalent definition (which is mathematically easier to work with) says that variability
refers to the extent to which scores in a distribution differ from their mean. If a distribution
lacks variability, we can say it is homogeneous (note that the opposite would be
heterogeneous).
Now discuss four measures of variability: range, mean or average deviation, variance,
and standard deviation. 1 Scope The range is probably the easiest method to find the
variability of a sample, i.e. the difference between the largest/maximum/highest and
smallest/minimum/lowest observation.
Range = Highest value - Lowest value R = XH - XL Example:
The range of the saleem’s four tests scores (3, 5, 5, 7) is:
XH = 7 and XL = 3
Therefore R = XH - XL= 7- 3= 4
Example
Consider the previous example in which results of the two different classes are:
Class 1: 80%, 80%, 80%, 80%, 80%
Class 2: 60%, 70%, 80%, 90%, 100%
The range of measurements in Class 1 is 0, and the range in class 2 is 40%. Simply
knowing that fact gives a much better understanding of the data obtained from the two
classes. In class 1, the mean was 80%, and the range was 0, but in class 2, the mean was
80%, and the range was 40%. The relationship between rang and variability can be
graphically show as:
The range of Distribution A and B is the same, although Distribution A has more
variability.
Co-efficient of Range
It is relative measure of dispersion and is based on the value of range. It is also called
range co-efficient of dispersion. It is defined as:
Co-efficient of Range = XH - XL/ XH + XL
Let us take two sets of observations. Set A contains marks of five students in
Mathematics out of 25 marks and group B contains marks of the same student in English
out of 100 marks.
Set A: 10, 15, 18, 20, 20
Set B: 30, 35, 40, 45, 50
The values of range and co-efficient of range are calculated as:
Set A: (Mathematics)
20–10=10
= 0.33
Set B: (English)
50–30=20
= 0.25
Set A has a range of 10 and set B has a range of 20. Set B appears to have more variance.
But that's not true. Range 20 in set B is for large observations and range 10 in set A is for
small observations. So 20 and 10 cannot be directly compared. Their base is not the
same. The math marks are out of 25 and the English marks are out of 100. So it doesn't
make sense to compare 10 to 20. When we convert these two values into a range
coefficient, we see that the range coefficient for set A is larger than that of set B.
Therefore, set A is larger variance or variation. Students' English grades are more stable
than math grades.
2. Mean Deviation If the deviation (MD) is the difference of a score from its mean and
the variability is the extent to which a score differs from its mean, then adding all the
deviations and dividing by their number should give a measure of variability. But the
problem is that the sum of the deviations is zero. However, calculating the absolute value
of the deviations before summing them eliminates this problem. Thus, the formula for MD
∑X ∑ −
X X
N = N
is given as follows:M.D=
Thus for sample data in which the suitable average is the X , the mean deviation ( M.D )
is given by the relation:
∑X−X
M.D=
n
For frequency distribution, the mean deviation is given by
∑fX−X
M.D =
∑f
Example:
Calculate the mean deviation from arithmetic mean in respect of the marks obtained by
nine students gives below and show that the mean deviation from median is minimum.
Marks (out of 25): 7, 4, 10, 9, 15, 12, 7, 9, 7
Solution:
After arranging the observations in ascending order, we get Marks:
4, 7, 7, 7, 9, 9, 10, 12, 15
Marks X X−X
4 4.89
7 1.89
7 1.89
7 1.89
9 0.11
9 0.11
10 1.11
12 3.11
15 6.11
Total 21.11
3. Variance
Variance is another absolute measure of dispersion. It is defined as the average of the
squared difference between each of the observations in a set of data and the mean.
For a sample data the variance is denoted is denoted by S2 and the population variance is
denoted by σ2 (sigma square).
That is:
Thus another name for the Variance is the Mean of the Squared Deviations About the
Mean (or more simply, the Mean of Squares (MS)). The problem with the MS is that its
units are squared and thus represent space, rather than a distance on the X axis like the
other measures of variability.
Example: Calculate the variance for the following sample data: 2, 4, 8, 6, 10, and 12.
Solution:
X X−X 2
2 (2–7)2 = 25
4 (4–7)2 = 9
8 (8–7)2 = 1
6 (6–7)2 = 1
10
(10–7)2 = 9
12
(12–7)2 = 25
ΣX=42
(
Σ X−
X
) =70
2
X = ∑ = 42 = 7
X
n 6
( )
2
S =∑ −
2 X X
n
( )
2
S =∑ −
2
X X
n
S 2 ==11.67
Variance = S2 = 11.67
The standard deviation is defined as the positive square root of the mean of the square
deviations taken from arithmetic mean of the data.
A simple solution to the problem of the MS representing a space is to compute its square
root. That is:
Since the standard deviation can be very small, it is usually reported to 2-3 decimal
places higher than what is available in the original data. The standard deviation is in
the same units as the original observation. If the original observations are in grams, the
value of the standard deviation will also be in grams. The standard deviation plays a
dominant role in the study of variation in data. It is a very widely used measure of
variance. It stands like a tower among the dispersal speed. When it comes to important
statistical tools, the first important one tool is the mean of x and another important tool
is the standard deviation. It is based on all observations and is subject to mathematical
processing. It is of great importance for data analysis and for various statistical
inferences.
Variance = S2 = 11.67
5. Estimation
Estimation is the goal of inferential statistics. We use sample values to estimate population
values. The symbols are as follows:
Mean X µ
Variance s2 x2
Standard Deviation s x
It is important that the sample values (estimators) be unbiased. An unbiased estimator of a
parameter is one whose average over all possible random samples of a given size equals the
value of the parameter.
Overall Example
150 150
145 110
Data
100 100
100 100
55 90
50 50
600 600
N 6 6
100 100
X
Note that the central tendency and range of the two distributions are the same. That
is, the mean, median, and mode are all 100 for both distributions, and the range is
101 for both distributions. However, while distributions A and B have the same
measures of central tendency and the same range, they differ in their variability.
Distribution A has more. Let's prove this by calculating the standard deviation in
each case. First to the distribution A:
X2
A X X
100 100 0 0
100 100 0 0
600 0 9050
N 6
Measure A
= =1810
Note that calculating the variance and standard deviation in this manner requires computing
the mean and subtracting it from each score. Since this is not very efficient and can be less
accurate as a result of rounding error, a computational formula is typically used. It is given
as follows:
Redoing the computations for Distribution A in this manner gives:
A X2
15 0 2 2500
14 5 2 1025
10 0 1 0000
10 0 1 0000
5 5 3025
5 0 2500
600 69050
N 6
Then, plugging in the appropriate values into the computational formula gives:
Note that the defining and computational formulas give the same result, but the
computational formula is easier to work with (and potentially more accurate due to less
rounding error).
B X2
150 2250 0
110 1210 0
100 1000 0
100 1000 0
90 810 0
50 250 0
600 65200
N 6
Then, plugging in the appropriate values into the computational formula gives:
--------------------------------------------
QUESTION NO. 5
--------------------------------------------
Discuss functions of test scores and progress reports in detail.
--------------------------------------------
ANSWER
--------------------------------------------
Functions of Test Scores and Progress Reports
The task of marking and reporting on student progress cannot be separated from the
procedures adopted in assessing student learning. When learning objectives are well
defined in terms of behavior or performance, and appropriate tests and other assessment
procedures are properly used, marking and reporting becomes a matter of summarizing
the results and presenting them in an understandable form. Reporting student progress
is difficult, especially when the data is represented by a single letter or numerical value
system (Linn & Gronlund, 2000). Assessments and referrals are decisions that require
information about individual students. In contrast, curricular and instructional decisions
require information about groups of students, often entire classes or schools (Linn &
Gronlund, 2000). There are three main purposes of student assessment. First, stamps
are the primary currency of exchange for the many opportunities and rewards our
company offers. Grades can be exchanged for various entities such as adult approval,
public recognition, admission to colleges and universities, etc. To deprive students of
grades is to deprive them of rewards and opportunities. Second, teachers become
accustomed to assessing their students' learning, and if teachers do not assess, students
may not be well aware of their learning progress. Third, students are motivated by
grading. Grades can serve as incentives, and for many students, incentives serve a
motivational function. Below are the various features of rating and reporting systems:
1. Use in teaching The focus of assessment and reporting should be on the student's
improvement in learning. This is most likely to occur when the message:
a) clarifies the learning objectives;
b) indicates the student's strengths and weaknesses in learning;
c) provides information about the pupil's personal and social development; and
d) contributes to student motivation.
Improving student learning is probably best achieved through daily assessment of
learning and feedback from tests and other assessment practices. A portfolio of work
produced during the academic year can be reviewed to regularly highlight a student's
strengths and weaknesses. Regular progress reports can help motivate students by
providing short-term goals and knowledge of results. Both are essential features of basic
learning. Well-designed progress reports can also assist in the evaluation of teaching
practices by identifying areas for revision. When the majority of student reports indicate
poor progress, this may indicate a need to adjust the learning objectives.
2. Feedback for students Grading and communicating test results to students is a
permanent practice in all educational institutions of the world. The mechanism or strategy
may vary from country to country or institution, but every institution follows this procedure
in some way. Reporting test scores to students has a number of benefits for them. As
students progress to higher grades, the usefulness of test scores for personal academic
planning and self-evaluation increases. For most students, scores provide feedback on
how much they know and how effective their learning has been. They may know their
strengths and areas that require special attention. Such feedback is necessary if students
are expected to be partners in managing their own instructional time and effort. These
results help them make the right decisions for their future professional development.
Teachers use a variety of strategies to help students become independent learners who
are able to take increasing responsibility for their own academic progress. Self-
assessment is an important aspect of self-assessment, and reporting test results can be
an integral part of the practices teachers use to support self-assessment. Test scores
help students identify areas for improvement, areas where significant progress has been
made, and areas where sustained high effort will help maintain a high level of
achievement. Test scores can be used with information from teacher evaluations to help
students set their own learning goals, decide how to allocate their time, and prioritize
improving skills such as reading, writing, speaking, and problem solving. When students
receive their own test results, they can learn about self-assessment while doing real self-
assessment. (Iowa Testing Programs, 2011). Assessment and reporting also provide
opportunities for students to develop an awareness of how they are growing in various
skill areas. Self-evaluation begins with self-monitoring, a skill that most children have
already begun to develop before kindergarten.
3. Administrative and Advisory Use Assessments and progress reports serve a
number of administrative functions. For example, they are used to determine progress
and graduation, award honors, determine student athletic ability, and report to other
institutions and employers. A single letter is usually required for most administrative
purposes, but technically, of course, a single letter does not actually interpret a student's
grade. Counselors use grades and student achievement reports, along with other
information, to help students create realistic educational and career plans. Reports that
include assessments of personal and social characteristics are also helpful in helping
students with adjustment problems.
4. Informing parents about their children’s performance Parents are often overwhelmed
by the grades and test reports they receive from school staff. In order to create a true
partnership between parents and teachers, it is essential that information about student
progress is communicated clearly, respectfully and accurately. Test results should be
provided to parents who use; a) simple, clear language without educational and testing
jargon and b) an explanation of the purpose of the tests used (Canter, 1998). Most of the
time parents are either ignored or least involved to be aware of their children's progress.
To strengthen the connection between home and school, parents need to receive
comprehensive information about their children's achievements. If parents do not
understand the tests their children receive, the scores, and how the results are used to
make decisions about their children, they are disqualified from helping their children learn
and make decisions. According to Kearney (1983), the lack of information provided to
consumers about test data has far-reaching and negative consequences. It states;
Individual student needs are not being met, parents are not being fully informed about
student progress, curricular needs are not being identified and addressed, and results are
not being reported to the various audiences who need to receive this information and
need to know what is being done about it. information. In some countries there are
prescribed policies for marking and reporting test results to parents. For example, the
Michigan Educational Assessment Policy (MEAP) is regularly revised with parent
suggestions and feedback in mind. The MEAP consists of criterion-referenced tests,
primarily in math and reading, administered annually to all students in grades four, seven,
and ten. MEAP recommends that policymakers at the state and local levels develop
strong linkages to establish, implement, and monitor effective reporting practices.
(Barber, Paris, Evans, & Gadsden, 1992). Without a doubt, it is more effective to talk to
parents about their children's scores than to send them a results report home for them to
interpret for themselves. For a variety of reasons, parent-teacher conferences or parent-
student-teacher conferences offer an excellent opportunity for teachers to provide and
interpret these results to parents.
1. Teachers tend to be more informed than parents about tests and the types of scores
interpreted.
2. Teachers can make numerous observations of their students' work and then document
the results. Discrepancies between test scores and classroom performance can be noted
and discussed.
3. Teachers are provided with samples of work that can be used to illustrate the type of
class work the student has been doing. Portfolios can be used to illustrate strengths and
explain where improvement is needed.
4. Teachers may be aware of special circumstances that may affect scores, either
positively or negatively, and skew the level of student achievement.
5. Parents have the opportunity to ask questions about misunderstandings or how they
can work. Students and teachers should communicate test results to parents at school
while addressing apparent weaknesses and building on strengths wherever possible.
(Iowa Testing Program, 2011). Under the 1998 Act, schools are required to regularly
assess students and regularly inform parents of assessment results, but specifically the
NCCA guidelines recommend that schools report to parents twice a year - once at the
end of Term 1 or at the start of Term 2 and the other at the end of the school year year.
Under existing data protection legislation, parents have a legal right to obtain the scores
their children have obtained on standardized tests. The NCCA developed a set of report
card templates for schools to use when communicating with parents, which were adopted
in conjunction with Circular 0138 issued by the Department of Education in 2006. A case
study conducted in the US context (www.uscharterschools.org) found that “a school
would it should be a resource for parents, not dictate to parents what their role should
be". In other words, the school should respect all parents and value the experiences and
individual strengths they offer their children.