0% found this document useful (0 votes)

52 views30 pages

8602 Assignment

Uploaded by

mj0780603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views30 pages

8602 Assignment

Uploaded by

mj0780603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

ASSIGNMENT No.

2
COURSE CODE - 8602

CODE NAME - EDUCATIONAL ASSESSMENT

AND EVALUATION

NAME - MUHAMMAD AFZAL

STUDENT ID - 0000758015

ADMISSION - SPRING 2024

--------------------------------------------
QUESTION NO. 1
--------------------------------------------
Explain the importance of validity for meaningful assessment?
--------------------------------------------
ANSWER
--------------------------------------------
Nature of Validity
The validity of an assessment instrument is the extent to which it measures what it is
designed to measure. For example, if a test is designed to measure three-digit addition
skill in mathematics, but the problems are presented in difficult language that does not
match the students' ability level, it may not measure three-digit addition ability and, as a
result, the test will not be valid. Many measurement experts have defined this term, some
definitions are given below.
According to the Business Dictionary, "validity is the degree to which an instrument,
sampling process, statistical technique, or test measures what it is intended to measure."
Cook and Campbell (1979) define validity as the appropriateness or correctness of
inferences, decisions, or descriptions made about individuals, groups, or institutions from
test results.
According to APA (American Psychological Association) standards, validity is the most
important factor in test evaluation. This concept refers to the appropriateness,
meaningfulness, and usefulness of specific inferences from test scores. Test validation is
the process of gathering evidence to support such conclusions. However, validity is a
uniform concept. Although evidence can be collected in many ways, validity always refers
to the extent to which that evidence supports the conclusions drawn from the scores.
Conclusions regarding the specific use of the test are verified, not the test itself.
Howell's (1992) view of test validity is; a valid test must measure specifically what it
purports to measure.
According to Messick, validity is a matter of degree, it is not absolutely valid or absolutely
invalid. It is true that validity evidence will accumulate over time, either reinforcing or
contradicting previous findings. Overall, in terms of assessment, validity refers to the
extent to which the test content is representative of the skills actually learned and whether
the test can provide accurate inferences about achievement. Thus, validity is the extent
to which a test measures what it claims to measure. It is important that the test is valid so
that the results can be accurately applied and interpreted.
Let's look at the following examples.
Examples:
1. Let's say you are charged with monitoring the effect of strict attendance policies on
class participation. After two or three weeks of observation, you reported that classroom
participation increased after the policy was implemented.
2. Let's say you are tasked with measuring intelligence, and if math and vocabulary
represent intelligence, then you could say that a test of math and vocabulary has high
validity when used as a measure of intelligence.
A test has validity, evidence, if we can show that it measures what it says it measures.
For example, if it is to be a test of fifth grade arithmetic ability, it should measure fifth
grade arithmetic ability and not reading ability.

1. Validity and validation of the test Tests can take the form of written answers to a
series of questions, such as paper and pencil tests, or professional assessments of
behavior in the classroom/school or as an assessment of work performance. The form of
written test results also varies from pass/fail to holistic judgments to a complex series of
numbers designed to capture minute differences in behaviour.
Regardless of the form of the test, its most important aspect is how the results are used
and how those results affect the individual and society as a whole. Tests used for
admission to schools or programs or for educational diagnosis not only affect the
individual but also attribute value to the content tested. A test that is perfectly appropriate
and useful in one situation may be inappropriate or inadequate in another. For example,
a test that may be adequate for use in pedagogical diagnostics may be completely
inappropriate for use in determining matriculation. Test validity, or test validation, explicitly
means validating the use of a test in a specific context, such as college admissions or
course placement. Therefore, when determining the validity of a test, it is important to
study the results of the tests in the environment in which they are used. In the previous
example, in order to use the same test for educational diagnosis as for matriculation, each
use would have to be validated separately, even though the same test is used for both
purposes.

2. The purpose of measuring validity Most, but not all, tests are designed to measure
skills, abilities, or characteristics that are and are not directly observable. For example,
Scholastic Aptitude Test (SAT) scores developed critical reading, writing, and math skills.
The SAT score a test taker receives when they pass the test is not a direct measure of
critical reading ability, any more than degrees Celsius are a direct measure of the heat of
an object. An examinee's level of critical reading ability development must be inferred
from his or her SAT Critical Reading score. The process of using test scores as a sample
of behavior to draw inferences about a larger domain of behavior is characteristic of most
educational and psychological tests. Responsible test developers and publishers must be
able to demonstrate that it is possible to use the sample of behavior measured by the test
to make valid inferences about the examinee's ability to perform tasks that represent the
broader domain of interest.

Understanding the nature of validity At its core, validity is the degree to which a test
measures what it claims to measure. It is the strength of our inferences, conclusions and
suggestions about a student's knowledge and skills based on their assessment results.
Think of validity as the "truth" of a test score. For educators and students alike, the
implications of validity are far-reaching, affecting everything from curriculum design to
student progress.

Content validity: Ensuring alignment with learning objectives

Content validity is fundamental to the development of assessments. It looks at whether the content
of an assessment aligns with the specific instructional objectives it is supposed to measure. For
example, a math test on algebra should cover a representative range of algebraic concepts and not
stray into unrelated areas like geometry. Here’s how content validity impacts the reliability of
assessments:
• Curriculum representation: The content of the assessment should mirror the curriculum or
the subject matter being taught.
• Relevance of items: Each item on the assessment must be relevant to the learning objectives
and contribute meaningfully to measuring the student’s knowledge.
• Breadth and depth: A valid assessment covers the necessary breadth (range of topics) and
depth (level of difficulty) appropriate to the learners’ level.
Criterion-related validity: Predictive and concurrent measurement
How well does an assessment forecast a student’s future performance or correlate with their current
status? That’s what criterion-related validity is all about. It’s split into two types:
• Predictive validity: This aspect of validity assesses the effectiveness of an evaluation tool
in predicting a student’s future performance. For instance, the SAT exam’s predictive validity is
gauged by how well it forecasts a student’s success in college.
• Concurrent validity: Concurrent validity measures how well a new test compares to a well-
established assessment. A new English language proficiency test, for example, would be measured
against a recognized standard to establish its concurrent validity.
Construct validity: The theoretical foundation
Construct validity is perhaps the most complex form of validity. It’s concerned with how well a
test aligns with the underlying theories and constructs that it’s supposed to measure. A construct
can be anything from intelligence to motivation or creativity—intangible qualities that are not
directly observable but are inferred from behavior or responses.
• Operationalization: This involves defining the constructs in measurable terms. How we
define ‘creativity’ will shape the kind of assessment items we choose to measure it.
• Multi-faceted evidence: Often, construct validity requires multiple lines of evidence,
including correlational studies, factor analysis, or experimental manipulation, to establish the link
between the test scores and the construct.

Factorial validity: Dissecting the structural composition

Factorial validity digs into the structure of the test itself, examining whether the test’s components
reflect the diversity within the construct. It’s closely tied to factor analysis, a statistical method
that explores patterns among test items to see if they group together as expected.
• Dimensionality: Tests are often designed to measure multiple dimensions of a construct.
Factorial validity assesses whether these dimensions emerge clearly from the test items.
• Construct differentiation: A valid test should not only measure the intended construct but
also differentiate it from other constructs. For example, a math test should measure mathematical
reasoning, not reading ability.

The cumulative impact of validity on education

Validity is not just an isolated statistic or a box to check off during test development. It’s a dynamic
quality that permeates through every aspect of educational assessment. Consider its broader
implications:
• Instructional improvement: Valid assessments provide teachers with accurate data to tailor
their instruction and support student learning more effectively.
• Policy and decision-making: High-stakes decisions about student promotion, graduation, or
admissions rely heavily on the validity of assessments.
• Equity and fairness: Valid assessments help ensure that all students are evaluated based on
relevant knowledge and skills, promoting fairness in educational opportunities.

Striving for validity: A continuous process

Establishing validity is not a one-time task; it’s an ongoing process that requires diligent attention
from educators, test developers, and researchers. New curricula, evolving educational standards,
and changing student demographics all necessitate regular reviews and updates of assessment
tools.

Conclusion
Validity in educational assessments is a topic rich in depth and significance. It’s the linchpin that
guarantees assessments do more than just generate scores; they provide meaningful insights into
student learning. As we’ve explored, the various facets of validity—content, criterion-related,
construct, and factorial—each play a vital role in creating a holistic picture of a student’s
knowledge and abilities.
--------------------------------------------
QUESTION NO. 2
--------------------------------------------
Discuss general consideration in constructing essay type test items with suitable examples?
--------------------------------------------
ANSWER
--------------------------------------------
General Consideration in Constructing Essay type Test Items
Robert L. Ebel and David A. Frisbie (1991) write in their book that “teachers often treat
the measurement of students' ability to think and use knowledge as a measure of the
knowledge their students possess. In these cases, tests that give students some degree
of freedom in their answers are necessary. Essay tests are tailored for this purpose. The
student writes an answer to the question on several paragraphs in the range of several
pages. Essays can be used for higher learning outcomes such as synthesis or
assessment as well as for lower level outcomes. They provide students with items to
supply, rather than choosing an appropriate answer, usually students compose the
answer into one or more sentences. Essay tests allow students to demonstrate their
ability to recall, organize, synthesize, connect, analyze, and evaluate ideas.
Types of Essay Tests
Essay tests can be divided into many types. W.S. Monree and R.I. Cater (1993) divide
essay tests into many categories, such as selective feedback, given assessment basis,
Comparing two things on one set basis, Comparing two things in general, Decision - for
or against, cause and effect, explanation of use or the exact meaning of a word, phrase
in a statement, summary of some unit of a textbook or article, analysis, statement of
relationships, illustration or example, classification, application of rules, laws, or principles
to new situations, discussion, statement of author's intent in selection or organization of
material , Criticism - of the adequacy, correctness or relevance of a printed statement or
a classmate's answer to a class question, reorganization of facts, formulation of a new
question - raised problems and questions, new methods of procedure, etc.
Types of Constructed Response Items
Essay items can vary from very long, open-ended term papers or take-home tests that
have flexible page limits (eg, 10-12 pages, no more than 30 pages, etc.) to limited-
response essays or essays of one page or less. . Thus, essay type items are of two types:-
• Limited Essay Response
• Extended Response Essay Items

I. Limited Response Essay Items

An essay item that presents a specific problem for which the student must recall relevant
information, organize it in an appropriate way, draw a defensible conclusion, and express
it within the limits of the problem posed or within a page or time limit is called limited.
answer essay item. The problem statement specifies the response constraints that guide
the student in answering and provide the evaluation criteria for scoring.
Example 1:
List the main similarities and differences in the lives of people living in Islamabad and
Faisalabad.
Example 2:
Compare the advantages and disadvantages of the lecture teaching method and the
demonstration teaching method.
When should limited response essay items be used?
Limited response essay items are usually used to:-
• Compare and contrast the positions • State the necessary assumptions
• Identify appropriate conclusions
• Explain the cause and effect relationship
• Organize data to support display
• Evaluate the quality and value of an item or action
• Integrate data from multiple sources
II. Extended response essay type items
An essay-type item that allows the student to specify length and complexity the answer
is called an extended response essay item. This type of essay is most useful when levels
of synthesis or cognitive domain assessment. We are interested in determining whether
students can organize, integrate, express and evaluate information, ideas, or extended
response items are used for the knowledge section.
Example:
Identify as many different ways of generating electricity in Pakistan as possible? Give
advantages and disadvantages of each. Your answer will be graded according to its
accuracy, understanding and practical skills. Your answer should be 8-10 pages long they
will be evaluated according to the already provided RUBRIC (scoring criteria).
Suggestions for Writing Essay Type Items
I. Ask or set tasks that will require the student to demonstrate mastery of basic
knowledge. This means that students should not be asked to simply reproduce
material they have heard in a lecture or read in a textbook. "Demonstrate command"
requires that the question be somewhat new or novel. The gist of the question
should be basic knowledge rather than trivia which might make a good board game
question.
II. Ask questions that are specific in the sense that experts (peers in the field) can
agree that one answer is better than another. Questions that include phrases like
"What do you think..." or "What is your opinion about..." are vague. They can be
used as a medium for assessing writing skills, but because they do not have a clear
right or wrong answer, they are useless for measuring other aspects of
achievement.
III. Define the examinee's task as completely and specifically as possible without
interfering with the measurement process itself. An essay item can be worded so
precisely that there is only one very short answer to it. Imposing such strict limits
on response is more restrictive than helpful. However, examinees need a guide to
judge how extensive their answer needs to be to be considered complete and
accurate.
IV. In general, prefer specific questions that can be answered concisely. The more
questions that are used, the better the test designer can sample the knowledge
area covered by the test. And the more responses available for evaluation, the more
accurate the overall test results are likely to be. In addition, short answers can be
scored more quickly and accurately than long extended answers, even when there
are fewer of the latter type.
V. Use enough items to adequately sample the relevant content domain, but not so
many that students do not have enough time to plan, develop, and revise their
responses. Some instructors use essay tests rather than one of the objective types
because they want to encourage and provide practice in writing. However, when
the time pressure becomes great, the essay test is one of the most unrealistic and
negative writing experiences that students can be exposed to. There is often no
time to edit, re-read, or spell-check. There is little time for planning, it has changed
so that there will be no time for writing. There are few, if any, real writing
assignments that require such conditions. And few writing experiences discourage
the use of proper writing habits as much as essay testing.
VI. Do not give examinees a choice between optional questions unless special
circumstances require it. Using optional items destroys strict comparability between
student scores because not all students actually take the same test. Student A
could answer points 1-3 and Student B could answer points 3-5. Under these
circumstances, score variability is likely to be relatively small because students
were able to respond to items they knew more about and ignore items they did not.
This reduced variability contributes to reduced test score reliability. This means that
we are less able to identify individual differences in outcomes when test scores form
a very homogeneous distribution. In summary, optional items limit the comparability
of scores across students and contribute to low score reliability due to reduced test
score variability.
VII. Test the question by writing the ideal answer. Ultimately, an ideal response is
needed to evaluate the responses. If prepared in time, it allows for review of the
wording of the item, the level of completeness required for an ideal response, and
the amount of time required to provide an appropriate response. It even allows the
author of the item to determine if there is any "correct" answer to the question.
VIII. Specify a time allowance for each item and/or specify a maximum number of points
that can be earned for the "best" answer to the question. Both pieces of information
give the examinee a clue as to the depth of response expected by the item writer.
They also represent legitimate information that the student can use to decide which
of several items should be skipped when time runs out. The number of points
attached to an item often reflects the number of parts necessary for an ideal
response. Of course, if a certain number of essential parts can be determined, that
number should be given as part of the question.
IX. Break the question into separate parts if there are obvious multiple questions or
parts of intended answers. The use of sections helps investigators organize and
thus streamlines the process. It also facilitates the assessment process as it
encourages the organization of answers. Finally, if multiple questions are not
specified, some examinees may inadvertently skip sections, especially when time
is tight.

--------------------------------------------
QUESTION NO. 3
--------------------------------------------
Write a note on the uses of measurement scales for students' learning assessment?
--------------------------------------------
ANSWER
--------------------------------------------
Introduction of Measurement Scales
All types of research data, test results, survey data, etc. are called raw data and are
collected using four basic scales. Nominal, ordinal, interval, and ratio are the four basic
scales for data collection. Ratio is more sophisticated than interval, interval is more
sophisticated than ordinal, and ordinal is more sophisticated than nominal. A variable
measured on a "nominal" scale is a variable that does not actually have any rating
resolution. One value really isn't greater than the other. A good example of a nominal
variable is gender. For nominal variables, there is a qualitative difference between values,
not a quantitative one. Something measured on an "ordinal" scale has an evaluative
connotation. One value is greater or greater or better than the other. With ordinal scales,
we only know that one value is better than another, or 10 is better than 9. A variable
measured on an interval or ratio scale has maximum evaluative resolution. After data
collection, there are three basic ways to compare and interpret the results obtained from
the responses. Student performance can be compared and interpreted against an
absolute standard, a criterion-referenced standard, or a norm-referenced standard. Some
examples from everyday life and educational contexts can make this clear:
Sr. Standard Characteristics daily life educational context
No.
1 Absolute simply state the He is 6' and 2" tall He spelled correctly 45 out
observed outcome of 50 English words

2 criterion compare the He is tall enough His score of 40 out of 50 is

referenced person's to catch the branch greater than minimum
performance with a of this tree. cutoff point 33. So he must
standard, or promoted to the next class.
criterion.

3 norm-referenced compare a person's He is the third His score of 37 out of 50

performance with fastest ballar in the was not very good; 65% of
that of other people pakistani squad his class fellows did better.
in the same context. 15.

All three types of score interpretation are useful depending on the purpose for which the
comparisons were made. An absolute score merely describes a measure of performance
or success without comparing it to any set or specified standard. Scores aren't particularly
useful without some kind of comparison. Criterion scores compare test performance
against a specific standard; such a comparison allows the test interpreter to decide
whether the score is satisfactory according to established standards. Norm-referenced
tests compare test performance with the performance of others who have been measured
using the same procedure. Teachers are usually more interested in how children compare
to a useful standard than how they compare to other children; however, a comparison
with reference to standards can also provide useful insights.

Using a scale Measurement.

Measurement is assigning numbers to objects or events in a systematic way.

Measurement scales are critical because they relate to the types of statistics you can use
to analyze data. An easy way to get a paper rejected is to use either the wrong
combination of measures and statistics, or to use a low-powered statistic on a high-
powered data set. The following four scale levels are commonly distinguished so that
proper analysis can be applied to the data and the number can only be used to label or
categorize the response.

1. Nominal scale.

Nominal scales are the lowest scale of measurement. A nominal scale, as the name
suggests, is simply some sort of placing data into categories, without any order or
structure. You can only examine whether the reading on the nominal scale equals some
particular value, or count the number of occurrences of each value. For example,
categorizing the blood types of classmates into A, B, AB, O, etc. V The only mathematical
operation we can perform with nominal data is counting. Variables assessed on a nominal
scale are called categorical variables; Categorical data are measured on nominal scales
that merely assign labels to distinguish categories. For example, gender is a nominal
scale variable. Classification of persons by gender is a common application of the nominal
scale.

Nominal data

• Classification or sorting of data, eg male or female • no ordering, e.g. it makes no sense

to say that a man is bigger than a woman (M > F) etc

• Arbitrary labels, eg pass=1 and fail=2 etc

2. Ordinal scale.

Something measured on an "ordinal" scale has an evaluative connotation. You can also
examine whether the ordinal scale date is less than or greater than another value. For
example, rating job satisfaction on a scale of 1 to 10, with 10 being complete satisfaction.
With ordinal scales, we only know that 2 is better than 1 or 10 is better than 9; we don't
know by how much. It may vary. So you can "score" ordinal data, but you can't "quantify"
the differences between two ordinal values. The properties of the nominal scale are
included in the ordinal scale.

Sequential Data
• ordered, but the differences between the values are not important. The difference
between the values may or may not be the same or the same.

• e.g. political parties in the spectrum from left to right marked 0, 1, 2

• e.g. Likert scales, rank your level of satisfaction on a scale from 1 to 5

• e.g. rating of restaurants

3. Interval Scale

An ordinal scale has a quantifiable difference between values and becomes an interval
scale. You can quantify the difference between two interval scale values, but there is no
natural zero. A variable measured on an interval scale provides as much or better
information than an ordinal scale, but interval variables have the same distance between
each value. The distance between 1 and 2 is equal to the distance between 9 and 10. For
example, temperature scales are interval data with 25 °C warmer than 20 °C, and a
difference of 5 °C has some physical meaning. Note that 0C is arbitrary, so it doesn't
make sense to say that 20C is twice the temperature of 10C, but there is exactly the same
difference between 100C and 90C as there is between 42C and 32C. Student results are
measured on an interval scale

Interval Data

• ordered, constant scale, but no natural zero

• differences make sense, but ratios do not (eg 30°-20°=20°-10°, but 20°/10° is not twice
as hot!

• e.g. temperature (C,F), data

4. Proportional Scale Something measured on a ratio scale has the same properties as
an interval scale, except that on a ratio scale there is absolute zero. An example is
temperature measured in Kelvin. No value below 0 degrees Kelvin is possible, it is
absolute zero. Physical measurements of height, weight, and length are typically ratio
variables. Weight is another example, 0 lbs. there is a significant absence of weight. This
ratio applies regardless of the scale in which the object is measured (eg meters or yards).
This is because there is a natural zero.

Ratio Data • ordered, constant scale, natural zero • eg height, weight, age, length
Nominal, ordinal, interval, and ratio can be thought of as ordered in relation to each other.
Ratio is more sophisticated than interval, interval is more sophisticated than ordinal, and
ordinal is more sophisticated than nominal.

--------------------------------------------

QUESTION NO. 4
--------------------------------------------
Explain measures of variability with suitable examples?

--------------------------------------------
ANSWER
--------------------------------------------
Measures of Variability

Variability refers to the extent to which scores in a distribution differ from each other. An
equivalent definition (which is mathematically easier to work with) says that variability
refers to the extent to which scores in a distribution differ from their mean. If a distribution
lacks variability, we can say it is homogeneous (note that the opposite would be
heterogeneous).
Now discuss four measures of variability: range, mean or average deviation, variance,
and standard deviation. 1 Scope The range is probably the easiest method to find the
variability of a sample, i.e. the difference between the largest/maximum/highest and
smallest/minimum/lowest observation.
Range = Highest value - Lowest value R = XH - XL Example:
The range of the saleem’s four tests scores (3, 5, 5, 7) is:

XH = 7 and XL = 3
Therefore R = XH - XL= 7- 3= 4

Example
Consider the previous example in which results of the two different classes are:
Class 1: 80%, 80%, 80%, 80%, 80%
Class 2: 60%, 70%, 80%, 90%, 100%
The range of measurements in Class 1 is 0, and the range in class 2 is 40%. Simply
knowing that fact gives a much better understanding of the data obtained from the two
classes. In class 1, the mean was 80%, and the range was 0, but in class 2, the mean was
80%, and the range was 40%. The relationship between rang and variability can be
graphically show as:

Distribution A has a larger range (and more variability) than Distribution B.

Because only the two extreme scores are used in computing the range, however, it is a
crude measure. For example:

The range of Distribution A and B is the same, although Distribution A has more
variability.

Co-efficient of Range
It is relative measure of dispersion and is based on the value of range. It is also called
range co-efficient of dispersion. It is defined as:
Co-efficient of Range = XH - XL/ XH + XL
Let us take two sets of observations. Set A contains marks of five students in
Mathematics out of 25 marks and group B contains marks of the same student in English
out of 100 marks.
Set A: 10, 15, 18, 20, 20
Set B: 30, 35, 40, 45, 50
The values of range and co-efficient of range are calculated as:

Range Coefficient of Range

Set A: (Mathematics)
20–10=10
= 0.33

Set B: (English)
50–30=20
= 0.25

Set A has a range of 10 and set B has a range of 20. Set B appears to have more variance.
But that's not true. Range 20 in set B is for large observations and range 10 in set A is for
small observations. So 20 and 10 cannot be directly compared. Their base is not the
same. The math marks are out of 25 and the English marks are out of 100. So it doesn't
make sense to compare 10 to 20. When we convert these two values into a range
coefficient, we see that the range coefficient for set A is larger than that of set B.
Therefore, set A is larger variance or variation. Students' English grades are more stable
than math grades.
2. Mean Deviation If the deviation (MD) is the difference of a score from its mean and
the variability is the extent to which a score differs from its mean, then adding all the
deviations and dividing by their number should give a measure of variability. But the
problem is that the sum of the deviations is zero. However, calculating the absolute value
of the deviations before summing them eliminates this problem. Thus, the formula for MD

∑X ∑ −
X X
N = N
is given as follows:M.D=
Thus for sample data in which the suitable average is the X , the mean deviation ( M.D )
is given by the relation:

∑X−X
M.D=
n
For frequency distribution, the mean deviation is given by

∑fX−X
M.D =
∑f
Example:
Calculate the mean deviation from arithmetic mean in respect of the marks obtained by
nine students gives below and show that the mean deviation from median is minimum.
Marks (out of 25): 7, 4, 10, 9, 15, 12, 7, 9, 7

Solution:
After arranging the observations in ascending order, we get Marks:
4, 7, 7, 7, 9, 9, 10, 12, 15

Marks X X−X

4 4.89

7 1.89

9 0.11

10 1.11
12 3.11

15 6.11

Total 21.11

3. Variance
Variance is another absolute measure of dispersion. It is defined as the average of the
squared difference between each of the observations in a set of data and the mean.
For a sample data the variance is denoted is denoted by S2 and the population variance is
denoted by σ2 (sigma square).

That is:

Thus another name for the Variance is the Mean of the Squared Deviations About the
Mean (or more simply, the Mean of Squares (MS)). The problem with the MS is that its
units are squared and thus represent space, rather than a distance on the X axis like the
other measures of variability.

Example: Calculate the variance for the following sample data: 2, 4, 8, 6, 10, and 12.

Solution:
X X−X 2

2 (2–7)2 = 25

4 (4–7)2 = 9
8 (8–7)2 = 1

6 (6–7)2 = 1

10
(10–7)2 = 9

12
(12–7)2 = 25

ΣX=42
(
Σ X−
X
) =70
2

X = ∑ = 42 = 7
X
n 6

( )
2

S =∑ −
2 X X
n

( )
2

S =∑ −
2
X X
n

S 2 ==11.67

Variance = S2 = 11.67

Variance is another absolute measure of dispersion. It is defined as the average of the

squared difference between each of the observations in a set of data and the mean.
4. Standard Deviation

The standard deviation is defined as the positive square root of the mean of the square
deviations taken from arithmetic mean of the data.

A simple solution to the problem of the MS representing a space is to compute its square
root. That is:
Since the standard deviation can be very small, it is usually reported to 2-3 decimal
places higher than what is available in the original data. The standard deviation is in
the same units as the original observation. If the original observations are in grams, the
value of the standard deviation will also be in grams. The standard deviation plays a
dominant role in the study of variation in data. It is a very widely used measure of
variance. It stands like a tower among the dispersal speed. When it comes to important
statistical tools, the first important one tool is the mean of x and another important tool
is the standard deviation. It is based on all observations and is subject to mathematical
processing. It is of great importance for data analysis and for various statistical
inferences.

Properties of the Variance & Standard Deviation:

1. Are always positive (or zero).
2. Equal zero when all scores are identical (i.e., there is no variability).
3. Like the mean, they are sensitive to all scores.
Example: in previous example

Variance = S2 = 11.67

Therefore SD= S = S2 = 11.67 = 3.41

5. Estimation
Estimation is the goal of inferential statistics. We use sample values to estimate population
values. The symbols are as follows:

Measure Sample Population

Mean X µ

Variance s2 x2

Standard Deviation s x
It is important that the sample values (estimators) be unbiased. An unbiased estimator of a
parameter is one whose average over all possible random samples of a given size equals the
value of the parameter.

While X is an unbiased estimator of x, s2 is not an unbiased estimator of x2.

In order to make it an unbiased estimator, we use N-1 in the denominator of the formula
rather than just N. Thus:

Overall Example

Consider a possibility for the scores that go with these distributions:

Distribution A B

150 150

145 110
Data
100 100

100 100

55 90

50 50

600 600

N 6 6

100 100
X

Range 150 -50+1=101 150-50+ 1=101

Note that the central tendency and range of the two distributions are the same. That
is, the mean, median, and mode are all 100 for both distributions, and the range is
101 for both distributions. However, while distributions A and B have the same
measures of central tendency and the same range, they differ in their variability.
Distribution A has more. Let's prove this by calculating the standard deviation in
each case. First to the distribution A:

X2
A X X

150 100 50 2500

145 100 45 2025

100 100 0 0

55 100 -45 2025

50 100 -50 2500

600 0 9050

N 6

Plugging the appropriate values into the defining formula gives:

Measure A

= =1810

Note that calculating the variance and standard deviation in this manner requires computing
the mean and subtracting it from each score. Since this is not very efficient and can be less
accurate as a result of rounding error, a computational formula is typically used. It is given
as follows:
Redoing the computations for Distribution A in this manner gives:

A X2

15 0 2 2500

14 5 2 1025

10 0 1 0000

5 5 3025

5 0 2500

600 69050

N 6

Then, plugging in the appropriate values into the computational formula gives:

Note that the defining and computational formulas give the same result, but the
computational formula is easier to work with (and potentially more accurate due to less
rounding error).

Doing the same calculations for Distribution B yields:

B X2

150 2250 0
110 1210 0

100 1000 0

90 810 0

50 250 0

600 65200

N 6

Then, plugging in the appropriate values into the computational formula gives:

--------------------------------------------
QUESTION NO. 5
--------------------------------------------
Discuss functions of test scores and progress reports in detail.
--------------------------------------------
ANSWER
--------------------------------------------
Functions of Test Scores and Progress Reports

The task of marking and reporting on student progress cannot be separated from the
procedures adopted in assessing student learning. When learning objectives are well
defined in terms of behavior or performance, and appropriate tests and other assessment
procedures are properly used, marking and reporting becomes a matter of summarizing
the results and presenting them in an understandable form. Reporting student progress
is difficult, especially when the data is represented by a single letter or numerical value
system (Linn & Gronlund, 2000). Assessments and referrals are decisions that require
information about individual students. In contrast, curricular and instructional decisions
require information about groups of students, often entire classes or schools (Linn &
Gronlund, 2000). There are three main purposes of student assessment. First, stamps
are the primary currency of exchange for the many opportunities and rewards our
company offers. Grades can be exchanged for various entities such as adult approval,
public recognition, admission to colleges and universities, etc. To deprive students of
grades is to deprive them of rewards and opportunities. Second, teachers become
accustomed to assessing their students' learning, and if teachers do not assess, students
may not be well aware of their learning progress. Third, students are motivated by
grading. Grades can serve as incentives, and for many students, incentives serve a
motivational function. Below are the various features of rating and reporting systems:
1. Use in teaching The focus of assessment and reporting should be on the student's
improvement in learning. This is most likely to occur when the message:
a) clarifies the learning objectives;
b) indicates the student's strengths and weaknesses in learning;
c) provides information about the pupil's personal and social development; and
d) contributes to student motivation.
Improving student learning is probably best achieved through daily assessment of
learning and feedback from tests and other assessment practices. A portfolio of work
produced during the academic year can be reviewed to regularly highlight a student's
strengths and weaknesses. Regular progress reports can help motivate students by
providing short-term goals and knowledge of results. Both are essential features of basic
learning. Well-designed progress reports can also assist in the evaluation of teaching
practices by identifying areas for revision. When the majority of student reports indicate
poor progress, this may indicate a need to adjust the learning objectives.
2. Feedback for students Grading and communicating test results to students is a
permanent practice in all educational institutions of the world. The mechanism or strategy
may vary from country to country or institution, but every institution follows this procedure
in some way. Reporting test scores to students has a number of benefits for them. As
students progress to higher grades, the usefulness of test scores for personal academic
planning and self-evaluation increases. For most students, scores provide feedback on
how much they know and how effective their learning has been. They may know their
strengths and areas that require special attention. Such feedback is necessary if students
are expected to be partners in managing their own instructional time and effort. These
results help them make the right decisions for their future professional development.
Teachers use a variety of strategies to help students become independent learners who
are able to take increasing responsibility for their own academic progress. Self-
assessment is an important aspect of self-assessment, and reporting test results can be
an integral part of the practices teachers use to support self-assessment. Test scores
help students identify areas for improvement, areas where significant progress has been
made, and areas where sustained high effort will help maintain a high level of
achievement. Test scores can be used with information from teacher evaluations to help
students set their own learning goals, decide how to allocate their time, and prioritize
improving skills such as reading, writing, speaking, and problem solving. When students
receive their own test results, they can learn about self-assessment while doing real self-
assessment. (Iowa Testing Programs, 2011). Assessment and reporting also provide
opportunities for students to develop an awareness of how they are growing in various
skill areas. Self-evaluation begins with self-monitoring, a skill that most children have
already begun to develop before kindergarten.
3. Administrative and Advisory Use Assessments and progress reports serve a
number of administrative functions. For example, they are used to determine progress
and graduation, award honors, determine student athletic ability, and report to other
institutions and employers. A single letter is usually required for most administrative
purposes, but technically, of course, a single letter does not actually interpret a student's
grade. Counselors use grades and student achievement reports, along with other
information, to help students create realistic educational and career plans. Reports that
include assessments of personal and social characteristics are also helpful in helping
students with adjustment problems.
4. Informing parents about their children’s performance Parents are often overwhelmed
by the grades and test reports they receive from school staff. In order to create a true
partnership between parents and teachers, it is essential that information about student
progress is communicated clearly, respectfully and accurately. Test results should be
provided to parents who use; a) simple, clear language without educational and testing
jargon and b) an explanation of the purpose of the tests used (Canter, 1998). Most of the
time parents are either ignored or least involved to be aware of their children's progress.
To strengthen the connection between home and school, parents need to receive
comprehensive information about their children's achievements. If parents do not
understand the tests their children receive, the scores, and how the results are used to
make decisions about their children, they are disqualified from helping their children learn
and make decisions. According to Kearney (1983), the lack of information provided to
consumers about test data has far-reaching and negative consequences. It states;
Individual student needs are not being met, parents are not being fully informed about
student progress, curricular needs are not being identified and addressed, and results are
not being reported to the various audiences who need to receive this information and
need to know what is being done about it. information. In some countries there are
prescribed policies for marking and reporting test results to parents. For example, the
Michigan Educational Assessment Policy (MEAP) is regularly revised with parent
suggestions and feedback in mind. The MEAP consists of criterion-referenced tests,
primarily in math and reading, administered annually to all students in grades four, seven,
and ten. MEAP recommends that policymakers at the state and local levels develop
strong linkages to establish, implement, and monitor effective reporting practices.
(Barber, Paris, Evans, & Gadsden, 1992). Without a doubt, it is more effective to talk to
parents about their children's scores than to send them a results report home for them to
interpret for themselves. For a variety of reasons, parent-teacher conferences or parent-
student-teacher conferences offer an excellent opportunity for teachers to provide and
interpret these results to parents.
1. Teachers tend to be more informed than parents about tests and the types of scores
interpreted.
2. Teachers can make numerous observations of their students' work and then document
the results. Discrepancies between test scores and classroom performance can be noted
and discussed.
3. Teachers are provided with samples of work that can be used to illustrate the type of
class work the student has been doing. Portfolios can be used to illustrate strengths and
explain where improvement is needed.
4. Teachers may be aware of special circumstances that may affect scores, either
positively or negatively, and skew the level of student achievement.
5. Parents have the opportunity to ask questions about misunderstandings or how they
can work. Students and teachers should communicate test results to parents at school
while addressing apparent weaknesses and building on strengths wherever possible.
(Iowa Testing Program, 2011). Under the 1998 Act, schools are required to regularly
assess students and regularly inform parents of assessment results, but specifically the
NCCA guidelines recommend that schools report to parents twice a year - once at the
end of Term 1 or at the start of Term 2 and the other at the end of the school year year.
Under existing data protection legislation, parents have a legal right to obtain the scores
their children have obtained on standardized tests. The NCCA developed a set of report
card templates for schools to use when communicating with parents, which were adopted
in conjunction with Circular 0138 issued by the Department of Education in 2006. A case
study conducted in the US context (www.uscharterschools.org) found that “a school
would it should be a resource for parents, not dictate to parents what their role should
be". In other words, the school should respect all parents and value the experiences and
individual strengths they offer their children.

(Catherine S. Taylor) Validity and Validation
100% (1)
(Catherine S. Taylor) Validity and Validation
217 pages
Data Analysis With Microsoft Excel
92% (24)
Data Analysis With Microsoft Excel
532 pages
Importance of Validity and Reliability in Classroom Assessments
No ratings yet
Importance of Validity and Reliability in Classroom Assessments
13 pages
Assignment 1 8602 Spring 2023
No ratings yet
Assignment 1 8602 Spring 2023
23 pages
Measurement and Evaluation
No ratings yet
Measurement and Evaluation
71 pages
Validity
No ratings yet
Validity
48 pages
Validity of The Assessment Tools: UNIT-6
No ratings yet
Validity of The Assessment Tools: UNIT-6
17 pages
Quantitative Methods
No ratings yet
Quantitative Methods
5 pages
Material For Evaluation For Class Lectures 1
No ratings yet
Material For Evaluation For Class Lectures 1
67 pages
Validity and Reliability in Education
No ratings yet
Validity and Reliability in Education
5 pages
Validity
No ratings yet
Validity
16 pages
8602 Assignment 2
No ratings yet
8602 Assignment 2
21 pages
Content Validity Analysis On Achievement Test at A Private Islamic Junior High School in Garut
No ratings yet
Content Validity Analysis On Achievement Test at A Private Islamic Junior High School in Garut
10 pages
AIL Unit 3
No ratings yet
AIL Unit 3
26 pages
Impact of Library Collections On User Satisfaction A Case Study
No ratings yet
Impact of Library Collections On User Satisfaction A Case Study
6 pages
Unit 6 8602
100% (1)
Unit 6 8602
22 pages
Robiel H. Statistics For Management
No ratings yet
Robiel H. Statistics For Management
18 pages
Validity Refers To How Well A Test Measures What It Is Purported To Measure
No ratings yet
Validity Refers To How Well A Test Measures What It Is Purported To Measure
6 pages
8602 (6) 2
No ratings yet
8602 (6) 2
15 pages
Task 7
No ratings yet
Task 7
3 pages
Unit 6 (8602)
No ratings yet
Unit 6 (8602)
14 pages
8602 2
No ratings yet
8602 2
13 pages
Lecture Notes On Characteristics of Tests
No ratings yet
Lecture Notes On Characteristics of Tests
10 pages
PTSP Ii Ece
No ratings yet
PTSP Ii Ece
3 pages
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
No ratings yet
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
65 pages
8602assignment No.2
No ratings yet
8602assignment No.2
43 pages
Validity
No ratings yet
Validity
11 pages
Assessment As An Integral Part of Teaching
No ratings yet
Assessment As An Integral Part of Teaching
46 pages
Assigment 2 (8602)
No ratings yet
Assigment 2 (8602)
15 pages
Futureinternet 14 00008 v2
No ratings yet
Futureinternet 14 00008 v2
17 pages
8602 2nd Mohsin
No ratings yet
8602 2nd Mohsin
22 pages
Non-Divergent Estimation Algorithm in The Presence of Unknown Correlations
No ratings yet
Non-Divergent Estimation Algorithm in The Presence of Unknown Correlations
5 pages
Allama Iqbal Open University Islamabad: Assignment No 2
No ratings yet
Allama Iqbal Open University Islamabad: Assignment No 2
33 pages
8602 Assignment B.ed Aiou For Making Assignment Contact 03077892369
No ratings yet
8602 Assignment B.ed Aiou For Making Assignment Contact 03077892369
27 pages
Anum Saddique 0000762728 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
No ratings yet
Anum Saddique 0000762728 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
22 pages
Muhammad Abbas 0000759749 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
No ratings yet
Muhammad Abbas 0000759749 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
22 pages
Blackwell Publishing Royal Statistical Society
No ratings yet
Blackwell Publishing Royal Statistical Society
7 pages
8602 Assignment
No ratings yet
8602 Assignment
30 pages
Student Name:: Anum Saddique
No ratings yet
Student Name:: Anum Saddique
24 pages
Which Groups of Students Are Consistently Scoring Lower On Standardized English Tests?
No ratings yet
Which Groups of Students Are Consistently Scoring Lower On Standardized English Tests?
17 pages
POLC 6314 - Homework 4 - DELAO
No ratings yet
POLC 6314 - Homework 4 - DELAO
16 pages
Prof. Hemant Kombrabail
100% (2)
Prof. Hemant Kombrabail
32 pages
Live Project Report: "Study of Customer Service With Reference To Big Bazaar"
No ratings yet
Live Project Report: "Study of Customer Service With Reference To Big Bazaar"
40 pages
Validity Module
No ratings yet
Validity Module
5 pages
Shah Meer 8602-2
No ratings yet
Shah Meer 8602-2
46 pages
Relationship Between Nutrients and Calories
No ratings yet
Relationship Between Nutrients and Calories
17 pages
Labeeb 0000758542 8602 B.ED (1.5 YEARS) SPRING 2024 1 2: Educational Assessment and Evaluation
No ratings yet
Labeeb 0000758542 8602 B.ED (1.5 YEARS) SPRING 2024 1 2: Educational Assessment and Evaluation
22 pages
8602 Assignment No 2 Muhamamd Shahid
No ratings yet
8602 Assignment No 2 Muhamamd Shahid
56 pages
Ahmed Mehrez
No ratings yet
Ahmed Mehrez
17 pages
8602.02 Sumaia Bulqees
No ratings yet
8602.02 Sumaia Bulqees
48 pages
Data Science Career Guide
No ratings yet
Data Science Career Guide
11 pages
1813-Article Text-6098-1-10-20210830
No ratings yet
1813-Article Text-6098-1-10-20210830
9 pages
Laiba Afzal 8602
No ratings yet
Laiba Afzal 8602
30 pages
Now, Let's Dive Deeper Into The Connection Between The Purpose of The Test and The Data We Choose To Measure
No ratings yet
Now, Let's Dive Deeper Into The Connection Between The Purpose of The Test and The Data We Choose To Measure
9 pages
Assessment in Education Plays A Crucial Role in Evaluating Student Learning
No ratings yet
Assessment in Education Plays A Crucial Role in Evaluating Student Learning
7 pages
8602 A2
No ratings yet
8602 A2
13 pages
STATA - Logit-Probit-Tobit - IInd Sem 23-24
No ratings yet
STATA - Logit-Probit-Tobit - IInd Sem 23-24
84 pages
Muhammad Nadeem 0000748844 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
No ratings yet
Muhammad Nadeem 0000748844 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
22 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Ariesta Solusi
No ratings yet
Ariesta Solusi
6 pages
Mehreen Sadaf 2nd Assignment 8602-2
No ratings yet
Mehreen Sadaf 2nd Assignment 8602-2
25 pages
Causality: Causes y
No ratings yet
Causality: Causes y
3 pages
Calculating Sensitivity, Specificity, and Predictive Values For Correlated Eye Data
No ratings yet
Calculating Sensitivity, Specificity, and Predictive Values For Correlated Eye Data
6 pages
Code - Docx 8602 Assinment .2
No ratings yet
Code - Docx 8602 Assinment .2
20 pages
Allama Iqbal Open University Islamabad: USER ID:0000901747 COURSE CODE: 8602-2 Semester: Autumn 2024
No ratings yet
Allama Iqbal Open University Islamabad: USER ID:0000901747 COURSE CODE: 8602-2 Semester: Autumn 2024
56 pages
Validity TM
No ratings yet
Validity TM
8 pages
Assignment No 2: 0000885316 8602 Educational Assessment and Evaluation
No ratings yet
Assignment No 2: 0000885316 8602 Educational Assessment and Evaluation
28 pages
Samina Ramzan 8602-2 s1
No ratings yet
Samina Ramzan 8602-2 s1
24 pages
Sanskar Bansal STR Edumentor PDF
No ratings yet
Sanskar Bansal STR Edumentor PDF
45 pages
Javiria Shafiq 8602-02
No ratings yet
Javiria Shafiq 8602-02
17 pages
Anova One Way FINAL EXAM
No ratings yet
Anova One Way FINAL EXAM
9 pages
Perceived Parenting Style and Young Adult's Self-Efficacy: Miss. Naushin F. Tamboli, Miss. Hazara Shaikh
No ratings yet
Perceived Parenting Style and Young Adult's Self-Efficacy: Miss. Naushin F. Tamboli, Miss. Hazara Shaikh
11 pages
Principle 3 Validity
No ratings yet
Principle 3 Validity
7 pages
8602-2
No ratings yet
8602-2
28 pages
S2 Linear Regression LKW 9march2025
No ratings yet
S2 Linear Regression LKW 9march2025
23 pages
Normal Distribution
No ratings yet
Normal Distribution
28 pages
Lesson 11-6 Analysis Data
No ratings yet
Lesson 11-6 Analysis Data
57 pages
Sample Global Smart Luggage System Market Research Report 2024-2031
No ratings yet
Sample Global Smart Luggage System Market Research Report 2024-2031
51 pages
Assignment 2nd 8602
No ratings yet
Assignment 2nd 8602
28 pages
Uzma 8602.2
No ratings yet
Uzma 8602.2
35 pages
Wildlife Ecology Conservation and Management Third Edition Fryxell instant download
50% (2)
Wildlife Ecology Conservation and Management Third Edition Fryxell instant download
133 pages
8602 Assignment No 2
100% (2)
8602 Assignment No 2
43 pages
Conversations with Yogananda First Edition. Edition J. Donald Walters download
No ratings yet
Conversations with Yogananda First Edition. Edition J. Donald Walters download
86 pages
(eBook PDF) Advanced Engineering Mathematics 5th Edition pdf download
100% (2)
(eBook PDF) Advanced Engineering Mathematics 5th Edition pdf download
83 pages

8602 Assignment

Uploaded by

8602 Assignment

Uploaded by

ASSIGNMENT No.

CODE NAME - EDUCATIONAL ASSESSMENT

NAME - MUHAMMAD AFZAL

ADMISSION - SPRING 2024

Content validity: Ensuring alignment with learning objectives

Factorial validity: Dissecting the structural composition

The cumulative impact of validity on education

Striving for validity: A continuous process

I. Limited Response Essay Items

2 criterion compare the He is tall enough His score of 40 out of 50 is

3 norm-referenced compare a person's He is the third His score of 37 out of 50

Using a scale Measurement.

Measurement is assigning numbers to objects or events in a systematic way.

• Classification or sorting of data, eg male or female • no ordering, e.g. it makes no sense

• Arbitrary labels, eg pass=1 and fail=2 etc

• e.g. political parties in the spectrum from left to right marked 0, 1, 2

• e.g. Likert scales, rank your level of satisfaction on a scale from 1 to 5

• e.g. rating of restaurants

• ordered, constant scale, but no natural zero

• e.g. temperature (C,F), data

Distribution A has a larger range (and more variability) than Distribution B.

Range Coefficient of Range

Variance is another absolute measure of dispersion. It is defined as the average of the

Properties of the Variance & Standard Deviation:

Therefore SD= S = S2 = 11.67 = 3.41

Measure Sample Population

While X is an unbiased estimator of x, s2 is not an unbiased estimator of x2.

Consider a possibility for the scores that go with these distributions:

Range 150 -50+1=101 150-50+ 1=101

150 100 50 2500

55 100 -45 2025

50 100 -50 2500

Plugging the appropriate values into the defining formula gives:

Doing the same calculations for Distribution B yields:

You might also like