Module 3: Principles of Psychological Testing: Central Luzon State University
Module 3: Principles of Psychological Testing: Central Luzon State University
Overview
I. Objectives
At the end of this lesson, you should be able to:
1. Describe and identify the different levels of measurement.
2. Summarize test scores using frequency distributions.
3. Describe the characteristics of the normal curve as well as skewed, peaked, and
bimodal distributions.
4. Describe the purpose and calculate measures of central tendency, measures of
variability, and measures of relationship.
5. Convert raw test scores into more meaningful units.
6. Describe norm-based interpretation and the different types of norms.
Note: If the variable has only two values, it is referred to as Dichotomous variable.
2. Ordinal
3. Interval
• Measurements are not only classified and ordered, but the distances between
each scale are equal
• Zero is arbitrary
But, it does not mean that a temperature of 50 degree celsius is 5 times hotter than 10
degree celsius
4. Ratio
Examples: Height
Weight
Number of students
Exam Marks
Interval/Ratio
Example:
Age
Income
Note: We can compare values in terms of which is larger or smaller and in terms of how
much larger or smaller one is compared
Note: In statistical practice, ratio variables are subjected to operations that treat them
as interval and ignore their ratio properties
Discrete vs Continuous
Discrete variable is one that cannot take on all values within the limits of the
variable. For example, responses to a five-point rating scale can only take on the values
1, 2, 3, 4, and 5. The variable cannot have the value 1.7. A variable such as a person's
height can take on any value. Variables that can take on any value and therefore are not
discrete are called continuous variables.
Why Scale of Measurement is Important?
1. It determines the nature of the information it provides about the test takers.
It influences our ability to use the test to compare people, to assess individual
differences. As we move from nominal scale to interval and ration scales, we increase the
precision of the measurement process. Nominal scales indicate which test takers fall into
each category. Even if the categories represent relative amounts of a property such as
“below average”, “average”, and “above average” IQ scores, we can only compare people
in broad, general ways.
2. The scale of measurement used affects our ability to apply statistical techniques to the
study of test scores.
Many common statistical procedures can only be used with scores at an interval or
ratio level of measurement.
Ordinal scales allow for more specific comparisons of where people fall on different
variables, but the lack of an equal0unit scale makes it difficult to compare people at
different points on the scale. Clearly, interval and ratio scales with their equal units are
most appropriate for comparing people, for the study of individual differences.
3. It appears that test scores based in higher scales of measurement are more useful for
assessing individual differences.
For example, the statistical procedures commonly used to determine a test’s
reliability and validity require either interval or ratio scales. Likewise, the statistical
analysis conducted when tests are used in research projects frequently assume scores are
at the interval or ration level. Although there are statistics for analyzing nominal and
ordinal data, they are not as powerful and therefore not as useful as higher-level analyses.
1. They help us describe things. Numbers provide convenient summaries and allow us
to evaluate some observations relative to others.
For example: If you get a score of 54 in an examination, you probably
want to know what the 54 means. It is lower than the average score or is it
the same? Knowing the answer can make your score more meaningful.
2. 2. We can use Statistics to make inferences, which are logical deductions about
things that cannot be observed directly.
For example: we do not know how many people watched a particular movie
unless we ask everyone. However, it we use scientific sample surveys, we can
make an inference about the percentage of the people who saw the film.
Raw Score- scores obtained directly from test performance whether the test is of maximal
or typical performance.
a. On maximal performance tests, the raw score usually is the number of items
answered correctly. It may also be the number of errors; the sum of points
on various items, the time taken to complete the test or a rating.
b. On typical performance test such as personality test, the raw score is obtained
in a slightly different manner. Items are generally keyed to represent the
dominant response of some defined group. A person’s raw score will be the
number of items answered in the keyed direction.
Transformed scores or Derived Scores- scores resulting from the transformation of raw
score into other scales in order to facilitate analysis and interpretation.
Example: Most common transformation is by adding or subtracting a constant and/or
multiplying or dividing by a constant X’ = a + bx where x’ is the transformed score and a
and b are constants.
Such is a linear transformation because the original (raw) score and transformed scores
will be related in a linear manner.
A single test score will mean more if we relate it to other test scores. A distribution
of score summarizes the scores for a group of individuals.
Frequency Distribution- is a simple way of displaying data from tests. It is a technique for
systematically displaying or representing scores to shoe how frequently each value was
obtained. We define all the possible scores and determine how many people obtain each
of these scores. This distribution can also be shown graphically by plotting the frequencies
for each score. This can be done either as a frequency polygon or a histogram.
Although distributions of tests scores often approximate a normal curve
(symmetrical, bell shaped) other types of distributions may be encountered.
Refers to a value or measure near the center of the distribution which represents
the average score of the group.
Three commonly used measures of central tendency are:
a. Mean
b. Median
c. Mode
2. Variation
Refers to the extent of the clustering about a central value. I f all scores are close
to the central value, their variation will be less than if they tend to depart more markedly
from the central values.
Example: Two score distribution may have the same mean but in one the score may
be closely clustered around the mean while in the other the scores may vary widely
and dispersed. Variability indicates the dispersion of scores around a given point.
a. Range
b. Standard Deviation
3. Skewness
a. Positively Skewed - If the larger frequencies tend to be concentrated toward the low
end of the variable and the smaller frequencies toward the high end.
Example: if a test is difficult scores could cluster at the low end.
b. Negatively skewed - The larger frequencies are concentrated toward the high end
of the scale and the smaller frequencies toward the low end.
Example: If a test is easy, the scores would cluster at the high end of the scale and
tail off toward the low end.
c. Normal Curve - if the distribution is symmetrical, bell shaped and the larger
frequencies are clustered around the average. The mean, median and mode
coincide.
4. Kurtosis
Norms
Kinds of Norms:
A. Developmental Norms- developmental level attained or how far along the normal
developmental path the individual had progressed.
1. Age norms- relates level of test performance to the age of people who have taken
the test.
2. Mental Age- a child’s score on the test will correspond to the age of people who
have taken the test.
3. Grade Norms or Grade equivalents- scores on educational achievement tests are
often interpreted in terms of grade equivalents. Grade norms are found by
computing the mean raw score obtained by children in each grade. Thus if the
average number of problems solved correctly on an arithmetic test by the fourth
graders in the standardization sample is 23, then a raw score of 23 corresponds to
a grade equivalent of 4.
4. Ordinal Scales- ordinal scales are designed to identify the stage reached by the
child in the development of specific behavior functions. The observations or scores
are expressed in order of magnitude. Numerical ranks express a “ greater than
relationship but with no implication about how much greater”.
Example: first, second, third, fourth
B. Within Group Norms- relative position within a specified group. The individual’s
performance is evaluated in terms of the performance of the most nearly comparable
standardization group, as when comparing a child’s raw score with that of children of
the same chronological age or in the same school grade.
1. Percentile- percentile scores are expressed in terms of the percentage of persons
in the standardization sample who fall below a given raw score. A percentile
indicates the individual’s relative position in the standardization sample. It can also
be regarded as ranks.
For example, If 28% of the subjects obtained a score fewer than 15 problems
correct on an arithmetic reasoning test, then a raw score of 15 corresponds
to the 28th percentile.
*** The chief drawback of percentile scores arises from the marked inequality
of their units, especially at the extremes of the distribution. Percentile should
not be confused with percentage scores which are raw scores expressed in
terms of percentage of correct items. Percentiles are derived scores
expressed in terms of percentage of persons.
A. Norm-Referenced Test
• when an individual’s score is compared to other individuals who have taken the test
often called standardization sample or normative group.
B. Criterion-Referenced Test
• When an individual’s score is compared to an established standard or criterion.
• Also called domain or objective-referenced
• the interest is not how the individual’s performance compares to others, but rather
how the individual performs with respect to some standard or criterion. Therefore,
in order to interpret a client’s criterion-referenced results, a counselor needs to
understand the domain being measured, such as multiplication of two-digit
numbers, third grade spelling words, or knowledge of counseling theories.
• The testing often pertains to whether the person has reached a certain standard of
performance within that domain. For example, do they get 70% of a sample second
grade arithmetic problems correct, do they spell 90% of fifth grade spelling words
correctly? Many of the tests you have taken in your academic career have been
criterion-referenced tests, where, for example, you need to score 90% or better
correct for an A, 80% or better a B, 70% or better for a C, and so forth.
• With a criterion-referenced test there is a mastery component. In these cases, a
predetermined cut-off score indicates whether the person has attained an
established level of mastery.
• Professional licensing examinations for counselors and psychologists are examples
criterion-referenced tests that include a mastery component.
Variance
Standard Scores
• address the limitations of unequal units of percentiles and provides a “shorthand”
method for understanding test results.
• express a person’s performance in terms of his duration from the exam in standard
deviation units. It becomes easy to know a client’s relative position in an instrument
because they describe how many standard deviations a client’s score is from the
mean.
• can be used with all types of instruments such as intelligence, personality and career
assessments.
Z Scores
• All standard scores are based on z-scores.
• We convert an individual raw score into a z score by subtracting the mean of the
instrument from the client’s raw score and dividing by the standard deviation of the
instrument. The formula for computing a z score is:
z = X-M
s
• If the instrument we are using has a normal distribution, then z scores are called
normalized and can provide additional information.
T Scores
• Another standard score, with a fixed mean of 50 and a standard deviation of 10.
• A z score can be converted to T scores by multiplying the z score by 10 and adding
or subtracting it from 50.
• The z score is considered the base of the standard score, since it used for conversion
to another type of standard score. Some test developers prefer T scores because
they eliminate the decimals and positive and negative signs of z scores.
• T scores are often normalized standard scores, but it is important for counselors to
check the manual of an instrument to ensure that this assumption is correct.
Stanines
• A contraction of standard and nine.
• Range from 1 to 9, with a mean of 5 and a standard deviation of 2 except for the
stanines of 1 and 9.
• It is different from other standard scores because they represent a range of
percentile scores.
• Raw scores are converted to stanines by having the lowest 4 percent of the
individuals receive a stanine score of 1, the next 7 percent receive a stanine of 2,
the next 12 percent receive a stanine of 3, and then just keep progressing through
the group.
• Stanines have the advantage of being a single-digit number that many people find
easy to use.
• The disadvantage is that the stanines represent a range of scores, and sometimes
people do not understand that one number represents various raw scores.
Deviation IQs
• an extension of the ratio IQ (intelligence quotient) used in early intelligence tests.
They are more preferred now than the ratio IQ
• standard scores where the deviations from the mean are converted into standard
scores, which typically have a mean of 100 and a standard deviation of 15.
• Counselors need caution because some intelligence tests uses standard deviation of
16.
• used in Scholastic Assessment Test (SAT) and Graduate Record Examination (GRE)
References: