0% found this document useful (0 votes)
15 views69 pages

Psychometrics ppt

The document discusses various aspects of psychometrics, including factors affecting psychological testing, types of tests, test construction, and reliability assessment. It highlights the importance of examiner variables, situational factors, and test-taker perspectives in the testing process. Additionally, it covers methods for ensuring test validity, reliability, and the systematic construction of psychological assessments.

Uploaded by

swetha meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views69 pages

Psychometrics ppt

The document discusses various aspects of psychometrics, including factors affecting psychological testing, types of tests, test construction, and reliability assessment. It highlights the importance of examiner variables, situational factors, and test-taker perspectives in the testing process. Additionally, it covers methods for ensuring test validity, reliability, and the systematic construction of psychological assessments.

Uploaded by

swetha meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Psychometrics

MEGHANAA KARANAM
Psychological Testing Vs. Assessment
Factors Affecting Psychological Testing

1. Examiner Variables

🔹 Examiner’s Competence – Training, experience, and adherence


to standardization.
🔹 Examiner Bias – Personal biases, stereotypes affecting
interpretation.
🔹 Rapport with Test-Taker – Comfort level, trust, and non-verbal
cues.
🔹 Communication Skills – Clarity in instructions and feedback.
Factors Affecting Psychological Testing
2. Situational Variables

🔹 Testing Environment – Lighting, noise, temperature, distractions.


🔹 Test Administration Mode – Paper-based vs. computer-based
vs. adaptive testing.
🔹 Time of Testing – Morning vs. evening, fatigue, or alertness levels.
🔹 Presence of Others – Social desirability effect, anxiety in group
settings.
Factors Affecting Psychological Testing
3. Test-Taker’s Perspective

🔹 Motivation & Effort – Interest level, engagement, test-taking


anxiety.
🔹 Emotional State – Stress, mood fluctuations, mental health
conditions.
🔹 Cultural & Language Barriers – Influence of linguistic and cultural
background.
🔹 Prior Experience – Test familiarity, previous exposure to similar
tests.
Types of
Psychometric
Tests
Based on Administration: Individual
vs. Group Testing
Individual Tests
Administered to one person at a time, allowing for detailed observation and interaction.
Advantages:
More control over testing conditions.
Can clarify doubts and observe non-verbal cues.
Useful for clinical and neuropsychological assessments.

Examples:
Wechsler Adult Intelligence Scale (WAIS-IV) – Measures intelligence and cognitive
abilities.
Stanford-Binet Intelligence Scales – Used for giftedness and intellectual disability
assessment.
Thematic Apperception Test (TAT) – Measures unconscious motives and emotions.
Based on Administration: Individual
vs. Group Testing
Group Tests
Administered to multiple people simultaneously, often in educational and
organizational settings.
Advantages:
Time-efficient and cost-effective.
Standardized administration and scoring.
Useful for screening large populations.
Examples:
Raven’s Progressive Matrices (RPM) – Non-verbal intelligence test.
Scholastic Aptitude Test (SAT) – Measures reasoning and critical
thinking skills.
Tests for Special Populations
Designed for individuals with specific needs such as disabilities,
neurodivergent conditions, or cultural backgrounds.

🔹 Tests for Intellectual Disabilities & Cognitive Impairment


Vineland Adaptive Behavior Scales (VABS) – Measures adaptive
behavior in individuals with developmental disabilities.
Binet Kamat Intelligence Scale - Mesures Intelligence with respective to
the childs mental Age
Leiter International Performance Scale – Non-verbal test suitable for
individuals with speech or hearing difficulties.
Tests for Special Populations
🔹 Tests for Children & Adolescents
Child Behavior Checklist (CBCL) – Assesses emotional and behavioral
issues in children.

🔹 Tests for Clinical Populations (Psychiatric & Neuropsychological


Assessment)
Beck Depression Inventory (BDI) – Measures severity of depression.
Wisconsin Card Sorting Test (WCST) – Assesses executive functioning
in patients with brain damage.
Brief Psychiatric Rating Scale (BPRS) – Evaluates psychiatric symptoms
in schizophrenia and other disorders.
Based on Response Format
Self-Report Tests
Individuals provide direct responses about their thoughts, feelings, and
behaviors.
Advantages: Easy to administer and score; suitable for large populations.
Limitations: Subject to social desirability bias and response distortions.
Examples:
MMPI-2 – Measures personality traits and psychopathology.
NEO Personality Inventory (NEO-PI-3) – Assesses Big Five personality
traits.
State-Trait Anxiety Inventory (STAI) – Differentiates between temporary
anxiety and chronic anxiety.
Based on Response Format
Projective Tests
Unstructured stimuli are presented, and responses reveal unconscious thoughts and
emotions.
Advantages: Reduces conscious manipulation; useful for psychodynamic
assessment.
Limitations: Subjective interpretation and lack of standard scoring.
Examples:
Rorschach Inkblot Test – Analyzes personality through perception of ambiguous
inkblots.
TAT (Thematic Apperception Test) – Assesses motivation and emotions through
storytelling.
Rotters Sentence Completion Test – Identifies underlying thoughts and conflicts.
Based on Response Format
Behavioral Measures
Observations of real-life behaviors in controlled or naturalistic settings.
Advantages: Objective and minimizes self-report bias.
Limitations: Time-consuming and requires trained observers.
Examples:
Functional Behavioral Assessment (FBA) – Identifies triggers and
consequences of behaviors.
Observation of Classroom Behavior – Used for ADHD diagnosis.
Behavioral Avoidance Test (BAT) – Measures phobic responses in anxiety
disorders.
Based on Time Constraints: Speed vs.
Power Tests
Speed Tests
Designed to measure how quickly an individual can complete simple tasks
within a time limit.
Characteristics:
Large number of easy questions.
Performance is influenced by processing speed.
Examples:
Digit Symbol Substitution Test (DSST) – Measures processing speed and
cognitive flexibility.
Based on Time Constraints: Speed vs.
Power Tests
Power Tests
Designed to assess problem-solving ability, reasoning, and intelligence
without time constraints.
Characteristics:
Questions increase in difficulty.
Focus on accuracy rather than speed.
Examples:
Wechsler Adult Intelligence Scale (WAIS-IV) – Measures cognitive ability.
Graduate Record Examination (GRE) – Assesses verbal, quantitative,
and analytical skills.
Based on Modality

Verbal Tests

Definition: Require language comprehension and verbal reasoning.


Examples:
Wechsler Verbal IQ Tests – Vocabulary, comprehension,
analogies.
Verbal Reasoning Section of GRE – Measures reading and
critical thinking skills.
Based on Modality

Non-Verbal Tests

Definition: Assess intelligence or ability without relying on language.


Examples:
Raven’s Progressive Matrices (RPM) – Assesses abstract
reasoning.
Cattell Culture Fair Intelligence Test (CFIT) – Reduces linguistic
bias.
Based on Modality

Performance Tests

Definition: Involve manipulation of objects or physical tasks.


Examples:
Wechsler Verbal IQ Tests – Vocabulary, comprehension,
analogies.
Kohs Block Design Test – Measures spatial and problem-solving
skills.
Based on Cultural Considerations

Culture-Fair Tests

Definition: Designed to minimize cultural and linguistic biases.


Examples:
Raven’s Progressive Matrices (RPM) – Uses abstract patterns
instead of words.
Cattell Culture Fair Intelligence Test (CFIT) – Focuses on non-verbal
reasoning.
Based on Cultural Considerations
Culture-Free Tests

Definition: Hypothetically free from cultural influences, though true


cultural neutrality is debatable.
Examples:
Goodenough-Harris Draw-A-Man Test – Measures cognitive ability
based on drawing rather than verbal responses.
Leiter International Performance Scale – Non-verbal intelligence
test designed for diverse populations.
Test Construction
Test construction is a systematic process of developing, validating, and
standardizing a psychological test to ensure accuracy, reliability, and validity in
measuring a specific trait, ability, or behavior.

Key Points for Constructing an Effective Questionnaire


1. Define the Problem Clearly 🔍
Clearly understand the research problem before creating the
questionnaire.
Identify all key aspects that may arise during the study.
2. Frame Questions Carefully ✍️
Choose open-ended or close-ended questions based on the type of
data needed.
Keep questions simple, clear, and relevant to the research objective.
Ensure questions align with the analysis plan.
Test Construction
3. Plan the Question Order 📋
Create a draft questionnaire with a logical flow.
Refer to existing questionnaires for guidance.
4. Review and Revise 🔄
Recheck the draft for errors or inconsistencies.
Fix technical issues and improve clarity.
5. Pilot Testing🧪
Conduct a small test run (pilot study) to identify problems.
Modify questions based on feedback.
6. Make Instructions Clear ✅
Provide easy-to-follow directions for respondents.
Ensure clarity to avoid confusion while answering.
Item Writing
1. Based on Response Format
Item Writing
2. Based on Mode of Response
Response Sets

Response sets refer to patterns or tendencies in how


individuals answer test items, regardless of their actual
feelings or abilities. These response biases can distort test
results and threaten validity.
Types of Response Sets and Their Implications
How Test Constructors Can Handle Response Sets
1. Balancing Item Wording
✅ Mix Positive & Negative Items
Instead of always framing questions positively, include reverse-worded
items to detect acquiescence bias.
Example:
Positively Worded: "I feel confident in my abilities."
Negatively Worded: "I often doubt my abilities."
How Test Constructors Can Handle Response Sets
1. Balancing Item Wording
✅ Mix Positive & Negative Items
Instead of always framing questions positively, include reverse-worded
items to detect acquiescence bias.
Example:
Positively Worded: "I feel confident in my abilities."
Negatively Worded: "I often doubt my abilities."
✅ Avoid Leading Questions
Keep wording neutral to prevent social desirability bias.
Example:
❌ "Do you agree that hardworking employees always succeed?"
✅ "Employees succeed if they work hard."
How Test Constructors Can Handle Response Sets
2. Using Forced-Choice Formats
✅ Eliminate Central Tendency & Social Desirability Bias
Instead of Likert scales, use forced-choice formats where respondents must
pick one of two equally desirable/undesirable options.
Example:
Instead of: "Rate your leadership skills (1-5)."
Use: "Which describes you better?
(A) I prefer leading a team.
(B) I prefer following instructions."
✅ Pair Matched Statements
Use Ipsative Scaling, where two equally positive or negative statements are
presented, forcing a choice.
Example:
"Which statement do you relate to more?"
(A) "I enjoy helping others solve problems."
(B) "I am good at analyzing problems logically."
How Test Constructors Can Handle Response Sets
3. Including Validity Scales & Attention Checks
✅ Response Inconsistency Checks
Use similar questions with different phrasing to test consistency.
Example:
Q5: "I prefer working in a team."
Q18: "I enjoy working alone."
Contradictory responses suggest careless answering.

4. Offering a Range of Response Options


✅ Minimize Extreme & Central Tendency Bias
Instead of a 5-point scale (Strongly Agree - Strongly Disagree), use 6-point
scales to remove the neutral option.
Example:
❌ (Odd Scale) → 1-2-3-4-5
✅ (Even Scale) → 1-2-3-4-5-6 (Forcing a choice)
How Test Constructors Can Handle Response Sets
5. Controlling Social Desirability
✅ Use Indirect Questioning
Ask about general behaviors instead of direct personal questions.
Example:
Instead of: "Do you always follow rules?"
Use: "How often do people break minor rules?"
✅ Anonymity & Confidentiality
Assure respondents that their answers will be anonymous to reduce
socially desirable responses.
Item Analysis

Item analysis is a statistical procedure used to


evaluate test items, ensuring they are valid and
suitable for measuring what they intend to assess. It
helps identify the quality of individual test items and
determines which items should be retained, revised,
or eliminated.
Item Analysis
Key Objectives of Item Analysis:
Determining Item Difficulty – Identifies which items are too easy,
too difficult, or moderately difficult.
Assessing Item Discrimination – Determines whether an item
effectively differentiates between high and low performers.
Evaluating the Effectiveness of Distractors – In multiple-choice
questions, it ensures that incorrect options (distractors) work
properly.
Improving Test Quality – Helps modify ineffective test items to
enhance their reliability and validity.
Item Analysis
Item Difficulty
Definition: Item difficulty refers to the proportion of test-takers who
answer a question correctly. It is calculated using the formula:

Where:
PPP is the difficulty index.
The value of P ranges from 0 to 1.
A higher value (closer to 1) indicates an easier item.
A lower value (closer to 0) indicates a more difficult item.
Item Analysis
Example Calculation:
Suppose 100 students take a test.
80 students answer a specific question correctly.
The difficulty index (F) is:
P=80/100=0.80
Since 80% of students got the item correct, it is considered easy.
Item Analysis
Item Discrimination
Definition: Item discrimination refers to the ability of a test item to
differentiate between high-performing and low-performing examinees.
It is calculated using the discrimination index (D), which measures the
difference in performance between high and low scorers.

Steps to Calculate Item Discrimination:


1. Divide the test-takers into two groups:
Top 27% (high scorers)
Bottom 27% (low scorers)
2. Calculate the percentage of students in each group who answered
the item correctly.
3. Use the formula:
Item Analysis
1. Where:
Phigh​= Proportion of high scorers who answered correctly.
Plow = Proportion of low scorers who answered correctly.
D ranges from -1 to +1.
Example Calculation:
High-performing group (top 27%): 90% answered correctly
(Phigh=0.90).
Low-performing group (bottom 27%): 40% answered correctly
(Plow=0.40).
Discrimination Index:
D=0.90−0.40=0.50
Item Analysis
Evaluating Distractors (Multiple-Choice Questions)

For multiple-choice items, the distractors (incorrect


answers) should be analyzed to ensure they function
effectively. A good distractor should:
Be selected by some examinees but not by high performers.
Not be too obvious as an incorrect option.
Have an even distribution of incorrect choices.
Item Analysis
Example of a Poor Distractor:

Question: What is the capital of France?


(A) Paris ✅
(B) London ❌
(C) Mars ❌(not a realistic distractor)
(D) Rome ❌
If no one selects option C, it is a poor distractor and should be
revised.
Test Standardization
Test standardization is the process of ensuring that a
psychological test has uniform administration, scoring, and
interpretation procedures. The goal is to minimize variability
in test results caused by external factors such as
differences in test conditions or examiner bias.

To ensure that test scores are meaningful and interpretable,


tests must be standardized, reliable, and valid. Additionally,
the development of norms helps in comparing individual
test scores to a reference group.
Reliability of Psychological Tests
Reliability refers to the consistency and stability of test scores
across repeated administrations or different conditions. A test is
considered reliable if it produces similar results under consistent
conditions.

The inherent aspects and synonyms of reliability are:


• dependability
• stability
• consistency
• predictability
• accuracy
• equivalence
Stability and Equivalence Aspects of Reliability

Stability and equivalence deserve special attention among different aspects


of reliability,

• The STABILITY aspect is concerned with securing consistent results with


repeated measurements of the same researcher and with the same
instrument. We usually determine the degree of stability by comparing the
results of repeated measurements.

• The EQUIVALENCE aspect considers how much error may get introduced
by different investigators or different samples of the items being studied. A
good way to test for the equivalence of measurements by two investigators is
to compare their observations of the same events.
Methods for assessing reliability

1. Test-Retest Method

The test-retest method estimates reliability by administering the same test to


the same group of individuals after a certain period (typically two weeks to a
month). The correlation between the two sets of scores determines the test-
retest reliability coefficient. A perfect correlation (1.00) suggests that the test
is highly reliable, though practical measurements often show lower reliability
due to variations in psychological states like anxiety, motivation, or interest.

Advantages:
Can be applied when only one version of the test is available.
Provides a simple and intuitive measure of reliability.
Methods for assessing reliability
1. Test-Retest Method

Limitations:
Conducting multiple test sessions can be costly and time-consuming.
Participants may remember their previous responses, leading to
artificially high reliability.
Psychological factors (e.g., anxiety, motivation) may change over time,
affecting scores.
A low correlation does not necessarily indicate poor reliability but could
suggest changes in the underlying construct.
The longer the interval between tests, the higher the chance of true
changes in the measured construct.
Reactivity effects may cause participants to change their attitudes
between test and retest.
Methods for assessing reliability
2. Alternative Form Method (Equivalent/Parallel Forms Method)
This method addresses some of the limitations of the test-retest method by using
two different but equivalent versions of a test instead of repeating the same test.
These alternate forms contain questions of equal difficulty but differ in content to
prevent memory bias. The reliability coefficient is calculated by correlating the
scores from both forms, typically administered about two weeks apart.

Advantages:
Reduces memory-related biases that may inflate reliability.
Provides a more rigorous assessment of measurement precision.
Limitations:
Developing equivalent test forms is challenging and time-intensive.
Requires participants to take two different tests, which may be burdensome.
Administering two separate tests increases the demand on resources.
Methods for assessing reliability
3. Split-Half Method
This method assesses the internal consistency of a test by dividing it into two
equal halves and comparing the scores from each half. A common way to split the
test is by grouping odd-numbered and even-numbered items separately (Odd-
Even reliability). The correlation between the two halves is calculated using
Pearson's correlation coefficient, which is then adjusted using the Spearman-
Brown formula to estimate full-test reliability.
Spearman-Brown Formula:

where r is the correlation between the two halves of the test.


Methods for assessing reliability
3. Split-Half Method

Advantages:
Requires only one test administration, unlike the test-retest and parallel forms
methods.
Suitable when time or resources do not allow for multiple testing sessions.

Limitations:
The reliability estimate varies based on how the test is split (e.g., first vs.
second half vs. odd-even).
Different methods of splitting items may lead to different reliability coefficients.
Validity in Measurement
Validity refers to the extent to which a measuring instrument accurately
measures what it is intended to measure. In social science and development
research, establishing validity is crucial, especially for complex variables like
malnutrition or intellectual development, where direct measurement is difficult.

Validity of Measuring Instrument vs. Interpretation of Data

Rather than validating an instrument itself, validation applies to the


interpretation of data derived from it. A measuring tool may be valid for one
purpose but invalid for another, making context crucial.
Approaches to Validation
There are four primary approaches to validating a measuring instrument:
1. Logical Validity (Face Validity)
Based on common sense or theoretical analysis.
Example: Measuring reading speed by assessing the amount read in a
given time.
Limitation: It is subjective and lacks empirical proof.
2. Jury Opinion
Experts in the field review and confirm the validity of an instrument.
Example: A scale measuring mental retardation validated by
psychologists and pediatricians.
Limitation: Experts' opinions are subjective, making this only slightly
better than logical validity.
Approaches to Validation
3. Known-Group Validation
Compares responses from groups with known behaviors or characteristics.
Example: A scale measuring religious attitudes tested on churchgoers and
non-churchgoers.
Limitation: Other factors (e.g., socioeconomic status) may influence the
results.
4. Independent Criteria Validation
Uses an external criterion to validate the instrument.
Criteria include:
a) Relevance – Scores must align with the concept being measured.
b) Freedom from Bias – External factors should not influence results.
c) Reliability – Consistency in measurements over time.
d) Availability – Practicality and feasibility of using the criterion.
Limitation: Finding a suitable external criterion is often challenging.
Types of Validity in Measurement

The classification of validity types is based on the framework


provided by the American Psychological Association (APA),
the American Educational Research Association (AERA), and
the National Council on Measurement in Education (NCME).
The three primary types of validity are:
1. Content Validity
2. Criterion Validity (Predictive Validity & Concurrent
Validity)
3. Construct Validity
Types of Validity in Measurement

1. Content Validity
Content validity assesses whether a test adequately
covers the domain of interest.
It ensures that the test represents all aspects of the
construct it aims to measure.
Example: A language proficiency test should include
grammar, vocabulary, comprehension, and writing skills. If
the test aligns well with instructional objectives, it is
content valid.
Types of Validity in Measurement
2. Criterion Validity
This type of validity examines how well a test correlates with an independent
criterion (i.e., an external measure).
a) Predictive Validity
Measures how well a test predicts future performance on a relevant criterion.
Example: Entrance exams (e.g., GRE, SAT) are validated by correlating test scores
with students’ academic performance in the future.
b) Concurrent Validity
Assesses how well a test correlates with an external measure taken at the same
time.
Example: A diagnostic test that differentiates students who need extra coaching
from those who do not has concurrent validity.
Key Difference: Predictive validity concerns future outcomes, while concurrent validity
evaluates present characteristics.
Types of Validity in Measurement
3. Construct Validity
Construct validity examines how well a test measures a
theoretical construct rather than a directly observable trait.
It is used when there is no universally accepted criterion or
content framework for measurement.
Construct validation requires:
a) Defining theoretical relationships between concepts.
b) Empirically testing these relationships.
c) Interpreting findings to confirm or refine the theory.
Example: Intelligence tests should correlate with other cognitive
ability measures to demonstrate construct validity.
Reliability vs. Validity: Which is More Important?

Key Differences:

Reliability measures the consistency of an instrument—


whether it produces the same results under the same
conditions.
Validity assesses whether the instrument measures what it is
supposed to measure.
While reliability is a necessary condition for validity, it is not
sufficient—an instrument can be reliable without being valid.
Importance of Validity Over Reliability

A test can be reliable but not valid – If an instrument


consistently measures the wrong construct, it is reliable but
invalid.
Validity is crucial for meaningful interpretation – If a test
does not accurately measure what it claims to, its
consistency (reliability) becomes irrelevant.
For a measurement to be useful, it must be both reliable
and valid – However, priority should be given to ensuring
validity first.
Illustration: Reliability vs. Validity in Academic Testing
Scenario:
Imagine a university develops a new Mathematics Aptitude Test to assess
students' ability to succeed in engineering programs.

Case 1: High Reliability, High Validity (Ideal Case)


The test consistently gives similar scores when taken by the same students
under the same conditions.
The test accurately predicts engineering students' future performance.
✅ Useful for selection purposes.
Case 2: Low Reliability, High Validity
The test includes relevant engineering-related math problems (valid).
However, the difficulty level fluctuates, and different test versions yield
inconsistent scores (low reliability).
❌ Not useful since inconsistent results make predictions unreliable.
Illustration: Reliability vs. Validity in Academic Testing

Case 3: High Reliability, Low Validity


The test consistently produces the same scores (high reliability).
However, it only tests basic arithmetic instead of engineering-
related math (low validity).
❌ Not useful since it doesn’t measure what it claims to.

Case 4: Low Reliability, Low Validity


The test asks random, unrelated questions (low validity).
Students score unpredictably on different test attempts (low
reliability).
❌ Completely useless for measuring aptitude.
Norms

Psychological tests are meaningful only when test scores are


compared to a reference group. This reference group provides
norms, which help in interpreting an individual's score in relation
to others.
Norms
1. Development of Norms
Norms are statistical benchmarks that allow comparison between an
individual’s test performance and a representative sample. Developing
norms involves several key steps:
Step 1: Selecting a Norm Group
A large and representative sample of the population is chosen.
The sample should reflect relevant characteristics such as age,
gender, education level, cultural background, and other demographic
variables.
Step 2: Administering the Test
The selected sample takes the test under standardized conditions to
ensure consistency.
Norms

Step 3: Data Collection and Analysis


The test scores are analyzed to establish central tendencies (mean,
median, mode) and variability (standard deviation, range).
The distribution of scores helps in determining percentiles, z-scores,
and standard scores.
Step 4: Establishing Norm Tables
Scores are compiled into tables that indicate how an individual's
performance compares to the norm group.
This allows test users to interpret raw scores meaningfully.
Types of Norms

2.1 Age Norms


Also called developmental norms, these compare an individual’s
score with others of the same age.
Commonly used in IQ tests (e.g., Wechsler Intelligence Scale for
Children - WISC) and child development assessments.
Example: A 6-year-old scoring the same as the average 8-year-
old on a reading test may be considered advanced.
2.2 Grade Norms
Used in educational testing to compare a student’s score to the
expected performance for their grade level.
Example: A 7th grader scoring is compared with reference group
of particular age of early adolescent.
Types of Norms
2.3 Percentile Ranks
Indicate the percentage of individuals in the norm group who scored below a
given score.
Example: A 90th percentile score means the individual scored higher than 90%
of the norm group.
2.4 Standard Scores
Z-scores, T-scores, and IQ scores convert raw scores into a standardized format
for easier interpretation.
Z-Scores: Indicate how many standard deviations a score is from the mean.
Example: A z-score of +2.0 means the individual scored 2 standard
deviations above the mean.
T-Scores: Have a mean of 50 and a standard deviation of 10.
IQ Scores: Intelligence tests use a mean of 100 and a standard deviation of
15.
Types of Norms
2.5 Stanines (Standard Nine)
A nine-point scale (1 to 9) used in educational and psychological assessments.
Scores are divided into low (1-3), average (4-6), and high (7-9).
2.6 Local vs. National Norms
Local Norms: Based on a small, specific group (e.g., students in a particular
school).
National Norms: Based on a broad, representative sample from an entire
country.
2.7 Criterion-Referenced Norms vs. Norm-Referenced Norms
Norm-Referenced Norms: Compare an individual’s performance to others in the
norm group (e.g., SAT, GRE).
Criterion-Referenced Norms: Compare performance to a fixed standard rather
than a group (e.g., a driver’s license test requiring 80% to pass).
Ethical Issues in Psychological Testing and Assessment
Psychological testing must adhere to ethical principles to protect individuals' rights and
prevent harm. The key ethical issues include:
1.1 User Qualifications and Professional Competence
Only qualified professionals should administer, interpret, and report psychological
tests.
Professionals must possess proper training, certification, and knowledge of
psychometric principles and ethical guidelines.
Misuse or misinterpretation of tests by unqualified individuals can lead to incorrect
diagnoses and inappropriate interventions.
APA and other organizations categorize tests into different levels (A, B, and C),
requiring increasing levels of expertise:
Level A: Can be administered with minimal training (e.g., basic surveys).
Level B: Requires knowledge of psychological concepts (e.g., personality tests).
Level C: Needs advanced training (e.g., intelligence and clinical tests).
Ethical Issues in Psychological Testing and Assessment
1.2 Protection of Privacy
Individuals have the right to control access to their personal psychological data.
Testing should be voluntary, and participants must be informed about the purpose,
risks, and how the results will be used.
Test administrators must obtain informed consent, especially when testing involves
vulnerable populations (e.g., children, patients with cognitive impairments).
1.3 Confidentiality
Psychological test results are sensitive personal information and must be kept
confidential.
Test results should only be shared with:
The individual tested (if appropriate).
Authorized professionals (e.g., clinicians, school psychologists).
Legal authorities (only in cases where disclosure is legally required, such as risk
of harm).
Digital and paper records must be securely stored to prevent unauthorized access.
Ethical Issues in Psychological Testing and Assessment
1.4 Testing Diverse Populations
Psychological tests should be culturally fair and free from biases that may
disadvantage certain groups.
Standardized tests developed in one cultural or linguistic context may not be valid
for other populations.
Ethical considerations include:
Language barriers: Using translated tests or interpreters when necessary.
Socioeconomic status: Recognizing the impact of education and resources on
test performance.
Disability accommodations: Providing modifications for individuals with physical,
sensory, or cognitive impairments.
APA’s Guidelines for Psychological Practice with Diverse Populations emphasize the
need for inclusive and equitable assessment practices.
Computer-Based Psychological Testing

Advancements in technology have led to the increased use of


computer-based psychological assessments (CBTAs). While these
offer benefits, they also present ethical and practical challenges.

2.1 Advantages of Computer-Based Testing


✅ Efficiency: Faster test administration, scoring, and interpretation.
✅ Standardization: Reduces human error and bias in administration.
✅ Accessibility: Can be administered remotely, benefiting individuals
in rural or underserved areas.
✅ Adaptivity: Some computer-based tests adjust difficulty based on
the test-taker’s responses (e.g., GRE, CAT).
Computer-Based Psychological Testing
2.2 Ethical Challenges in Computer-Based Testing
Data Security & Privacy:
Digital test results must be stored securely to prevent hacking or unauthorized access.
Organizations must comply with data protection laws (e.g., GDPR, HIPAA) when storing
and sharing psychological test data.
Validity & Reliability Concerns:
Some traditional tests may lose psychometric validity when converted to digital formats.
The test environment (e.g., distractions in an online setting) can influence performance.
Access & Digital Divide:
Not all individuals have equal access to technology, creating potential disparities in
testing.
People with limited computer skills may struggle with digital tests, affecting their scores.
Ethical Use of Artificial Intelligence (AI) in Testing:
AI-driven psychological assessments must be transparent and fair.
AI-based tests should not reinforce biases or lead to discriminatory outcomes.

You might also like