0% found this document useful (0 votes)
54 views8 pages

Screening Tests: Series vs. Parallel Analysis

The seminar presented by Dr. Shalini Pattanayak discusses the validity and reliability of screening tests, emphasizing their importance in disease prevention. Key concepts include the definitions of validity, sensitivity, specificity, and the use of ROC curves to determine effective cut-off points. The seminar concludes that a good screening test should be both highly valid and reliable, with agreement between observers assessed through percent agreement or Kappa statistics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views8 pages

Screening Tests: Series vs. Parallel Analysis

The seminar presented by Dr. Shalini Pattanayak discusses the validity and reliability of screening tests, emphasizing their importance in disease prevention. Key concepts include the definitions of validity, sensitivity, specificity, and the use of ROC curves to determine effective cut-off points. The seminar concludes that a good screening test should be both highly valid and reliable, with agreement between observers assessed through percent agreement or Kappa statistics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SEMINAR- II

TOPIC: VALIDITY AND RELIABILITY OF A SCREENING TEST

Presented by: Dr. Shalini Pattanayak

Post Graduate Trainee (JR-2), Dept. of Community Medicine

IPGME&R and SSKM Hospital, Kolkata

TABLE OF CONTENTS:

1. Validity Of a Screening Test

2. Positive Predictive Value (PPV)

3. Negative Predictive Value (NPV)

4. Tests In Series and Parallel

5. Bayes’ Theorem

6. Likelihood Ratio

7. Problem of The Borderline

8. Determining the Cut-off Point

9. Receiver Operating Characteristic Curve (ROC Curve)

10. Reliability of A Screening Test

11. Relationship Between Validity and Reliability

12. Summary

13. References

CONCEPT: The active search for disease among apparently healthy people is a fundamental aspect of
prevention. This is embodied in “screening”.

DEFINITION: The search for unrecognized disease or defect by means of rapidly applied tests,
examinations, or other procedures in apparently healthy individuals.

• Some examples of screening tests:

1
✓ Serological testing for HIV (ELISA, RAPID)

✓ Neonatal screening for hypothyroidism

✓ Screening for cervical carcinoma ( Pap smear, VIA) and breast carcinoma ( Mammography ).

✓ Screening for developmental anomalies in the fetus ( AFP : Alpha- fetoprotein).

✓ there is a test that can detect the disease prior to the onset of signs and symptoms

✓ facilities should be available for confirmation of diagnosis

✓ there is an effective treatment

✓ good evidence is present that early detection and treatment reduces mortality and morbidity

✓ there should be an agreed-on policy concerning whom to treat as patients.

• The screening test must satisfy the following criteria:

✓ acceptability

✓ repeatability (reliability/precision/reproducibility/consistency)

✓ validity/accuracy

1. VALIDITY: The term “validity” refers to what extent a screening test accurately measures
which it purports or is supposed to measure.
It expresses the ability of a test to separate or distinguish those who have the disease from
those who do not. E.g. glycosuria is a useful screening test for diabetes, but a more valid
test is glucose tolerance test.

• Validity refers to how well the assessment tool/ instrument actually measures what it is intended
to measure i.e. underlying outcome of interest.

• The assessment tool/ instrument must be valid for the result to be accurately applied and
interpreted. E.g. weighing machine is not valid.

• It determines how an individual performs, in different situations.

• VALIDITY has the following components:

❖ Sensitivity

❖ Specificity

❖ Predictive Accuracy

• The above-mentioned measurements are measured in percentages.

2
• Sensitivity and specificity are usually determined by applying the test to one group of persons
having the disease, and a reference group not having the same disease.

• SENSITIVITY: Ability of a Test to identify correctly all those who have the disease (True
Positive)

• It is a statistical index to measure diagnostic accuracy; first described by Yerushalmy in 1940s.

• SPECIFICITY: Ability of a Test to identify correctly those who do not have the disease (True
Negative).

• FALSE POSITIVES: The term “false positive’ means that an individual who does not have
the disease under study is falsely labelled as “diseased” and is subjected to further diagnostic
tests, at the cost of inconvenience, discomfort, anxiety and expenses, till he is finally diagnosed
free from the disease. False positive tests are a burden to health expenses and community.

• FALSE NEGATIVES: The term “false negative” means that an individual who has a disease
under study is told that he does not have the disease; in other words, he is given a ‘false
assurance’. Thus, a study participant who tested false negative may ignore the signs and
symptoms of the disease and further delay the treatment. This may be detrimental if the disease
is a serious one and if no screening test is planned in upcoming few days or weeks.

• Ability of a Screening Test to identify correctly all those who have the disease, out of all those
who test positive on a screening test

• Also called Post Test Probability.

• Ability of a Screening Test to identify correctly all those who do not have the disease , out of
all those who test negative on a screening test

• SERIES: One Test after Another

2nd Test is applied only after 1st Test is Positive

• PARALLEL: Both Tests are applied together

5. IN SERIES:
• Combined Sensitivity of 2 Tests A & B in series = Sn(A) x Sn(B)

• Combined Specificity of 2 Tests A & B in series = [ Sp(A) + Sp(B)] – [Sp (A) x Sp(B)]

• IN PARALLEL:

• Combined Sensitivity of 2 tests A & B in Parallel = [ Sn(A) + Sn(B)] – [Sn (A) x Sn(B)]

• Combined Specificity of 2 Tests A & B in Parallel = Sp(A) x Sp(B)

3
• If the test results are positive, what is the probability that the patient has the disease?

• If the test is negative, what is the probability that the person doesn’t have the disease?

• Bayes’ Theorem provides answer

• It was first described by Clergyman

• LIKELIHOOD RATIO POSITIVE (LR+): It is the ratio of sensitivity of a test to the false-
positive error rate.

• LIKELIHOOD RATIO NEGATIVE (LR-): It is the ratio of false negative error rate divided
by the specificity of a test.

• If the LR+ of a test is large and the LR- is small, it is probably a good test.

• Experts in test analysis sometimes calculate the ratio of LR+ to LR- to obtain a measure of
separation between the positive and the negative test.

• Likelihood ratios are not influenced by prevalence of the disease.

• The factors to be considered are:

✓ Disease prevalence: when the disease prevalence is high in a community, the cut-off point is
set at a lower level

✓ The disease: when the disease under study is lethal and early intervention markedly increases
the prognosis, the cut-off point is set at a lower level.

✓ Receiver Operating Characteristic curve or ROC curve, is used to decide on a good cut-off point
of continuous variables in clinical tests, e.g. serum calcium, blood glucose, blood pressure, etc.

✓ Origin: World War II, in evaluating the performance of radar receiver operators :-

i) true positive

ii) false positive

iii) false negative

• If a group of investigators wanted to determine the best cut-off for a blood pressure screening
program, they might begin by taking a single initial blood pressure measurement in a large
population and then performing a complete workup for persistent hypertension in all of the
individuals. Each person would have data on a single screening blood pressure and an ultimate
diagnosis concerning the presence or absence of hypertension. Based on this information, an
ROC curve could be constructed.

4
• ROC plot shows the relationship between Sensitivity and Specificity

• An increase in Sensitivity leads to decrease in Specificity and vice-versa

• Sensitivity = 1- FN

or FN = 1 - Sensitivity

• Specificity = 1- FP

or FP = 1 - Specificity

• The ideal ROC Curve for a test would rise almost vertically from the lower left corner and
move horizontally almost along the upper line.

• One method of comparing different tests is to determine the area under the ROC curve for each
test and to use a statistical test of significance to decide if the area under one curve differs
significantly from the area under the other curve. The greater the area under the curve, the better
the test is.

Reliability of a screening test, sometimes also known as reproducibility or precision or


consistency, is the ability of a measurement to give the same result or similar result with
repeated measurements of the same factor. It means that all values obtained from the same test
will be consistent every time, in the same setting.

• The factors that contribute to the variation between the test results are:

• Observer variations

• Biological variations

• Errors related to technical methods

• Observer variations are of the following two types:

• Intraobserver variation: Sometimes variation occurs between two or more readings of the
same test results made by the observer. Tests and examinations differ in the degree to which
subjective factors enter into observer’s conclusions, and greater the subjective element in the
reading, greater the intraobserver variation in readings is likely to be.

• Interobserver Variation: Variation between observers may happen, where two examiners
often do not give the same result. The extent to which the observers agree or disagree is an
important issue and therefore we need to be able to express the extent of agreement in
quantitative terms also called ‘percent agreement.’

5
• Biological or subject variation: The values obtained in measuring many human characteristics
often vary over time, for a short period or a longer period, such as seasonal variation. The
conditions under which certain tests are conducted, e.g. shortly after eating, post-exercise, etc.
clearly can lead to different results in the same individual.

• Errors related to technical methods:

• defective instruments

• erroneous calibrations

• faulty reagents

• the test itself may be inappropriate or not reliable

Overall Percent Agreement: If a test uses dichotomous variables, i.e., two different results
(positive or negative), the results may be arranged into a 2x2 table, and the observer agreement
can be calculated.

• A common way to measure agreement is to calculate the overall percent agreement.


Drawbacks of Percent Agreement:

• it does not give an idea about the prevalence of disease in the participants studied.

• it does not show how disagreements occurred- if the positive and negative test results were
evenly distributed between the two observers or did one observer consistently find more
positive outcomes than the other.

• it does not define the extent to which agreement between the observers improves on chance.

KAPPA STATSITICS: The Kappa test is performed to determine the extent to which the
agreement between two observers improved on chance agreement alone. Even if two observers
only guessed about the presence or absence of a disease or health condition, they sometimes
would agree by chance.

Let us consider an example: two clinicians have examined the same 100 patients in 1 hour and
recorded the presence or absence of murmur in each patient. For 7 patients, the first clinician reports
absence of murmur and the second reports presence of a murmur, and for 3 patients the second
clinician reports the absence and first clinician reports the presence of a murmur. For 30 patients
the clinicians agree on the presence and for 60 patients they agree on the absence of a murmur.

• The observed agreement (Ao) is the actual number of observations in cells a and d.

• The maximum possible agreement is the total number of observations (N)

6
• The agreement expected by chance (Ac) is the sum of expected number of observations in cells
a and d.

• Therefore, kappa = (Ao-Ac) / (N-Ac)

❖ KAPPA STATISTICS (contd.): from the previous mentioned values, the following can be
calculated:

• Observed agreement (Ao): 30+60= 90

• Maximum possible agreement (N): 30+7+3+60 = 100

• Cell a agreement expected by chance: [(30+7) (30+3)]/100 = 12.2

• Cell d agreement expected by chance: [(3+60) (7+60)]/100 = 42.2

• So, total agreement expected by chance (Ac): 12.2+42.2 = 54.4

• So, kappa: (Ao-Ac) / (N-Ac) = (90-54.4) / (100-54.4) = 0.78 = 78%

❖ KAPPA STATISTICS (contd.): kappa ratio can take values from -1 to +1.

• -1 = perfect disagreement

• 0 = agreement expected by chance

• +1 = perfect agreement.

Interpretation:

• <20% = negligible improvement over chance

• 20-40% = minimal improvement over chance

• 40-60% = fair improvement over chance

• 60-80% = good improvement over chance

• >80% = excellent improvement over chance

❖ WEIGHTED KAPPA:

• Kappa test provides valuable data on observer agreement for diagnosis recorded as ‘present’ or
‘absent’.

• For diagnosis/ studies involving 3 or more outcome categories like negatives, suspicious or
probable, we use weighted kappa test.

• The weighted kappa test gives partial credit for agreement that is close but not perfect.

7
SUMMARY:

• VALIDITY is the ability of a screening test to accurately measure what it purports to measure.

• RELIABILITY is that property of a screening test where repeated measurements of the same
variable done on the same subject or material at the same time will yield consistent results.

• Validity has two components: sensitivity and specificity, which can be determined when a test
is applied to a group of diseased individuals and to a reference group having non-diseased
individuals. These two components along with ‘predictive accuracy’ are the inherent properties
of a screening test.

• A good screening test should be highly valid and highly reliable at the same time.

• Agreement among observations between two different observers can be determined by percent
agreement or Kappa statistics.

• A good cut off point for continuous variables obtained from a clinical test can be determined
by ROC curve, wherefrom sensitivity and false positive error rate can also be determined.

• A screening test, which is used to rule out a diagnosis, must have high degree of sensitivity.

• A confirmatory test, which is used to rule in a disease, must have high degree of specificity.

REFERNCES:

• Park K. Park’s Textbook of Preventive and Social Medicine. 26th ed. Jabalpur: Banarasidas
Bhanot Publishers; 2021. p. 152-6.

• Celentano D, Szklo M. Gordis Epidemiology. Elsevier. 2019. p. 94-120.

• Katz LD, Elmore GJ, Wild MG, Lucan CS. Jekel’s Epidemiology, Biostatistics, Preventive
Medicine, and Public Health. 4th ed. Elsevier. 2014. p. 81-96.

You might also like