0% found this document useful (0 votes)
39 views69 pages

UDEC1203 - Topic 6 Analysis of Experimental Data

Here are the key steps in hypothesis testing: 1. State the null hypothesis (H0) and alternative hypothesis (Ha) - H0: The true mean amount of soda is 12 oz - Ha: The true mean amount of soda is not 12 oz 2. Choose a significance level (α), usually 0.05 3. Calculate a test statistic and p-value 4. Compare p-value to α - If p-value < α, reject H0 - If p-value ≥ α, fail to reject H0 This determines if there is sufficient evidence from the sample to reject the claim of the null hypothesis. The goal is to avoid incorrectly rejecting a true
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views69 pages

UDEC1203 - Topic 6 Analysis of Experimental Data

Here are the key steps in hypothesis testing: 1. State the null hypothesis (H0) and alternative hypothesis (Ha) - H0: The true mean amount of soda is 12 oz - Ha: The true mean amount of soda is not 12 oz 2. Choose a significance level (α), usually 0.05 3. Calculate a test statistic and p-value 4. Compare p-value to α - If p-value < α, reject H0 - If p-value ≥ α, fail to reject H0 This determines if there is sufficient evidence from the sample to reject the claim of the null hypothesis. The goal is to avoid incorrectly rejecting a true
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Topic 6:

Analysis of Experimental Data


Distributions of Measurements
Population (collection of all measurements of interest to a
experiment; total infinite number of measurements) vs. sample
(subset of measurements selected from the population; finite
number of measurements)

Population mean (μ) vs. sample mean (𝑿)

Limitation of population: hard to define and to observe

Samples are easier to contact; less time consuming; less costly.

Precision = closeness of data to other data that have been


obtained in a similar manner, expressed usually by standard
deviation
Distributions of Measurements
Population std. dev. (σ)

𝑁 (𝑋 − 𝜇)2
𝑖=1 𝑖
𝜎=
𝑁

Sample std. dev. (s)

𝑛 (𝑋 − 𝑋)2
𝑖=1 𝑖
𝑠=
𝑛−1

Use sample std. dev. (s) with data sets of 30 points or less
• Lower value of s indicates better precision
• Scatter from “true” value will decrease as N is increased
• What is n-1? Degrees of freedom: anytime you make an assumption,
lose one degree of freedom, N-1 = # of data that remain independent
(n-1) is commonly called the degree of freedom for the sample.
Distributions of Measurements

Binomial distribution describes a population whose


members have only certain, discrete values.

A molecule of cholesterol, for example, cannot have


2.5 atoms of 13C.

Other populations are considered continuous.

The most common encountered continuous


distribution is the normal or Gaussian distribution.
All measurements have random error
(can only be minimized not eliminated)
Consider measuring the volume
dispensed by a 10-mL volumetric pipet

As N >30, starts to form bell-shaped curve


Normal Distributions

The shape of a normal distribution is determined by


two parameters, which are the population’s mean,
or true mean, μ and the population’s variance, σ2.

The shape of the normal distribution curve can be


described by the following equation:

− 𝑥−𝜇 2 /(2𝜎 2 )
𝑒
𝑦=
𝜎 2𝜋
Normal Distributions
Several features of normal distribution:
o Contains a single maximum corresponding to μ
and the distribution is symmetrical about this
value.
o Increasing the population’s variance increases
the spread of distribution while decreasing its
height.
o Since normal distribution depends solely on μ
and σ2, the area, or probability of occurrence
between any two limits defined in terms of
these parameters is the same for all normally
distribution curves.
Normal Distributions

z-variable: deviation from the mean relative to the


standard deviation, describes all populations of data
regardless of standard deviation
μ ± 1σ = 68.3%
𝑥 − 𝜇 μ ± 2σ = 95.5%
𝑧=
𝜎 μ ± 3σ = 99.7%
Calculating Normal Probabilities…
Example: The time required to install a computer
is normally distributed with a mean of 50 minutes
and a standard deviation of 10 minutes:

What is the probability that a computer is


assembled in a time between 45 and 60 minutes?
Algebraically speaking, what is P(45 < X < 60)
Calculating Normal Probabilities…

𝑥 − 𝜇
𝑧=
𝜎

P(45 < X < 60) =


z = 0, σ = 1
𝟒𝟓−𝟓𝟎 𝟔𝟎−𝟓𝟎
P ( <X< )=
𝟏𝟎 𝟏𝟎

P (-0.5 < z < 1) 0.5328


Class assignment

• P(–0.5 < Z < 1)

• P(Z > 1.6)

• P(Z < -2.23)

• P(Z < 1.52)

• P(0.9 < Z < 1.9)

• P(X < 0)? If μ = 10 %, s = 5 %


Confidence Interval for Population
• For example, 68.26 % of the members in a normally
distributed population have values within the range
μ ± 1σ, and 95.44% of the population’s members have
values within the range μ ± 2σ regardless of the actual
values of μ and σ.

In general , we can write


𝑿𝒊 = 𝝁 ± 𝒛𝝈 …(1)
Where the factor z accounts for
the desired level of confidence.

Values reported in this fashion


are called confidence
interval.
Confidence Interval for Population

Alternatively, a confidence interval can be


expressed in terms of the population’s standard
deviation and the value of a single member drawn
from the population.

Thus, Equation (1) can be rewritten as a


confidence interval for the population mean

𝝁 = 𝑿𝒊 ± 𝒛𝝈
Class assignment
The population standard deviation for the amount of
aspirin in a batch of analgesic tablets is known to be
7 mg of aspirin. A single tablet is randomly selected,
analysed, and found to contain 245 mg of aspirin.
What is the 95 % confidence interval for the
population mean?
Estimating µ and σ2
The sample’s mean, 𝑥 and variance, s2, are appropriate
estimators of the population’s mean, μ, and variance,
σ2.
If we could analyze every possible sample of equal size
for a given population (e.g. every possible sample of 5
coins), calculating their respective means and
variances, the average mean and the variance would
equal μ and σ 2.
𝑥 and s2 are said to be an unbiased estimators of μ and
σ 2.
Although 𝑥 and s2 for any single sample probably will
not be the same as μ or σ 2.
Central Limit Theorem

States that the means of random samples drawn


from any distribution with mean µ and variance σ2
will have an approximately normal distribution
with a mean equal to µ and a variance equal to
σ2/n.
𝜎
𝜇𝑥 = 𝜇 and 𝜎𝑥 =
𝑛

The sample size is usually considered to be large if


n ≥ 30.
Sampling from a normally distributed
population

When the population from which samples are


drawn is normally distributed with its mean equal
to µ and standard deviation equal to σ. The mean
sample, 𝜇𝑥 , also have a normal distribution with
the mean, 𝜇𝑥 equal to the mean of the
population, µ and the standard deviation, 𝜎𝑥
𝜎2
equal to
𝑛
Confidence Interval for Population
Confidence intervals also can be reported using
the mean for a sample of size n, drawn from a
population of known σ.

The standard deviation of mean (also known as


standard error of the mean) of n measurements,
𝜎
𝜎𝑋 =
𝑛
The confidence interval (CI) for the population’s
mean, therefore, is
𝜎
𝜇 =𝑋 ±𝑧
𝑛
Confidence level for various values of z

Confidence level, % z
50.00 0.67
68.26 1.00
86.64 1.50
90.00 1.64
95.00 1.96
95.44 2.00
99.00 2.58
99.70 3.00
Class assignment
Determine the 95 % confidence interval for the
analgesic tablets described in previous example,
if an analysis of 5 tablets yields a mean of 245
mg of aspirin?
Class Assignment
Determine the 80% and 95% confidence intervals for:
(a) A data entry of 1108 mg/L glucose
(b) A mean value for 1 week data of 1100.3 mg/L (1 data
is recorded per day).
Assume that in each part, s = 19 is a good estimator of σ.
Confidence Interval for Population
The width of a confidence interval depends on:
1. The value of z, which depends on the confidence level
2. The sample size, n
The value of z increases as the confidence level increases.
For example, the value of z is approximately 1.64 for a 90
% confidence level, 1.96 for a 95 % confidence level.
Hence, the higher the confidence level the larger the
width of the confidence interval.
Thus, if we want to decrease the width of a confidence
interval, we have two choices:
a) Lower the confidence level
b) Increase the sample size
Class Assignment

How many tablets in an analysis are needed to


decrease the 95 % confidence interval to 245
mg ± 2 mg of aspirin?
Student’s t / t statistic
But…..s is not always a good estimator of σ
Then use t statistic (often called Student’s t), which
depends on the number of measurements. (n<30)
To account for the uncertainty in estimating σ2, the
term z is replaced with the variable t.
For single measurement with result Xi ,
𝑠
Confidence Interval for 𝜇 = 𝑋𝑖 ± 𝑡
𝑛

Confidence intervals also can be reported using the


mean of a sample of size n,
𝑠
Confidence Interval for 𝜇 =𝑋 ±𝑡
𝑛
Student’s t / t statistic

Student’s t depends on
- the desired confidence level
- the number of degree of freedom
Student’s t / t statistic
Class Assignment
A clinical chemist obtained the following data for
the alcohol content of a sample of blood: %
C2H5OH: 0.084, 0.089, and 0.079.
Calculate the 95% confidence interval for the mean
assuming that
a. The three results obtained are the only
indication of the precision of the method
b. From previous of experience on hundreds of
samples, we know that the standard deviation
the method s = 0.005% C2H5OH is a good
estimate of σ.
Solution:
Hypothesis Testing
Why do we need to perform a hypothesis testing?

We take a sample of 100 cans of soft drink under


investigation.

We found that the mean amount of soda in these 100 cans is


11.89 oz.

Can we state that, on average, all the cans contains < 12 oz of


soda? Cannot
Another sample of 100 cans may give us a mean of 12.05 oz.
Hypothesis Testing
Therefore, we need to perform a test of hypothesis to find out
how large the difference between 12 oz and 11.89 oz and to
investigate whether or not this difference has occurred as a
result of chance alone.

If 11.89 oz is the mean for all cans and not for just 100 cans,
then we do not need to make a test of hypothesis. We can
immediately state that the mean amount of soda on all such
cans is < 12 oz.

We perform a test of hypothesis only when we are making


decision about a population parameter based on the value of
a sample statistic.
Hypothesis Testing
Hypothesis testing is the basis for many decision made in
science and engineering.

The hypothesis tests that we describe are used to


determine if the results from these experiments support
the model.

If agreement is found, the hypothetical model serves as


the basis for further experiments.

When the hypothesis is supported by sufficient


experimental data, it becomes recognized as a useful
theory until such time as data are obtained that prove it.
General requirements for constructing a
hypothesis test

The hypothesis

- Is your initial guess concerning the results of


the statistical test

- The hypothesis can be either that the results


will fit the model (known as the null hypothesis,
H0 or that they won’t fit the model (the
alternative hypothesis, HA)
Hypothesis Testing
A null hypothesis postulates that two or more
observed quantities are the same.
Specific examples of hypothesis tests that scientists
often use include the comparison of
(1) The mean of an experimental data set with what
is believed to be the true value,
(2) The mean to a predicted or cutoff (threshold)
value,
(3) The means or the standard deviations from two
or more sets of data.
Hypothesis Testing
Comparing an experimental mean with a known value:
A statistical hypothesis test to draw conclusions about
the population mean (μ) and its nearness to the known
value (μ0).
There are two contradictory outcomes that we consider
in any hypothesis test:
(1) The null hypothesis H0, states that μ = μ0.
(2) The alternative hypothesis HA,

We might reject the null hypothesis in favor of HA if is


different than μ0 (μ ≠ μ0).
Other alternative hypotheses are μ > μ0 or μ < μ0.
Hypothesis Testing

Suppose we are interested in determining


whether the concentration of lead in an industrial
wastewater discharge exceeds the maximum
permissible amount of 0.05 ppm.
Our hypothesis test would be summarized:
H0: μ = 0.05 ppm
HA: μ > 0.05 ppm
Hypothesis Testing
Large Sample Z test: (usually n > 30)
If a large number of results are available so that s
is a good estimate of σ, the z test is appropriate.
1. State the null hypothesis: H0: μ = μ0
2. Form the test statistic: x  μ0
z
σ N
3. State the alternative hypothesis Ha and
determine the rejection region.

For HA: μ ≠ μ0, reject H0 if z ≥ zcrit or if z ≤ -zcrit (two-tailed test)


For HA: μ > μ0, reject H0 if z ≥ zcrit (one-tailed test)
For HA: μ < μ0, reject H0 if z ≤ -zcrit (one-tailed test)
Hypothesis Testing
There is only a 5% probability that random error
will lead to a value of z ≥ zcrit or z ≤ -zcrit.
The significance level overall is α = 0.05
From the Figure below, the critical value of z is 1.96

Rejection regions for the


95% confidence level
Two-tailed test for HA: μ≠ μ0.
Hypothesis Testing
The probability that z exceeds zcrit to be
5% or the total probability in both tails
to be 10%.
The significance level overall is α = 0.10.
The critical value from Table 1 is 1.64.

Rejection regions for the 95%


confidence level
One-tailed test for HA: μ> μ0.

Rejection regions for the


95% confidence level
One-tailed test for HA: μ< μ0.
Class assignment

A class of 30 students determined the activation


energy of a chemical reaction to be 116 kJ/mol
(mean value) and standard deviation of 22 kJ/mol.
Are the data in agreement with the literature value
of 129 kJ/mol at
(a) The 95% confidence level
(b) The 99% confidence level

Estimate the probability of obtaining a mean equal


to the student value.
Solution:
Hypothesis Testing
For a small number of results (n < 30), we use a similar
procedure to the z test except that the test statistics is
the t statistic.
The null hypothesis H0: μ= μ0, where μ0 is a specific
value of μ such as an accepted value, a theoretical
value or a threshold value.
1. State the null hypothesis: H0: μ = μ0
2. From the test statistic: x  μ0
t
s N
3. State the alternative hypothesis HA and
determine the rejection region.
Hypothesis Testing

For Ha: μ ≠ μ0, reject H0 if t ≥ tcrit or if t ≤ -tcrit


(two-tailed test)
For Ha: μ > μ0, reject H0 if t ≥ tcrit
(one-tailed test)
For Ha: μ < μ0, reject H0 if t ≤ -tcrit
(one-tailed test)
Class assignment

A food chemist wishes to validate a new method that will


be used to measure the vitamin C content of food. A
reference orange sample is obtained that has a known
vitamin C of 0.0532 % (w/w). Several replicate
measurements of this sample by the new method give
estimated vitamin C contents of 0.0482 %, 0.0471 %,
0.0510 %, 0.0468 %, and 0.0495 %. Is the mean result of
the new method the same as the known content of the
reference sample if these values are compared at the 90
% confidence level?
Guide to Answer:
* Calculate the mean and the standard deviation before
proceeding to the hypothesis test.
Comparison of two experimental means
Hypothesis testing for comparing two means values is divided
into two categories depending on the source of the data.
Frequently scientists must judge whether a difference in the
means of two sets of data is real or the result of random error.
Data are said to be unpaired when the analysis of a series of
samples drawn from different sources. For instance, the
samples are collected from two different populations or from
randomly selected individuals from the same population at
different times.
Paired data are encountered when analyzing a series of
samples drawn from the same source. For instance, same
subject measured before & after a treatment, or same subject
measured at different times.
Comparison of two experimental means
Paired data
In paired samples, the difference between the two data values for each
element is denoted by d. This value of d is called the paired difference.
The average of these differences (𝑑 ) is
𝑑𝑖
𝑑=
𝑛
The standard deviation for these differences (Sd)
2 (𝑑)2
(𝑑𝑖 ) − 𝑛
𝑆𝑑 =
𝑛−1

The standard deviation of the mean difference (𝑆𝑑 )


𝑆
𝑆𝑑 = 𝑑
𝑛
The test statistic is
𝑑 − 𝜇𝑑
𝑡=
𝑆𝑑
Where 𝜇𝑑 is the paired difference mean of the population
Class Assignment
A wastewater treatment plant is required to monitor their discharges
into the rivers. Eight samples were first sent to Commercial Laboratory
A for the response suspended solid analysis. After one week, the same
eight samples were sent to another laboratory called Commercial
Laboratory B to repeat the same analysis using the same method. The
results for response suspended solids are listed below. Are the results
obtained by two different laboratories the same at the 95 % confidence
level ? Sample Commercial Commercial
Laboratory A Laboratory B
1 11.23 11.35
2 19.42 19.32
3 14.35 14.43
4 8.36 8.21
5 9.17 9.28
6 23.41 23.57
7 12.62 12.48
8 21.26 21.45
Solution:
- Calculate the difference in results, di

Sample Commercial Commercial Difference in


Laboratory A Laboratory B results
di = (B-A)
1 11.23 11.35 0.12
2 19.42 19.32 -0.10
3 14.35 14.43 0.08
4 8.36 8.21 -0.15
5 9.17 9.28 0.11
6 23.41 23.57 0.16
7 12.62 12.48 -0.14
8 21.26 21.45 0.19
Solution: (Continued)
Calculate average difference ( 𝑫 𝐨𝐫 𝒅 ), standard deviation (sd) and the t
value
Comparison of variances (Precision)
At times, there is a need to compare the variances
(or standard deviation) of two data sets.
The normal t-test requires that the standard
deviations of the data sets being compared are
equal.
F-test:
 A simple statistical test can be used to
compare the precision of two results or
methods under the provision that the
populations follow the normal (Gaussian)
distribution.
Comparison of variances (Precision)
F-test is based on the null hypothesis that the two
population variances under consideration are equal.
H0 : σ  σ
2
1
2
2

The test statistic F, which is defined as the ratio of the


two samples variances. s12
F 2 (where S1 > S2)
s2
It is calculated and compared with the critical
value of F at the desired confidence level.
The null hypothesis is rejected if the test statistic
differs too much from unity.
Comparison of variances (Precision)

Taken from Miller and Miller, Statistic and Chemometrics for Analytical Chemistry, 6th Edition
Class Assignment

A standard method for the determination of the carbon


monoxide (CO) level in gaseous mixture is known from
many hundreds of measurements to have a standard
deviation(s) of 0.21 ppm CO. A modification of the
method yields a value of s of 0.15 ppm CO with 13
replicates of measurement. A second modification, also
based on 13 replicates of measurement, has a standard
deviation of 0.12 ppm CO. Is either modification
significantly more precise than the standard method at 
= 0.05?
Solution:
Null hypothesis:

The alternative
hypothesis:

Because an improvement is claimed, the variances of the


modifications are placed in the denominator.

For 1st modification:

For 2nd modification:


Solution:
For the standard procedure, sstd is a good estimate of, and the number
of degrees of freedom from the numerator can be taken as infinite.
Fcrit 
F1 < 2.30,
 We the null hypothesis & conclude that there is
and hence the precision is similar with the
original method.
F2 > 2.30,
 We the null hypothesis and conclude that the
, suggestion that it has a greater precision
than the original method at the 95% confidence level.
Then, we can ask the question: is Mod2 than Mod1?
The answer can be obtained by comparing the variances of the two
methods and place the larger variance as the numerator:
and Fcrit = at 12 degrees of freedom.
Detection of Gross Errors / Outliers

There are times when a set of data appears to


be skewed by the presence of one or more data
that are not consistent with the remaining data
points.

Such values are called outliers.

The most commonly used hypothesis test for


identifying outliers is Dixon’s Q-test.
Detection of Gross Errors / Outliers
To perform this test, data are ranked from smallest to
largest value so that the suspected outlier is either the first
or the last data point.
The test statistic:
𝑥0 − 𝑥𝑛
𝑄𝑐𝑎𝑙 =
𝑥𝑙𝑎𝑟𝑔𝑒𝑠𝑡 − 𝑥𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡
Where 𝑥0 : suspected outlier
𝑥𝑛 : suspected outlier’s nearest neighbor

Next, the value of Qcal is compared with a critical value. If


Qcal >Qc, the suspected data point can be called an outlier
and considered for rejection. If Qcal< Qc, the suspected
outlier must be retained.
Detection of Gross Errors / Outliers

It is important to note that the Dixon’s Q-test should


used in caution as it is possible an outlier identified by
the Dixon’s Q-test is NOT an outlier. It is also possible
that an outlier might be wrongfully retained.

It is recommended to report the median when the


Dixon’s Q-test suggested retention.
Detection of Gross Errors / Outliers
Critical values for Dixon’s Q-Test
Values for Qc at Various
Number of Confidence Levels
Observations Confidence Level
90% 95% 99%
3 0.941 0.970 0.994
4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.598
10 0.412 0.466 0.568
11 0.392 0.444 0.542
12 0.376 0.426 0.522
13 0.361 0.410 0.503
14 0.349 0.396 0.488
15 0.338 0.384 0.475
Class assignment
A urine sample containing a known amount of
markers for marijuana is sent to several drug testing
laboratories. These laboratories report the following
concentrations:
Lab No 1 2 3 4 5
Concentration, 55.3 57.8 54.0 68.1 58.7
μg/L
Use the Q-test to determine whether any of these
results can be considered an outlier at the 95 %
confidence level.
Fitting Experimental Results

This procedure is often required when we are


preparing a calibration curve or are comparing
experimental results to a predicted response.

There are many types of equations used in chemical


analysis, but the most common is the one for a
straight line.

The best-fit line for a set of data can be determined


by using process known as linear regression.
Linear Regression
Fitting x (“independent variable”) and y (“dependent
variable”)values into in following equation:
𝑦𝑖,𝑐𝑎𝑙 = 𝑚𝑥𝑖 + 𝑏
Where m is the slope (representing the change in y vs x)
b is the line’s intercept on the y-axis
xi is a given x value in the data set
yi,cal is the response predicted at xi by the best- fit line
Linear Regression
It is possible to obtain the best estimates for m and b by
using the method of least-squares analysis.
This method gives a series of equations that allow the
slope and intercept for the best-fit line to be calculated
for a particular data set.
Correlation Coefficient, r
How well does the best fit line describe the data?
When |r| = 1, the fit is perfect.
Increases in r = greater confidence.
Coefficient of determination, r 2 (has a value between 0
and 1)
Formula for determining the best-fit
parameters for a straight line
Linear Regression
Class assignment
Standards that contain the drug oxymorphone are
analyzed and give a calibration curve that appears to
follow a straight line. The peak heights measured by
liquid chromatography for standards with
oxymorphone concentrations of 100, 200, 300, 400
and 500 ng/mL have relative values of 161, 342, 543,
765, and 899, respectively.

a) Determine the best-fit slope and intercept for this


line.
b) What is the correlation coefficient for the best-fit
line?
Solution:

Thus, the best-fit line to our data set is y = 1.90 x + (-28)


Solution: (Continued)

You might also like