0% found this document useful (0 votes)
4 views2 pages

Exploratory Data Analysis Ch2(1)

Chapter 2 focuses on resistant statistics, stem and leaf plots, and box and whisker plots, emphasizing the importance of using resistant summary statistics like the median and midspread instead of the mean and standard deviation, which can be influenced by extreme scores. The stem and leaf plot is introduced as a more informative alternative to histograms, allowing for a clearer visualization of data distribution. Instructions for creating a stem and leaf plot in SPSS are also provided.

Uploaded by

Denisse Moreno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views2 pages

Exploratory Data Analysis Ch2(1)

Chapter 2 focuses on resistant statistics, stem and leaf plots, and box and whisker plots, emphasizing the importance of using resistant summary statistics like the median and midspread instead of the mean and standard deviation, which can be influenced by extreme scores. The stem and leaf plot is introduced as a more informative alternative to histograms, allowing for a clearer visualization of data distribution. Instructions for creating a stem and leaf plot in SPSS are also provided.

Uploaded by

Denisse Moreno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Exploratory Data Analysis Ch.

2 Mini-Lecture

The chapter 2 mini-lecture will focus on three areas: resistant statistics, the stem and leaf plot, and
the box and whisker plot (a separate document). The latter two are graphing techniques that help to
illustrate the shape of the distribution of scores, while the former is a different way of doing summary
statistics.

When we talk about a summary statistic, we are speaking of a way of representing a large number of
data points with a single number. The mean of a set of scores tells us something about a distribution
(the central tendency or location of a distribution) of numbers, so does the standard deviations (the
spread of a distribution). However, the mean and standard deviation are not resistant statistics. That is,
they are overly influenced by one or a very few extreme scores. I live in a middle class neighborhood. If
Mark Zuckerberg decided to buy one of the houses in our neighborhood for his cat, then all of the
sudden the average income in our neighborhood would be over $1,000,000,000. Does that mean that
everyone in the neighborhood suddenly became rich? No. Certainly not. The problem is that the mean
is not resistant. It would be overly influenced by the income of one extremely wealthy cat. The same is
true of the standard deviation.

We could instead use resistant summary statistics. A statistic is resistant if it is not overly influenced by
extreme scores. Instead of the mean, we should use the median. The median is obtained by placing all
the scores in order from lowest to highest and then picking out the middle score. Notice that if we
change the highest score of a distribution of, say, 100 scores that the median would not change. Can
you see why? Instead of using the standard deviation, we could use the resistant alternative called the
midspread. In order to obtain the midspread, you would order your scores from highest to lowest and
then divide the ordered scores into an upper half and a lower half. The median of the upper half is
called the upper hinge. The median of the lower half of the scores is known as the lower hinge. The
midspread is equal to the lower hinge subtracted from the upper hinge.
The stem and leaf plot is a more informative version of the histogram/bar graph. The bars in a bar
graph give you the shape of a distribution, but instead of just solidly colored bars, the stem and leaf plot
fills in the bars with the scores from the original data set. So, let’s suppose we have a data set consisting
of the following scores: 99, 97, 96, 95, 94, 93, 91, 89, 88, 86, 84, 83, 81, 77, 73, 71, 66, 52, and 50. We
form a stem and leaf plot by, first dividing the stem of each number from the leaf. The stem refers to
the “tens” value of a number, while the leaf refers to the “ones” part of a number. So, for the number
52: the ‘5’ is the tens part (stem) and the ‘2’ is the ones part (leaf). We draw a vertical line to
distinguish the stems from the leaves. We place the stems below the line and place a leaf for each
individual score above their corresponding stems. Forming a stem and leaf plot for the above data (can
you see how it is created?) we get the plot below. We can see that the shape of this distribution is not
symmetric--- it is therefore skewed.

9
7 9
6 8
5 6
4 4 7
3 3 3 2
1 1 1 6 0
9 8 7 6 5

We can create a stem and leaf plot in SPSS by going to Analyze> Descriptive Statistics> Explore.

You might also like