0% found this document useful (0 votes)
18 views5 pages

HISTOGRAMS

HISTOGRAMS

Uploaded by

Claudia Ferraz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

HISTOGRAMS

HISTOGRAMS

Uploaded by

Claudia Ferraz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

2 Histograms

Histograms are density estimates. A density estimate gives a good impression of the
distribution of the data. In contrast to boxplots, density estimates show possible
multimodality of the data. The idea is to locally represent the data density by
counting the number of observations in a sequence of consecutive intervals (bins)
with origin x0 . Let Bj .x0 ; h/ denote the bin of length h which is the element of a
bin grid starting at x0 :

Bj .x0 ; h/ D Œx0 C .j  1/h; x0 C jh/; j 2 Z;

where Œ:; :/ denotes a left closed and right open interval. If fxi gniD1 is an i.i.d. sample
with density f , the histogram is defined as follows:

XX
n
fOh .x/ D n1 h1 Ifxi 2 Bj .x0 ; h/g Ifx 2 Bj .x0 ; h/g: (1.7)
j 2Z i D1

In sum (1.7) the first indicator function Ifxi 2 Bj .x0 ; h/g (see Symbols and
Notation in Chap. 21) counts the number of observations falling into bin Bj .x0 ; h/.
The second indicator function is responsible for “localising” the counts around x.
The parameter h is a smoothing or localising parameter and controls the width of
the histogram bins. An h that is too large leads to very big blocks and thus to a
very unstructured histogram. On the other hand, an h that is too small gives a very
variable estimate with many unimportant peaks.
The effect of h is given in detail in Fig. 1.6. It contains the histogram (upper
left) for the diagonal of the counterfeit bank notes for x0 D 137:8 (the minimum
of these observations) and h D 0:1. Increasing h to h D 0:2 and using the same
origin, x0 D 137:8, results in the histogram shown in the lower left of the figure.
This density histogram is somewhat smoother due to the larger h. The binwidth is
next set to h D 0:3 (upper right). From this histogram, one has the impression that
the distribution of the diagonal is bimodal with peaks at about 138.5 and 139.9.
12 1 Comparison of Batches

10 30

8 25

20
6
15
4
10
2 5

0 0
138 139 140 141 138 139 140 141
h = 0.1 h = 0.3

20 40

15 30

10 20

5 10

0 0
138 139 140 141 138 139 140 141
h = 0.2 h = 0.4

Fig. 1.6 Diagonal of counterfeit bank notes. Histograms with x0 D 137:8 and h D 0:1 (upper
left), h D 0:2 (lower left), h D 0:3 (upper right), h D 0:4 (lower right) MVAhisbank1

The detection of modes requires fine tuning of the binwidth. Using methods from
smoothing methodology (Härdle, Müller, Sperlich, & Werwatz, 2004) one can find
an “optimal” binwidth h for n observations:
 p 1=3
24 
hopt D :
n

Unfortunately, the binwidth h is not the only parameter determining the shapes of fO.
In Fig. 1.7, we show histograms with x0 D 137:65 (upper left), x0 D 137:75
(lower left), with x0 D 137:85 (upper right), and x0 D 137:95 (lower right). All
the graphs have been scaled equally on the y-axis to allow comparison. One sees
that—despite the fixed binwidth h—the interpretation is not facilitated. The shift
of the origin x0 (to four different locations) created four different histograms. This
1.2 Histograms 13

40 40

20 20

0 0
138 139 140 141 138 139 140 141
x = 137.65 x = 137.85
0 0

40 40

20 20

0 0
138 139 140 141 138 139 140 141
x = 137.75 x = 137.95
0 0

Fig. 1.7 Diagonal of counterfeit bank notes. Histogram with h D 0:4 and origins x0 D 137:65
(upper left), x0 D 137:75 (lower left), x0 D 137:85 (upper right), x0 D 137:95 (lower right)
MVAhisbank2

property of histograms strongly contradicts the goal of presenting data features.


Obviously, the same data are represented quite differently by the four histograms. A
remedy has been proposed by Scott (1985): “Average the shifted histograms!”. The
result is presented in Fig. 1.8.
Here all bank note observations (genuine and counterfeit) have been used. The
(so-called) averaged shifted histogram is no longer dependent on the origin and
shows a clear bimodality of the diagonals of the Swiss bank notes.
14 1 Comparison of Batches

Swiss Bank Notes Swiss Bank Notes

0.4 0.4
Diagonal

Diagonal
0.3 0.3

0.2 0.2

0.1 0.1

0 0
138 139 140 141 142 138 139 140 141 142
2 shifts 8 shifts

Swiss Bank Notes Swiss Bank Notes

0.4 0.4
Diagonal

Diagonal
0.3 0.3

0.2 0.2

0.1 0.1

0 0
138 139 140 141 142 138 139 140 141 142
4 shifts 16 shifts

Fig. 1.8 Averaged shifted histograms based on all (counterfeit and genuine) Swiss bank notes:
there are 2 shifts (upper left), 4 shifts (lower left), 8 shifts (upper right) and 16 shifts (lower right)
MVAashbank

Summary
,! Modes of the density are detected with a histogram.

,! Modes correspond to strong peaks in the histogram.

,! Histograms with the same h need not be identical. They also


depend on the origin x0 of the grid.
,! The influence of the origin x0 is drastic. Changing x0 creates
different looking histograms.
,! The consequence of an h that is too large is an unstructured
histogram that is too flat.
,! A binwidth h that is too small results in an unstable histogram.
1.3 Kernel Densities 15

Summary (continued)
p
,! There is an “optimal” h D .24 =n/1=3 .

,! It is recommended to use averaged histograms. They are kernel


densities.

You might also like