Intro To Kernel Density Estimation

Intro to Kernel Density Estimation

Uploaded by

knjigetestoviscribd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

Intro To Kernel Density Estimation

Intro to Kernel Density Estimation

Uploaded by

knjigetestoviscribd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

So here are the kernels, the US women.

So there's lots of US women.

So, I could work with quite small bins.
And it looks fine.
It's interesting.
We can say something about maybe the mode of the distribution
is about 160 centimeters.
But, after a while, the little bumps
here become somewhat annoying.
They are clearly not representing
what's there in reality.
So we might want something smoother.
And one way to get something smoother
is to place-- instead of having a histogram,
representing what's called a "kernel."
So let you show you these data.
Let me show you the same data.
I've plotted now the histogram, but without color, so we
can see the kernel.
And the kernel is this function.
So the kernel density estimation is, in a sense,
a smoothed histogram.
So how does this smoothing happen?
Remember that we said the histogram, in order
to do the histogram, I take in a particular interval,
I basically stack up little vertical bins
for each observation, proportional
to the number of observations that I have.
For the histogram, we do the same thing,
except that, instead of stacking little rectangular bars,
we sum up the result from a kernel function.
So let me first show it to graphically,
and then I'll put down the formula.
So suppose that this is my sample.
I have a sample which has only 10 observations.
And these are my observations.
These are the value of the observations.
How do I do a kernel density estimation?
Around each observation, I'm going
to draw a curve that we are going
to call a "kernel," which was is where the "kernel density
estimation" come from--
this kernel.
What's a kernel?
What do you think of this blue curve?
How do they look like?
A normal distribution?
It could be a normal, or it could be-- doesn't even
have to be a normal.
What's relevant--
Oh, Gaussian.
Gaussian and normal are friends.
That's the same term.
But what's-- yeah.
They're symmetrical and centered on the point.
Exactly.
What they need to be is they need
to be symmetrical and centered on the point.
So a kernel is fine.
A normal is fine.
This one is a normal, as it tells you on the top.
But actually, like, a little U-shape, inverted U-shape,
on top of it would work just fine.
So any distribution that is symmetrical
and we centered around the point will do.
And it has to integrate to 1.
So, we do all these curves.
We do all these bells, normal-looking bells,
or Epanechnikov.
Epanechnikov is kind of more of a rounder bell, like that.
Doesn't really matter.
That's a choice, the size of the bell, the shape of the bell,
but it's not a choice that turns out to be deeply important.
We do that for each of the points.
OK?
And then we take a bin.
In the case of kernel density estimation,
we are going to call that a "bandwidth"--
like the width of the band.
And then, suppose I'm interested in estimating
the kernel-density function for this guy, this point.
I'm doing my band.
Here, in this case, we know that the bandwidth is 0.678.
So I'm drawing a little band of 0.678,
which is around-- so that's about, if this is 1,
this is about that.
I'm drawing it around the point where
I'm interested to estimate.
I'm drawing it around my x.
And then I'm counting, I'm summing up
all of the height of the curves for the point that--
for the points that fall within this bin.
So, for example, here, when I draw this,
I'm getting this one, this one, this one, this one, this one,
roughly these ones.
So I'm going to sum up, at the point x, the height of each
of these curves.
So it's very similar to do a histogram, except that,
in an histogram, at a point I stack rectangles
of the same height, and here I'm going
to stack little bars of different height,
giving them smaller height if they
are far from my point and larger height
if they are close to my point.
Does that make sense?
So, if you see, at the very, very, very edge of it,
the histogram is just-- for this point, here, there is almost--
there is almost only the point from this histogram, in here,
so I'm very close to the curve-- to the first one.
And then I'm kind of moving up from it,
because, for all of these points that are here
I'm adding a lot of histogram.
I'm adding a lot of kernels, so the vertical height is higher.
At this point, here, corresponds to a point that is above here,
with an histogram of 68.
You can see that there is a lot of kernels to add.
Make sense?
Yep.
So, for a certain bandwidth, when
you sum up all the heights of the little curves
in that bandwidth, then that final height value,
does it get plotted at the beginning of that bandwidth?
In the middle.
Oh.
In the middle.
So, basically, you draw the bandwidth.
If you want to plot this particular point,
you draw the bandwidth centered around that plot,
and you sum up all of the kernels that show up
in that interval, and that gives you the value of that point.
OK?
Now, concretely, you don't actually do that.
But that's what the--
R does that.
So that's what this function tells us.
I think it's useful to go to the graphical representation
of what this function tells us.
But basically it tells us it's--
for any-- you know, if you have a sample x1,
x2, to xn, an independent and identically sample drawn
from some distribution with an unknown PDF
that you are trying to get some sense of.
We are interested--
The kernel density estimator is given by, for any point x,
the weighted sum of the kernel function of x minus xi.
So basically it's this weighted sum of all of the function.
So it gives us something which is
quite similar to an histogram.
But within each of the bin is giving more weight
in our counts to the number of observations--
to the observations that are closer
to the center of the integral.
And we redivide by n.
And the size of each.
Yep?
So how-- I guess, how accurate is it
to assume that it's identically distributed?
And, like, what can you do about a sample size that's
not identically distributed?
So, in this particular case, this is what it is.

That's the assumption we are making.

Whether it's a good assumption or a bad assumption,
that is going to depend on the data set.
But, when I draw this assumption we
are making, for it to even make sense
to start drawing a kernel distribution.
So, typically, a sample of observation, it's
reasonable to think that a sample observation representing
heights, pretty reasonable to think that it's an iid sample.
These are people who are--
It might be an iid sample from a funky distribution.
For example, if I have men and women,
there is a distribution that represents
the size of men and women, but I could instead say, well,
this distribution is really the combination of two
distributions, which is one distribution for the men, one
distribution for the women.
But there is still a distribution
that represents where this sample is coming from.
Can you weight kernels different, in a way,
to distribute your--
like, find a probability [INAUDIBLE]??
No, you cannot.
Because, remember, you have no idea,
at this point, what the distribution is.
So what you're trying to do, here,
is to do, say, this is my sample.
It's coming from this--
This is my sample.
I'm assuming that it's an iid sample drawn
from some distribution.
I want to look at the shape of this distribution.
I make no assumption-- that's the value of a kernel.
I make no assumption about what the distribution looks like.
So you can see that, here, there is actually a bump, here.
It's not a-- this definitely doesn't look, for example,
like a normal distribution.
I make zero assumption of what the distribution might be.
The kernel is going to tell me what it might be.

Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
No ratings yet
Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
387 pages
(Bernard. W. Silverman) Density Estimation For Sta
No ratings yet
(Bernard. W. Silverman) Density Estimation For Sta
92 pages
Kernel Smoothing-MP Wand-MC Jones-1995
100% (1)
Kernel Smoothing-MP Wand-MC Jones-1995
228 pages
An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
100% (1)
An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
7 pages
Lecture Notes On Multivariate Analysis
100% (1)
Lecture Notes On Multivariate Analysis
75 pages
Non-Parametric Methods Using Kernel Density Estimation
No ratings yet
Non-Parametric Methods Using Kernel Density Estimation
1 page
Kernel Density Estimation - Wikipedia
No ratings yet
Kernel Density Estimation - Wikipedia
11 pages
Review of Kernel Density Estimation
No ratings yet
Review of Kernel Density Estimation
35 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
Articulo Sheather
No ratings yet
Articulo Sheather
11 pages
Non Parametric Density Estimation
No ratings yet
Non Parametric Density Estimation
4 pages
TEAA - Memory Based Tecniques
No ratings yet
TEAA - Memory Based Tecniques
23 pages
On density estimation
No ratings yet
On density estimation
4 pages
Simon Sheather 2004 PDF
No ratings yet
Simon Sheather 2004 PDF
10 pages
Kernel Density Estimation
No ratings yet
Kernel Density Estimation
10 pages
CrimeStatChapter 8
No ratings yet
CrimeStatChapter 8
43 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Empirical Finance1
No ratings yet
Empirical Finance1
31 pages
Kernel Density Estimation and Its Application
No ratings yet
Kernel Density Estimation and Its Application
8 pages
Lecture 12
No ratings yet
Lecture 12
4 pages
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
No ratings yet
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
11 pages
U4 ProbabilityDensityEstimation
No ratings yet
U4 ProbabilityDensityEstimation
6 pages
Non-Parametric Methods
No ratings yet
Non-Parametric Methods
51 pages
Density Estimation
No ratings yet
Density Estimation
17 pages
Comparing Distributions Part Two
No ratings yet
Comparing Distributions Part Two
2 pages
Day 3
No ratings yet
Day 3
19 pages
Lec7 Density PDF
No ratings yet
Lec7 Density PDF
9 pages
slides3part1-mrbm2324
No ratings yet
slides3part1-mrbm2324
29 pages
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
No ratings yet
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
5 pages
13 Density Estimation Note
No ratings yet
13 Density Estimation Note
48 pages
Density Estimation Is A Statistical Technique Used
No ratings yet
Density Estimation Is A Statistical Technique Used
16 pages
A Short Course On Nonparametric Curve Estimation R PDF
No ratings yet
A Short Course On Nonparametric Curve Estimation R PDF
114 pages
AMC Technical Brief 4 (Kernel Density Estimation Using Kernel - Xla)
No ratings yet
AMC Technical Brief 4 (Kernel Density Estimation Using Kernel - Xla)
2 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
Racine - 2007 - Nonparametric Econometrics A Primer
No ratings yet
Racine - 2007 - Nonparametric Econometrics A Primer
88 pages
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
No ratings yet
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
7 pages
CH Density Estimation
No ratings yet
CH Density Estimation
15 pages
05 Density Estimation
No ratings yet
05 Density Estimation
29 pages
Kernel (Statistics)
No ratings yet
Kernel (Statistics)
4 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
densityestimation
No ratings yet
densityestimation
28 pages
Chapter One
100% (1)
Chapter One
46 pages
Functional Estimation For Density, Regression Models and Processes (Odile Pons)
No ratings yet
Functional Estimation For Density, Regression Models and Processes (Odile Pons)
205 pages
Mean and Variance
No ratings yet
Mean and Variance
20 pages
Chapter 2 - Representing Sample Data: Graphical Displays
No ratings yet
Chapter 2 - Representing Sample Data: Graphical Displays
16 pages
Mean-Shift Tracking: R.Collins, CSE, PSU CSE598G Spring 2006
No ratings yet
Mean-Shift Tracking: R.Collins, CSE, PSU CSE598G Spring 2006
93 pages
aula4
No ratings yet
aula4
15 pages
PM Notes
No ratings yet
PM Notes
26 pages
Tabak-Turner
No ratings yet
Tabak-Turner
20 pages
LOV
No ratings yet
LOV
43 pages
Comparing Distributions Part One
No ratings yet
Comparing Distributions Part One
2 pages
Notests PDF
No ratings yet
Notests PDF
153 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Norway04 Nonparametric
No ratings yet
Norway04 Nonparametric
32 pages
Resumo Adp
No ratings yet
Resumo Adp
5 pages
Kde Presentation PDF
No ratings yet
Kde Presentation PDF
105 pages
Article LR
No ratings yet
Article LR
18 pages
Introduction To Kernel Smoothing
100% (1)
Introduction To Kernel Smoothing
24 pages
Introduction To Kernel Smoothing
No ratings yet
Introduction To Kernel Smoothing
24 pages
Bézier Circles and other shapes
From Everand
Bézier Circles and other shapes
G. Adam Stanislav
5/5 (1)
Drawing Lessons
From Everand
Drawing Lessons
Willy Pogany
4/5 (4)
DRAWING: Perspective
From Everand
DRAWING: Perspective
Yves Leblanc
No ratings yet
tos-statistics-and-probability-3rd-quarter-tos-copy-xlsx
No ratings yet
tos-statistics-and-probability-3rd-quarter-tos-copy-xlsx
3 pages
(3rd Month) MATH 112 - Statistics and Probability
No ratings yet
(3rd Month) MATH 112 - Statistics and Probability
65 pages
Grade 11: Module 1-2: Exploring Random Variable and Constructing Probability Distribution
No ratings yet
Grade 11: Module 1-2: Exploring Random Variable and Constructing Probability Distribution
9 pages
Uncertainty
No ratings yet
Uncertainty
6 pages
322 Chapter 7: Sampling and Sampling Distributions
No ratings yet
322 Chapter 7: Sampling and Sampling Distributions
26 pages
Homework (Session 5) S5.1
No ratings yet
Homework (Session 5) S5.1
2 pages
DPP-Applied Math-12-Probability
No ratings yet
DPP-Applied Math-12-Probability
2 pages
Commonly Used Continuous Distributions: 1st Semester 2022
No ratings yet
Commonly Used Continuous Distributions: 1st Semester 2022
63 pages
Elementary Probability and Statistics
No ratings yet
Elementary Probability and Statistics
25 pages
J. Medhi: Second Edition
No ratings yet
J. Medhi: Second Edition
8 pages
CE204 Recitation07 Week08 Chapter3 4
No ratings yet
CE204 Recitation07 Week08 Chapter3 4
2 pages
Heteroscedasticity:: Testing and Correcting in SPSS
No ratings yet
Heteroscedasticity:: Testing and Correcting in SPSS
32 pages
Tarea 01
No ratings yet
Tarea 01
3 pages
Chapter 5 - Sampling and Sampling Distribution
No ratings yet
Chapter 5 - Sampling and Sampling Distribution
44 pages
39. Basic Probability Worksheet - Solutions
No ratings yet
39. Basic Probability Worksheet - Solutions
3 pages
Statistical Modelling: Univ.-Prof. Dr. Habil. Albrecht Gnauck
No ratings yet
Statistical Modelling: Univ.-Prof. Dr. Habil. Albrecht Gnauck
67 pages
Covariance
No ratings yet
Covariance
5 pages
Moment Generating Function
No ratings yet
Moment Generating Function
11 pages
Probability Kiboko Kabsa
100% (1)
Probability Kiboko Kabsa
111 pages
BCS Statistics 112 Sem 1 2019 Answers
No ratings yet
BCS Statistics 112 Sem 1 2019 Answers
16 pages
PQT - UNIT 4 Lecture Notes
No ratings yet
PQT - UNIT 4 Lecture Notes
9 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
10th Probability Extra Questions.pdf
No ratings yet
10th Probability Extra Questions.pdf
11 pages
Booklet 2 CS1
No ratings yet
Booklet 2 CS1
17 pages
Probability One Shot #BounceBack
100% (1)
Probability One Shot #BounceBack
177 pages
Applied Statistics
100% (1)
Applied Statistics
64 pages
(FREE PDF Sample) Stochastic Methods in Scientific Computing 1st Edition Massimo D'Elia Ebooks
100% (5)
(FREE PDF Sample) Stochastic Methods in Scientific Computing 1st Edition Massimo D'Elia Ebooks
84 pages
Chapter 9 (Decision Making)
No ratings yet
Chapter 9 (Decision Making)
47 pages