0% found this document useful (0 votes)
17 views33 pages

Week 4-Nonparametric and Semiparametric Estimation

Uploaded by

f.shhryrpr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views33 pages

Week 4-Nonparametric and Semiparametric Estimation

Uploaded by

f.shhryrpr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Week 4: Further Nonparametric and Semiparametric

Estimation

Dongwoo Kim
Simon Fraser University
Jan 26, 2023
Introduction

We’ve learned basic kernel based nonparametric estimators.

Today, we further investigate more nonparametric estimators.

• Local linear estimator


• Series (sieve) estimator
• Endogeneity and shape restrictions
• Empirical implementation of nonparametric estimators

Several useful semiparametric techniques we will consider to overcome curse of


dimensionality

• Single index model


• Partially linear model
• Maximum rank correlation estimator

1
Local Linear Estimator

The Nadaraya-Watson estimator minimizes

n  
1 X 2 Xi − x
S0,n (m) = (Yi − m) K
nh i=1 h

so that m approximates the conditional mean of Y given X = x. This is called


local constant estimator.

Local linear estimator minimizes

n  
1 X Xi − x
S1,n (m, β) = (Yi − α − (Xi − x)β)2 K
nh i=1 h

and the resulting estimator is of form

 
Pn Xi −x
i=1 (Yi − (Xi − x)β(x))K h
m̂(x) =   .
Pn Xi −x
i=1 K h

2
Local Linear Estimator

Now we have to estimate β̂(x) to compute m̂(x).

m̂(x) can be written as a weighted average of Yi ,

n
X
m̂(x) = wi Yi .
i=1

Solving for wi gives us the estimator desired.

3
Local Linear Estimator: Implementation

Define that Ki = K ( Xi h−x ), Zi = (1, Xi − x) for notational simplicity. Now


suppose that one writes a least squares problem as follows.
" #
√ √ m(x)
Yi Ki = Zi Ki + Ui
β(x)

Then Sn (m, β) is the objective function of this weighted least squares


regression.
" " ##2
2
√ √ m(x)
Ui = Yi Ki − Zi Ki = [Yi − m(x) − (Xi − x)β(x)]2 Ki
β(x)

√ √
Therefore, by defining Ỹi = Yi Ki and Z̃i = Zi Ki ,

" #
m(x)
Ỹi = Z̃i + Ui
β(x)

4
Local Linear Estimator: Implementation

Therefore, OLS estimation of Ỹi on Z̃i solves this minimization problem.

" # n n
m̂(x) X X
=( Z̃i0 Z̃i )−1 Z̃i0 Ỹi
β̂(x) i=1 i=1

" # " n #−1 n


m̂(x) X 0 X 0
= Zi Ki Zi Zi Ki Yi
β̂(x) i=1 i=1

" n
#−1 n n
X X X
∴ m̂(x) = (1 0) Zi0 Ki Zi Zi0 Ki Yi = Wi Yi
i=1 i=1 i=1
" n
#−1
X
where Wi = (1 0) Zi0 Ki Zi Zi0 Ki
i=1

5
Summary

• Bias and variance derivation is similar to NW estimator.


• Variance is the same to NW estimator but bias tends to be smaller.
• Monte Carlo evidence in the literature suggests that local linear estimator
has smaller MSE than NW estimator.
• It does not rely on symmetry of kernel function. Hence better performance
near the boundary of the support of X .
• In RDD estimation, local linear estimator is the norm.
• If true model is linear, no bias.
• Extended to local polynomial estimation.

6
Series (Sieve) Estimation

Series (Sieve) estimation is an alternative nonparametric estimation technique


to kernel approach.

This method relies on functional approximation method.

We are still interested in estimating the regression

Y = m(X ) + U, E [U|X ] = 0

nonparametrically.

7
Series (Sieve) Estimation

Why sometimes is it called sieve?

8
Series (Sieve) Estimation

The idea is to approximate the unknown function m with some basis functions
such as

• polynomials
• piecewise polynomials (spline)
• Fourier series

These methods work well because of the following theorem.

Weierstrass Theorem
Suppose m is a continuous function real-valued function, defined on an
interval [a, b]. Then for all ε > 0 there exists a polynomial p(x) such that for
each x ∈ [a, b] it holds that

|f (x) − p(x)| ≤ ε.

So for any continuous function, there exists a polynomial that uniformly


converges to it.
9
Series (Sieve) Estimation

Now define Zk (X )M
k=1 as a series of basis functions.

PM
The main idea is to approximate m(X ) by k=1 Zk (X )βk .

For example, if you use cubic polynomial,

Z1 (X ) = 1, Z2 (X ) = X , Z3 (X ) = X 2 , Z4 (X ) = X 3 .

Then we approximate

m(X ) ≈ β1 + X β2 + X 2 β3 + X 3 β4 .

10
Series (Sieve) Estimation

Let Z (X ) = (Z1 (X ), · · · , ZM (X ))0 and β = (β1 , · · · , βM )0 , both are M × 1


column vectors. Then we estimate

M
X
Yi = Zk (Xi )βk + Ui
k=1

= Z (Xi )0 β + Ui

As the specification is linear in basis functions, β̂ is estimated by OLS. By


defining a n × M matrix Z = (Z1 (Xi ), · · · , ZM (Xi ), )

β̂ = (Z 0 Z )−1 Z 0 Y .

Note that like bandwidth h in kernel-based estimators, we have to choose M


which is the number of series terms. The more series terms, the greater
flexibility.

11
Series (Sieve) Estimation

Assumptions
1. Ui is iid and homoskedastic with mean zero.
2. {Xi }ni=1 are fixed.
3. For some r ∈ (0, 1) and C1 ∈ (0, ∞), M = C1 nr .
4. The elements in Z (Xi ), the approximating function Z (Xi )β, and the
function m are uniformly bounded by a constant C4 .
λmin (Z 0 Z )
5. M
→ ∞ as n → ∞ where λmin (Z 0 Z ) is the minimum eigenvalue of
0
Z Z.
1
6. There exists β(n) such that M 2r supx |Z (X )0 β(n) − m(x)| → 0 as n → ∞.
7. v = σ 2 (Z (x)0 (Z 0 Z )−1 Z (x)) ≥ C3
n
for all large n.

12
Series (Sieve) Estimation

Under Assumptions 1-7, the series estimator m̂(x) is consistent and


asymptotically normal.

The asymptotic variance is v .

The bias disappears as n → ∞ by Assumption 6.

13
Series (Sieve) Estimation

In practice given finite sample, we cannot take very large M that gives full
flexibility.

Using a polynomial for the entire domain of m is not a very good idea.

Piecewise approximation works better.

• Divide the support of X into J number of intervals.

[a1 , b1 ], [a2 , b2 ], · · · , [aJ , bj ]

• Each point that divides two intervals is called a “knot” (bj = aj+1 ).
• We need J − 1 number of knots.
• Within each interval, estimate the function m(X ) with a polynomial.

Note that the number of knots and number of series terms M need to be
chosen. MSE based cross-validation gives optimal solution.

14
Summary

• Series estimation is easy to implement.


• Several tuning parameters you need to choose. The type of basis
functions, number of terms, number of knots.
• Results are very sensitive to the choice of tuning parameters.
• Much easier to implement shape restrictions than kernel-based approach.
• Asymptotic behaviour is straightforward.
• More detaild discussion: Chen (2007, Handbook of Econometrics)

15
Endogeneity in Nonparametric Regression

Consider regression where X is potentially endogenous and valid instrument Z


exists.

Y = m(X ) + U, E [U|Z ] = 0

Given the model, identification of the nonparametric function m(X ) is obtained


by

E [Y |Z = z] = E [m(X )|Z = z].

Therefore, in order to solve for m(x) at specific value of X = x, we have to


take the inverse of conditional expectation operator.

16
Endogeneity in Nonparametric Regression

By rewriting the identifying equation,


Z
E [Y |Z = z] = m(x)fX |Z (x|z)dx
Z
fXZ (x, z)
= m(x) dx.
fZ (z)

This is well-known “ill-posed” inverse problem.

Note that we have to estimate fXZ , fZ and E [Y |Z ] all of which have


nonparametric convergence rates.

Overall, this problem results in much slower convergence rate than standard
nonparametric regression under exgeneity.

17
Endogeneity in Nonparametric Regression

The finite sample performance of nonparametric IV estimator can be very poor.

Potential solution: imposing shape restriction!

• Economic theory often provides the shape of structural relationship


between economic variables
• Demand curve is decreasing and supply curve increasing in price
• Production function is increasing in productivity, labour and capital input

It is shown by Chetverikov and Wilhelm (2017, ECTA) that proper shape


restriction can significantly improve the finite sample performance of NPIV
estimator.

Stata implementation is provided by Chetverikov, Kim and Wilhelm (2018).

18
Semiparametric Estimation

Semiparametric models consist of nonparametric and parametric parts.

Examples:

• Single index model


Y = m(X 0 β) + U
• Partially linear model

Y = m(X ) + W 0 β + U

• Maximum rank correlation estimator

Y = m(X 0 β + U)

19
Semiparametric Estimation: Single index

Suppose the linear index of X affects the outcome.

Y = m(X 0 β) + U

This specification significantly reduces the dimensionality of nonparametric


problem as X 0 β is scalar.

Proposition
Identification of m and β requires

1. X should not contain a constant, X must contain at least one continuous


variable, and ||β|| = 1. In addition, the elements of X must be linearly
independent.
2. m is differentiable and is not a constant function on the support of X .
3. Varying values of discrete components of X do not divide the support of X
into disjoint subsets.

20
Semiparametric Estimation: Single index

Intuitively, after obtaining β̂, the nonparametric function m can be estimated


using any nonparametric estimator.

Ichimura (1993) proposes estimator for β that minimizes

( n
)
(Yi − m̂−i (Xi0 β))2 w (Xi )1[Xi ∈ An ]
X
min
β
i=1

where

 X 0 β−X 0 β 
j i
P
j6=i Y j K h
m̂−i (Xi0 β) = P  X 0 β−X 0 β  ,
j i
j6=i K h

W (Xi ) is non-negative weight function and 1[Xi ∈ An ] is a trimming function.

21
Semiparametric Estimation: Single index

• The estimator is a weighted least squares estimator with trimming.


• Weighting is used to correct for heteroskedasticity.
• Trimming is used to avoid dividing by numbers near zero. Only
observations with fz (Zi ) > b are included where Zi = Xi0 β and b > 0.
• Conditional mean uses the leave-one-out estimator.
• Kernel is a one-dimensional kernel (no curse of dimensionality).

22
Semiparametric Estimation: Single index

Asymptotics and consistency

• The estimator converges to normal distribution under regularity conditions.


√  
n β̂ − β →d N(0, Ω)

• Formula for variance on page 255 in Li and Racine (2007).


• Variance depends on weight function, variance of error, derivative of m,
and covariance matrix of X .

• β̂ is n-consistent and the nonparametric function m(·) is one
dimensional.
• Estimator is conceptually straightforward but hard to compute.
• Proof of asymptotic normality is very technical in Ichimura’s original
paper. Great heuristic proof in Li and Racine (2007) or Horowitz (2012).

23
Partially Linear Model

One of the simplest semiparametric models is partially linear model proposed by


Robinson (1988, ECTA).

• This model consists of nonparametric function m and linear parametric


component.

Yi = m(Zi ) + Xi0 β + Ui , E [Ui |Xi , Zi ] = 0

• The goal is to estimate β at parametric rate and then estimate function m


nonparametrically.
• Only a few regressors Zi is supposed to have unknown nonlinear
relationship with Yi .
• Dimensionality of nonparametric problem is reduced to be handy.
• Widely used in many fields due to simplicity.

24
Partially Linear Model: Main Idea

Identification requires several conditions.

• Xi cannot contain a constant term.


• X is not deterministic given Z such that

Φ = E [(X − E (X |Z ))(X − E (X |Z ))0 ]

is positive definite.

Under these conditions, taking conditional expectation given Zi gives

E [Yi |Zi ] = m(Zi ) + E [Xi |Zi ]β + E [Ui |Zi ] .


| {z }
=0

Therefore, by subtracting E [Yi |Zi ] from Yi , (called Robinson transformation)

Yi − E [Yi |Zi ] = (Xi − E [Xi |Zi ]) β,


| {z } | {z }
=Ỹi X̃i

The nonparametric component is cancelled out and OLS of Ỹi on X̃i


consistently estimate β!!
25
Partially Linear Model: Main Idea

Given β, nonparametric estimation of m is straightforward since

Yi − Xi0 β = m(Zi ) + Ui .

Nonparametric regression of Yi − Xi0 β on Zi gives consistent estimate of m.

Unfortunately, the conditional expectations of Yi and Xi given Zi are unknown.

Solution: Replace unknown conditional mean functions with estimates!

   
P Zj −Zi P Zj −Zi
j6=i Y i K h j6=i X i K h
Ê [Yi |Zi ] = P 
Zj −Zi
 , Ê [Xi |Zi ] = P 
Zj −Zi

j6=i K h j6=i K h

26
Partially Linear Model: Trimming

Note that the estimators have density of Zi as their denominator.

• If fZ (Zi ) is close to 0, the estimators work very poorly.


• Trimming: drop observations where fZ is small.
• Estimator of β is
n n
X̃i Ỹi 1[fˆZ (Zi ) > b]
X 0 X
β̂ = ( X̃i X̃i )−1
i=1 i=1

where Ỹi = Yi − Ê [Yi |Zi ], X̃i = Xi − Ê [Xi |Zi ].



• Robinson showed that this estimator is n-consistent and asymptotically
normal.

n(β̂ − β) →d N(0, Φ−1 ΨΦ−1 )
0
where Ψ = σ 2 (Xi , Zi )X̃i X̃i .

27
Maximum Rank Correlation Estimator

Consider the model


Y = m(X 0 β + U), U⊥
⊥X
where m is unknown strictly increasing function in its argument.

• Now U is not fully additively separable.


• Can we uniquely identify β?

28
Maximum Rank Correlation Estimator

Under given assumptions, β cannot be uniquely identified. Why?

If β̃ = βc where c ∈ R+ and c 6= 0. Then

m(X 0 β + U) = m̃(X 0̃ β + Ũ)

where m̃(z) := m( cz ) and Ũ = cU. Therefore, (β, m(·)) is observationally


equivalent to (β̃, m̃(·)).

However, we can still identify β up to scale with some normalization. (This


means you fix c) In practice, we restrict our parameter space such that

B := {β ∈ RK : ||β|| = 1}

How can we estimate β consistently?

29
Maximum Rank Correlation Estimator

See Aaron Han (1987, JoE) for the details.

Consider

S(β) = E [1(Yi > Yj )1(Xi0 β > Xj0 β) + 1(Yi < Yj )1(Xi0 β < Xj0 β)]
= E [1[(Yi − Yj )(Xi0 β − Xj0 β) > 0]

Suppose that β0 is the true parameter. Then β ∗ = β0


||β0 ||
uniquely maximises
S(β) under the following assumptions.

• A1. m(·) is strictly increasing in its argument


• A2. X ⊥
⊥U
• A3. The support of the cdf of X is not contained in any proper linear
subspace of RK (rank condition)
• A4. β ∗ = (β1∗ , · · · , βK∗ ); There exists some k such that βk∗ 6= 0 and the
distribution of xk has everywhere positive Lebesque density conditional on
x−k

30
Maximum Rank Correlation Estimator

• The estimator works for more general form of model

Y = m(X 0 β, U)

where U is nonseparable.
• Consistency is proved by showing uniform convergence of the sample
objective function to S(β).
• Note that the estimator is of extremum estimator.

• Sherman (1993) showed that this estimator is n-consistent and
asymptotically normal. Proof of asymptotic normality is not derived by
usual extremum estimation argument but empirical process theory.

31
Summary

• Semiparametric models are in-between parametric and nonparametric


models.
• No curse of dimensionality at cost of stronger regularity conditions.
• Parametric rate for parametric components.
• Nonparametric component is estimated after obtaining β. Convergence
rate of β̂ is much faster so including β̂ in nonparametric regression does
not affect the nonparametric rate.
• Theory is straightforward. Implementation is somewhat complex but pretty
feasible.

32

You might also like