0% found this document useful (0 votes)
19 views11 pages

Assignment 2 2024

Uploaded by

weiranzhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

Assignment 2 2024

Uploaded by

weiranzhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

10/3/24, 2:08 AM Assignment 2 2024

Assignment 2
Exercise 1: What factors
weather insurance? increase the demand for
This exercise is an adapted case from the following paper (but you do not need to
read the paper to complete the assignment):
Cai, Jing, Alain De Janvry, and Elisabeth Sadoulet. "Subsidy policies and insurance
demand." American Economic Review 110.8 (2020): 2422-2453.
Households face different types of weather risks that can generate large fluctuations
in income and consumption. To shield individuals from risks, many governments
exercise great efforts on developing and marketing formal insurance products.
However, in both developing and developed countries, the value placed by individuals
on insurance is usually surprisingly low, and initiatives to provide information,
subsidies, and to increase trust have had limited success (Cole et al. 2013, Banerjee
et al. 2019). Many countries have given up on trying to sell insurance and moved to
make insurance mandatory.
In this assignment we will study the demand for a weather insurance product for rice
producers in China. Rice is the most important food crop in China, with nearly 50
percent of the country’s farmers engaged in its production. In order to maintain food
security and shield farmers from negative weather shocks, in 2009 the Chinese
government asked the People’s Insurance Company of China (PICC) to design and
offer the first rice production insurance policy to rural households in 31 pilot
counties.5 The program was expanded to 62 counties in 2010 and to 99 in 2011. The
experiment we are studying in this assignment was conducted in 2010 and 2011 in
randomly selected villages included in the 2010 expansion in Jiangxi province, one of
China’s major rice-producing areas.
The product in our study is an area-yield index weather insurance that covers natural
disasters, including heavy rains, floods, windstorms, extremely high or low
temperatures, and droughts. If any of these disasters occurs and leads to a 30
percent or more average loss in yield in a given area, farmers in that area are eligible
to receive payouts from the insurance company. These areas are typically defined as
fields that include the plots of 5 to 10 farmers.

Data Description
The data for this exercise comes from households in 134 villages in the Jiangxi
province, which is considered a representative sample of rice producers in Jiangxi.
Households were surveyed and each observation in the dataset, named
data_cai_sadoulet_dejanvry.dta, corresponds to a household in that sample.
file:///Users/jvj/Downloads/Assignment 2 2024.html 1/11
10/3/24, 2:08 AM Assignment 2 2024

The variables included in the data_CSD.dta that are required for this question are:
• takeup_2011 : dummy equal to 1 if the household decided to take up the
insurance product in 2011
• area : area of rice production in mou (mou, Chinese unit of land measurement that
varies with location but is commonly 806.65 square yards (0.165 acre, or 666.5
square metres).
• age : age of household head
• agpop : household size (number of people living in the same household)
• male : Gender of household head (1 if male, 0 if female)
• literacy : dummy equal to 1 if the household head is literate, 0 otherwise

Question 1 : Descriptive Statistics


Load the dataset data_CSD.dta . Notice that this is a .dta file so you will need to
use the haven package. Use the head() function to have a look at the dataset.
In [249… library(haven)
CSD <- read_dta('data_CSD.dta')
head(CSD)

A tibble: 6 × 11
takeup_2011 area payout_2010 age agpop rice_inc male price_final literac
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <db
1 4.5 1 56 6 50 1 2.7
1 14.0 1 56 2 80 1 2.7
1 16.0 1 59 9 100 1 4.5
1 8.0 1 72 7 10 1 4.5
1 7.5 1 49 6 30 1 2.7
1 9.0 1 32 5 20 1 4.5

a) Descriptive statistics
i) Demographic characterisics
How many households are in your data set? How many respondents are male and
female? What is the mean age among the household heads in the sample? What is
the mean number of members in the households in your sample? Note there are
some missing values in the household size variable. What argument do you have to
add to mean() to get around this?

file:///Users/jvj/Downloads/Assignment 2 2024.html 2/11


10/3/24, 2:08 AM Assignment 2 2024

Hint: check for the mean() syntax in this website:


https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/mean .
In [250… nrow(CSD)

library(tidyverse)
CSD_male <- CSD %>%
filter(male == 1)
nrow(CSD_male)
CSD_female <- CSD %>%
filter(male == 0)
nrow(CSD_female)

CSD_1 <- subset(CSD, is.na(CSD[,"age"])==F)


mean(CSD_1$age)

CSD_2 <- subset(CSD, is.na(CSD[,"agpop"])==F)


mean(CSD_2$agpop)

3474
3367
107
53.0743301642178
5.23048112935753
3474 households are in this data set, with 3367 male respondents and 107 female
respondents. The mean age among the household heads in the sample is about
53.07, and the mean number of members in the households is about 5.23.
ii) Measures of dispersion
Compute the standard deviation and standard errors of the age of household head
and the household size variables.
You can use canned functions for the standard deviations.
In [251… CSD_1 %>%
summarize(sd_age = sd(age),
se_age = sd_age/sqrt(nrow(CSD_1)))

CSD_2 %>%
summarize(sd_agpop = sd(agpop),
se_agpop = sd_agpop/sqrt(nrow(CSD_2)))

A tibble: 1 × 2
sd_age se_age
<dbl> <dbl>
11.78974 0.2001137

file:///Users/jvj/Downloads/Assignment 2 2024.html 3/11


10/3/24, 2:08 AM Assignment 2 2024

A tibble: 1 × 2
sd_agpop se_agpop
<dbl> <dbl>
2.394976 0.04065125

b) Histogram for Area of rice production in acres


Construct a variable area_ac equal to area of rice production in acres.
Plot a histogram (Hint: use the hist() command) of this constructed variable, with
100 bins.
In [252… CSD_3 <- subset(CSD, is.na(CSD[,"area"])==F)

CSD_3 <- CSD_3 %>%


mutate(area_ac = area*0.165)

hist(CSD_3$area_ac,
breaks = 100,
ylim =c(0,550))

c)household
Comparisons
head for area for rice production by gender of
i) Means
Calculate the mean of area for rice production (in acres) between female- and male-
headed households and their standard errors. Compare the two means, do they seem
file:///Users/jvj/Downloads/Assignment 2 2024.html 4/11
10/3/24, 2:08 AM Assignment 2 2024

substantially different?
In [253… CSD_4 <- subset(CSD_3, is.na(CSD_3[,"area_ac"])==F)

CSD_male <- CSD_4 %>%


filter(male == 1)
CSD_female <- CSD_4 %>%
filter(male == 0)

CSD_male_stat <- CSD_male %>%


summarize(n = nrow(CSD_male),
area_mean = mean(area_ac),
se_area_mean = sd(area_ac)/sqrt(nrow(CSD_male)))

CSD_female_stat <- CSD_female %>%


summarize(n = nrow(CSD_female),
area_mean = mean(area_ac),
se_area_mean = sd(area_ac)/sqrt(nrow(CSD_female)))

CSD_male_stat
CSD_female_stat

A tibble: 1 × 3
n area_mean se_area_mean
<int> <dbl> <dbl>
3346 1.94646 0.03336083
A tibble: 1 × 3
n area_mean se_area_mean
<int> <dbl> <dbl>
104 1.822457 0.2636056
No, they seem similar with very slight differences between area mean.
ii) Test Statistics
Create a test statistic for the the difference between male- and female headed
household for the area for rice production (in acres). Use a two-tail test. Is the
difference statistically significant at the 0.95 confidence level?
In [254… reg1 <- lm(area_ac ~ male, data = CSD_4)
summary(reg1)
reg1out <- summary(reg1)
reg1out$coefficients[2,1]
reg1out$coefficients[2,1]-1.96*reg1out$coefficients[2,2]
reg1out$coefficients[2,1]+1.96*reg1out$coefficients[2,2]

t_stat <- (CSD_male_stat$area_mean - CSD_female_stat$area_mean)/


(CSD_male_stat$se_area_mean + CSD_female_stat$se_area_mean)
t_stat

file:///Users/jvj/Downloads/Assignment 2 2024.html 5/11


10/3/24, 2:08 AM Assignment 2 2024

Call:
lm(formula = area_ac ~ male, data = CSD_4)

Residuals:
Min 1Q Median 3Q Max
-1.9135 -1.1215 -0.2965 0.5285 24.4535

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8225 0.1919 9.499 <2e-16 ***
male 0.1240 0.1948 0.636 0.525
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.957 on 3448 degrees of freedom


Multiple R-squared: 0.0001175, Adjusted R-squared: -0.0001725
F-statistic: 0.4051 on 1 and 3448 DF, p-value: 0.5245
0.124002802114328
-0.257856418402737
0.505862022631392
0.41756509809899
*Still struggling

Question 2: Effect of literacy on the demand for


weather insurance products
In its first year, the experiment allocated different subsidies to households in 134
randonly selected villages of the province. A subsidy of 70% was offered to all
households in the villages. Then, 2 days after this initial sale, households from 62
randomly selected villages were offered the insurance product for free.
In this part of the exercise, we will focus on year 1 of the experiment and estimate the
effect of literacy on the takeup of the insurance product. Consider the two following
models (with area measured in acres):
Model (1): T akeUp = β0 + β1 Literacy +β2 Area +u
Model (2): T akeUp = β0 + β1 Literacy +β2 Area +β3 HH size +u
a) Estimation
Estimate equations (1) and (2) with lm() .
In [255… model1 <- lm(takeup_2011 ~ literacy + area_ac, data = CSD_4)
summary(model1)
model2 <- lm(takeup_2011 ~ literacy + area_ac + agpop, data = CSD_4)
summary(model2)

file:///Users/jvj/Downloads/Assignment 2 2024.html 6/11


10/3/24, 2:08 AM Assignment 2 2024

Call:
lm(formula = takeup_2011 ~ literacy + area_ac, data = CSD_4)

Residuals:
Min 1Q Median 3Q Max
-0.7325 -0.5289 0.4379 0.4696 0.5130

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.485964 0.017465 27.825 <2e-16 ***
literacy 0.035367 0.019017 1.860 0.0630 .
area_ac 0.009143 0.004368 2.093 0.0364 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4987 on 3444 degrees of freedom


(3 observations deleted due to missingness)
Multiple R-squared: 0.002556, Adjusted R-squared: 0.001977
F-statistic: 4.413 on 2 and 3444 DF, p-value: 0.01219
Call:
lm(formula = takeup_2011 ~ literacy + area_ac + agpop, data = CSD_4)

Residuals:
Min 1Q Median 3Q Max
-0.7434 -0.5221 0.4016 0.4716 0.5724

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.412607 0.025880 15.943 < 2e-16 ***
literacy 0.039821 0.019025 2.093 0.03641 *
area_ac 0.008754 0.004362 2.007 0.04482 *
agpop 0.013582 0.003545 3.831 0.00013 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4977 on 3440 degrees of freedom


(6 observations deleted due to missingness)
Multiple R-squared: 0.006828, Adjusted R-squared: 0.005962
F-statistic: 7.883 on 3 and 3440 DF, p-value: 3.069e-05

b) Interpretation
Interpret each of the estimated parameters of equation (2) - remember to include
significance.
1. intercept: The intercept parameter indicates that when all other x variables are 0,
households will have a 41.2607% probability to take up the insurance product.
The p-value < 2e-16. It’s less than α = .05, thus the intercept variable have a
statistically significant relationship with the y variable in model 2.
2. literacy: The literacy parameter indicates that when holding all other x variables
constant, one unit (0 -> 1) increase in literacy will lead to 0.039821 increase in
household's probability to take up the insurance product. The p-value is
0.03641. It’s less than α = .05, thus the literacy variable have a statistically
significant relationship with the y variable in model 2.
file:///Users/jvj/Downloads/Assignment 2 2024.html 7/11
10/3/24, 2:08 AM Assignment 2 2024

3. area_ac: The area of rice production in acres indicates that when holding all
other x variables constant, one unit (0 -> 1) increase in area of rice production in
acres will lead to 0.008754 increase in household's probability to take up the
insurance product. The p-value is 0.04482. It’s less than α = .05, thus the
area_ac variable have a statistically significant relationship with the y variable in
model 2.
4. agpop: The household size parameter indicates that when holding all other x
variables constant, one unit (0 -> 1) increase in household size will lead to
0.013582 increase in household's probability to take up the insurance product.
The p-value is 0.00013. It’s less than α = .05, thus the agpop variable have a
statistically significant relationship with the y variable in model 2.

c) Omitted Variable Bias


How did your estimate of β^1 change between equation (1) and equation (2)? Without
performing any calculations, what information does this give you about the
correlation between the literacy of household heads and household size? (Explain
your reasoning in no more than 4 sentences.)
Model 2 has a higher β^1 compared to Model 1, after adding household size
parameter.
It indicates a negative correlation between the literacy of household heads and
household size. (improve reasoning during friday oh.)
d) Prediction
Predict the expected probability of a household taking up the insurance product if its
household head is literate, it produces rice on 5 acres and has 3 members using your
estimates from equation (2).
In [256… predict_value <- data.frame(literacy = 1, area_ac = 5, agpop = 3)
predict(model2, newdata = predict_value)

1: 0.536942950910693
The expected probability of a household taking up the insurance product if its
household head is literate, it produces rice on 5 acres and has 3 members using your
estimates from equation (2) is about 0.53694.

Question 3: Price and Payouts


We will now investigate how prices of the insurance product affect its take-up in the
second year of the experiment. In that year, subsidies ranging from 40% to 90% of
the market price of the insurance products were randomly assigned to households in
the village.
file:///Users/jvj/Downloads/Assignment 2 2024.html 8/11
10/3/24, 2:08 AM Assignment 2 2024

The two new variables you will need for this question are:
price_final : the price in RMB/mou offered to the household for the
insurance product in year 2.
payout_2010 : a dummy variable equal to 1 if the household received an
insurance payout in year 1.
(a) Define estimating Equation
Write an equation you could estimate that would account for price and payouts in
addition to the variables whose effects we were estimating in Question 2.
Importantly, you want to understand the impact of a 1% increase in prices rather than
a 1 unit increase in Chinese Yuan.
Your want to test two hypotheses:
1. The price of the insurance product does not affect the demand for the insurance
product.
2. A 50% reduction in the price of the product has the same effect on the likelihood
of purchasing the product as receiving a payout the year before.
T akeUp = β0 + β1 Literacy +β2 Area +β3 HH size $

\beta_4 log(pricef inal) + \beta_5 payout2010 + u $


(b) Summary stats
Look at the summary statistics of your price and payout variables (the table() or
summary() variables could come in handy). What percentage of households receive a
payout?
In [257… summary(CSD$price_final)
summary(CSD$payout_2010)

Min. 1st Qu. Median Mean 3rd Qu. Max.


1.20 2.70 3.60 4.08 5.40 7.20
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.4082 1.0000 1.0000

Based on the chart, 40.82% households receive a payout.


(c) Hypothesis 1
Estimate the equation in part (a). What can you conclude about your first hypothesis?
Note that you might transform your price variable prior to estimating the model.
In [258… model3 <-lm(takeup_2011 ~ literacy + area_ac + agpop + log(price_final) +
summary(model3)

file:///Users/jvj/Downloads/Assignment 2 2024.html 9/11


10/3/24, 2:08 AM Assignment 2 2024

Call:
lm(formula = takeup_2011 ~ literacy + area_ac + agpop + log(price_final) +
payout_2010, data = CSD_4)

Residuals:
Min 1Q Median 3Q Max
-0.9422 -0.4164 0.1833 0.4131 0.7552

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.530904 0.031603 16.799 < 2e-16 ***
literacy 0.036750 0.018070 2.034 0.042054 *
area_ac 0.002622 0.004163 0.630 0.528757
agpop 0.011217 0.003370 3.329 0.000882 ***
log(price_final) -0.157193 0.015293 -10.279 < 2e-16 ***
payout_2010 0.268953 0.016496 16.304 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4726 on 3438 degrees of freedom


(6 observations deleted due to missingness)
Multiple R-squared: 0.1051, Adjusted R-squared: 0.1038
F-statistic: 80.74 on 5 and 3438 DF, p-value: < 2.2e-16

P value < 2.2e-16, which is very close to zero. It indicates that we can reject our 1st
hypothesis -- the price of the insurance product does not affect the demand for the
insurance product.
(d) Hypothesis 2
(i) 50% increase in the price
What is the change in likelihood to purchase the product associated with a 50%
decrease in the price of the product ?
In [259… model3out <- summary(model3)
(model3out$coefficients[5,1])/2

-0.0785965124297076
Type your reply here
(ii) Receiving a payout in year 1
And what is the increase in the likelihood to purchase the product associated with
receiving the payout in year 1?
In [260… model3out$coefficients[6,1]

0.268953308501464
Type your reply here
(iii) Compare effects
file:///Users/jvj/Downloads/Assignment 2 2024.html 10/11
10/3/24, 2:08 AM Assignment 2 2024

Compare effects found in part (d.i) and (d.ii) with a statistical test.
In [ ]:

consult leila on friday.

file:///Users/jvj/Downloads/Assignment 2 2024.html 11/11

You might also like