|
| 1 | +--- |
| 2 | +title : Homework 2 for Stat Inference |
| 3 | +subtitle : Extra problems for Stat Inference |
| 4 | +author : Brian Caffo |
| 5 | +job : Johns Hopkins Bloomberg School of Public Health |
| 6 | +framework : io2012 |
| 7 | +highlighter : highlight.js |
| 8 | +hitheme : tomorrow |
| 9 | +#url: |
| 10 | +# lib: ../../librariesNew #Remove new if using old slidify |
| 11 | +# assets: ../../assets |
| 12 | +widgets : [mathjax, quiz, bootstrap] |
| 13 | +mode : selfcontained # {standalone, draft} |
| 14 | +--- |
| 15 | +```{r setup, cache = F, echo = F, message = F, warning = F, tidy = F, results='hide'} |
| 16 | +# make this an external chunk that can be included in any file |
| 17 | +library(knitr) |
| 18 | +options(width = 100) |
| 19 | +opts_chunk$set(message = F, error = F, warning = F, comment = NA, fig.align = 'center', dpi = 100, tidy = F, cache.path = '.cache/', fig.path = 'fig/') |
| 20 | +
|
| 21 | +options(xtable.type = 'html') |
| 22 | +knit_hooks$set(inline = function(x) { |
| 23 | + if(is.numeric(x)) { |
| 24 | + round(x, getOption('digits')) |
| 25 | + } else { |
| 26 | + paste(as.character(x), collapse = ', ') |
| 27 | + } |
| 28 | +}) |
| 29 | +knit_hooks$set(plot = knitr:::hook_plot_html) |
| 30 | +runif(1) |
| 31 | +``` |
| 32 | + |
| 33 | +## About these slides |
| 34 | +- These are some practice problems for Statistical Inference Quiz 1 |
| 35 | +- They were created using slidify interactive which you will learn in |
| 36 | +Creating Data Products |
| 37 | +- Please help improve this with pull requests here |
| 38 | +(https://github.com/bcaffo/courses) |
| 39 | +runif(1) |
| 40 | + |
| 41 | +--- &radio |
| 42 | +The probability that a manuscript gets accepted to a journal is 12% (say). However, |
| 43 | +given that a revision is asked for, the probability that it gets accepted |
| 44 | +is 90%. Is it possible that the probability that a manuscript has a revision |
| 45 | +asked for is 20%? |
| 46 | + |
| 47 | +1. Yeah, that's totally possible. |
| 48 | +2. _No, it's not possible._ |
| 49 | +3. It's not possible to answer this question. |
| 50 | + |
| 51 | +*** .hint |
| 52 | +$A = accepted$, $B = revision$. $P(A) = .12$, $P(A | B) = .90$. $P(B) = .20$ |
| 53 | + |
| 54 | +*** .explanation |
| 55 | +$P(A \cap B) = P(A | B) * P(B) = .9 \times .2 = .18$ this is larger than |
| 56 | +$P(A) = .12$, which is not possible since $A \cap B \subset A$. |
| 57 | + |
| 58 | + |
| 59 | +--- &radio |
| 60 | +Suppose that the number of web hits to a particular site are approximately normally |
| 61 | +distributed with a mean of 100 hits per day and a standard deviation of 10 hits per day. What's the probability that a given day has fewer than 93 hits per day |
| 62 | +expressed as a percentage to the nearest percentage point? |
| 63 | + |
| 64 | +1. 76% |
| 65 | +2. _24%_ |
| 66 | +3. 47% |
| 67 | +4. 94% |
| 68 | + |
| 69 | +*** .hint |
| 70 | +Let $X$ be the number of hits per day. We want $P(X \leq 93)$ given that |
| 71 | +$X$ is $N(100, 10^2)$. |
| 72 | + |
| 73 | +*** .explanation |
| 74 | +```{r} |
| 75 | +round(pnorm(93, mean = 100, sd = 10) * 100) |
| 76 | +``` |
| 77 | + |
| 78 | + |
| 79 | +--- &radio |
| 80 | +Suppose 5% of housing projects have issues with asbestos. The sensitivity of a test |
| 81 | +for asbestos is 93% and the specificity is 88%. What is the probability that a |
| 82 | +housing project has no asbestos given a negative test expressed as a percentage |
| 83 | +to the nearest percentage point? |
| 84 | + |
| 85 | +1. 0% |
| 86 | +2. 5% |
| 87 | +3. 10% |
| 88 | +4. 20% |
| 89 | +5. 50% |
| 90 | +6. _100%_ |
| 91 | + |
| 92 | +*** .hint |
| 93 | +$A = asbestos$, $T_+ = tests positive$, $T_- = tests negative$. |
| 94 | +$P(T_+ | A) = .93$, $P(T_- | A^c) = .88$, $P(A) = .05$. |
| 95 | + |
| 96 | +*** .explanation |
| 97 | +We want |
| 98 | +$$ |
| 99 | +P(A^c | T_-) = \frac{P(T_- | A^c) P(A^c)}{P(T_- | A^c) P(A^c) + P(T_- | A) P(A)} |
| 100 | +$$ |
| 101 | +```{r} |
| 102 | +(.88 * .95) / (.88 * .95 + .07 * .05) |
| 103 | +``` |
| 104 | + |
| 105 | + |
| 106 | + |
| 107 | +--- &multitext |
| 108 | +Suppose that the number of web hits to a particular site are approximately normally |
| 109 | +distributed with a mean of 100 hits per day and a standard deviation of 10 hits per day. |
| 110 | + |
| 111 | +1. What number of web hits per day represents the number so that only |
| 112 | +5% of days have more hits? Express your answer to 3 decimal places. |
| 113 | + |
| 114 | + |
| 115 | + |
| 116 | +*** .hint |
| 117 | +Let $X$ be the number of hits per day. We want $P(X \leq 93)$ given that |
| 118 | +$X$ is $N(100, 10^2)$. |
| 119 | + |
| 120 | +*** .explanation |
| 121 | +<span class="answer">`r round(qnorm(.95, mean = 100, sd = 10), 3)`</span> |
| 122 | +```{r} |
| 123 | +round(qnorm(.95, mean = 100, sd = 10), 3) |
| 124 | +round(qnorm(.05, mean = 100, sd = 10, lower.tail = FALSE), 3) |
| 125 | +``` |
| 126 | + |
| 127 | + |
| 128 | +--- &multitext |
| 129 | +Suppose that the number of web hits to a particular site are approximately normally |
| 130 | +distributed with a mean of 100 hits per day and a standard deviation of 10 hits per day. Imagine taking a random sample of 50 days. |
| 131 | + |
| 132 | +1. What number of web hits would |
| 133 | +be the point so that only 5% of averages of 50 days of web traffic have more hits? |
| 134 | +Express your answer to 3 decimal places. |
| 135 | + |
| 136 | +*** .hint |
| 137 | +Let $\bar X$ be the average number of hits per day for 50 randomly sampled days. |
| 138 | +$X$ is $N(100, 10^2 / 50)$. |
| 139 | + |
| 140 | +*** .explanation |
| 141 | +<span class="answer">`r round(qnorm(.95, mean = 100, sd = 10 / sqrt(50) ), 3)`</span> |
| 142 | + |
| 143 | +```{r} |
| 144 | +round(qnorm(.95, mean = 100, sd = 10 / sqrt(50) ), 3) |
| 145 | +round(qnorm(.05, mean = 100, sd = 10 / sqrt(50), lower.tail = FALSE), 3) |
| 146 | +``` |
| 147 | + |
| 148 | +--- &multitext |
| 149 | + |
| 150 | +You don't believe that your friend can discern good wine from cheap. Assuming |
| 151 | +that you're right, in a blind test where you randomize 6 paired varieties (Merlot, |
| 152 | +Chianti, ...) of cheap and expensive wines |
| 153 | + |
| 154 | +1. what is the change that she gets 5 or 6 right expressed as a percentage |
| 155 | +to one decimal place? |
| 156 | + |
| 157 | +*** .hint |
| 158 | +Let $p=.5$ and $X$ be binomial |
| 159 | + |
| 160 | +*** .explanation |
| 161 | + |
| 162 | +<span class="answer">`r round(pbinom(4, prob = .5, size = 6, lower.tail = TRUE) * 100, 1)`</span> |
| 163 | + |
| 164 | +```{r} |
| 165 | +round(pbinom(4, prob = .5, size = 6, lower.tail = TRUE) * 100, 1) |
| 166 | +``` |
| 167 | + |
| 168 | +--- &multitext |
| 169 | + |
| 170 | +Consider a uniform distribution. If we were to sample 100 draws from a |
| 171 | +a uniform distribution (which has mean 0.5, and variance 1/12) and take their |
| 172 | +mean, $\bar X$ |
| 173 | + |
| 174 | +1. what is the approximate probability of getting as large as 0.51 or larger expressed to 3 decimal places? |
| 175 | + |
| 176 | +*** .hint |
| 177 | +Use the central limit theorem that says $\bar X \sim N(\mu, \sigma^2/n)$ |
| 178 | + |
| 179 | +*** .explanation |
| 180 | + |
| 181 | +<span class="answer"> `r round(pnorm(.51, mean = 0.5, sd = sqrt(1 / 12 / 100), lower.tail = FALSE), 3)`</span> |
| 182 | + |
| 183 | +```{r} |
| 184 | +round(pnorm(.51, mean = 0.5, sd = sqrt(1 / 12 / 100), lower.tail = FALSE), 3) |
| 185 | +``` |
| 186 | + |
| 187 | + |
| 188 | +--- &multitext |
| 189 | + |
| 190 | +If you roll ten standard dice, take their average, then repeat this process over and over and construct a histogram, |
| 191 | + |
| 192 | +1. what would it be centered at? |
| 193 | + |
| 194 | + |
| 195 | +*** .hint |
| 196 | +$E[X_i] = E[\bar X]$ where $\bar X = \frac{1}{n}\sum_{i=1}^n X_i$ |
| 197 | + |
| 198 | +*** .explanation |
| 199 | + |
| 200 | + |
| 201 | +The answer will be <span class="answer">3.5</span> since the mean of the |
| 202 | +sampling distribution of iid draws will be the population mean that the |
| 203 | +individual draws were taken from. |
| 204 | + |
| 205 | +--- &multitext |
| 206 | + |
| 207 | +If you roll ten standard dice, take their average, then repeat this process over and over and construct a histogram, |
| 208 | + |
| 209 | +1. what would be its variance expressed to 3 decimal places? |
| 210 | + |
| 211 | +*** .hint |
| 212 | +$$Var(\bar X) = \sigma^2 /n$$ |
| 213 | + |
| 214 | +*** .explanation |
| 215 | +The answer will be <span class="answer">`r round( mean(1 : 6 - 3.5) ^2 / 100, 3)`</span> |
| 216 | +since the variance of the sampling distribution of the mean is $\sigma^2/12$ |
| 217 | +and the variance of a die roll is |
| 218 | + |
| 219 | +```{r} |
| 220 | +mean((1 : 6 - 3.5)^2) |
| 221 | +``` |
| 222 | + |
| 223 | +--- &multitext |
| 224 | +The number of web hits to a site is Poisson with mean 16.5 per day. |
| 225 | + |
| 226 | +1. What is the probability of getting 20 or fewer in 2 days expressed |
| 227 | +as a percentage to one decimal place? |
| 228 | + |
| 229 | +*** .hint |
| 230 | +Let $X$ be the number of hits in 2 days then $X \sim Poisson(2\lambda)$ |
| 231 | + |
| 232 | +*** .explanation |
| 233 | +<span class="answer">`r round(ppois(20, lambda = 16.5 * 2) * 100, 1)`</span> |
| 234 | + |
| 235 | +```{r} |
| 236 | +round(ppois(20, lambda = 16.5 * 2) * 100, 1) |
| 237 | +``` |
| 238 | + |
| 239 | + |
| 240 | + |
0 commit comments