Stat405                Simulation


                              Hadley Wickham
Thursday, 23 September 2010
1. Homework comments
               2. Mathematical approach
               3. More randomness
               4. Random number generators




Thursday, 23 September 2010
Homework
                   Just graded your organisation and code, and
                   focused my comments there.
                   Biggest overall tip: use floating figures (with figure
                   {...}) with captions. Use ref{} to refer to the figure in
                   the text.
                   Captions should start with brief description of plot
                   (including bin width if applicable) and finish with
                   brief description of what the plot reveals.
                   Will grade captions more aggressively in the future.


Thursday, 23 September 2010
Code
                   Gives explicit technical details.
                   Your comments should remind you why
                   you did what you did.
                   Most readers will not look at it, but it’s
                   very important to include it, because it
                   means that others can check your work.



Thursday, 23 September 2010
Mathematical
                               approach

                   Why are we doing this simulation? Could
                   work out the expected value and variance
                   mathematically. So let’s do it!
                   Simplifying assumption: slots are iid.




Thursday, 23 September 2010
calculate_prize <- function(windows) {
       payoffs <- c("DD" = 800, "7" = 80, "BBB" = 40,
         "BB" = 25, "B" = 10, "C" = 10, "0" = 0)

          same <- length(unique(windows)) == 1
          allbars <- all(windows %in% c("B", "BB", "BBB"))

          if (same) {
            prize <- payoffs[windows[1]]
          } else if (allbars) {
            prize <- 5
          } else {
            cherries <- sum(windows == "C")
            diamonds <- sum(windows == "DD")

              prize <- c(0, 2, 5)[cherries + 1] *
                c(1, 2, 4)[diamonds + 1]
          }
          prize
     }

Thursday, 23 September 2010
slots <- read.csv("slots.csv", stringsAsFactors = F)

     # Calculate empirical distribution
     dist <- table(c(slots$w1, slots$w2, slots$w3))
     dist <- dist / sum(dist)

     slots <- names(dist)




Thursday, 23 September 2010
poss <- expand.grid(
       w1 = slots, w2 = slots, w3 = slots,
       stringsAsFactors = FALSE
     )

     poss$prize <- NA
     for(i in seq_len(nrow(poss))) {
       window <- as.character(poss[i, 1:3])
       poss$prize[i] <- calculate_prize(window)
     }




Thursday, 23 September 2010
Your turn
                   How can you calculate the probability of each
                   combination?
                   (Hint: think about subsetting. Another hint:
                   think about the table and character
                   subsetting. Final hint: you can do this in one
                   line of code)
                   Then work out the expected value (the payoff).



Thursday, 23 September 2010
poss$prob <- with(poss,
       dist[w1] * dist[w2] * dist[w3])

     (poss_mean <- with(poss, sum(prob * prize)))

     # How do we determine the variance of this
     # estimator?




Thursday, 23 September 2010
More
                randomness

Thursday, 23 September 2010
Sample

                   Very useful for selecting from a discrete
                   set (vector) of possibilities.
                   Four arguments: x, size, replace, prob




Thursday, 23 September 2010
How can you?
                   Choose 1 from vector
                   Choose n from vector, with replacement
                   Choose n from vector, without replacement
                   Perform a weighted sample
                   Put a vector in random order
                   Put a data frame in random order


Thursday, 23 September 2010
# Choose 1 from vector
     sample(letters, 1)

     # Choose n from vector, without replacement
     sample(letters, 10)
     sample(letters, 40)

     # Choose n from vector, with replacement
     sample(letters, 40, replace = T)

     # Perform a weighted sample
     sample(names(dist), prob = dist)


Thursday, 23 September 2010
# Put a vector in random order
     sample(letters)

     # Put a data frame in random order
     slots[sample(1:nrow(slots)), ]




Thursday, 23 September 2010
Your turn
                   Source of randomness in random_prize is
                   sample. Other options are:
                   runif, rbinom, rnbinom, rpois, rnorm,
                   rt, rcauchy
                   What sort of random variables do they
                   generate and what are their parameters?
                   Practice generating numbers from them.


Thursday, 23 September 2010
Function              Distribution       Parameters
                 runif            Uniform            min, max
               rbinom             Binomial         size, prob
             rnbinom          Negative binomial    size, prob
                 rpois            Poisson             lambda
                 rnorm             Normal            mean, sd
                      rt              t                 df
             rcauchy              Cauchy          location, scale

Thursday, 23 September 2010
Distributions
                   Other functions
                    •         r to generate random numbers
                    •         d to compute density f(x)
                    •         p to compute distribution F(x)
                    •         q to compute inverse distribution F-1(x)



Thursday, 23 September 2010
# Easy to combine random variables

     n <- rpois(10000, lambda = 10)
     x <- rbinom(10000, size = n, prob = 0.3)
     qplot(x, binwidth = 1)

     p <- runif(10000)
     x <- rbinom(10000, size = 10, prob = p)
     qplot(x, binwidth = 0.1)

     # cf.
     qplot(runif(10000), binwidth = 0.1)


Thursday, 23 September 2010
# Simulation is a powerful tool for exploring
     # distributions. Easy to do computationally; hard
     # to do analytically

     qplot(1 / rpois(10000, lambda = 20))
     qplot(1 / runif(10000, min = 0.5, max = 2))

     qplot(rnorm(10000) ^ 2)
     qplot(rnorm(10000) / rnorm(10000))

     # http://www.johndcook.com/distribution_chart.html



Thursday, 23 September 2010
Your turn




Thursday, 23 September 2010
RNG
                              Computers are deterministic, so how
                                do they produce randomness?




Thursday, 23 September 2010
Thursday, 23 September 2010
How do computers
                generate random numbers?

                   They don’t! Actually produce pseudo-
                   random sequences.
                   Common approach: Xn+1 = (aXn + c) mod m
                   (http://en.wikipedia.org/wiki/
                   Linear_congruential_generator)




Thursday, 23 September 2010
next_val <- function(x, a, c, m) {
       (a * x + c) %% m
     }

     x <- 1001
     (x <- next_val(x, 1664525, 1013904223, 2^32))

     # http://en.wikipedia.org/wiki/
     List_of_pseudorandom_number_generators

     # R uses
     # http://en.wikipedia.org/wiki/Mersenne_twister


Thursday, 23 September 2010
# Random numbers are reproducible!

     set.seed(1)
     runif(10)

     set.seed(1)
     runif(10)

     # Very useful when required to make a reproducible
     # example that involves randomness




Thursday, 23 September 2010
True randomness
                   Atmospheric radio noise: http://
                   www.random.org. Use from R with
                   random package.
                   Not really important unless you’re running
                   a lottery. (Otherwise by observing a long
                   enough sequence you can predict the
                   next value)


Thursday, 23 September 2010

10 simulation

  • 1.
    Stat405 Simulation Hadley Wickham Thursday, 23 September 2010
  • 2.
    1. Homework comments 2. Mathematical approach 3. More randomness 4. Random number generators Thursday, 23 September 2010
  • 3.
    Homework Just graded your organisation and code, and focused my comments there. Biggest overall tip: use floating figures (with figure {...}) with captions. Use ref{} to refer to the figure in the text. Captions should start with brief description of plot (including bin width if applicable) and finish with brief description of what the plot reveals. Will grade captions more aggressively in the future. Thursday, 23 September 2010
  • 4.
    Code Gives explicit technical details. Your comments should remind you why you did what you did. Most readers will not look at it, but it’s very important to include it, because it means that others can check your work. Thursday, 23 September 2010
  • 5.
    Mathematical approach Why are we doing this simulation? Could work out the expected value and variance mathematically. So let’s do it! Simplifying assumption: slots are iid. Thursday, 23 September 2010
  • 6.
    calculate_prize <- function(windows){ payoffs <- c("DD" = 800, "7" = 80, "BBB" = 40, "BB" = 25, "B" = 10, "C" = 10, "0" = 0) same <- length(unique(windows)) == 1 allbars <- all(windows %in% c("B", "BB", "BBB")) if (same) { prize <- payoffs[windows[1]] } else if (allbars) { prize <- 5 } else { cherries <- sum(windows == "C") diamonds <- sum(windows == "DD") prize <- c(0, 2, 5)[cherries + 1] * c(1, 2, 4)[diamonds + 1] } prize } Thursday, 23 September 2010
  • 7.
    slots <- read.csv("slots.csv",stringsAsFactors = F) # Calculate empirical distribution dist <- table(c(slots$w1, slots$w2, slots$w3)) dist <- dist / sum(dist) slots <- names(dist) Thursday, 23 September 2010
  • 8.
    poss <- expand.grid( w1 = slots, w2 = slots, w3 = slots, stringsAsFactors = FALSE ) poss$prize <- NA for(i in seq_len(nrow(poss))) { window <- as.character(poss[i, 1:3]) poss$prize[i] <- calculate_prize(window) } Thursday, 23 September 2010
  • 9.
    Your turn How can you calculate the probability of each combination? (Hint: think about subsetting. Another hint: think about the table and character subsetting. Final hint: you can do this in one line of code) Then work out the expected value (the payoff). Thursday, 23 September 2010
  • 10.
    poss$prob <- with(poss, dist[w1] * dist[w2] * dist[w3]) (poss_mean <- with(poss, sum(prob * prize))) # How do we determine the variance of this # estimator? Thursday, 23 September 2010
  • 11.
    More randomness Thursday, 23 September 2010
  • 12.
    Sample Very useful for selecting from a discrete set (vector) of possibilities. Four arguments: x, size, replace, prob Thursday, 23 September 2010
  • 13.
    How can you? Choose 1 from vector Choose n from vector, with replacement Choose n from vector, without replacement Perform a weighted sample Put a vector in random order Put a data frame in random order Thursday, 23 September 2010
  • 14.
    # Choose 1from vector sample(letters, 1) # Choose n from vector, without replacement sample(letters, 10) sample(letters, 40) # Choose n from vector, with replacement sample(letters, 40, replace = T) # Perform a weighted sample sample(names(dist), prob = dist) Thursday, 23 September 2010
  • 15.
    # Put avector in random order sample(letters) # Put a data frame in random order slots[sample(1:nrow(slots)), ] Thursday, 23 September 2010
  • 16.
    Your turn Source of randomness in random_prize is sample. Other options are: runif, rbinom, rnbinom, rpois, rnorm, rt, rcauchy What sort of random variables do they generate and what are their parameters? Practice generating numbers from them. Thursday, 23 September 2010
  • 17.
    Function Distribution Parameters runif Uniform min, max rbinom Binomial size, prob rnbinom Negative binomial size, prob rpois Poisson lambda rnorm Normal mean, sd rt t df rcauchy Cauchy location, scale Thursday, 23 September 2010
  • 18.
    Distributions Other functions • r to generate random numbers • d to compute density f(x) • p to compute distribution F(x) • q to compute inverse distribution F-1(x) Thursday, 23 September 2010
  • 19.
    # Easy tocombine random variables n <- rpois(10000, lambda = 10) x <- rbinom(10000, size = n, prob = 0.3) qplot(x, binwidth = 1) p <- runif(10000) x <- rbinom(10000, size = 10, prob = p) qplot(x, binwidth = 0.1) # cf. qplot(runif(10000), binwidth = 0.1) Thursday, 23 September 2010
  • 20.
    # Simulation isa powerful tool for exploring # distributions. Easy to do computationally; hard # to do analytically qplot(1 / rpois(10000, lambda = 20)) qplot(1 / runif(10000, min = 0.5, max = 2)) qplot(rnorm(10000) ^ 2) qplot(rnorm(10000) / rnorm(10000)) # http://www.johndcook.com/distribution_chart.html Thursday, 23 September 2010
  • 21.
    Your turn Thursday, 23September 2010
  • 22.
    RNG Computers are deterministic, so how do they produce randomness? Thursday, 23 September 2010
  • 23.
  • 24.
    How do computers generate random numbers? They don’t! Actually produce pseudo- random sequences. Common approach: Xn+1 = (aXn + c) mod m (http://en.wikipedia.org/wiki/ Linear_congruential_generator) Thursday, 23 September 2010
  • 25.
    next_val <- function(x,a, c, m) { (a * x + c) %% m } x <- 1001 (x <- next_val(x, 1664525, 1013904223, 2^32)) # http://en.wikipedia.org/wiki/ List_of_pseudorandom_number_generators # R uses # http://en.wikipedia.org/wiki/Mersenne_twister Thursday, 23 September 2010
  • 26.
    # Random numbersare reproducible! set.seed(1) runif(10) set.seed(1) runif(10) # Very useful when required to make a reproducible # example that involves randomness Thursday, 23 September 2010
  • 27.
    True randomness Atmospheric radio noise: http:// www.random.org. Use from R with random package. Not really important unless you’re running a lottery. (Otherwise by observing a long enough sequence you can predict the next value) Thursday, 23 September 2010