Robert P. Dobrow

Probability


Скачать книгу

the random experiment using random numbers on the computer. One iteration of the experiment is called a “trial.”

      2 Determine success: Based on the outcome of the trial, determine whether or not the event occurs. If yes, call that a “success.”

      3 Replication: Repeat the aforementioned two steps many times. The proportion of successful trials is the simulated estimate of

      Monte Carlo simulation is intuitive and matches up with our sense of how probabilities “should” behave. We give a theoretical justification for the method in Chapter 10, where we study limit theorems and the law of large numbers.

       Example 1.32 Consider simulating the probability that an ideal fair coin comes up heads. One could do a physical simulation by just flipping a coin many times and taking the proportion of heads to estimate .Using a computer, choose the number of trials (the larger the better) and type the R command> sample(0:1, n, replace = T)The command samples with replacement from times such that outcomes are equally likely. Let 0 represent tails and 1 represent heads. The output is a sequence of ones and zeros corresponding to heads and tails. The average, or mean, of the list is precisely the proportion of ones. To simulate , type> mean(sample(0:1, n, replace = T))Repeat the command several times (use the up arrow key). These give repeated Monte Carlo estimates of the desired probability. Observe the accuracy in the estimate with one million trials:> mean(sample(0:1, 1000000, replace = T)) [1] 0.500376 > mean(sample(0:1, 1000000, replace = T)) [1] 0.499869 > mean(sample(0:1, 1000000, replace = T)) [1] 0.498946 > mean(sample(0:1, 1000000, replace = T)) [1] 0.500115

      The R script CoinFlip.R simulates a familiar probability—the probability of getting three heads in three coin tosses.

      R: SIMULATING THE PROBABILITY OF THREE HEADS IN THREE COIN TOSSES

      # CoinFlip.R # Trial > trial <- sample(0:1, 3, replace = TRUE) # Success > if (sum(trial) == 3) 1 else 0 # Replication > n <- 10000 # Number of repetitions > simlist <- numeric(n) # Initialize vector > for (i in 1:n) { trial <- sample(0:1, 3, replace = TRUE) success <- if (sum(trial) == 3) 1 else 0 simlist[i] <- success } > mean(simlist) # Proportion of trials with 3 heads [1] 0.1293

      To simulate three coin flips, use the sample command. Again letting 1 represent heads and 0 represent tails, the command

      > trial <- sample(0:1, 3, replace = TRUE)

      chooses a head or tails three times. The three results are stored as a three-element list (called a vector in R) in the variable trial .

      After flipping three coins, the routine must decide whether or not they are all heads. This is done by summing the outcomes. The sum will equal three if and only if all flips are heads. This is checked with the command

      > if (sum(trial) == 3) 1 else 0

      which returns a 1 for success, and 0, otherwise.

      For the actual simulation, the commands are repeated n times in a loop. The output from each trial is stored in the vector simlist . This vector will consist of n ones and zeros corresponding to success or failure for each trial, where success is flipping three heads.

      Finally, after repeating n trials, we find the proportion of successes in all the trials, which is the proportion of ones in simlist . Given a list of zeros and ones, the average, or mean, of the list is precisely the proportion of ones in the list. The command mean(simlist) finds this average giving the simulated probability of getting three heads.

      Run the script via the script file or R supplement to see that the resulting estimate is fairly close to the exact solution 1 slash 8 equals 0.125 period Increase n to 100,000 or even a million to get more precise estimates. image

      > set.seed(360)

      where any number can be used as the input. We will use a seed of 360 unless otherwise specified and will not show this command in the text (though it is shown in the supplements and scripts), but it is used before every script where random numbers are generated. Have fun and set seeds using numbers you enjoy!

      The reason that setting seeds is important is that this makes the work reproducible. Reproducibility means that others can take the code and obtain the same results, lending credibility to the work. While writing simulations, you should aim to make sure all your code is reproducible.

       Example 1.33 The script Divisible356.R simulates the divisibility problem in Example 1.30 that a random integer from is divisible by 3, 5, or 6. The problem, and the resulting code, is more complex.The function simdivis() simulates one trial. Inside the function, the expression num%%x==0 checks whether num is divisible by . The if statement checks whether num is divisible by 3, 5, or 6, returning 1 if it is, and 0, otherwise.After defining the function, typing simdivis() will simulate one trial. By repeatedly typing simdivis() on your computer, you get a feel for how this random experiment behaves over repeated trials.In this script, instead of writing a loop, we use the replicate command. This powerful R command is an alternative to writing loops for simple expressions. The syntax is replicate(n,expr) . The expression expr is repeated times creating an -element vector. Thus, the result of typing> simlist <- replicate(1000, simdivis())is a vector of 1000 ones and zeros stored in the variable simlist corresponding to success or failure in the divisibility experiment. The average mean(simlist) gives the simulated probability.Play with this script. Based on 1000 trials, you might guess that the true probability is between 0.45 and 0.49. Increase the number of trials to 10,000 and the estimates are roughly between 0.46 and 0.48. At 100,000, the estimates become even more precise between 0.465 and 0.468.We can actually quantify this increase in precision in Monte Carlo simulation as gets large. But that is a topic that will have to wait until Chapter 11.

      R: SIMULATING THE DIVISIBILITY PROBABILITY

      # Divisible356.R # simdivis() simulates one trial > simdivis <- function() { num <- sample(1:1000,1) if (num%%3==0 || num%%5==0 || num%%6==0) 1 else 0 } > simlist <- replicate(10000, simdivis()) mean(simlist) [1] 0.4707