Monte Carlo Simulation

Statistical Programming with R

Packages and functions that we use

library(dplyr)    # Data manipulation
library(magrittr) # Pipes
library(stringi)  # For counting substrings
library(ggplot2)  # Plotting suite

What is it?

With Monte Carlo methods we can estimate the value of any unknown quantity by using statistical theory (what we would expect based on chance)

It builds upon the principles of inferential statistics and needs:

A large set of numbers (e.g. an infinite population) or a theoretical distribution
A random sample from that large set

It works by repetitively sampling from a population while varying the input, but keeping the method consistent.

Why do it?

evaluate how a statistical method performs in different situations
calculate the power of a particular test based on some (violation of) assumptions
estimate the probability of a complex event by simulating that event
learn about the sampling distribution and bootstrapping
“feel like a god at your own computer”

Some probability theory

An example

John travels to and from work by train every day:
- In the past 10 years his train has been delayed 12 times
- \(P(\text{delay}) = \frac{12}{2\times3650} \approx .0016\)
- John feels very confident about trains and expects them to run on time
Bill travels to and from work by train every week:
- In the past year his train has been delayed 50 times
- \(P(\text{delay}) = \frac{50}{2\times52} \approx .481\)
- Bill feels very confident about trains but realizes that they often do not run on time
Claire travels by train very occasionally
- Out of the last 3 trips, two trips were delayed
- \(P(\text{delay}) = \frac{2}{3}\)
- Claire does not feel confident about trains and does not expect them to run on time

Confidence

Is John’s confidence misplaced?
Is Bill’s expectation misplaced?
Is Claire’s lack of confidence misplaced?

Law of large numbers

Bernouilli’s law (1713):

In repeated independent experiments

with the same true probability \(p\) of a particular outcome in each experiment
when repeated over a large number of times
the average over the results for all these repetitions
will converge to the true probability \(p\)

So if we replicate the same procedure an infinite number of times, the difference between our estimate and the true value would be zero.

The experiments must be independent, i.e., the probability of the event is the same for every trial.

Gambler’s fallacy & independence

The tendency to believe that previous throws alter the probability for a future event is referred to as gambler’s fallacy:

for independent trials on a fair die, the probability is one in six: even if the previous 100 throws have been sixes.

If a fair roulette game landed on black 50 times in a row, the probability of landing on red is not bigger for the next throw: it remains \(\frac{18}{37}\).

Regression to the mean

Following an extreme random event, the next random event is likely to be less extreme. In the long run the probability event will move to the mean.

RttM vs LLN vs gambler’s fallacy

RTTM: When one run of throwing a die 100 times yields 100 sixes (a rare event) –> the next run will likely return less than 100.
LLN: In the long term the 100 sixes event will average out and the probability of rolling a six will move to \(P = \frac{1}{6}\).
GF: The coin is due to a run of 500 not-sixes to even out the 100 sixes

Law of large numbers

set.seed(123)
x <- sample(1:6, 10, prob = rep(1/6, 6), replace = TRUE)
prop.table(table(x))

## x
##   1   2   3   4   5   6 
## 0.3 0.1 0.1 0.2 0.2 0.1

x <- sample(1:6, 100, prob = rep(1/6, 6), replace = TRUE)
prop.table(table(x))

## x
##    1    2    3    4    5    6 
## 0.15 0.17 0.16 0.21 0.14 0.17

More proof

x <- sample(1:6, 10000, prob = rep(1/6, 6), replace = TRUE)
prop.table(table(x))

## x
##      1      2      3      4      5      6 
## 0.1616 0.1683 0.1663 0.1713 0.1663 0.1662

x <- sample(1:6, 1000000, prob = rep(1/6, 6), replace = TRUE)
prop.table(table(x))

## x
##        1        2        3        4        5        6 
## 0.166610 0.167284 0.166763 0.166686 0.166458 0.166199

Estimating the probability of an event

What is the probability of getting 123 in a row?

charx <- paste(x, collapse = "")

occurrences <- stri_count_fixed(charx, "123") / 1e6
trueprob <- (1/6)^3

cat(occurrences,"\n", trueprob, sep = "")

## 0.004635
## 0.00462963