POLS 1600

Probability:
Distributions and Limit Theorems

Updated Apr 22, 2025

WebR Status

Installing package 3 out of 3: ggpubr

Overview

Option	Choice	Votes
Jacket	Tweed	3
Jacket	Tuxedo	3
Palette	Fall (Warm + Dark)	3
Palette	Spring (Warm + Light)	3
Pant	Shorts	3
Patterns	How ’bout a fun graphic or print	6
Shoe	The most garish basketball sneaker you have	3
Tie	Bow tie	5
Top	Sports jersey	6

Probability Distributions

Common Continuous Distributions

Exponential: Counting till a specific event occurs in continuous time
Normal: Describe the outcomes that are sums of random variables (with finite means)
- The limit of a Binomial distribution as $n \to \infty$
- The maximum entropy when we only know the mean and variance
t: A finite sample approximation of the normal
$χ^{2}$ : Distribution of sums of squared variables from a Normal distribution

A Binomial random variable is sum of successes from a series of $n$ trials from a Bernoulli Distribution with probability of success $p$

$P r (X = x) = f (x) = (\binom{n}{x}) p^{x} (1 - p)^{1 - x} for x 0,1,2, \dots n$

$E [X] = n p$

$V a r [X] = n p (1 - p)$

Tip

Binomial distributions are useful for modeling the binary (yes/no) outcome like voting

Installing package 3 out of 3: ggpubr

Poisson Distribution

Poisson Distribution
Visualize Distribution

A Poisson random variable describes the probability of observing a discrete number of events in a fixed period of time given that occur with a fixed average rate of $λ$

$P r (X = x) = f (x) = \frac{λ^{x}}{x!} e^{- λ}$

$E [X] = λ$

$V a r [X] = λ$

Tip

Poisson distributions are useful for modeling counts (0,1,2,3 …) like total acts of political participation

Installing package 3 out of 3: ggpubr

Normal Distribution

A Normal distribution is a continuous random variable defined by two parameters: a location parameter $μ$ that determines the center of a distribution and a scale parameter $σ$ that determines the spread of a distribution

$f (x) = \frac{1}{\sqrt{2 π σ^{2}}} \exp [- \frac{1}{2 σ^{2}} (x - μ)^{2}]$

$E [X] = μ$

$V a r [X] = σ^{2}$

Tip

Distributions that involve summing random variables (with finite variance…) say, like the distribution of E[Y|X]) will tend towards normal distributions

Installing package 3 out of 3: ggpubr

The Law of Large Numbers (Intuitive)

Finally, suppose we took a sample of size N = 32 (e.g. the class size). Since our sample is the population, our estimate will be exactly equal to to the population.

Each sample will give us the same “true” value. That is, it will not vary at all.

The idea that as the sample size increases, the distance of a sample mean from the population mean $μ$ goes to 0 is called the Law of Large Numbers

The (Weak) Law of Large Numbers (Formally)

Let $X_{1}, X_{2}, \dots$ be independent and identically distributed (i.i.d.) random variables with mean $μ$ and variance $σ^{2}$ .

Then for every $ϵ > 0$ , as the sample size increases (1), the distance of a sample mean from the population mean $μ$ (2) goes to 0 (3).

$\overset{2. The distance of the sample mean from the truth}{\overset{⏞}{P r (| \frac{X_{1} + \dots + X_{n}}{n} - μ | > ϵ)}} \overset{3. Goes to 0}{\overset{⏞}{\to 0}} \underset{1. As the sample size increases}{\underset{⏟}{as n \to \infty}}$

Equivalently:

$lim_{n \to \infty} P r (| {\bar{X}}_{n} - μ | < ϵ) = 1$

Simulating the LLN

As we increase our sample size (roll the die more times), the LLN says the chance that our sample average is far from the truth $(p (| \frac{X_{1} + \dots + X_{n}}{n} - μ | > ϵ))$ , gets vanishingly small.

Let’s write some code to simulate this process

# Create a 6-sided die
die <- 1:6

# Create function to simulate rolling a die N times

roll_fn <- function(n) {
  rolls <- data.frame(rolls = sample(die, size = n, replace = TRUE))
  # summarize rolls 
  df <- rolls %>%
    summarise(
    # number of rolls
      n_rolls = n(),
    # number of times 1 was rolled
      ones = sum(rolls == 1),
    # number of times 2 was rolled, etc..
      twos = sum(rolls == 2),
      threes = sum(rolls == 3),
      fours = sum(rolls == 4),
      fives = sum(rolls == 5),
      sixes = sum(rolls == 6),
      # Average of all our rolls
      average =  mean(rolls),
      # Absolute difference between averages and rolls
      abs_error = abs(3.5-average)
    )
  # Return summary df
  df
}


# Holder for simulatoin

sim_df <- NULL

# Set seed
set.seed(123)

for(i in 1:1000){
  sim_df <- rbind(sim_df,
                  roll_fn(i)
  )
}

fig_lln <- sim_df %>% 
  pivot_longer(
    cols = c("average", "abs_error"),
    names_to = "Measure",
    values_to = "Estimate"
  ) %>% 
  mutate(
    Measure = ifelse(Measure == "average","Average","Absolute Error") %>% 
      factor(., levels = c("Average","Absolute Error"))
  ) %>% 
ggplot(aes(n_rolls, Estimate))+
  geom_line()+
  geom_smooth()+
  facet_wrap(~Measure,scales = "free_y")+
  theme_minimal()

The Strong Law of Large Numbers

As you may have inferred, there is a weak law of large numbers and a strong law of large numbers.

The weak law of large numbers states that as the sample size increases, the sample mean converges in probability to the population value $μ$

$lim_{n \to \infty} P r (| {\bar{X}}_{n} - μ | < ϵ) = 1$

The strong law of large numbers states that as the sample size increases, the sample mean converges almost surely to the population value $μ$

$lim_{n \to \infty} P r (| {\bar{X}}_{n} = μ |) = 1$ The differences in types of convergence won’t matter much for us in this course

Additional facts for the CLT

We can show that:

$\begin{aligned} E [S_{n}] & = n μ V a r [S_{n}] & = n σ^{2} σ_{S} & = \sqrt{n} σ \\ E [{\bar{X}}_{n}] & = μ V a r [{\bar{X}}_{n}] & = \frac{σ^{2}}{n} σ_{\bar{X}} & = \frac{σ}{\sqrt{n}} \end{aligned}$

Basically: the expected value and variance of the sum is just $n$ times the population parameters (the true values for the distribution).

Since the mean is just the sum divided by the sample size, the expected value of the mean is equal to the population value and the variance and standard deviations of the mean are decreasing in $n$ .

Finally, we can define $Z$ to in terms of either $S$ or $\bar{X}$

$Z_{n} = \frac{S_{n} - n μ}{\sqrt{n} σ} = \frac{{\bar{X}}_{n} - μ}{σ / \sqrt{n}}$

CLT: Why it matters

Why is this result so important?
Lots of our questions come of the form, how does a typical value of Y vary with X.
We may not know the true underlying distribution of Y
But the CLT says we can often approximate the distribution of a typical value of Y conditional on X $(E [Y | X])$ using a normal (or related) distributions.
Knowing these distribution, in turn allows us to conduct statistical inference

Simulating the CLT

The following code simulates the process of:

taking repeated ( $N_{s i m} = 2000$ )samples of varying sizes ( $N_{s a m p} = 10, 100, 100$ )
from two very not Normal populations (Poison( $λ = 5$ ), Weird mixture of distributions)
Calculating the means from each sample
Plotting the sampling distributions of sample means
Approximating the distribution of sample means with Normal distributions

Tip

Even if random variable’s distribution is not at all Normal, the distribution of sample means often can be reasonably approximated by Normal Distributions

# Define Population

N <- 10000
set.seed(123)
pop_df <- tibble(
  Poisson = rpois(N, 5),
  # Binomial = rbinom(size=20, n=N, prob = .25),
  type = sample(0:2,N,replace =T,prob=c(.4,.2,.4)),
  Weird = case_when(
    type == 0 ~ rbeta(N,5,2)*2,
    type == 1 ~ (rexp(N,4)-6.5)*-1,
    type == 2 ~ rnorm(N,8,2)
    
  )
  ) %>% select(Poisson, Weird)

fig_pop_dist <- pop_df %>% 
  pivot_longer(
    col = everything(),
    names_to = "Distribution"
  ) %>% 
  ggplot(aes(value,fill=Distribution,group=Distribution))+
  geom_histogram()+
  xlim(0,16)+
  facet_grid(~Distribution,scales = "free_x")+
  stat_summary(aes(x=0, y=value),fun.data =\(x) data.frame(xintercept = mean(x)), geom="vline")

sample_sizes <- c(10,100,1000)

calculate_sample_mean <- function(n,pop){
  df <- tibble(
    size = n,
    `Sample Mean` = mean(sample(pop,n,replace = F))
  )
  return(df)
}

simulate_clt_fn <- function(nsims = 100, the_pop,the_n, ...){
  sim <- 1:nsims %>% purrr::map_df(\(x)calculate_sample_mean(pop=the_pop, n=the_n))
  return(sim)
}



# binomial_clt <- sample_sizes %>% 
#   purrr::map_df( \(x)  simulate_clt_fn(nsims= 2000,the_pop = pop_df$Binomial, the_n = x)) %>% 
#   mutate(
#     id = 1:n(),
#     Distribution = "Binomial"
#   )

poisson_clt <- sample_sizes %>% 
  purrr::map_df( \(x)  simulate_clt_fn(nsims= 2000,the_pop = pop_df$Poisson, the_n = x)) %>% 
  mutate(
    id = 1:n(),
    Distribution = "Poisson"
  )

weird_clt <- sample_sizes %>% 
  purrr::map_df( \(x)  simulate_clt_fn(nsims= 2000,the_pop = pop_df$Weird, the_n = x)) %>% 
  mutate(
    id = 1:n(),
    Distribution = "Weird"
  )

sample_df <- poisson_clt %>% bind_rows(weird_clt) %>% 
  mutate(
    `Sample Size` = factor(size)
  )


fig_samp_dist <- sample_df %>% 
  ggplot(aes(`Sample Mean`,col=`Sample Size`))+
  geom_density()+
  geom_rug()+
  # theme( strip.background.y = element_blank(),
  #     strip.text.y = element_blank())+
  xlim(0,16)+
  facet_grid(`Sample Size`~Distribution,scales = "free_y")

  
fig_clt <- ggarrange(fig_pop_dist,fig_samp_dist,ncol=1)


p10_weird <- sample_df %>% 
  filter(Distribution == "Weird") %>% 
  filter(size == 10) %>% 
  ggplot(aes(`Sample Mean`))+
  geom_density(aes(col="Sample Size =10"))+
  geom_rug(aes(col="Sample Size =10"))+
  stat_function(
    fun=dnorm, args = list(mean=mean(pop_df$Weird),  sd=sd(pop_df$Weird)/sqrt(10)),
    col="black",linetype = "dashed"
    )+
  xlim(0,10)+
  theme_minimal()+
  guides(col="none")+
  labs(
    title = "Normal Approximation to Sampling Distribution",
    subtitle = "Weird Distribution, N = 10"
  )

p1000_weird <- sample_df %>% 
  filter(Distribution == "Weird") %>% 
  filter(size == 1000) %>% 
  ggplot(aes(`Sample Mean`))+
  geom_density(aes(col="Sample Size =1000"))+
  geom_rug(aes(col="Sample Size =1000"))+
  stat_function(
    fun=dnorm, args = list(mean=mean(pop_df$Weird),  sd=sd(pop_df$Weird)/sqrt(1000)),
    col="black",linetype = "dashed"
    )+
  xlim(4,6)+
  theme_minimal()+
  guides(col="none")+
  labs(
    title = "Normal Approximation to Sampling Distribution",
    subtitle = "Weird Distribution, N = 1000"
  )

p10_poisson <- sample_df %>% 
  filter(Distribution == "Poisson") %>% 
  filter(size == 10) %>% 
  ggplot(aes(`Sample Mean`))+
  geom_density(aes(col="Sample Size =10"))+
  geom_rug(aes(col="Sample Size =10"))+
  stat_function(
    fun=dnorm, args = list(mean=5,  sd=sd(pop_df$Poisson)/sqrt(10)),
    col="black",linetype = "dashed"
    )+
  xlim(0,10)+
  theme_minimal()+
  guides(col="none")+
  labs(
    title = "Normal Approximation to Sampling Distribution",
    subtitle = "Poisson(Lambda = 5), N = 10"
  )

p1000_poisson <- sample_df %>% 
  filter(Distribution == "Poisson") %>% 
  filter(size == 1000) %>% 
  ggplot(aes(`Sample Mean`))+
  geom_density(aes(col="Sample Size =1000"))+
  geom_rug(aes(col="Sample Size =1000"))+
  stat_function(
    fun=dnorm, args = list(mean=5,  sd=sd(pop_df$Poisson)/sqrt(1000)),
    col="black",linetype = "dashed"
    )+
  xlim(4,6)+
  theme_minimal()+
  guides(col="none")+
  labs(
    title = "Normal Approximation to Sampling Distribution",
    subtitle = "Poisson(Lambda = 5), N = 1000"
  )

fig_clt_approx <- ggarrange(p10_weird, p1000_weird, p10_poisson,p1000_poisson)

Robust Standard Errors

Robust standard errors attempt to estimat $σ_{i}^{2}$ using the residuals from the model $\hat{ϵ_{i}}$ and additional adjustments to yield robust standard errors that are consistent, even when there is heteroskedasiticity.

$V a r [\hat{β}] = (X^{'} X)^{- 1} X^{'} [\begin{matrix} {\hat{ϵ}}_{1}^{2} & 0 & 0 & \dots & 0 \\ 0 & {\hat{ϵ}}_{2}^{2} & 0 & \dots & 0 \\ 0 & 0 & {\hat{ϵ}}_{3}^{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & {\hat{ϵ}}_{n}^{2} \end{matrix}] X (X^{'} X)^{- 1} =$

Clustered standard errors go a step further, summing up the residuals within clusters (groups) in the data.

POLS 1600 Probability: Distributions and Limit Theorems Updated Apr 22, 2025

POLS 1600
Overview
Class Plan
Goals
Annoucements: Assignment 2
Setup: Packages for today
Feedback
What did we like
What did we dislike
Advice for Others
Help me with my fit
Our Fashion Advice
What would Derek Guy say?
Probability Distributions
Probability
Probability
Random Variables
Probability Distributions
Probability Distributions
Common Probability Distributions
Common Discrete Distributions
Common Continuous Distributions
Bernoulli Distribution...
Binomial Distribution...
Poisson Distribution...
Normal Distribution...
The Law of Large Numbers
The Law of Large Numbers (Intuitive)
The Law of Large Numbers (Intuitive)
The Law of Large Numbers (Intuitive)
The (Weak) Law of Large Numbers (Formally)
Simulating the LLN
Simulating the LLN
Simulating the LLN
Proving the Weak LLN
Proving the Weak LLN
Proving the Weak LLN
The Strong Law of Large Numbers
The Central Limit Theorem
The Central Limit Theorem
Z-scores and Standardization
Notation for the CLT
Additional facts for the CLT
Central Limit Theorem
CLT: Why it matters
Simulating the CLT...
Summary
Lab 8 and Standard Errors
Lab 8
Errors and Residuals
Variance of Regression Coefficients depends on the errors
Variance of Regression Coefficients depends on the errors
Constant Error Variance
Constant Error Variance
Non-Constant Error Variance
Conseqeunces of Non-Constant Error Variance
Constant Error Variance
Robust Standard Errors
Replicating Grumbach and Hill
Summary
References