POLS 1600

Probably too much Probability

Updated Apr 22, 2025

Overview

Probability

Frequentist interpretations of probability

Probability describes how likely it is that some event happens.
- Flip a fair coin, the probability of heads is Pr(Heads) = 0.5
Frequentist: view this probability as the limit of the relative frequency of an event over repeated trials.

$Pr(E) = \lim_{n \to \infty} \frac{n_{E}}{n} \approx \frac{ \text{# of Times E happened}}{\text{Total # of Trials}}$

Thinking about probability as a relative frequency, requires us to know how to count the number of times an event occurred (see also)

Frequentist interpretations of probability

Probabilities from a Frequentist perspective are defined by fixed and unknown parameters
The goal of statistics for a frequentist is to learn about these parameters from data.
Frequentist statistics often ask questions like “What is the probability of observing some data $Y$ , given a hypothesis about the true value of parameter(s), $θ$ , that generated it.

Frequentist interpretations of probability

For example, suppose we wanted to test whether a coin is “fair” $(p = P r (H e a d s) = .5; q = P r (T a i l s) = 1 - p = .5) .$ We could:

Flip a fair coin 10 times. Our estimate of the $P r (H)$ is the number of heads divided by 10. It could be 0.5, but also 0 or 1, or some number in between.
Flip a coin 100 times and our estimate will be closer to the true $p a r a m t e r$ .
Flip a coin an $\infty$ amount of times and the relative frequency will converge to the true parameter $(P r (H) = lim_{n \to \infty} \frac{n_{H}}{n} = p = 0.5 for a fair coin)$

Test	Have Covid	Don't Have Covid
Positive	True Positive	False Positive
Negative	False Negative	True Negative

Test	Have Covid	Don't Have Covid
Positive	950	4950
Negative	50	94050

Continuous distributions

Probability Density Functions (PDF): $f (x)$
- Assigns probabilities to events in the sample space such that Kolmogorov Axioms still apply
- But… since their are an infinite number of values a continuous variable could take, p(X=x)=0, that is, the probability that X takes any one specific value is 0.
Cumulative Distribution Function (CDF) $F (x) = p (X \leq x) = \int_{- \infty}^{x} f (x) d x$
- Instead of summing up to a specific value (discrete) we integrate over all possible values up to $x$
- Probability of having a value less than x

How many times would you have to roll a fair die to get all six sides?

We can think of this as the sum of the expected values for a series of geometric distributions with varying probabilities of success, $p$ . The expected value of a geometric variable is:

$\begin{aligned} E (X) & = \sum_{k = 1}^{\infty} k p (1 - p)^{k - 1} \\ = p \sum_{k = 1}^{\infty} k (1 - p)^{k - 1} \\ = p (- \frac{d}{d p} \sum_{k = 1}^{\infty} (1 - p)^{k}) (Chain rule) \\ = p (- \frac{d}{d p} \frac{1 - p}{p}) (Geometric Series) \\ = p (\frac{d}{d p} (1 - \frac{1}{p})) = p (\frac{1}{p^{2}}) = \frac{1}{p} \end{aligned}$

Covariance and correlation

Covariance measures the degree to which two random variables vary together.

$C o v [X, Y] \to +$ An increase in $X$ tends to be larger than its mean when $Y$ is larger than its mean

$C o v [X, Y] = E [(X - E [X]) (Y - E [Y])] = E [X Y] - E [X] E [Y]$

The correlation between $X$ and $Y$ is simply the covariance of $X$ and $Y$ divided by the standard deviation of each.

$ρ = \frac{C o v [X, Y]}{σ_{X} σ_{Y}}$

Normalized covariance to a scale that runs between $[- 1, 1]$

Standard errors of regression coefficients

One can show under a set of assumptions that variance-covariance matrix of $\hat{β}$ is

$\begin{aligned} E [(\hat{β} - β) (\hat{β} - β)^{'}] & = σ^{2} (X^{'} X)^{- 1} \\ = [\begin{array}{c} V a r ({\hat{β}}_{0}) & C o v (\hat{β_{0}}, \hat{β_{1}}) & \dots & C o v (\hat{β_{0}}, \hat{β_{k}})) \\ C o v ({\hat{β}}_{1}, \hat{β_{0}}) & V a r (\hat{β_{1}},) & \dots & C o v (\hat{β_{1}}, \hat{β_{k}})) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ C o v ({\hat{β}}_{k}, \hat{β_{0}}) & C o v (\hat{β_{k}}, \hat{β_{1}}) & \dots & V a r (\hat{β_{k}}) \end{array}] \end{aligned}$ Where we can estimate $σ^{2}$ with ${\hat{σ}}^{2}$ the mean $(1 / n - k)$ squared error $(ϵ^{'} ϵ)$ of the regression

${\hat{σ}}^{2} = \frac{ϵ^{'} ϵ}{n - k}$

Robust Standard Errors

$σ^{2} (X^{'} X)^{- 1}$ is a good estimate of the variance of $\hat{β}$ if the errors in a regression are independent and identically distributed (iid).

These turn out to be strong assumptions that are violated when there is:

Non-constant error variance (aka heteroskedasticity)
- The variance among the treated units tends to be higher than the variance among control units
Autocorrelation
- We observe the same unit over multiple periods (Say RI in 2016, 2018, 2020)
Clustering
- Respondents in RI are more similar to each other than respondents in MA

POLS 1600 Probably too much Probability Updated Apr 22, 2025

POLS 1600
Overview
Class Plan
Assignments
Annoucements: Assignment 2
Final Project
Assignments 3 and 4
Setup: Packages for today
Course Survey
Probability
Probability
Experiments, sample spaces, sets, and events
Experiments, sample spaces, sets, and events
Subsets
Unions, Intersections, and Complements
Source...
Three Rules of Probability
The Addition Rule
Source...
The Law of Total Probability (Part 1)
Source...
Two interpretations of probablity
Frequentist interpretations of probability
Frequentist interpretations of probability
Frequentist interpretations of probability
Bayesian interpretations of probability
Bayesian Updating
Slide 28
Bayesian vs Frequentists
Bayesian vs Frequentists
Summary: Probability
Conditional Probability
Conditional Probability: Definition
Conditional Probability: Multiplication Rule
The Law of Total Probability (Part 2)
Independence
Independence
Conditional Independence
Bayes Rule
What’s the probability you have Covid-19 given a positive test
Possible Outcomes
What’s the probability you have Covid-19 given a positive test
What’s the probability you have Covid-19 given a positive test
What if you took a second test?
Random Variables and Probability Distributions
Random Variables
Example: Toss Two Coins
Probability Distributions
Discrete distributions
Example: Toss Two coins
Rolling a die
Continuous distributions
Integrals
Integrals
Integrals
Integrals
Link between PDF and CDF
Properties of the CDF
Recall the PMF and CDF of a die
What’s the probability
What we’ll use proability distributions for:
Please memorize these...
Source...
Expected Values and Variances
Expected Value
Condtional Expectations:
What’s the expected value of one roll of fair die?
Properties of Expected Values
How many times would...
Rolling a fair die to get all six sides
Variance
Variance
Standard Deviations
Covariance and correlation
Properties of Variance and Covariance
What you need to know (WYNK)
Summary: Random Variables and Probability Distributions
Summary: Random Variables and Probability Distributions
Standard Errors for Regression
Interpreting regressions
What’s a standard error?
Standard errors of...
Standard errors of...
Standard errors of...
Robust Standard Errors
Robust Standard Errors
Robust standard errors in R
lm_robust()
Previewing Lab 8
Overview
Recreating Figure 2
Q3.1 Describing variation across states
References