POLS 1600

Causal Inference in
Experimental Designs

Updated Apr 22, 2025

Feedback

Estimands, Estimators and Estimates

Estimand the thing we want to know.
- Sometimes called a parameter ( $θ$ , “theta”) or quantity of interest
- The expected value of heights in POLS 1600 ( $θ = E [X]$ )
Estimator a rule or method for calculating an estimate of our estimand
- An average of a sample of 10 student’s heights in POLS 1600
- $\hat{θ} = \bar{x} = 1 / n * \sum_{1}^{n} x_{i}$
Estimate: a value produced by our estimator for some data
- The average of our 10 person sample is 5'10''

Potential outcomes notation

$Y_{i} (1)$ describes individual $i$ ’s outcome, $Y_{i}$ if they received the treatment $(D_{i} = 1)$
- Shorthand for $Y_{i} (D_{i} = 1)$
- Paul’s Covid-19 status ( $Y_{i}$ ) with the vaccine ( $D_{i} = 1$ )
$Y_{i} (0)$ describes individual $i$ ’s outcome, $Y_{i}$ if they did not receive the treatment $(D_{i} = 0)_{i}$
- Shorthand for $Y_{i} (D_{i} = 0)$
- Paul’s Covid-19 status ( $Y_{i}$ ) without the vaccine ( $D_{i} = 0$ )

The treatment received determines which potential outcome we actually observe:

$Y_{i} = (1 - D_{i}) * Y_{i} (0) + D_{i} * Y_{i} (1)$

Potential outcomes are fixed, but we only observe one (of many) potential outcomes $\to$ Fundamental Problem of Causal Inference

Potential Outcomes:

$Y_{i} (1)$	$Y_{i} (0)$	$τ_{i}$
7	3	4
8	6	2
5	4	1
4	3	1
6	10	-4
8	9	-1
5	4	1
7	8	-1
4	3	1
6	0	6

$E [Y_{i} (1)]$	$E [Y_{i} (0)]$	$E [τ_{i}]$
6	5	1

If we could observe everyone’s potential outcomes, we could calculate the ICE
On average eating chocolate increases happiness by 1 point on our 10-point scale (ATE = 1)
Suppose we conducted a study and let folks select what they wanted to eat.

Potential Outcomes:

$Y_{i} (1)$	$Y_{i} (0)$	$τ_{i}$
7	3	4
8	6	2
5	4	1
4	3	1
6	10	-4
8	9	-1
5	4	1
7	8	-1
4	3	1
6	0	6

$E [Y_{i} (1)]$	$E [Y_{i} (0)]$	$A T E$
6	5	1

Observed Treatment:

$x_{i}$	$d_{i}$	$y_{i}$
chocolate	1	7
chocolate	1	8
chocolate	1	5
chocolate	1	4
fruit	0	10
fruit	0	9
chocolate	1	5
fruit	0	8
chocolate	1	4
chocolate	1	6

${\bar{y}}_{d = 1}$	${\bar{y}}_{d = 0}$	$\hat{A T E}$
5.57	9	-3.43

Observed Treatment:

$x_{i}$	$d_{i}$	$y_{i}$
chocolate	1	7
chocolate	1	8
chocolate	1	5
chocolate	1	4
fruit	0	10
fruit	0	9
chocolate	1	5
fruit	0	8
chocolate	1	4
chocolate	1	6

${\bar{y}}_{d = 1}$	${\bar{y}}_{d = 0}$	$\hat{A T E}$
5.57	9	-3.43

Selection Bias

Our estimate of the ATE is biased by the fact that folks who prefer fruit seem to be happier than folks who prefer chocolate in this example
In general, selection bias occurs when folks who receive the treatment differ systematically from folks who don’t
What if instead of letting people pick and choose, we randomly assigned half our respondents to chocolate and half to receive fruit

Potential Outcomes:

$Y_{i} (1)$	$Y_{i} (0)$	$τ_{i}$
7	3	4
8	6	2
5	4	1
4	3	1
6	10	-4
8	9	-1
5	4	1
7	8	-1
4	3	1
6	0	6

$E [Y_{i} (1)]$	$E [Y_{i} (0)]$	$A T E$
6	5	1

Randomly Assigned Treatment:

$x_{i}$	$d_{i}$	$y_{i}$
chocolate	1	7
chocolate	1	8
chocolate	0	4
chocolate	1	4
fruit	0	10
fruit	1	8
chocolate	0	4
fruit	0	8
chocolate	1	4
chocolate	0	0

${\bar{y}}_{d = 1}$	${\bar{y}}_{d = 0}$	$\hat{A T E}$
6.2	5.2	1

Randomly Assigned Treatment:

$x_{i}$	$d_{i}$	$y_{i}$
chocolate	1	7
chocolate	1	8
chocolate	0	4
chocolate	1	4
fruit	0	10
fruit	1	8
chocolate	0	4
fruit	0	8
chocolate	1	4
chocolate	0	0

${\bar{y}}_{d = 1}$	${\bar{y}}_{d = 0}$	$\hat{A T E}$
6.2	5.2	1

Random Assignment

When treatment has been randomly assigned, a difference in sample means provides an unbiased estimate of the ATE
The fact that our $\hat{A T E} = A T E$ in this example is pure coincidence.
If we randomly assigned treatment a different way, we’d get a different estimate.
In general unbiased estimators will tend to be neither too high nor too low (e.g. $E [\hat{θ} - θ] = 0$ ])

Estimating an Average Treatment Effect

If we treatment has been randomly assigned, we can estimate the ATE by taking the difference of means between treatment and control:

$\begin{aligned} E [\frac{\sum_{1}^{m} Y_{i}}{m} - \frac{\sum_{m + 1}^{N} Y_{i}}{N - m}] & = \overset{\begin{array}{c} Average outcome \\ among treated \\ units \end{array}}{\overset{⏞}{E [\frac{\sum_{1}^{m} Y_{i}}{m}]}} - \overset{\begin{array}{c} Average outcome \\ among control \\ units \end{array}}{\overset{⏞}{E [\frac{\sum_{m + 1}^{N} Y_{i}}{N - m}]}} \\ = E [Y_{i} (1) | D_{i} = 1] - E [Y_{i} (0) | D_{i} = 0] \end{aligned}$

That is, the ATE is causally identified by the difference of means estimator in an experimental design

Random Assignment 1

$x_{i}$	$d_{i}$	$y_{i}$
chocolate	1	7
chocolate	1	8
chocolate	0	4
chocolate	1	4
fruit	0	10
fruit	1	8
chocolate	0	4
fruit	0	8
chocolate	1	4
chocolate	0	0

${\bar{y}}_{d = 1}$	${\bar{y}}_{d = 0}$	$\hat{A T E}$
6.2	5.2	1

Random Assignment 2

$x_{i}$	$d_{i}$	$y_{i}$
chocolate	0	3
chocolate	1	8
chocolate	0	4
chocolate	1	4
fruit	1	6
fruit	1	8
chocolate	0	4
fruit	1	7
chocolate	0	3
chocolate	0	0

${\bar{y}}_{d = 1}$	${\bar{y}}_{d = 0}$	$\hat{A T E}$
6.6	2.8	3.8

Random Assignment 3

$x_{i}$	$d_{i}$	$y_{i}$
chocolate	1	7
chocolate	0	6
chocolate	1	5
chocolate	1	4
fruit	0	10
fruit	0	9
chocolate	0	4
fruit	1	7
chocolate	1	4
chocolate	0	0

${\bar{y}}_{d = 1}$	${\bar{y}}_{d = 0}$	$\hat{A T E}$
5.4	5.8	-0.4

Why Random Assignment Matters?

Formally, randomly assigning treatments creates statistical independence $(⫫)$ between treatment ( $D$ ) and potential outcomes ( $Y (1), Y (0)$ ) as well as any observed ( $X$ ) or unobserved confounders ( $U$ ):

$Y_{i} (1), Y_{i} (0), X_{i}, U_{i} ⫫ D_{i}$

Practically, what this means is that what we can observe ( differences in conditional means for treated and control ), provide good (unbiased) estimates of what we’re trying to learn about (Average Treatment Effects)

POLS 1600 Causal Inference in Experimental Designs Updated Apr 22, 2025

POLS 1600
Overview
Class Plan
Annoucements
Group Assignments
Feedback
What did we like
What did we dislike
Setup
Setting your working directory when working “Live”
Packages for today
Review
Review: Data wrangling
Data transformations
Data Visualization
Practice
Practice
Causal Inference
Causal claims imply counterfactual comparisons
What’s the counterfactual for these claims:
Casual claims are all around us
Casual claims are all around us
Notation for Causal Inference
How to represent causal claims
General Notation: Variables
Expected Values
Conditional Expectations
Estimands, Estimators and Estimates
Error and Bias
Bias vs. variance
The bias-variance tradeoff
Potential outcomes notation
Fundamental Problem of Causal Inference
Causal Identification
Identification
Causal Identification
Observational vs Experimental Designs
Causal Identification in Experimental Designs
The FPoCI is a problem of missing data
A statistical solution to the FPoCI
Does eating chocolate make you happy?
Potential Outcomes:...
Potential Outcomes:...
Observed Treatment:...
Potential Outcomes:...
Randomly Assigned...
Estimating an Average Treatment Effect
Random Assignment...
Distribution of Sample ATEs
Why Random Assignment Matters?
Causal Identification with Experimental Designs
Random assignment creates testable implications
No Causation without Manipulation?
Estimating ATEs with the resume data
The Resume Experiment
High level Overview (p. 34)
High level Overview (p. 34)
Crosstabs
Tidy crosstab
Calculating Call Back Rates
Calculating Call Back Rates with group_by()
Factor variables in Base R
Factor variables in Tidy R
Comparing approaches
Visualizing Call Back Rates by Name
Broockman and Kalla (2016)
Reading Academic Papers
Study Design :A placebo-controlled field experiment
Data for Thursday
Codebook
HLO
Study Design
Assessing balance in covariates
Assessing balance in covariates
Assessing balance in covariates
Summary
Summary
References