POLS 1600

Casual Inference in
Observational Designs &
Simple Linear Regression

Updated Apr 22, 2025

You're learning how to map conceptual tasks to commands in R
Skill	Common Commands
Setup R	library(), ipak()
Load data	read_csv(), load()
Get HLO of data	df$x, glimpse(), table(), summary()
Transform data	<-, mutate(), ifelse(), case_when()
Reshape data	pivot_longer(), left_join()
Summarize data numerically	mean(), median(), summarise(), group_by()
Summarize data graphically	ggplot(), aes(), geom_

Mean: Conceptual Understanding

A mean is:

A common and important measure of central tendency (what’s typical)
It’s the arithmetic average you learned in school
We can think of it as the balancing point of a distribution
A conditional mean is the average of one variable $X$ , when some other variable, $Z$ takes a value $z$
- Think about the average height in our class (unconditional mean) vs the average height among men and women ([conditional means].{blue})

Experimental Designs

Experimental designs are studies in which a causal variable of interest, the treatement, is manipulated by the researcher to examine its causal effects on some outcome of interest
Random assignment is the key to causal identification in experiments because it creates statistical independence between treatment and potential outcomes any potential confounding factors

$Y_{i} (1), Y_{i} (0), X_{i}, U_{i} ⫫ D_{i}$

Randomization creates credible counterfactual comparisons

If treatment has been randomly assigned, then:

The only thing that differs between treatment and control is that one group got the treatment, and another did not.
We can estimate the Average Treatment Effect (ATE) using the difference of sample means

$\begin{aligned} E [\frac{\sum_{1}^{m} Y_{i}}{m} - \frac{\sum_{m + 1}^{N} Y_{i}}{N - m}] & = \overset{\begin{array}{c} Average outcome \\ among treated \\ units \end{array}}{\overset{⏞}{E [\frac{\sum_{1}^{m} Y_{i}}{m}]}} - \overset{\begin{array}{c} Average outcome \\ among control \\ units \end{array}}{\overset{⏞}{E [\frac{\sum_{m + 1}^{N} Y_{i}}{N - m}]}} \\ = E [Y_{i} (1) | D_{i} = 1] - E [Y_{i} (0) | D_{i} = 0] \end{aligned}$

Observational Designs

Observational designs are studies in which a causal variable of interest is determined by someone/thing other than the researcher (nature, governments, people, etc.)
Since treatment has not been randomly assigned, observational studies typically require stronger assumptions to make causal claims.
Generally speaking, these assumptions amount to a claim about conditional independence

$Y_{i} (1), Y_{i} (0), X_{i}, U_{i} ⫫ D_{i} | K_{i}$

Where after conditioning on $K_{i}$ , some knowledge about the world and how the data were generated, our treatment is as good as (as-if) randomly assigned (hence conditionally independent)
- Economists often call this assumption of selection on observables

Directed Acyclic Graphs

Directed Acyclic Graphs provide a way of encoding assumptions about casual relationships
- Directed Arrows $\to$ describe a direct causal effect
- Arrow from $D \to Y$ means $Y_{i} (d) \neq Y_{i} (d^{'})$ “The outcome ( $Y$ ) for person $i$ when D happens ( $Y_{i} (d)$ ) is different than the the outcome when $D$ doesn’t happen ( $Y_{i} (d^{'})$ )
- No arrow = no effect ( $Y_{i} (d) = Y_{i} (d^{'})$ )
- Acyclic: No cycles. A variable can’t cause itself

Linear Regression and the Line of Best Fit

The goal of linear regression is to choose coefficients $β_{0}$ and $β_{1}$ to summarizes the relationship between $y$ and $x$

$y_{i} = β_{0} + β_{1} x_{i} + ϵ$

To accomplish this we need some sort of criteria.
For linear regression, that criteria is minimizing the error between what our model predicts $\hat{y_{i}} = β_{0} + β_{1} x_{i}$ and what we actually observed $(y_{i})$
More on this to come. But first…

Regression Notation

$y_{i}$ an outcome variable or thing we’re trying to explain
- AKA: The dependent variable, The response variable, The left hand side of the model
$x_{i}$ a predictor variables or things we think explain variation in our outcome
- AKA: The independent variable, covariates, the right hand side of the model.
- Cap or No Cap: I’ll use $X$ (should be $X$ ) to denote a set (matrix) of predictor variables. $y$ vs $Y$ can also have technical distinctions (Sample vs Population, observed value vs Random Variable, …)
$β$ a set of unknown parameters that describe the relationship between our outcome $y_{i}$ and our predictors $x_{i}$
$ϵ_{i}$ the error term representing variation in $y_{i}$ not explained by our model.
$\hat{ϵ_{i}}$ is a residual, or an estimate of the error, given a model.

Practical: Predicted values from a Linear Regression

Often it’s useful for interpretation to obtain predicted values from a regression.
To obtain predicted vales $(\hat{y})$ , we simply plug in a value for $x$ (In this case, $A g e$ ) and evaluate our equation.
For example, might we expect attitudes to differ among an 18-year-old college student and their 68-year-old grandparent?

${\hat{F T}}_{x = 18} = 62.82 + - 0.2 \times 18 = 59.16$ ${\hat{F T}}_{x = 65} = 62.82 + - 0.2 \times 68 = 49.01$

What’s an error? What’s a residual

Recall an error, $ϵ_{i}$ , is the deviation of from an unknown true model

$y_{i} = f (x_{i}) + ϵ_{i}$

A residual, $\hat{ϵ_{i}}$ is an estimate of the error obtained by taking the difference between the observed value of $y_{i}$ and what our model would predict, $\hat{y_{i}}$ given some value of $x_{i}$ . So for a model:

$y_{i} = β_{0} + β_{1} x_{i} + ϵ_{i}$

We simply subtract our model’s prediction $β_{0} + β_{1} x_{i}$ from the observed value, $y_{i}$

$\hat{ϵ_{i}} = y_{i} - \hat{y_{i}} = (Y_{i} - (β_{0} + β_{1} x_{i}))$

How did `lm()` choose $β_{0}$ and $β_{1}$

In an intro stats course, we would walk through the process of finding

$Find \hat{β_{0}}, \hat{β_{1}} {arg min}_{β_{0}, β_{1}} \sum (y_{i} - (β_{0} + β_{1} x_{i}))^{2}$ Which involves a little bit of calculus. The big payoff is that

$β_{0} = \bar{y} - β_{1} \bar{x}$ And

$β_{1} = \frac{C o v (x, y)}{V a r (x)}$ Which is never quite the epiphany, I think we think it is…

The following slides walk you through the mechanics of this exercise. We’re gonna skip through them in class, but they’re there for your reference

Some useful facts about derivatives

Derivative of a constant

$f^{'} (c) = 0$

Derivative of a line f(x)=2x

$f^{'} (2 x) = 2$

Derivative of $f (x) = x^{2}$

$f^{'} (x^{2}) = 2 x$

Chain rule: y= f(g(x)). The derivative of y with respect to x is

$\frac{d}{d x} (f (g (x))) = f^{'} (g (x)) g^{'} (x)$

The derivative of the “outside” times the derivative of the “inside,” remembering that the derivative of the outside function is evaluated at the value of the inside function.

Notation

Let’s adopt a little notation to help us think about the logic of Snow’s design:

$D$ : treatment indicator, 1 for treated neighborhoods (Lambeth), 0 for control neighborhoods (Southwark and Vauxhall)
$T$ : period indicator, 1 if post treatment (1854), 0 if pre-treatment (1849).
$Y_{d i} (t)$ the potential outcome of unit $i$
- $Y_{1 i} (t)$ the potential outcome of unit $i$ when treated between the two periods
- $Y_{0 i} (t)$ the potential outcome of unit $i$ when control between the two periods

	Pre-Period (T=0)	Post-Period (T=1)
Treated $D_{i} = 1$	$E [Y_{0 i} (0) \| D_{i} = 1]$	$E [Y_{1 i} (1) \| D_{i} = 1]$
Control $D_{i} = 0$	$E [Y_{0 i} (0) \| D_{i} = 0]$	$E [Y_{0 i} (1) \| D_{i} = 0]$

Before vs after comparisons:

Snow could have compared Labmeth in 1854 $(E [Y_{i} (1) | D_{i} = 1] = 19)$ to Lambeth in 1849 $(E [Y_{i} (0) | D_{i} = 1] = 85)$ , and claimed that moving the pumps upstream led to 66 fewer cholera deaths.
Assumes Lambeth’s pre-treatment outcomes in 1849 are a good proxy for what its outcomes would have been in 1954 if the pumps hadn’t moved $(E [Y_{0 i} (1) | D_{i} = 1])$ .
A skeptic might argue that Lambeth in 1849 $\neq$ Lambeth in 1854

Company	1849 (T=0)	1854 (T=1)
Lambeth (D=1)	85	19
Southwark and Vauxhall (D=0)	135	147

Treatment-Control comparisons in the Post Period.

Snow could have compared outcomes between Lambeth and S&V in 1954 ( $E [Y i (1) | D i = 1] - E [Y i (1) | D i = 0]$ ), concluding that the change in pump locations led to 128 fewer deaths.
Here the assumption is that the outcomes in S&V and in 1854 provide a good proxy for what would have happened in Lambeth in 1954 had the pumps not been moved $(E [Y_{0 i} (1) | D_{i} = 1])$
Again, our skeptic could argue Lambeth $\neq$ S&V

Company	1849 (T=0)	1854 (T=1)
Lambeth (D=1)	85	19
Southwark and Vauxhall (D=0)	135	147

Difference in Differences

To address these concerns, Snow employed what we now call a difference-in-differences design,

There are two, equivalent ways to view this design.

$\underset{1. Treat-Control |Post}{\underset{⏟}{{E [Y_{i} (1) | D_{i} = 1] - E [Y_{i} (1) | D_{i} = 0]}}} - \overset{Treated-Control|Pre}{\overset{⏞}{{E [Y_{i} (0) | D_{i} = 1] - E [Y_{i} (0) | D_{i} = 0]}}$

Difference 1: Average change between Treated and Control in Post Period
Difference 2: Average change between Treated and Control in Pre Period

Using linear regression to estimate a Difference in Difference

Recall that linear regression provides a…
- linear estimate of the conditional expectation function
In the canonincal pre-post, treated and control DiD, $β_{3}$ from the following linear regression will give us the ATT:

$y = β_{0} + β_{1} P o s t + β_{2} T r e a t e d + \underset{τ_{A T T}}{\underset{⏟}{β_{3} P o s t \times T r e a t e d}}$

cholera_df <- tibble(
  Period = factor(c("Pre","Pre","Post","Post"),
                  levels = c("Pre","Post")),
  Year = c(1849,1849, 1854,1854),
  Treated = factor(c("Control","Treated","Control","Treated")),
  Company = c("S&V","Lambeth","S&V","Lambeth"),
  Deaths = c(135,85,147,19)
)

m_did <- lm(Deaths~Period + Treated + Period:Treated, cholera_df)

m_did


Call:
lm(formula = Deaths ~ Period + Treated + Period:Treated, data = cholera_df)

Coefficients:
              (Intercept)                 PeriodPost  
                      135                         12  
           TreatedTreated  PeriodPost:TreatedTreated  
                      -50                        -78

Statistical models
	Model 1
(Intercept)	135.00
Post (1854)	12.00
Treated (Lambeth)	-50.00
Post X Treated (DID)	-78.00
R²	1.00
Adj. R²
Num. obs.	4
^*p < 0.001; ^p < 0.01; ^*p < 0.05

POLS 1600 Casual Inference in Observational Designs & Simple Linear Regression Updated Apr 22, 2025

POLS 1600
Overview
Overview
Learing goals
Annoucements
Assignment 1
Group Assignments
Feedback
What did we like
What did we dislike
Grinding an Iron Pestle into a Needle
Review
Review
Data Wrangling
Data wrangling
Mapping Concepts to Code
Descriptive Statiscs
Descriptive statistics
Levels of understanding
Levels of understanding in POLS 1600
Mean: Conceptual Understanding
Mean as a balancing point
Mean: Practical
Conditional Means: Practical
Mean: Definitional
Mean: Definitional
Mean: Theoretical
Mean: Theoretical
Mean: Theoretical
Mean: Theoretical
The sample mean is an unbiased estimator of the population mean
The sample mean is an unbiased estimator of the population mean
Levels of understanding
Data Visualization
Data Visualization
You are about to be reincarnated: HLO
Basic Plot
Use a factor to label and order responses
Revised figure
What creature and...
Adding labelled values
You’re about to be reincarnated:
Data visualization is an iterative process
Setup
New packages
Install new packages
Census API
Packages for today
Previewing the Lab
Red Covid
Preview of the Lab
Lab: Questions 1-5: Review
Lab: Questions 6-10: Simple Linear Regression
Before Thursday
Q1: Setup your workspace
Q2 Load the data
Q2.1 Load the Covid-19 Data
Q2.2 Load Election Data
Q3 Describe the structure of each dataset
Q3 Describe the structure of each dataset
Q4 Recode the data for analysis
Q4.1 Recode the Covid-19
Q4.2 Calculate Rolling Means of Covid Deaths
Rolling Averages
Look at the output of zoo::rollmean()
Comparing Daily Cases to Rolling Average
Q4.3 Recode Presidential data
Q5 merging data
Advice for merging
Causal Inference
Causal inference is about counterfactual comparisons
Causal Identification
Experimental Designs
Randomization creates credible counterfactual comparisons
Observational Designs
Causal Inference in Observational Studies
Directed Acyclic Graphs
Two Ways to Describe Causal Claims
Directed Acyclic Graphs
Types of variables in a DAG
DAGs illustrate two sources of bias:
Confounding Bias:...
Collider Bias: The...
When to control for a variable:
Covariate Adjustment
Covariate Adjustment
Covariate Adjustment
Three approaches to covariate adjustment
Simple Linear Regression
Understanding Linear Regression
Understanding Linear Regression
Conceptual: Linear Regression
Conceptual: Linear Regression
Conceptual: Linear Regression
Linear Regression and the Line of Best Fit
Regression Notation
Linear Regression
Linear Regression
Practical: Estimating a Linear Regression
Practical: Estimating a Linear Regression
The lm() function
Practical: Interpreting a Linear Regression
Practical: Interpreting a Linear Regression
Practical: Predicted values from a Linear Regression
Practical: Predicted values from a Linear Regression
Practical: Predicted values from a Linear Regression
Practical: Predicted values from a Linear Regression
Practical: Visualizing...
Technichal: Mechanics of Linear Regression
How did lm() choose $β_{0}$ and $β_{1}$
What’s an error? What’s a residual
Why are we squaring and summing $ϵ$
How do we minimize $\sum ϵ^{2}$
How did lm() choose $β_{0}$ and $β_{1}$
How do we minimize $\sum ϵ^{2}$
Derivatives
Some useful facts about derivatives
Finding a local Minimum
Partial derivatives
Minimizing the sum of squared errors
Minimizing the sum of squared errors
Solving for $β_{0}$
Solving for $β_{1}$
Solving for $β_{1}$
Solving for $β_{1}$
Theoretical: OLS provides a linear estimate of CEF: E[Y|X]
Linear Regression is a many splendored thing
Linear Regression is a many splendored thing
The Conditional Expectation Function
Linear Approximations...
What you need to know about Regression
What you need to know about Regression
Difference-in-Differences
Motivating Example: What causes Cholera?
Notation
Causal Effects
Average Treatment on Treated
Average Treatment on Treated
Data
How can we estimate the effect of moving pumps upstream?
Before vs after comparisons:
Treatment-Control comparisons in the Post Period.
Difference in Differences
Difference in Differences
Difference in Differences
Identifying Assumption of a Difference in Differences Design
Parralel Trends
Using linear regression to estimate a Difference in Difference
Summary
Extensions and limitations
Applications
References