POLS 1600

Interpreting and Evaluating
Linear Models

Updated Apr 22, 2025

Overview

Goals

Regression models partition variation in an outcome into variation explained by the model and not explained by the model
Individual regression coefficients reflect the variation explained by that predictor, and only that predictor
Predicted values for regression models aid in substantive interpretation
Measures of model fit like $R^{2}$ can be useful for comparing different regression models
Difference-in-differences designs combine pre-post and treatment-control comparisons to make stronger causal claims.

What does it mean to “control for X”

Statistical models
	Model 1	Model 2	Model 3
(Intercept)	0.57^***	0.57^***	0.57^***
	(0.05)	(0.05)	(0.04)
rep_voteshare_std	0.23^***	0.23^***	0.07
	(0.05)	(0.05)	(0.07)
med_age_std		0.03	-0.02
		(0.05)	(0.05)
med_income_std			-0.22^**
			(0.07)
R²	0.31	0.31	0.44
Adj. R²	0.29	0.28	0.40
Num. obs.	51	51	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Statistical models
	DV: Death
(Intercept)	0.57^***
	(0.05)
rep_voteshare_std	0.23^***
	(0.05)
R²	0.31
Adj. R²	0.29
Num. obs.	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Statistical models
	DV: Death	DV: Vote Share	DV: Res. Deaths
(Intercept)	0.57^***	0.57^***	0.57^***	-0.00	-0.00
	(0.05)	(0.05)	(0.06)	(0.14)	(0.05)
rep_voteshare_std	0.23^***	0.23^***
	(0.05)	(0.05)
med_age_std		0.03	0.00	-0.12
		(0.05)	(0.06)	(0.14)
res_repvs_no_age					0.23^***
					(0.05)
R²	0.31	0.31	0.00	0.02	0.31
Adj. R²	0.29	0.28	-0.02	-0.00	0.30
Num. obs.	51	51	51	51	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Statistical models
	DV: Death	DV: Vote Share	DV: Res. Death
(Intercept)	0.57^***	0.57^***	0.57^***	-0.00	0.00
	(0.05)	(0.04)	(0.04)	(0.10)	(0.04)
rep_voteshare_std	0.23^***	0.07
	(0.05)	(0.07)
med_age_std		-0.02	-0.03	-0.22^*
		(0.05)	(0.05)	(0.10)
med_income_std		-0.22^**	-0.27^***	-0.74^***
		(0.07)	(0.05)	(0.10)
res_repvs_no_age_income					0.07
					(0.07)
R²	0.31	0.44	0.43	0.55	0.02
Adj. R²	0.29	0.40	0.40	0.53	0.00
Num. obs.	51	51	51	51	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Using regression to produce predicted values

Coefficients in a regression define a formula which produces a predicted value of the outcome $y$ when the predictors $X$ take particular values.

$\begin{aligned} y & = \overset{Predictors}{\overset{⏞}{β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots β_{j} x_{j}}} + \underset{Residuals}{\underset{⏟}{ϵ}} \\ y & = β_{0} + β_{1} x_{r v s} + β_{2} x_{a g e} + β_{3} x_{i n c} + ϵ & m3 \\ y & = 0.56 + 0.07 x_{r v s} - 0.02 x_{a g e} - 0.22 x_{i n c} + \hat{ϵ} & estimated m3 \\ y & = 0.56 + 0.07 (- 0.87) - 0.02 (0.62) - 0.22 (0.38) + \hat{ϵ} & prediction for RI \\ \overset{Observed}{\overset{⏞}{0.22}} & = \underset{Predicted}{\underset{⏟}{0.41}} + \overset{Residual}{\overset{⏞}{(- 0.19)}} \end{aligned}$

Statistical models
	Model 1
(Intercept)	6.194^**
	(2.186)
percent_vaccinated	-0.169^*
	(0.077)
percent_vaccinated^2	0.001
	(0.001)
rep_voteshare_std	-0.062
	(0.081)
med_age_std	0.053
	(0.053)
med_income_std	-0.114
	(0.068)
R²	0.561
Adj. R²	0.512
Num. obs.	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Calculating $R^{2}$ in R

We could do it by hand, finding that our model explained about 43 percent of the observed variation deaths.

# ESS / TSS
var(m3$fitted.values)/var(m3$model$new_deaths_pc_14day)

[1] 0.4393655

# 1 - RSS/TSS
1 - var(m3$residuals)/var(m3$model$new_deaths_pc_14day)

[1] 0.4393655

But generally we let the summary() function do it for us:

summary(m3)


Call:
lm(formula = new_deaths_pc_14day ~ rep_voteshare_std + med_age_std + 
    med_income_std, data = covid_lab)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50751 -0.19703 -0.06278  0.20024  0.92320 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.56561    0.04425  12.782  < 2e-16 ***
rep_voteshare_std  0.07140    0.06654   1.073  0.28869    
med_age_std       -0.01692    0.04744  -0.357  0.72296    
med_income_std    -0.21669    0.06660  -3.254  0.00211 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.316 on 47 degrees of freedom
Multiple R-squared:  0.4394,    Adjusted R-squared:  0.4036 
F-statistic: 12.28 on 3 and 47 DF,  p-value: 4.689e-06

Adjusted $R^{2}$

One can show that a models $R^{2}$ always increases as we add predictors, even when they’re unrelated to the outcome
The adjusted $R^{2}$ adjusts for this by weighting the $R^{2}$ of a model by the number of predictors

$adj. R^{2} = 1 - \frac{R S S / (n - k)}{T S S / (n - 1)}$

ex_df <- data.frame(
  y = rnorm(100) 
  ) %>%
    bind_cols(
      data.frame(matrix(rnorm(10000), ncol=100))
    ) %>% janitor::clean_names()


the_formulas <- list()
for(i in 2:51){
  vars <- names(ex_df)[2:i]
  the_formulas[[i-1]] <- paste("y~",paste(vars,collapse = "+"))
}

the_formulas %>% 
  purrr::map(as.formula) %>% 
  purrr::map(lm, data=ex_df) %>% 
  purrr::map(summary) %>% 
  purrr::map_df(glance) -> r2_df

r2_df %>% 
  ggplot(aes(df, r.squared))+
  geom_point(aes(col = "R^2"))+
  geom_line()+
  geom_point(aes(y=adj.r.squared,col = "Adjusted R^2"))+
  geom_line(aes(y=adj.r.squared))+
  labs(
    x = "Number of predictors",
    y = "Proportion of Variance Explained",
    title = "Adding unrelated predictors increases a model's R^2\nwhile the Adjusted R^2 provides a better indicator of poor fit ",
    col ="Model fit"
  ) -> fig_r2

Using $R^{2}$ to compare models

When models are nested (larger models contain all the predictors of smaller models), we can ask, does including the additional predictors in the larger model explain more variation in the outcome than we would expect would happen if we just added additional, random variable.

Formally we call this process an Analysis of Variance (ANOVA)

Let’s assess the added predictive power of I(percent_vaccinated^2) by estimating a model without it and comparing models using ANOVA

# Estimate model without polynomial
m5 <- lm(new_deaths_pc_14day ~ percent_vaccinated  + rep_voteshare_std + med_age_std + med_income_std, covid_lab
           )

Statistical models
	Model 1	Model 2
(Intercept)	6.194^**	2.532^***
	(2.186)	(0.657)
percent_vaccinated	-0.169^*	-0.035^**
	(0.077)	(0.012)
percent_vaccinated^2	0.001
	(0.001)
rep_voteshare_std	-0.062	-0.089
	(0.081)	(0.082)
med_age_std	0.053	0.071
	(0.053)	(0.053)
med_income_std	-0.114	-0.119
	(0.068)	(0.070)
R²	0.561	0.531
Adj. R²	0.512	0.490
Num. obs.	51	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05

The anova suggests that including a polynomial provides a marginal improvement to fit (p < 0.10)

anova(m5, m4)

Analysis of Variance Table

Model 1: new_deaths_pc_14day ~ percent_vaccinated + rep_voteshare_std + 
    med_age_std + med_income_std
Model 2: new_deaths_pc_14day ~ percent_vaccinated + I(percent_vaccinated^2) + 
    rep_voteshare_std + med_age_std + med_income_std
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1     46 3.9268                              
2     45 3.6758  1   0.25098 3.0725 0.08644 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Standardized vs Non Standardized Predictors

Why are we using standardized predictors?

Standardizing variables is a common transformation:

$z -scores of x = \frac{x_{i} - μ_{x}}{σ_{x}}$ When variables are measured on very different scales or units (e.g. age in years, income in dollars), using standardized (or normalized) versions rescales them to a unit-less measures that all have:

a mean of zero
a standard deviation of 1

# Estimate model  with unstandardized predictors
m6 <- lm(new_deaths_pc_14day ~ percent_vaccinated  + rep_voteshare + med_age + med_income, covid_lab
           )

Statistical models
	Model 1	Model 2
(Intercept)	2.532^***	2.470^*
	(0.657)	(1.063)
percent_vaccinated	-0.035^**	-0.035^**
	(0.012)	(0.012)
rep_voteshare_std	-0.089
	(0.082)
med_age_std	0.071
	(0.053)
med_income_std	-0.119
	(0.070)
rep_voteshare		-0.007
		(0.007)
med_age		0.029
		(0.022)
med_income		-0.000
		(0.000)
R²	0.531	0.531
Adj. R²	0.490	0.490
Num. obs.	51	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05

When should you standardize variables? It depends on what you’re trying to do.

It’s the same information, just rescaled:

coef(m5)[5] # Standardized coef

med_income_std 
    -0.1188812

coef(m6)[5] # Unstandaried

  med_income 
-1.10947e-05

coef(m6)[5]*sd(covid_lab$med_income) # Same as standardized

med_income 
-0.1188812

Can facilitate comparison and estimation
Might make interpretation easier (but the onus is on you to describe your models well)
- Don’t standardized binary predictors

Notation

Let’s adopt a little notation to help us think about the logic of Snow’s design:

$D$ : treatment indicator, 1 for treated neighborhoods (Lambeth), 0 for control neighborhoods (Southwark and Vauxhall)
$T$ : period indicator, 1 if post treatment (1854), 0 if pre-treatment (1849).
$Y_{d i} (t)$ the potential outcome of unit $i$
- $Y_{1 i} (t)$ the potential outcome of unit $i$ when treated between the two periods
- $Y_{0 i} (t)$ the potential outcome of unit $i$ when control between the two periods

	Pre-Period (T=0)	Post-Period (T=1)
Treated $D_{i} = 1$	$E [Y_{0 i} (0) \| D_{i} = 1]$	$E [Y_{1 i} (1) \| D_{i} = 1]$
Control $D_{i} = 0$	$E [Y_{0 i} (0) \| D_{i} = 0]$	$E [Y_{0 i} (1) \| D_{i} = 0]$

Before vs after comparisons:

Snow could have compared Labmeth in 1854 $(E [Y_{i} (1) | D_{i} = 1] = 19)$ to Lambeth in 1849 $(E [Y_{i} (0) | D_{i} = 1] = 85)$ , and claimed that moving the pumps upstream led to 66 fewer cholera deaths.
Assumes Lambeth’s pre-treatment outcomes in 1849 are a good proxy for what its outcomes would have been in 1954 if the pumps hadn’t moved $(E [Y_{0 i} (1) | D_{i} = 1])$ .
A skeptic might argue that Lambeth in 1849 $\neq$ Lambeth in 1854

Company	1849 (T=0)	1854 (T=1)
Lambeth (D=1)	85	19
Southwark and Vauxhall (D=0)	135	147

Treatment-Control comparisons in the Post Period.

Snow could have compared outcomes between Lambeth and S&V in 1954 ( $E [Y i (1) | D i = 1] - E [Y i (1) | D i = 0]$ ), concluding that the change in pump locations led to 128 fewer deaths.
Here the assumption is that the outcomes in S&V and in 1854 provide a good proxy for what would have happened in Lambeth in 1954 had the pumps not been moved $(E [Y_{0 i} (1) | D_{i} = 1])$
Again, our skeptic could argue Lambeth $\neq$ S&V

Company	1849 (T=0)	1854 (T=1)
Lambeth (D=1)	85	19
Southwark and Vauxhall (D=0)	135	147

Difference in Differences

To address these concerns, Snow employed what we now call a difference-in-differences design,

There are two, equivalent ways to view this design.

$\underset{1. Treat-Control |Post}{\underset{⏟}{{E [Y_{i} (1) | D_{i} = 1] - E [Y_{i} (1) | D_{i} = 0]}}} - \overset{Treated-Control|Pre}{\overset{⏞}{{E [Y_{i} (0) | D_{i} = 1] - E [Y_{i} (0) | D_{i} = 0]}}$

Difference 1: Average change between Treated and Control in Post Period
Difference 2: Average change between Treated and Control in Pre Period

Using linear regression to estimate a Difference in Difference

Recall that linear regression provides a…
- linear estimate of the conditional expectation function
In the canonincal pre-post, treated and control DiD, $β_{3}$ from the following linear regression will give us the ATT:

$y = β_{0} + β_{1} P o s t + β_{2} T r e a t e d + \underset{τ_{A T T}}{\underset{⏟}{β_{3} P o s t \times T r e a t e d}}$

cholera_df <- tibble(
  Period = factor(c("Pre","Pre","Post","Post"),
                  levels = c("Pre","Post")),
  Year = c(1849,1849, 1854,1854),
  Treated = factor(c("Control","Treated","Control","Treated")),
  Company = c("S&V","Lambeth","S&V","Lambeth"),
  Deaths = c(135,85,147,19)
)

m_did <- lm(Deaths~Period + Treated + Period:Treated, cholera_df)

m_did


Call:
lm(formula = Deaths ~ Period + Treated + Period:Treated, data = cholera_df)

Coefficients:
              (Intercept)                 PeriodPost  
                      135                         12  
           TreatedTreated  PeriodPost:TreatedTreated  
                      -50                        -78

Statistical models
	Model 1
(Intercept)	135.00
Post (1854)	12.00
Treated (Lambeth)	-50.00
Post X Treated (DID)	-78.00
R²	1.00
Adj. R²
Num. obs.	4
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Generalizing Diff-in-Diff with Linear Regression

Linear regression allows us to generalizes Diff-in-Diff to multiple periods and treatment interventions, with fixed effects

$y_{i t} = \overset{Unit FE}{\overset{⏞}{α_{i}}} + \underset{Period FE}{\underset{⏟}{γ_{t}}} + \overset{Treatment}{\overset{⏞}{τ * d_{i t}}} + \underset{Covariates}{\underset{⏟}{X β}} + ϵ_{i t}$

Unit fixed effects $(α_{i})$ control for time-invariant differences across units
Period fixed effects $(γ_{i})$ control for unit-invariant differences across periods
$τ$ corresponds the Difference-in-Difference estimate for a two-way fixed effects regression

POLS 1600 Interpreting and Evaluating Linear Models Updated Apr 22, 2025

POLS 1600
Overview
Class Plan
Goals
Annoucements
Setup: Packages for today
Feedback
What did we like
What did we dislike
What we’re good at
What we’re working on
How are we doing?
Don’t trust the polls
Slide 14
What should we do going forward?
What does it mean to “control for X”
Regression models partition variance
Coefficients describe...
Why do coefficients change when we control for variables?
Residualized Regression
What’s a residual
Residuals are uncorrelated with $X$ and $\hat{y}$
Residualized Regression...
Why did the coefficient on Rep Vote Share change in m3 but not m2?
Slide 25
Statistical models...
\[ \text{Covid-19...
Statistical models...
\[ \text{Deaths}...
Statistical models...
\[ \text{Deaths}...
Using regression to make predictions
Using regression to produce predicted values
Producing Predicted Values in R
Are there decreasing...
Evaluating Model Fit
Evaluating Model Fit
R^2
Calculating $R^{2}$ in R
Adjusted $R^{2}$
Using $R^{2}$ to...
Standardized vs Non...
Difference-in-Differences
Motivating Example: What causes Cholera?
Notation
Causal Effects
Average Treatment on Treated
Average Treatment on Treated
Data
How can we estimate the effect of moving pumps upstream?
Before vs after comparisons:
Treatment-Control comparisons in the Post Period.
Difference in Differences
Difference in Differences
Difference in Differences
Identifying Assumption of a Difference in Differences Design
Parallel Trends
Using linear regression to estimate a Difference in Difference
Summary
Generalizing Diff-in-Diff with Linear Regression
Extensions and limitations
Applications
Previewing Lab 7
Replicating Grumbach and Hill (2022)
Slide 65
General Structure of Labs 7-8
Reading Grumbach and Hill (2022)
Q1: Download the replication files
1. Go to the paper’s dataverse
2. Log in through Brown
3. Select all of the files
4. Download the files in their original format
5. Save and unzip...
Q3: Load the data into R
Summary
Summary
References

	DV: Death
	Baseline
(Intercept)	0.57^***
	(0.05)
rep_voteshare_std	0.23^***
	(0.05)
R²	0.31
Adj. R²	0.29
Num. obs.	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05

	DV: Death			DV: Vote Share	DV: Res. Deaths
	Baseline	Mutliple	Age	Vote Share	Deaths
(Intercept)	0.57^***	0.57^***	0.57^***	-0.00	-0.00
	(0.05)	(0.05)	(0.06)	(0.14)	(0.05)
rep_voteshare_std	0.23^***	0.23^***
	(0.05)	(0.05)
med_age_std		0.03	0.00	-0.12
		(0.05)	(0.06)	(0.14)
res_repvs_no_age					0.23^***
					(0.05)
R²	0.31	0.31	0.00	0.02	0.31
Adj. R²	0.29	0.28	-0.02	-0.00	0.30
Num. obs.	51	51	51	51	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05

	DV: Death			DV: Vote Share	DV: Res. Death
	Baseline	Full	No Rep	Vote Share	Deaths
(Intercept)	0.57^***	0.57^***	0.57^***	-0.00	0.00
	(0.05)	(0.04)	(0.04)	(0.10)	(0.04)
rep_voteshare_std	0.23^***	0.07
	(0.05)	(0.07)
med_age_std		-0.02	-0.03	-0.22^*
		(0.05)	(0.05)	(0.10)
med_income_std		-0.22^**	-0.27^***	-0.74^***
		(0.07)	(0.05)	(0.10)
res_repvs_no_age_income					0.07
					(0.07)
R²	0.31	0.44	0.43	0.55	0.02
Adj. R²	0.29	0.40	0.40	0.53	0.00
Num. obs.	51	51	51	51	51
^*p < 0.001; ^p < 0.01; ^*p < 0.05