Comments for Lab 10 - Exploring Questions from the 2024 National Elections Pilot Study

Author

Your Group Members Names Here

Published

April 17, 2025

Overview

In this lab, we’ll continue exploring data from the National Election Studies 2024 Pilot Study.

From last week’s lab you should have identified some variables of interest in the data, recoded them as needed, and begun to formulate some research questions.

In this week’s lab, we’ll pick up where we left off:

Set up your work space (5 minutes)
Load the data from last week (5 minutes)
Quickly summarize the data (15)
Revisit and revise your research question and expectations (15 minutes)
Estimate models to explore your question (10 minutes)
Interpret the results of your model (30 minutes)

One of these 6 tasks will be randomly selected as the graded question for the lab.

set.seed(11042024)
graded_question <- sample(1:6,size = 1)
paste("Question",graded_question,"is the graded question for this week")

[1] "Question 6 is the graded question for this week"

Show entries

Search:

	Group	1	2	3	4
1	Group 1	Gurpartap Singh	Niyah Whitten	Marisa Coghlin	Aidan Choi
2	Group 2	Sylvie Watts	Amani Diallo	Sinclair Harris	Peter Zucker
3	Group 3	Joshua Okwaning	Eiffel Sunga	Jenna Hercher	Ella Hendelman
4	Group 4	Jacqueline Zhang	Charlie Jeffers	Meleah Neely	Nene Mokonchu
5	Group 5	Talib Reddick	Zoe Kaufman	Jenna Lowry	Harry Laferriere
6	Group 6	Tess Naquet-Radiguet	Anisha Kumar	Nash Riebe	Sophia Wotman
7	Group 7	Ben Buka	Ianthe Ince	Riya Srinivasan	Kaelah Kimura

Showing 1 to 7 of 7 entries

Previous1Next

Goals

This week’s lab will give you practice:

Loading libraries needed to do your analysis (Q1)
Loading data you’ve already recoded (Q2)
Summarizing data (Q3)
Translating research questions into empirical expectations
Estimating linear models (Q5)
Interpreting the substantive and statistical significance of these models (Q6)

Workflow

Please render this .qmd file

As with every lab, you should:

Download the file
Save it in your course folder
Update the author: section of the YAML header to include the names of your group members in attendance.
Render the document
Open the html file in your browser (Easier to read)
Write your code in the chunks provided under each section
Comment out or delete any test code you do not need
Render the document again after completing a section or chunk (Error checking)
Upload the final lab to Canvas.

1 Set up your workspace

Please load the libraries you will use in your analysis.

1.1 Load Packages

# Libraries
library(tidyverse)
library(haven)
library(labelled)
library(kableExtra)
library(texreg)
library(ggpubr)
library(patchwork)
library(DeclareDesign)

1.2 Set your working directory in R Studio

In R studio set your working directory to the folder where this lab is saved by clicking > Session > Set Working Directory > To Source File Location

After doing so uncomment getwd() Should print out something like

“~/Desktop/pols1600/labs/”

Depending on where your lab is saved

# In the Top Panel of RStudio Click
# Session > Set Working Directory > Source File Location

# Uncomment to Check Where Your File is Saved 
# getwd()

Note

If getwd() says something like ‘~/Downloads/’ click: “File > Save As” and save this lab in your course folder. Then close the version 09-lab.qmd that was opened from your Downloads folder and open the version of 09-lab.qmd that now exists in your course folder.

2 Load the data from last week.

Please uncomment and if needed, update, the code below to load the data from last week

# load("nes24.rda")

Note

You’ll need to have saved the data from last week’s lab to the same folder where this lab has been saved. If you didn’t save the lab data from last week, you can load a version of the recoded data here:

# load(url("https://pols1600.paultesta.org/files/data/nes24.rda"))

3 Summarize the data

Use the code from last weeks lab to produce a table of summary statistics.

Please update the code to include

Your specific outcome and predictor variables
a column for the standard deviation of each variable between the Max and N Missing columns

3.1 Caclulate summary statistics for recoded variables

# Vector of numeric variables to summarize
the_vars <- c(
  "dv_vote_trump2024", 
  "dv_participation",
  "age","education","income",
  "is_white","is_black","is_hispanic","is_asian","is_other",
  "partyid","is_dem","is_rep","is_ind"
)

# Vector of nicely formatted labels for variables
the_labels <- c(
  "Vote for Trump in '24",
  "Acts of Participation in `20",
  "Age","Education", "Income",
  "White", "Black","Hispanic","Asian","Other",
  "Party ID", "Democrat","Republican","Independent"
)

df_summary <- df %>% 
  select(all_of(the_vars)) %>%
  rename_with(~the_labels) %>% 
  pivot_longer(
    cols = everything(),
    names_to = "Variable"
  ) %>% 
  mutate(
    Variable = factor(Variable, levels = the_labels)
  ) %>% 
  group_by(Variable) %>% 
  summarise(
    Min = min(value,na.rm = T),
    p25 = quantile(value, prob = .25,na.rm = T),
    Median = quantile(value, prob = .5,na.rm = T),
    Mean = mean(value, na.rm = T),
    p75 = quantile(value, prob = .75,na.rm = T),
    Max = max(value,na.rm = T),
    SD = sd(value, na.rm=T),
    `N missing` = sum(is.na(value))
  )

3.2 Display the results as a formated table

kable(df_summary,
      digits = 2) %>% 
  kable_styling() %>% 
  pack_rows("Outcomes", start_row = 1, end_row = 2) %>% 
  pack_rows("Demographic Predictors", 3, 10) %>% 
  pack_rows("Political Predictors", 11, 14)

Variable	Min	p25	Median	Mean	p75	Max	SD	N missing
Outcomes
Vote for Trump in '24	0	0	1	0.50	1.00	1	0.50	174
Acts of Participation in `20	0	0	0	0.93	1.00	5	1.31	0
Demographic Predictors
Age	18	33	51	49.37	63.75	94	17.74	31
Education	1	2	3	3.51	5.00	6	1.51	0
Income	1	3	6	6.43	9.00	16	3.55	222
White	0	0	1	0.67	1.00	1	0.47	0
Black	0	0	0	0.13	0.00	1	0.33	0
Hispanic	0	0	0	0.13	0.00	1	0.33	0
Asian	0	0	0	0.02	0.00	1	0.15	0
Other	0	0	0	0.01	0.00	1	0.09	0
Political Predictors
Party ID	1	2	4	3.88	6.00	7	2.15	58
Democrat	0	0	0	0.43	1.00	1	0.49	58
Republican	0	0	0	0.38	1.00	1	0.48	58
Independent	0	0	0	0.20	0.00	1	0.40	58

3.3 Interpret the results

Please use these tables to describe a typical respondent to the 2024 NES Pilot Study:

The National Election Study’s 2024 Pilot Study contains responses from 1909 individuals. The typical respondent in the data was just under 50 years old, with some college, with an income in the range of $50k-$59k. Approximately two-thirds of the sample identified as white, with 13 percent of respondents identifying as Black, 13 percent as Hispanic, 2 percent as Asian. Forty-three percent of respondents identified as Democrats, 38 percent as Republicans, and 20 percent as Independents. The respondents were evenly split in who they would vote for 2024, with 50 percent saying they would Vote for Donald Trump, and 50 percent saying Joe Biden. In the 2020 campaign, the average respondent reported engaging in about 1 act of political participation.

4 Formulate a research question and expectations

Please take a moment, revisit and revise your research questions developed in last weeks lab that contains the following:

A clear research question
A set of expectations and potential alternative explanations
The formula for at least two linear models to test these expectations

4.1 Research question

ME: How does support for Trump in the 2024 election vary with age and race?

YOU:

4.2 Expectations

ME: On average, I expect that older voters will be more supportive of Trump, but suspect that this trend varies by race. I expect it will be particularly true among White voters, but less so among people of color. Since, partisanship is likely to be strong predictor of vote choice, I will explore whether these specific relationships hold, once we control for variations in partisan identification which we know varies both by age and race.

YOU:

4.3 Linear Model

ME:

I will estimate the following models:

$y = β_{0} + β_{1} a g e + \sum β_{k} r a c e_{k} + ϵ$ If my expectations hold, I expect the coefficient on age in this model to be positive, indicating older voters are more likely to vote for Trump. White respondents are the reference category in this model¹ and so the coefficients on the racial indicators ( $β_{k} r a c e$ ) correspond to how members of each racial group differ from white respondents in their propensity to vote for Trump. I expect all of these coefficients to be negative.

To explore whether the relationship between age and vote choice varies by race, I will fit an interaction model:

$y = β_{0} + β_{1} a g e + \sum β_{k} r a c e_{k} + ϵ + \sum β_{j k} a g e \times r a c e_{k} + ϵ$ This model allows the relationship between age and vote choice to vary by race. For white respondents, the relationship is described by $β_{1}$ . Again I expect it to be positive, suggesting older white voters are more likely to vote for trump. For racial minorities, the marginal effect of age for the racial or ethnic group $k$ is described by $β_{1} + β_{j k}$ . In general, I expect that the coefficients on $β_{j k}$ to be negative such that older racial minorities are less likely to support Trump than older white respondents.

Finally, I will estimate a model that controls for partisanship. If the relationships between age, and race and vote choice are simply a reflection of differences in partisan identification, then coefficients on these predictors should no longer be statistically significant, while the coefficient on partisanship should positive and statistically significant.

$y = β_{0} + β_{1} p i d + β_{2} a g e + \sum β_{k} r a c e_{k} + ϵ + \sum β_{j k} a g e \times r a c e_{k} ϵ$

You:

We will estimate the following models:

$y = β_{0} + β_{1} . . .$

5 Estimate your models

Below I estimate the three models described above

# Fit models
m1 <- lm_robust(dv_vote_trump2024 ~ age + race_5cat, df)
m2 <- lm_robust(dv_vote_trump2024 ~ age*race_5cat, df)
m3 <- lm_robust(dv_vote_trump2024 ~ age*race_5cat + partyid, df)

6 Interpret your results

Please produce

A regression table presenting of your model
At least one figure showing predicted values from one model

Interpret your results using both confidence intervals and hypothesis tests to assess the statistical significance of your claims and predicted values to help interpret the substantive significance of your results

6.1 Regression Table

htmlreg(list(m1, m2, m3),
        custom.model.names = c(
          "Baseline", "Interaction", "Alternative"
        ),
        caption = "Support for Trump in 2024",
        caption.above = T,
        custom.coef.names = c(
          "(Intercept)",
          "Age",
          "Black",
          "Hispanic",
          "Asian",
          "Other",
          "Age:Black",
          "Age:Hispanic",
          "Age:Asian",
          "Age:Other",
          "Party ID"
        ) ,
        include.ci = F,
        digits = 3)

Support for Trump in 2024
	Baseline	Interaction	Alternative
(Intercept)	0.449^***	0.361^***	-0.160^***
	(0.038)	(0.045)	(0.032)
Age	0.002^**	0.004^***	-0.000
	(0.001)	(0.001)	(0.001)
Black	-0.280^***	0.213^*	-0.038
	(0.034)	(0.100)	(0.083)
Hispanic	-0.089^*	0.045	-0.052
	(0.038)	(0.101)	(0.073)
Asian	-0.238^**	0.007	-0.233
	(0.076)	(0.236)	(0.136)
Other	-0.031	-0.035	0.125
	(0.052)	(0.152)	(0.108)
Age:Black		-0.011^***	-0.000
		(0.002)	(0.002)
Age:Hispanic		-0.003	0.001
		(0.002)	(0.001)
Age:Asian		-0.005	0.003
		(0.005)	(0.003)
Age:Other		0.000	-0.002
		(0.003)	(0.002)
Party ID			0.173^***
			(0.003)
R²	0.047	0.061	0.569
Adj. R²	0.044	0.056	0.566
Num. obs.	1735	1735	1691
RMSE	0.489	0.486	0.329
^*p < 0.001; ^p < 0.01; ^*p < 0.05

6.2 Visualize predicted values

# Predictors for m2
pred_dfm2 <- expand_grid(
  age = seq(min(df$age, na.rm =T), max(df$age, na.rm = T)),
  race_5cat = sort(unique(df$race_5cat))
)

# Predictions with confidence intervals
pred_dfm2 <- cbind(
  pred_dfm2,
  predict(m2, newdata = pred_dfm2, interval = "confidence")$fit
)

# Plot predictions from m2

fig_m2 <- pred_dfm2 %>% 
  ggplot(aes(age, fit, col = race_5cat))+
  geom_ribbon(aes(ymin = lwr, ymax = upr, 
    fill = race_5cat
    ),
    alpha = .5)+
  geom_line()+
  facet_wrap( ~ race_5cat)+
  theme_minimal()+
  guides(
    col = "none",
    fill = "none"
  )+
  labs(
    y = "Predicted Vote Choice",
    title = "Support for Trump by Age and Race"
  )

# Display figure
fig_m2

# Predictors for m3
pred_dfm3 <- expand_grid(
  age = seq(min(df$age, na.rm =T), max(df$age, na.rm = T)),
  race_5cat = sort(unique(df$race_5cat)),
  partyid = mean(df$partyid, na.rm=T)
)

# Predictions with confidence intervals
pred_dfm3 <- cbind(
  pred_dfm3,
  predict(m3, newdata = pred_dfm3, interval = "confidence")$fit
)

# Plot predictions from m2

fig_m3 <- pred_dfm3 %>% 
  ggplot(aes(age, fit, col = race_5cat))+
  geom_ribbon(aes(ymin = lwr, ymax = upr, 
    fill = race_5cat
    ),
    alpha = .5)+
  geom_line()+
  facet_wrap( ~ race_5cat)+
  theme_minimal()+
  guides(
    col = "none",
    fill = "none"
  )+
  labs(
    y = "Predicted Vote Choice",
    title = "Support for Trump by Age and Race Controlling for Partisanship"
  )

fig_m3

6.3 Interperet results

ME: Our baseline model confirms our initial expectations. Controlling for race, older respondents have higher predicted levels of support for Trump by 0.002 percentage points. Controlling for race, the model predicts that a 60 year old respondent is about 6.3 percentage points more likely to vote for Trump than a 30 year-old respondent. The test statistic for this coefficient is 3.10 corresponding to a p-value < 0.05, suggesting that if there were no relationship between age and vote choice in this model, it would be very unlikely that we observed a test statistic of this magnitude. Similarly the confidence interval for this estimate has suggests that coefficients as small as 0.0008 and as large as 0.0034 are plausible values for the relationship between age and vote choice in these data. Similarly, controlling for age, Black, Hispanic, and Asian respondents report significantly lower levels of support for Trump than white respondents (p < 0.05).

Turning to the interaction model, we see that the magnitude of the coefficient on age (which corresponds to the marginal effect of age for white respondents) increases, while the coefficients on the interactions between age and racial indicators are generally negative, suggesting that the relationship between age and support for Trump is less strong for these racial and ethnic groups. Figure 1 helps clarify these marginal effects, as we see that slope for age is clearly positive for white respondents, clearly negative for Black respondents. The confidence intervals for the predicted values of the other racial and ethnic groups are generally wide, and consisent with positive, negative, or no relationship between age and vote choice.

Finally, looking at the model controlling for partisanship, we see that none of the relationships between age, race, and vote choice, remain statistically significant once we account for the relationships between partisanship and vote choice, and the relationships between partisanship and age and race. In sum, apparent demographic differences in support for Trump appear driven by the differences in partisan identification across racial groups and within age groups.

YOU:

Take the Class Survey

Please take a few moments to complete the class survey for this week.){target=“_blank”} for this week.

Footnotes

Because white is first level the factor variable race↩︎