set.seed(11042024)
<- sample(1:6,size = 1)
graded_question paste("Question",graded_question,"is the graded question for this week")
[1] "Question 6 is the graded question for this week"
Your Group Members Names Here
April 10, 2024
In this lab, we’ll continue exploring data from the National Election Studies 2024 Pilot Study.
From last week’s lab you should have identified some variables of interest in the data, recoded them as needed, and begun to formulate some research questions.
In this week’s lab, we’ll pick up where we left off:
Set up your work space (5 minutes)
Load the data from last week (5 minutes)
Quickly summarize the data (15)
Revisit and revise your research question and expectations (15 minutes)
Estimate models to explore your question (10 minutes)
Interpret the results of your model (30 minutes)
One of these 6 tasks will be randomly selected as the graded question for the lab.
set.seed(11042024)
graded_question <- sample(1:6,size = 1)
paste("Question",graded_question,"is the graded question for this week")
[1] "Question 6 is the graded question for this week"
This week’s lab will give you practice:
Loading libraries needed to do your analysis (Q1)
Loading data you’ve already recoded (Q2)
Summarizing data (Q3)
Translating research questions into empirical expectations
Estimating linear models (Q5)
Interpreting the substantive and statistical significance of these models (Q6)
As with every lab, you should:
author:
section of the YAML header to include the names of your group members in attendance.Please load the libraries you will use in your analysis.
In R studio set your working directory to the folder where this lab is saved by clicking > Session > Set Working Directory > To Source File Location
After doing so uncomment getwd()
Should print out something like
“~/Desktop/pols1600/labs/”
Depending on where your lab is saved
If getwd()
says something like ‘~/Downloads/’ click: “File > Save As” and save this lab in your course folder. Then close the version 09-lab.qmd
that was opened from your Downloads folder and open the version of 09-lab.qmd
that now exists in your course folder.
Please uncomment and if needed, update, the code below to load the data from last week
Use the code from last weeks lab to produce a table of summary statistics.
Please update the code to include
Max
and N Missing
columns# Vector of numeric variables to summarize
the_vars <- c(
"dv_vote_trump2024",
"dv_participation",
"age","education","income",
"is_white","is_black","is_hispanic","is_asian","is_other",
"partyid","is_dem","is_rep","is_ind"
)
# Vector of nicely formatted labels for variables
the_labels <- c(
"Vote for Trump in '24",
"Acts of Participation in `20",
"Age","Education", "Income",
"White", "Black","Hispanic","Asian","Other",
"Party ID", "Democrat","Republican","Independent"
)
df_summary <- df %>%
select(all_of(the_vars)) %>%
rename_with(~the_labels) %>%
pivot_longer(
cols = everything(),
names_to = "Variable"
) %>%
mutate(
Variable = factor(Variable, levels = the_labels)
) %>%
group_by(Variable) %>%
summarise(
Min = min(value,na.rm = T),
p25 = quantile(value, prob = .25,na.rm = T),
Median = quantile(value, prob = .5,na.rm = T),
Mean = mean(value, na.rm = T),
p75 = quantile(value, prob = .75,na.rm = T),
Max = max(value,na.rm = T),
SD = sd(value, na.rm=T),
`N missing` = sum(is.na(value))
)
kable(df_summary,
digits = 2) %>%
kable_styling() %>%
pack_rows("Outcomes", start_row = 1, end_row = 2) %>%
pack_rows("Demographic Predictors", 3, 10) %>%
pack_rows("Political Predictors", 11, 14)
Variable | Min | p25 | Median | Mean | p75 | Max | SD | N missing |
---|---|---|---|---|---|---|---|---|
Outcomes | ||||||||
Vote for Trump in '24 | 0 | 0 | 1 | 0.50 | 1.00 | 1 | 0.50 | 174 |
Acts of Participation in `20 | 0 | 0 | 0 | 0.93 | 1.00 | 5 | 1.31 | 0 |
Demographic Predictors | ||||||||
Age | 18 | 33 | 51 | 49.37 | 63.75 | 94 | 17.74 | 31 |
Education | 1 | 2 | 3 | 3.51 | 5.00 | 6 | 1.51 | 0 |
Income | 1 | 3 | 6 | 6.43 | 9.00 | 16 | 3.55 | 222 |
White | 0 | 0 | 1 | 0.67 | 1.00 | 1 | 0.47 | 0 |
Black | 0 | 0 | 0 | 0.13 | 0.00 | 1 | 0.33 | 0 |
Hispanic | 0 | 0 | 0 | 0.13 | 0.00 | 1 | 0.33 | 0 |
Asian | 0 | 0 | 0 | 0.02 | 0.00 | 1 | 0.15 | 0 |
Other | 0 | 0 | 0 | 0.01 | 0.00 | 1 | 0.09 | 0 |
Political Predictors | ||||||||
Party ID | 1 | 2 | 4 | 3.88 | 6.00 | 7 | 2.15 | 58 |
Democrat | 0 | 0 | 0 | 0.43 | 1.00 | 1 | 0.49 | 58 |
Republican | 0 | 0 | 0 | 0.38 | 1.00 | 1 | 0.48 | 58 |
Independent | 0 | 0 | 0 | 0.20 | 0.00 | 1 | 0.40 | 58 |
Please use these tables to describe a typical respondent to the 2024 NES Pilot Study:
The National Election Study’s 2024 Pilot Study contains responses from 1909 individuals. The typical respondent in the data was just under 50 years old, with some college, with an income in the range of $50k-$59k. Approximately two-thirds of the sample identified as white, with 13 percent of respondents identifying as Black, 13 percent as Hispanic, 2 percent as Asian. Forty-three percent of respondents identified as Democrats, 38 percent as Republicans, and 20 percent as Independents. The respondents were evenly split in who they would vote for 2024, with 50 percent saying they would Vote for Donald Trump, and 50 percent saying Joe Biden. In the 2020 campaign, the average respondent reported engaging in about 1 act of political participation.
Please take a moment, revisit and revise your research questions developed in last weeks lab that contains the following:
ME: How does support for Trump in the 2024 election vary with age and race?
YOU:
ME: On average, I expect that older voters will be more supportive of Trump, but suspect that this trend varies by race. I expect it will be particularly true among White voters, but less so among people of color. Since, partisanship is likely to be strong predictor of vote choice, I will explore whether these specific relationships hold, once we control for variations in partisan identification which we know varies both by age and race.
YOU:
ME:
I will estimate the following models:
\[
y = \beta_0 + \beta_1 age + \sum\beta_k race_k + \epsilon
\] If my expectations hold, I expect the coefficient on age
in this model to be positive, indicating older voters are more likely to vote for Trump. White respondents are the reference category in this model1 and so the coefficients on the racial indicators (\(\beta_k race\)) correspond to how members of each racial group differ from white respondents in their propensity to vote for Trump. I expect all of these coefficients to be negative.
To explore whether the relationship between age and vote choice varies by race, I will fit an interaction model:
\[ y = \beta_0 + \beta_1 age + \sum\beta_k race_k + \epsilon + \sum\beta_{jk} age \times race_k + \epsilon \] This model allows the relationship between age and vote choice to vary by race. For white respondents, the relationship is described by \(\beta_1\). Again I expect it to be positive, suggesting older white voters are more likely to vote for trump. For racial minorities, the marginal effect of age for the racial or ethnic group \(k\) is described by \(\beta_1 + \beta_{jk}\). In general, I expect that the coefficients on \(\beta_{jk}\) to be negative such that older racial minorities are less likely to support Trump than older white respondents.
Finally, I will estimate a model that controls for partisanship. If the relationships between age, and race and vote choice are simply a reflection of differences in partisan identification, then coefficients on these predictors should no longer be statistically significant, while the coefficient on partisanship should positive and statistically significant.
\[ y = \beta_0 + \beta_1 pid + \beta_2 age + \sum\beta_k race_k + \epsilon + \sum\beta_{jk} age \times race_k \epsilon \]
You:
We will estimate the following models:
\[ y = \beta_0 + \beta_1 ... \]
\[ y = \beta_0 + \beta_1 ... \]
Below I estimate the three models described above
Please produce
Interpret your results using both confidence intervals and hypothesis tests to assess the statistical significance of your claims and predicted values to help interpret the substantive significance of your results
htmlreg(list(m1, m2, m3),
custom.model.names = c(
"Baseline", "Interaction", "Alternative"
),
caption = "Support for Trump in 2024",
caption.above = T,
custom.coef.names = c(
"(Intercept)",
"Age",
"Black",
"Hispanic",
"Asian",
"Other",
"Age:Black",
"Age:Hispanic",
"Age:Asian",
"Age:Other",
"Party ID"
) ,
include.ci = F,
digits = 3)
Baseline | Interaction | Alternative | |
---|---|---|---|
(Intercept) | 0.449*** | 0.361*** | -0.160*** |
(0.038) | (0.045) | (0.032) | |
Age | 0.002** | 0.004*** | -0.000 |
(0.001) | (0.001) | (0.001) | |
Black | -0.280*** | 0.213* | -0.038 |
(0.034) | (0.100) | (0.083) | |
Hispanic | -0.089* | 0.045 | -0.052 |
(0.038) | (0.101) | (0.073) | |
Asian | -0.238** | 0.007 | -0.233 |
(0.076) | (0.236) | (0.136) | |
Other | -0.031 | -0.035 | 0.125 |
(0.052) | (0.152) | (0.108) | |
Age:Black | -0.011*** | -0.000 | |
(0.002) | (0.002) | ||
Age:Hispanic | -0.003 | 0.001 | |
(0.002) | (0.001) | ||
Age:Asian | -0.005 | 0.003 | |
(0.005) | (0.003) | ||
Age:Other | 0.000 | -0.002 | |
(0.003) | (0.002) | ||
Party ID | 0.173*** | ||
(0.003) | |||
R2 | 0.047 | 0.061 | 0.569 |
Adj. R2 | 0.044 | 0.056 | 0.566 |
Num. obs. | 1735 | 1735 | 1691 |
RMSE | 0.489 | 0.486 | 0.329 |
***p < 0.001; **p < 0.01; *p < 0.05 |
# Predictors for m2
pred_dfm2 <- expand_grid(
age = seq(min(df$age, na.rm =T), max(df$age, na.rm = T)),
race_5cat = sort(unique(df$race_5cat))
)
# Predictions with confidence intervals
pred_dfm2 <- cbind(
pred_dfm2,
predict(m2, newdata = pred_dfm2, interval = "confidence")$fit
)
# Plot predictions from m2
fig_m2 <- pred_dfm2 %>%
ggplot(aes(age, fit, col = race_5cat))+
geom_ribbon(aes(ymin = lwr, ymax = upr,
fill = race_5cat
),
alpha = .5)+
geom_line()+
facet_wrap( ~ race_5cat)+
theme_minimal()+
guides(
col = "none",
fill = "none"
)+
labs(
y = "Predicted Vote Choice",
title = "Support for Trump by Age and Race"
)
# Display figure
fig_m2
# Predictors for m3
pred_dfm3 <- expand_grid(
age = seq(min(df$age, na.rm =T), max(df$age, na.rm = T)),
race_5cat = sort(unique(df$race_5cat)),
partyid = mean(df$partyid, na.rm=T)
)
# Predictions with confidence intervals
pred_dfm3 <- cbind(
pred_dfm3,
predict(m3, newdata = pred_dfm3, interval = "confidence")$fit
)
# Plot predictions from m2
fig_m3 <- pred_dfm3 %>%
ggplot(aes(age, fit, col = race_5cat))+
geom_ribbon(aes(ymin = lwr, ymax = upr,
fill = race_5cat
),
alpha = .5)+
geom_line()+
facet_wrap( ~ race_5cat)+
theme_minimal()+
guides(
col = "none",
fill = "none"
)+
labs(
y = "Predicted Vote Choice",
title = "Support for Trump by Age and Race Controlling for Partisanship"
)
fig_m3
ME: Our baseline model confirms our initial expectations. Controlling for race, older respondents have higher predicted levels of support for Trump by 0.002 percentage points. Controlling for race, the model predicts that a 60 year old respondent is about 6.3 percentage points more likely to vote for Trump than a 30 year-old respondent. The test statistic for this coefficient is 3.10 corresponding to a p-value < 0.05, suggesting that if there were no relationship between age and vote choice in this model, it would be very unlikely that we observed a test statistic of this magnitude. Similarly the confidence interval for this estimate has suggests that coefficients as small as 0.0008 and as large as 0.0034 are plausible values for the relationship between age and vote choice in these data. Similarly, controlling for age, Black, Hispanic, and Asian respondents report significantly lower levels of support for Trump than white respondents (p < 0.05).
Turning to the interaction model, we see that the magnitude of the coefficient on age (which corresponds to the marginal effect of age for white respondents) increases, while the coefficients on the interactions between age and racial indicators are generally negative, suggesting that the relationship between age and support for Trump is less strong for these racial and ethnic groups. Figure 1 helps clarify these marginal effects, as we see that slope for age is clearly positive for white respondents, clearly negative for Black respondents. The confidence intervals for the predicted values of the other racial and ethnic groups are generally wide, and consisent with positive, negative, or no relationship between age and vote choice.
Finally, looking at the model controlling for partisanship, we see that none of the relationships between age, race, and vote choice, remain statistically significant once we account for the relationships between partisanship and vote choice, and the relationships between partisanship and age and race. In sum, apparent demographic differences in support for Trump appear driven by the differences in partisan identification across racial groups and within age groups.
YOU:
Please take a few moments to complete the class survey for this week.){target=“_blank”} for this week.
Because white
is first level the factor variable race
↩︎