A4: Project Drafts

You final will paper consists of seven sections. There is no minimum (or maximum) length for the final paper, but I’ve included rough estimates in parentheses of how long I think each section should be along with the relative contribution of each section to your final grade.

Introduction (5 percent, ~ 4 paragraphs)
Theory and Expectations (10 percent, ~4+ paragraphs)
Data (20 percent ~ 4+ paragraphs)
Design (25 percent ~ 5+ paragraphs)
Results (25 percent ~ 5+ paragraphs)
Conclusion (5 percent ~ 3+ paragraphs)
Appendix (10 percent ~ Variable codebook and all the R code for your project)

Specific guidance, expectations, and grading criteria for each section are provided below.

You need not have every section completely finished in this draft, but the more you have done the more feedback you will get. Remember, the data, design, and results section make up 70 percent of your grade, so while a good introduction, compelling theory, and thoughtful conclusion are important, you should budget your time accordingly.

Your actual grade for the draft will reflect a general assessment of the status of your project

100: On track for A
90: On track for A if provided specific tasks and changes are made
80: Lots of work to do, but if done, headed in the right direction
70: Major revisions needed. We should probably talk in office hours

Additionally, I will assign points to each section as if it were you final paper to give you a sense of where to focus your efforts. This is not your actual grade for this assignment, but can be though of as a rough estimate of the grade you would receive on a section if you were to submit it, as is, for your final paper. Your grade for that section on the final paper may increase or decrease depending on the changes you make to your draft. Additionally, I will assign points to each section as if it were you final paper to give you a sense of where to focus your efforts. This is not your actual grade for this assignment, but can be though of as a rough estimate of the grade you would receive on a section if you were to submit it, as is, for your final paper. Your grade for that section on the final paper may increase or decrease depending on the changes you make to your draft.

I have also posted a template for the final paper and an example of a final paper from a previous version of POLS 1600.

The template contains some sample code to that may be usefully adapted for your projects, but should not appear in your final paper (i.e. keep use the structure provided by the template, but delete the code I’ve included.)
The final paper assignment has changed over time. There are things this paper does, which I no longer ask you to do (e.g. explain the difference between simulation and asymptotic approaches to statistical inference). There are things I ask you to do (e.g. write a separate theory section that motivates your expectations) which this paper was not required to do.

Introduction

This section provides an introduction and overview of your project.

Plan to write at least four paragraphs that

Clearly articulates your group’s research question
Lays out the theoretical framework that motivates your inquiry
Describes your empirical strategy
Provides an outline of the rest of the paper and previews your results.

Grading Criteria (5 points)

5 points
- Articulates clear and compelling research question
- Offers a strong motivation for pursuing this question tied to existing social science theory
- Describes a convincing empirical strategy with appropriate data and research design
- Previews interesting results with substantive interpretation
- Makes reader excited to read the rest of the paper
- Writing throughout is clear, precise, and engaging, with correct spelling and grammar and consistent citations
4 points
- Articulates research question
- Link between research question and social science theory could be more developed
- Describes empirical strategy with data and research design
- Preview of results with little to no interpretation
- Leaves reader moderately interested in the rest of the paper.
- Writing throughout is clear and precise, with correct spelling and grammar and consistent citations
< 4 points
- Research question unclear or absent
- Little to no discussion of why this research question is of interest to social scientists and citizens
- Little to no discussion of the data and research design used to pursue this question.
- No discussion of empirical results
- Does little to pique reader’s interest in the rest of the paper.
- Writing throughout is muddled and vague, with uncorrekt spellung and grammur and consistent citations

Theory and Expectations

In this section you will provide the necessary context for your research question and develop your theoretical framework to derive testable empirical implications.

This section should begin by explaining the following:

What is the phenomena you are studying? (Research Question/Outcome of Interest)
Why should we care about this phenomena? (Context/Motivation/Importance)

Then it should turn to addressing:

What are the factors scholars think explain this phenomena? (Literature Review/Key predictors/Alternative explanations)

Which should lead to a discussion of

What are the empirical implications of these claims? (Testable hypotheses)

You have freedom in how you address these questions, but your final paper must cite at least three relevant academic sources (books and journal articles), using either footnotes or parenthetical citations.

Sometimes it is helpful to formalize your expectations into a set of numbered hypotheses. Other times this can feel a little to formalistic, and it is sufficient to simply describe the expected patterns of relationship (“We expect that as X increases, Y should …”) . It is up to you to decide what works best. If you take the route of listing and numbering your hypotheses, use this ordering to structure the presentation and discussion of your empirical models (e.g. “the coefficient on $β_{1}$ in model 1 provides a test of H1”)

Grading Criteria (10 points)

9-10 points:
- Careful discussion of relevant theoretical, empirical, and/or substantive contexts that motivates research question.
- Cites at least three relevant pieces of academic literature on research topic. Uses this literature to articulate theoretical framework for thinking about research question.
- Theoretical framework clearly defines:
- the outcome (or outcomes) of interest
- the factors expected to explain variation in that outcome
- some potential alternative explanations/omitted variables, that might explain variation in both the outcome and predictors
- Develops a clear set of expectations informed by theoretical framework for how these explanatory factors should relate to the outcome (and possibly each other)
8 points:
- Brief discussion of theoretical, empirical, and/or substantive contexts that motivates research question.
- Cites less than three relevant pieces of academic literature on research topic. Uses this literature to articulate theoretical framework for thinking about research question.
- Theoretical framework broadly defines:
- the outcome (or outcomes) of interest
- the factors expected to explain variation in that outcome
- some potential alternative explanations/omitted variables, that might explain variation in both the outcome and predictors
- Develops a set of expectations for how these explanatory factors should relate to the outcome (and possibly each other, but connection to larger theoretical framework is unclear
<8 points:
- Little to no discussion of theoretical, empirical, and/or substantive contexts that motivates research question.
- No citation of relevant pieces of academic literature on research topic. Broader theoretical framework absent.
- Fails to clearly define:
- the outcome (or outcomes) of interest
- the factors expected to explain variation in that outcome
- some potential alternative explanations/omitted variables, that might explain variation in both the outcome and predictors
- Fails to articulate a clear set of expectations for how these explanatory factors should relate to the outcome

Data

Describe the data you will use to explore your question. In doing so you should:

Identify the source or sources of your data
- If using multiple datasets, briefly describe how your final dataset was constructed.
Define the unit of analysis (substantively speaking, what is a row in your dataset) and number of observations in your data
Explain how your key concepts (outcome variable, key predictors, covariates/alternative explanations) are operationalized and measured
Describe the distributions of these data using a table and/or figure(s).
Interpret these distributions for your reader. What does a typical observation in the data look like. Are some data skewed. Are their big outliers

Your broader goals for this section are to convince your reader:

That these data are appropriate for your research question
Demonstrate your deep knowledge of these data through careful and thoughtful discussion of the measurement and distributions of your variables
Set up your empirical results section through initial descriptive explorations of the data.

Grading Criteria (20 points)

18-20 points:
- Data sources are clear and appropriate
- Unit of analysis is clear and appropriate
- Outcomes, key predictors, and covariates are clearly defined and carefully described
- Descriptive tables and figures are thoroughly discussed in text with correct interpretations
- Data appear to be clean and tidy
16-17 points:
- Data sources are clear and appropriate
- Unit of analysis is unclear
- Outcomes, key predictors, and covariates are briefly defined and described
- Descriptive tables and figures are briefly discussed in text with mostly correct interpretations
- Data appear to be clean and tidy
< 16 points:
- Data sources are not identified
- Unit of analysis is is not discussed or incorrect
- Outcomes, key predictors, and covariates are not defined and described
- Descriptive tables and figures absent, not discussed or interpreted incorrectly
- Clear errors in coding (e.g. failing to recode values as NA)

Design

Next, you will describe the empirical research design that you will use to test your expectations. You have two main tasks or goals for this section:

First you need to explain how you will to test your theoretical expectations in terms of the empirical estimates of your model (e.g. the regression coefficients of your linear models).

Write down the equations of the models you are estimating. Based on the theoretical framework you’ve developed, what should we expect in terms of the sign, size, and significance of the predictors in your model?

At a minimum, you will likely want to specify at least two models:

A baseline (bivariate) model

$O u t c o m e = β_{0} + β_{1} \times Key Predictor + ϵ$ And tell the reader what your theory implies the in terms of the sign, size, and statistical significance of the coefficient on your key predictor.

A multiple regression model controlling for alternative explanations (things that predict both your outcome and key predictor)

$O u t c o m e = β_{0} + β_{1} \times Key Predictor + X β + ϵ$ Where $X β$ is a matrix containing one or more additional predictors representing alternative explanations. Again tell the reader how to interpret this model in relation to your baseline. If your theory is true, what should happen to sign, size, and significance of the coefficient(s) on your key predictor, once you’ve controlled for your alternative explanations?

Most of you will likely estimate more than two models. Perhaps your theory implies the relationship between your key predictor and the outcome is non-linear, or the depends on the value of other moderating variable. Again, specify this model, and explain to the reader how we should interpret

Second, use this section to demonstrate your mastery of some of the key statistical concepts from the course. Specifically, you should do some of the following here:

What are the identifying assumptions we would have to make for your research design to have a causal interpretation ? (e.g. Selection on observables?) What might violate these assumptions? How can you address these violations?
If you’re using linear regression, just what exactly is linear regression and why are you using it?
How should we interpret the coefficients from a linear regression model?
What does it mean to “control” for a variable in a regression?
How can we quantify the uncertainty in our estimates? (What’s a confidence interval? A sampling distribution? A standard error? What’s a hypothesis test? A test statistic? A p-value?)

You will also get credit for demonstrating your understanding of these concepts through your interpretation of your results.

Grading Criteria (30 points)

23-25 points:
- Correctly specifies multiple models to test empirical expectations
- Expresses empirical expectations in terms of coefficients from linear models
- Explains motivation and interpretation of specified models
- Conveys a clear understanding of
  - Identifying assumptions of empirical design
  - What linear regression is
  - What it means to control for variables
  - How to interpret coefficients from regression
  - How to quantify uncertainty around regression estimates, using both confidence intervals and hypothesis tests
20-22 points:
- Correctly specifies two models to test empirical expectations
- Expresses empirical expectations in terms of coefficients from linear models
- Explains motivation and interpretation of specified models
- Conveys some understanding of
  - Identifying assumptions of empirical design
  - What linear regression is
  - What it means to control for variables
  - How to interpret coefficients from regression
  - How to quantify uncertainty around regression estimates, using both confidence intervals and hypothesis tests
< 20 points:
- Incorrectly specifies one models to test empirical expectations
- Does not express empirical expectations in terms of coefficients from linear models
- Little to no discussion of the motivation and interpretation of specified models
- Conveys little to no understanding of
  - Identifying assumptions of empirical design
  - What linear regression is
  - What it means to control for variables
  - How to interpret coefficients from regression
  - How to quantify uncertainty around regression estimates, using both confidence intervals and hypothesis tests

Results

Having described your data and mapped your theoretical expectations onto your empirical models, you will present your results.

This section should include the following:

A regression table (or tables) presenting the coefficients from estimated models
Interpretation of the the coefficients from these models in terms of your empirical expectations.
- Are the results consistent with your expectations?
- How and why do estimates vary from one model to the next?
- How confident should we be in a given estimate?
- Specifically, how should we interpret the confidence intervals and p-values associated with a given estimate?
A figure that helps illustrate one of your key findings:
- This will likely be a predicted values from one (or more) of your models, particularly if you are estimating a model with an interaction term, a polynomial term, or a generalized linear model.

In general, I would encourage you to start from the simplest, likely bivariate model. Use this as a chance to explain the mechanics of linear regression, the interpretation of linear regression as linear estimated of the Conditional Expectation Function, and demonstrate your mastery of the concepts of confidence intervals and hypothesis testing.

Then turn results from more complicated models. Again provide a substantive interpretation the coefficients and possible changes sign, size, and/or significance of coefficients from one model to the next. Use this discussion to further demonstrate your understanding of what it means to control for variables – both from a technical and substantive perspective.

Finally use a figure (or figures) to help illustrate the substantive interpretation of your results. (e.g. How much is your outcome predicted to change if some key predictor moved from one standard deviation below its mean to one standard deviation above)

Grading Criteria (25 points)

23-25 points:
- Empirical results are
  - clearly presented in tables and figures appropriate variable names and labels
  - correctly interpreted in terms of substantive and statistical significance
  - used to evaluate theoretical expectations
  - interpreted to demonstrate mastery of the concepts of
    - bivariate and multiple regression
    - confidence intervals
    - hypothesis testing
  - tell a coherent and compelling story that provides insights into research question
20-22 points:
- Empirical results are
  - roughly presented in tables and figures with awkward variable names and labels
  - interpreted mostly correctly in terms of substantive and statistical significance
  - used to evaluate theoretical expectations
  - interpreted to demonstrate general understanding of the concepts of
    - bivariate and multiple regression
    - confidence intervals
    - hypothesis testing
  - tell a coherent story that provides partial insights into research question
< 20 points:
- Empirical results are
  - poorly presented in tables and figures with awkward variable names and labels
  - interpreted mostly incorrectly in terms of substantive and statistical significance
  - are not used to evaluate theoretical expectations
  - interpreted to demonstrate little to no understanding of the concepts of
    - bivariate and multiple regression
    - confidence intervals
    - hypothesis testing
  - fail to tell a coherent story that provides little to no insights into research question

Conclusion

Finally, conclude by summarizing the results of your project.

What did you set out to learn?
What did you find?
What are some the strengths and weaknesses of your approach?
What should future research on this topic look like?
A list of correctly formatted references

Grading Criteria (25 points)

5 points
- Clear summary of the project’s findings
- Discussion of the strengths and weaknesses of the project
- Interesting suggestions for further research that seem likely to add additional insights and/or address limitations of the present study.
- Correctly formatted references/works cited
4 points
- Cursory summary of the project’s findings
- Discussion of the strengths and weaknesses of the project
- Some suggestions for further research informed by the present project
- Correctly formatted references/works cited
< 4 points
- Little to no summary of the project’s findings
- Little to no discussion of the strengths and weaknesses of the project
- No suggestions for further research informed by the present project
- No references/works cited

Appendix

Finally, at the end of your document, you should provide a:

Codebook
Code Appendix.

Your code book should describe be organized conceptually by variable type:

Outcome
Key Predictors
Covariates

For each variable, provide:

variable_name_used_in_code | Conceptual name
- Description of variable/Survey Question Wording
- Original name in raw data.
- Summary of any recoding, transformations (e.g. Collapsing categories, or reverse coding endpoints of scales)
- Summary of range of values (e.g. 1= Strong Democrat … 7 = Strong Republican)

After you codebook, at the very end of you document, you will include a single code chunk that contains all the R code, used in your analysis. At a minimum, this code chunk should contain code that:

Sets up the work space
Loads your data
Recodes and transforms the data as needed
Produces tables of summary statistics and figures
Estimate your models
Presents the results of your models in tables and figures.

In the accompanying POLS1600_paper_template.Rmd I’ve demonstrated how to write your code inline with each section (e.g. loading data in the Data Section, estimating models in the Results section), display only the tables and figures you want in the main text, and then display the full code at the end of the document.

The keys to a nicely formatted document are:

Setting global options for code chunks in the for knitr
Labeling individual code chunks with meaningful names
Setting results: asis for individual chunks you want to display in text.
Setting echo: true, eval: false in the final appendix code chunk.

Grading Criteria (25 points)

10 points
- Codebook is
  - easy to read and informative
  - describes all the variables used in the analysis.
- Code appendix is
  - nicely formatted
  - well-commented
  - able to to reproduces all the analysis in text without errors
- No R code, raw output, error messages appears in the main text of the paper.
8-9 points
- Codebook is
  - describes all of the variables used in the analysis.
- Code appendix:
  - Could be more cleanly formatted and needs more comments
  - able to reproduces all the analysis in text without errors or only minor errors
- No R code, raw output, error messages appears in the main text of the paper.
< 8 points
- Codebook is absent, incomplete, and/or incorrect
- Code appendix is:
  - Poorly formatted
  - Contains no explanatory comments
  - unable to reproduce most of the analysis in the text.
  - Contains clear errors
- R code, raw R output, error messages appear throughout the main text of the paper.