POLS 1600

Overview

R, R Studio and Quarto
Getting set up to work in R
Basic Programming in R

R, R Studio and Quarto

R is an open source statistical programming language (cheatsheet)
R Studio is an integrated development environment (IDE) that makes working in R much easier (cheatsheet)
Quarto is a publishing system that allows us to write and present code in different formats (cheatsheet)

General Tuesday Workflow

Go to https://pols1600.paultesta.org
Go to class content for current week
Open slides in browser
Open R Studio
Create .qmd file titled wk01-notes.qmd and save in course folder
Get set up to work
Take notes and follow along

Let’s create a .qmd file

Three components of a .qmd

Control output with YAML header

  ---
  title: "Title here"
  author: "Your name"
  format:
    html:
      toc: true
  ---

Write code Blocks/Chunks

```{r}
#| echo: true
2+2
```

Describe code using Markdown
- See Help > Markdown quick reference

The Basics of R

R is an interpreter (>)
- Everything that exists in R is an object
- Everything that happens in R is the result of a function
Data come in different types, shapes, and sizes
Packages extend what R can do
- install.packages("pacakge_name") once to download a package
- load packages every session using library("package_name")

R is an interpreter (>)

Enter commands line-by-line in the console

The > means R is a ready for a command
The + means your last command isn’t complete
- If you get stuck with a + use your escape key!
Send code from .qmd file to the console:
- cntrl + Enter (PC) | cmd + Return (Mac) -> run current line
- cntrl + shift + Enter (PC) | cmd + shift + Return (Mac) -> run all code in current chunk

R is a Calculator

Operator	Description	Usage
+	addition	x + y
-	subtraction	x - y
*	multiplication	x * y
/	division	x / y
^	raised to the power of	x ^ y
abs	absolute value	abs(x)
%/%	integer division	x %/% y
%%	remainder after division	x %% y

R is logical

Operator	Description	Usage
&	and	x & y
\|	or	x \| y
xor	exactly x or y	xor(x, y)
!	not	!x

R is logical

x <- T; y <- F

x == T

[1] TRUE

x == T & y == T

[1] FALSE

x == T | y == T

[1] TRUE

!x

[1] FALSE

R can make comparisons

Operator	Description	Usage
<	less than	x < y
<=	less than or equal to	x <= y
>	greater than	x > y
>=	greater than or equal to	x >= y
==	exactly equal to	x == y
!=	not equal to	x != y
%in%	group membership*	x %in% y
is.na	is missing	is.na(x)
!is.na	is not missing	!is.na(x)

Everything that exists in R is an object

The number 5 is an object in R

[1] 5

We can assign the object 5, the name x, using the assignment operator <-

x <- 5 # Read this as "x gets 5"

Now if we tell R to show us x, we’ll get

[1] 5

print(x)

[1] 5

Data come in different types

# Create some data

# Numeric
x <- 2 # Double
y <- 6L # Integer

# Logical
only_two_types_of_people <- TRUE 

# Character
me <- "Paul"

# Factor
grades = factor(c("A","B","C"))

# What type are they?
class(x)

[1] "numeric"

class(y)

[1] "integer"

class(only_two_types_of_people)

[1] "logical"

class(me)

[1] "character"

class(grades)

[1] "factor"

Tip

One common portal of discovery, is that a function in R is expecting data of one type but (e.g. numeric) but actually gets data of different type (e.g. character)

The class() function is a useful base R for troubleshooting such errors.

Data come in different “shapes” and “sizes”

Source: Gaurav Tiwari

Name	“Size”	Type of Data	R code
scalar	1	numeric, character, factor, logical	`x <- 5`
vector	N elements: `length(x)`	all the same	`v <- c(1, 2, T, "false")`
matrix	N rows by columns K: `dim(x)`	all the same	`m <- matrix(y,2,2)`
array	N row by K column by J dimensions: `dim(x)`	all the same	`a <- array(m,c(2,2,3))`
data frames	N row by K column matrix	can be different	`d <-data.frame(x=x, y=y)`
tibbles	N row by K column matrix	can be different	`d <-tibble(x=x, y=y)`
lists	can vary	can be different	`l <-list(x,y,m,a,d)`

Everything that happens in R is the result of a function

You’ve already seen and used some R functions
- the <- is the assignement operator that assigns a value to a name
- c() is the combine function that combines elements together
- install.packages() installs packages
- library() loads packages you’ve installed so you can use functions and data that are part of that package

Three sources of functions

Three sources of functions:

base R
- <-; mean(x); library("package_name")
packages
- install.packages("packageName)"
- remotes::intall_github("user/repository")
You
- my_function <- function(x){x^2}

Tip

Can you spot the portal of discovery in the code above?

Functions are like recipes

They have:

names
ingredients (inputs)
steps that tell you what to do with the ingredients (statements/code)
tasty results from applying those steps to given ingredients (outputs)

(Source)

Can I kick it?

can_x_kick_it <- function(x){
  # Determine if x can kick it
  # If x in A Tribe Called Quest
  if(x %in% c("Q-Tip","Phife Dawg",
              "Ali Shaheed Muhammad", 
              "Jarobi White")){
    return("Yes you can")
  }else{
    return("Before this, did you really know what live was?")
  }

}
can_x_kick_it("Q-Tip")

[1] "Yes you can"

can_x_kick_it("Paul")

[1] "Before this, did you really know what live was?"

Getting setup to work in R

Each time you start a project in R, you will want to:

Set your working directory in R Studio
Load (and if needed, install) the R packages you will use
Set any “global” options you want
Load the data you’ll be using

Set your working directory

Load (and if needed, install) the R packages you will use

Install packages once¹ with install.packages("package_name")
Load packages every session with library("package_name")

Install packages for the lab

Let’s install the tidyverse and COVID19.

Create a new code chunk
Label it libraries
Copy and paste the following into your console

install.packages("tidyverse")
install.packages("COVID19")

Once you’ve installed these packages comment out the code (Why?)

# install.packages("tidyverse")
# install.packages("COVID19")

Keyboard Shortcuts to toggle # comments

macOS: CMD + SHIFT + C

PC: CTRL + SHIFT + C

Loading the `tidyverse` and `COVID19` packages

Type the following into your code chunk:

library("tidyverse")
library(COVID19)

Load the data you’ll be using

There are three ways to load data.

Load a pre-existing dataset
- data("dataset") will load the dataset named “dataset”
  - data() will list all datasets
Load a .Rdata/.rda file using load("dataset.rda")
Read data of a different format (.csv, .dta, .spss) into R using specific functions from packages like haven and readr

# EXAMPLE CODE (Won't actually run)
# Read a .csv file saved locally on my Desktop
df_csv <- readr::read_csv("~/Desktop/data.csv")
# Read a .dta file saved on the web
df_dta <- haven::read_dta("https://paultesta.org/data.dta")

Description	Usage
sum	sum(x)
minimum	min(x)
maximum	max(x)
range	range(x)
mean	mean(x)
median	median(x)
percentile	quantile(x)
variance	var(x)
standard deviation	sd(x)
rank	rank(x)

Face Mask Policy	Average No. of New Cases
No policy	10.26
Recommended	16.61
Some requirements	36.18
Required shared places	29.38
Required all times	32.18