+ - 0:00:00
Notes for current slide
Notes for next slide

Getting Started in R

an introduction to data analysis and visualisation

Basics

Réka Solymosi & Sam Langton

2 July 2019

1 / 23

R

2 / 23

What is R?

3 / 23

What is R?

  • R is a programming language for exploring, visualising and analysing data.
3 / 23

What is R?

  • R is a programming language for exploring, visualising and analysing data.

  • Created by Ross Ihaka and Robert Gentleman (University of Auckland)

3 / 23

What is R?

  • R is a programming language for exploring, visualising and analysing data.

  • Created by Ross Ihaka and Robert Gentleman (University of Auckland)

  • Released in 1995

3 / 23

What is R?

  • R is a programming language for exploring, visualising and analysing data.

  • Created by Ross Ihaka and Robert Gentleman (University of Auckland)

  • Released in 1995

  • Implements the S programming language created at Bell Labs

3 / 23

What is R?

  • R is a programming language for exploring, visualising and analysing data.

  • Created by Ross Ihaka and Robert Gentleman (University of Auckland)

  • Released in 1995

  • Implements the S programming language created at Bell Labs

  • Companies like Google, Facebook and the Financial Times use it. The BBC have also recently made a big push to use it for their visualisations.

3 / 23

Why use R?

4 / 23

Why use R?

  • Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more
4 / 23

Why use R?

  • Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more

  • Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight

4 / 23

Why use R?

  • Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more

  • Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight

  • Open source: free and customisable

4 / 23

Why use R?

  • Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more

  • Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight

  • Open source: free and customisable

  • Reproducibility: code can be shared and the results repeated

4 / 23

Why use R?

  • Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more

  • Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight

  • Open source: free and customisable

  • Reproducibility: code can be shared and the results repeated

  • Transparency: explicitly documents all the steps of your analyses

4 / 23

Why use R?

  • Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more

  • Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight

  • Open source: free and customisable

  • Reproducibility: code can be shared and the results repeated

  • Transparency: explicitly documents all the steps of your analyses

  • Automation: analyses can be run and re-run with new and existing datasets

4 / 23

Why use R?

  • Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more

  • Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight

  • Open source: free and customisable

  • Reproducibility: code can be shared and the results repeated

  • Transparency: explicitly documents all the steps of your analyses

  • Automation: analyses can be run and re-run with new and existing datasets

  • Support network: worldwide community of developers and users

4 / 23

Why wouldn't you use it?

  • Anxiety about 'hard' skills
  • "I am just not good at that
    kind of thing"
  • "I have to be good at maths"
  • Experts putting you off
5 / 23

Are there any disadvantages?

6 / 23

Are there any disadvantages?

Learning R can be a steep learning curve and the transition from a graphical user interface like Excel or SPSS to one that is command driven can be unsettling.

6 / 23

Are there any disadvantages?

Learning R can be a steep learning curve and the transition from a graphical user interface like Excel or SPSS to one that is command driven can be unsettling.


However, you’ll soon find that working with a command line is much more efficient than pointing and clicking. After all, you can replicate, automate and share your R scripts.
6 / 23

Example outputs using R

7 / 23

Charts

library(ggplot2)
p <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() +
geom_smooth(method = "loess", se = FALSE) +
labs(x = "Engine size", y = "Miles per gallon",
title = "Fuel efficiency decreases with engine size") +
theme_minimal()
p

8 / 23

Tables

library(DT)
DT::datatable(
head(USArrests, 10),
fillContainer = FALSE, options = list(pageLength = 5)
)
9 / 23

Maps

library(leaflet)
leaflet() %>% addTiles() %>%
addMarkers(-2.236518, 53.466478,
popup = "University of Manchester")
10 / 23

R and RStudio

11 / 23

R and RStudio

It is possible to just use R but there are several reasons why you will prefer to code within the RStudio environment:

  • syntax highlighting
  • code completion
  • rmarkdown integration
  • four-pane workspace for managing R windows
12 / 23

RStudio's panes

13 / 23

Setting up

14 / 23

The working directory

The working directory is where everything created in your R session is stored. The function getwd() returns the filepath to the current working directory.

getwd()

To change the working directory you have two options:

  1. Use the dropdown menus: Session -> Set Working Directory -> Choose Directory
  2. Use the command setwd() in the console

For example, to change the working directory on a computer running macOS you might enter:

setwd("/Users/your_name/Documents/GSinR")

On a computer running Windows it would look more like:

setwd("C:/Users/your_name/Documents/GSinR")
15 / 23

Installing and loading packages

Packages are collections of R functions and data. There are over 10,000 user-contributed packages available to install from CRAN.

Just type install.packages() in the console with the name of the package in inverted commas:

install.packages("tidyverse")

Once installed you can load the package using the library() function like this:

library(tidyverse)
16 / 23

Finding help

You can find help within R using the ? function. For example, if you want to learn more about a function just place a ? before the function name.

For example, if you wanted to read up on the getwd() function you'd enter:

?getwd

You can also use the help() function:

help(getwd)
17 / 23

R basics

18 / 23

Arithmetic

R provides arithmetic operators operators such as

2 + 2 # addition
## [1] 4
9 - 7 # subtraction
## [1] 2
8 * 12 # multiplication
## [1] 96
9 / 3 # division
## [1] 3
10 ^ 2 # exponentiation
## [1] 100
19 / 23

Objects and functions

Objects are information (e.g. values, model coefficients, plots etc) stored in R.

x <- c(11, 19, 13, 16, 12, 12, 18, 14, 20, NA)
x
## [1] 11 19 13 16 12 12 18 14 20 NA

Functions are collections of R commands that do a specific task. Many functions have arguments which provide extra information to the function.

mean(x, na.rm = TRUE)
## [1] 15
20 / 23

Creating variables

The assignment operator (<-) is used to assign the value of an expression to a variable. The object is being 'given' the whatever the value is.

For example, the object 'population' is created by using the c() or concatenate function to combine values.

pop <- c(9896000, 2488200, 2434100, 980800, 845000)
pop
## [1] 9896000 2488200 2434100 980800 845000
21 / 23

Data Types

The object 'pop' is a vector, a one-dimensional array with each element separated by a comma. Each element in the vector is a numeric value.

class(pop)
## [1] "numeric"

R support other data types: integer, character, logical, factor, and complex.

Each element in the vector 'city' is a string of one or more characters,

# character
city <- c("London", "Birmingham", "Manchester", "Glasgow", "Newcastle")
city
## [1] "London" "Birmingham" "Manchester" "Glasgow" "Newcastle"

whereas the values in 'million_plus' are logical.

# logical
million_plus <- c(TRUE, TRUE, TRUE, FALSE, FALSE)
million_plus
## [1] TRUE TRUE TRUE FALSE FALSE
22 / 23

Creating data frames

Like vectors, data frames are another data structure used in R. Similar to spreadsheets a data frame stores variables in columns and observations in rows. Unlinke matrices, data frames can store variables of different types (e.g. numeric, character).

city_pop <- data.frame(city, population = pop, million_plus)
city_pop
## city population million_plus
## 1 London 9896000 TRUE
## 2 Birmingham 2488200 TRUE
## 3 Manchester 2434100 TRUE
## 4 Glasgow 980800 FALSE
## 5 Newcastle 845000 FALSE



Data source: http://www.centreforcities.org/data-tool/

23 / 23

R

2 / 23
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow