an introduction to data analysis and visualisation
R is a programming language for exploring, visualising and analysing data.
Created by Ross Ihaka and Robert Gentleman (University of Auckland)
R is a programming language for exploring, visualising and analysing data.
Created by Ross Ihaka and Robert Gentleman (University of Auckland)
Released in 1995
R is a programming language for exploring, visualising and analysing data.
Created by Ross Ihaka and Robert Gentleman (University of Auckland)
Released in 1995
Implements the S programming language created at Bell Labs
R is a programming language for exploring, visualising and analysing data.
Created by Ross Ihaka and Robert Gentleman (University of Auckland)
Released in 1995
Implements the S programming language created at Bell Labs
Companies like Google, Facebook and the Financial Times use it. The BBC have also recently made a big push to use it for their visualisations.
Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more
Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight
Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more
Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight
Open source: free and customisable
Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more
Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight
Open source: free and customisable
Reproducibility: code can be shared and the results repeated
Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more
Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight
Open source: free and customisable
Reproducibility: code can be shared and the results repeated
Transparency: explicitly documents all the steps of your analyses
Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more
Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight
Open source: free and customisable
Reproducibility: code can be shared and the results repeated
Transparency: explicitly documents all the steps of your analyses
Automation: analyses can be run and re-run with new and existing datasets
Cutting edge analytics: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more
Powerful graphics and data visualisations: used by the New York Times and FiveThirtyEight
Open source: free and customisable
Reproducibility: code can be shared and the results repeated
Transparency: explicitly documents all the steps of your analyses
Automation: analyses can be run and re-run with new and existing datasets
Support network: worldwide community of developers and users
Learning R can be a steep learning curve and the transition from a graphical user interface like Excel or SPSS to one that is command driven can be unsettling.
Learning R can be a steep learning curve and the transition from a graphical user interface like Excel or SPSS to one that is command driven can be unsettling.
library(ggplot2) p <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + geom_smooth(method = "loess", se = FALSE) + labs(x = "Engine size", y = "Miles per gallon", title = "Fuel efficiency decreases with engine size") + theme_minimal()p
library(leaflet)leaflet() %>% addTiles() %>% addMarkers(-2.236518, 53.466478, popup = "University of Manchester")
The working directory is where everything created in your R session is stored. The function getwd()
returns the filepath to the current working directory.
getwd()
To change the working directory you have two options:
setwd()
in the consoleFor example, to change the working directory on a computer running macOS you might enter:
setwd("/Users/your_name/Documents/GSinR")
On a computer running Windows it would look more like:
setwd("C:/Users/your_name/Documents/GSinR")
Packages are collections of R functions and data. There are over 10,000 user-contributed packages available to install from CRAN.
Just type install.packages()
in the console with the name of the package in inverted commas:
install.packages("tidyverse")
Once installed you can load the package using the library()
function like this:
library(tidyverse)
You can find help within R using the ?
function. For example, if you want to learn more about a function just place a ? before the function name.
For example, if you wanted to read up on the getwd()
function you'd enter:
?getwd
You can also use the help()
function:
help(getwd)
R provides arithmetic operators operators such as
2 + 2 # addition
## [1] 4
9 - 7 # subtraction
## [1] 2
8 * 12 # multiplication
## [1] 96
9 / 3 # division
## [1] 3
10 ^ 2 # exponentiation
## [1] 100
Objects are information (e.g. values, model coefficients, plots etc) stored in R.
x <- c(11, 19, 13, 16, 12, 12, 18, 14, 20, NA)x
## [1] 11 19 13 16 12 12 18 14 20 NA
Functions are collections of R commands that do a specific task. Many functions have arguments which provide extra information to the function.
mean(x, na.rm = TRUE)
## [1] 15
The assignment operator (<-
) is used to assign the value of an expression to a variable. The object is being 'given' the whatever the value is.
For example, the object 'population' is created by using the c()
or concatenate function to combine values.
pop <- c(9896000, 2488200, 2434100, 980800, 845000)pop
## [1] 9896000 2488200 2434100 980800 845000
The object 'pop' is a vector, a one-dimensional array with each element separated by a comma. Each element in the vector is a numeric value.
class(pop)
## [1] "numeric"
R support other data types: integer, character, logical, factor, and complex.
Each element in the vector 'city' is a string of one or more characters,
# charactercity <- c("London", "Birmingham", "Manchester", "Glasgow", "Newcastle")city
## [1] "London" "Birmingham" "Manchester" "Glasgow" "Newcastle"
whereas the values in 'million_plus' are logical.
# logicalmillion_plus <- c(TRUE, TRUE, TRUE, FALSE, FALSE)million_plus
## [1] TRUE TRUE TRUE FALSE FALSE
Like vectors, data frames are another data structure used in R. Similar to spreadsheets a data frame stores variables in columns and observations in rows. Unlinke matrices, data frames can store variables of different types (e.g. numeric, character).
city_pop <- data.frame(city, population = pop, million_plus)city_pop
## city population million_plus## 1 London 9896000 TRUE## 2 Birmingham 2488200 TRUE## 3 Manchester 2434100 TRUE## 4 Glasgow 980800 FALSE## 5 Newcastle 845000 FALSE
Data source: http://www.centreforcities.org/data-tool/
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |