class: center, middle, inverse, title-slide #
Getting Started in R
an introduction to data analysis and visualisation
## Basics ### Réka Solymosi & Sam Langton ### 2 July 2019 --- class: inverse, center, middle # R --- ### What is R? -- - [R]( is a programming language for exploring, visualising and analysing data. -- - Created by Ross Ihaka and Robert Gentleman (University of Auckland) -- - Released in 1995 -- - Implements the S programming language created at Bell Labs -- - Companies like Google, Facebook and the Financial Times use it. The [BBC]( have also recently made a big push to use it for their visualisations. --- ### Why use R? -- - *Cutting edge analytics*: over 10,000 user-contributed packages available on finance, genomics, animal tracking, crime analysis, and much more -- - *Powerful graphics and data visualisations*: used by the New York Times and FiveThirtyEight -- - *Open source*: free and customisable -- - *Reproducibility*: code can be shared and the results repeated -- - *Transparency*: explicitly documents all the steps of your analyses -- - *Automation*: analyses can be run and re-run with new and existing datasets -- - *Support network*: worldwide community of developers and users --- ### Why wouldn't you use it? - Anxiety about 'hard' skills - "I am just not good at that kind of thing" - "I have to be good at maths" - Experts putting you off <div style= "float:right;position: relative; top: -150px;"> <img src="img/not-r.png" width="400px" /> </div> --- ### Are there any disadvantages? -- Learning R can be a steep learning curve and the transition from a graphical user interface like Excel or SPSS to one that is command driven can be unsettling. <div align="center"> <img src="img/learning_curve.jpg" height=250> </div> -- <br> However, you’ll soon find that working with a command line is much more efficient than pointing and clicking. After all, you can replicate, automate and share your R scripts. --- class: inverse, center, middle # Example outputs using R --- ### Charts ```r library(ggplot2) p <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + geom_smooth(method = "loess", se = FALSE) + labs(x = "Engine size", y = "Miles per gallon", title = "Fuel efficiency decreases with engine size") + theme_minimal() p ``` <img src="slides_files/figure-html/warning-F-1.png" width="700px" style="display: block; margin: auto;" /> --- ### Tables ```r library(DT) DT::datatable( head(USArrests, 10), fillContainer = FALSE, options = list(pageLength = 5) ) ```
--- ### Maps ```r library(leaflet) leaflet() %>% addTiles() %>% addMarkers(-2.236518, 53.466478, popup = "University of Manchester") ```
--- class: inverse, center, middle # R and RStudio --- ### R and RStudio It is possible to just use [R]( but there are several reasons why you will prefer to code within the [RStudio]( environment: - syntax highlighting - code completion - [rmarkdown]( integration - four-pane workspace for managing R windows --- ### RStudio's panes <img src="img/panes.png" width="1200px" /> --- class: inverse, center, middle # Setting up --- ### The working directory The working directory is where everything created in your R session is stored. The function `getwd()` returns the filepath to the current working directory. ```r getwd() ``` To change the working directory you have two options: 1. Use the dropdown menus: *Session -> Set Working Directory -> Choose Directory* 2. Use the command `setwd()` in the console For example, to change the working directory on a computer running macOS you might enter: ```r setwd("/Users/your_name/Documents/GSinR") ``` On a computer running Windows it would look more like: ```r setwd("C:/Users/your_name/Documents/GSinR") ``` --- ### Installing and loading packages Packages are collections of R functions and data. There are [over 10,000 user-contributed packages]( available to install from CRAN. Just type `install.packages()` in the console with the name of the package in inverted commas: ```r install.packages("tidyverse") ``` Once installed you can load the package using the `library()` function like this: ```r library(tidyverse) ``` --- ### Finding help You can find help within R using the `?` function. For example, if you want to learn more about a function just place a ? before the function name. For example, if you wanted to read up on the `getwd()` function you'd enter: ```r ?getwd ``` You can also use the `help()` function: ```r help(getwd) ``` --- class: inverse, center, middle # R basics --- ### Arithmetic R provides arithmetic operators operators such as ```r 2 + 2 # addition ``` ``` ## [1] 4 ``` ```r 9 - 7 # subtraction ``` ``` ## [1] 2 ``` ```r 8 * 12 # multiplication ``` ``` ## [1] 96 ``` ```r 9 / 3 # division ``` ``` ## [1] 3 ``` ```r 10 ^ 2 # exponentiation ``` ``` ## [1] 100 ``` --- ### Objects and functions *Objects* are information (e.g. values, model coefficients, plots etc) stored in R. ```r x <- c(11, 19, 13, 16, 12, 12, 18, 14, 20, NA) x ``` ``` ## [1] 11 19 13 16 12 12 18 14 20 NA ``` *Functions* are collections of R commands that do a specific task. Many functions have arguments which provide extra information to the function. ```r mean(x, na.rm = TRUE) ``` ``` ## [1] 15 ``` --- ### Creating variables The assignment operator (`<-`) is used to assign the value of an expression to a variable. The *object* is being 'given' the whatever the *value* is. For example, the object 'population' is created by using the `c()` or concatenate function to combine values. ```r pop <- c(9896000, 2488200, 2434100, 980800, 845000) pop ``` ``` ## [1] 9896000 2488200 2434100 980800 845000 ``` --- ### Data Types The object 'pop' is a vector, a one-dimensional array with each element separated by a comma. Each element in the vector is a numeric value. ```r class(pop) ``` ``` ## [1] "numeric" ``` R support other data types: integer, character, logical, factor, and complex. Each element in the vector 'city' is a string of one or more characters, ```r # character city <- c("London", "Birmingham", "Manchester", "Glasgow", "Newcastle") city ``` ``` ## [1] "London" "Birmingham" "Manchester" "Glasgow" "Newcastle" ``` whereas the values in 'million_plus' are logical. ```r # logical million_plus <- c(TRUE, TRUE, TRUE, FALSE, FALSE) million_plus ``` ``` ## [1] TRUE TRUE TRUE FALSE FALSE ``` --- ### Creating data frames Like vectors, data frames are another data structure used in R. Similar to spreadsheets a data frame stores variables in columns and observations in rows. Unlinke matrices, data frames can store variables of different types (e.g. numeric, character). ```r city_pop <- data.frame(city, population = pop, million_plus) city_pop ``` ``` ## city population million_plus ## 1 London 9896000 TRUE ## 2 Birmingham 2488200 TRUE ## 3 Manchester 2434100 TRUE ## 4 Glasgow 980800 FALSE ## 5 Newcastle 845000 FALSE ``` <br> <br> Data source: <>