class: center, middle, inverse, title-slide #
Getting Started in R
an introduction to data analysis and visualisation
## Tidy ### Réka Solymosi & Sam Langton ### 3 July 2019 --- class: inverse, center, middle ![tidyr](img/tidyr.png) --- ### Tidy data Tidy data are structured for use in R and satisfy three rules: 1. Each variable must have its own column 2. Each observation must have its own row 3. Each value must have its own cell. (Grolemund and Wickham 2017:149) There are two of tidy data formats: wide and long. --- ### Tidy data ![](http://garrettgman.github.io/images/tidy-1.png)<!-- --> [Source: Data Science with R, Garrett Grolemund](http://garrettgman.github.io/tidying/) --- ### Why tidy data? “Tidy datasets are all alike but every messy dataset is messy in its own way.” – Hadley Wickham --- ### Data ```r wide <- data.frame(Date = c("01-01-2017","02-01-2017","03-01-2017"), Burglary = c(7,5,13), Drugs = c(1,3,9), Robbery = c(9,0,9), Shoplifting = c(4,5,1)) wide ``` ``` ## Date Burglary Drugs Robbery Shoplifting ## 1 01-01-2017 7 1 9 4 ## 2 02-01-2017 5 3 0 5 ## 3 03-01-2017 13 9 9 1 ``` --- ### `gather()` For when your columns are values, not variables (wide to long) ```r long <- gather(wide, category, count, -Date) long ``` ``` # Date category count # 1 01-01-2017 Burglary 7 # 2 02-01-2017 Burglary 5 # 3 03-01-2017 Burglary 13 # 4 01-01-2017 Drugs 1 # 5 02-01-2017 Drugs 3 # 6 03-01-2017 Drugs 9 # 7 01-01-2017 Robbery 9 # 8 02-01-2017 Robbery 0 # 9 03-01-2017 Robbery 9 # 10 01-01-2017 Shoplifting 4 # 11 02-01-2017 Shoplifting 5 # 12 03-01-2017 Shoplifting 1 ``` --- ### `spread()` For when observations have multiple rows (long to wide) ```r wide <- spread(long, category, count) wide ``` ``` # Date Burglary Drugs Robbery Shoplifting # 1 01-01-2017 7 1 9 4 # 2 02-01-2017 5 3 0 5 # 3 03-01-2017 13 9 9 1 ``` --- ### `separate()` Split columns ```r split <- separate(wide, Date, c("Day", "Month", "Year"), sep = "-") split ``` ``` # Day Month Year Burglary Drugs Robbery Shoplifting # 1 01 01 2017 7 1 9 4 # 2 02 01 2017 5 3 0 5 # 3 03 01 2017 13 9 9 1 ``` --- ### `unite()` Join columns ```r united <- unite(split, Date, c(Day, Month, Year), sep = "-") united ``` ``` # Date Burglary Drugs Robbery Shoplifting # 1 01-01-2017 7 1 9 4 # 2 02-01-2017 5 3 0 5 # 3 03-01-2017 13 9 9 1 ```