class: center, middle, inverse, title-slide #
Getting Started in R
an introduction to data analysis and visualisation
## Transform ### Réka Solymosi & Sam Langton ### 3 July 2019 --- class: inverse, center, middle ![dplyr](img/dplyr.png) --- class: inverse, center, middle # Subsetting, sorting and summarising data --- ### Data ```r starwars ``` ``` ## # A tibble: 87 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke~ 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl~ red 33 <NA> ## 4 Dart~ 202 136 none white yellow 41.9 male ## 5 Leia~ 150 49 brown light brown 19 female ## 6 Owen~ 178 120 brown, gr~ light blue 52 male ## 7 Beru~ 165 75 brown light blue 47 female ## 8 R5-D4 97 32 <NA> white, red red NA <NA> ## 9 Bigg~ 183 84 black light brown 24 male ## 10 Obi-~ 182 77 auburn, w~ fair blue-gray 57 male ## # ... with 77 more rows, and 5 more variables: homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, starships <list> ``` --- ### `select()` ```r # Pick columns select(starwars, name, species) ``` ``` ## # A tibble: 87 x 2 ## name species ## <chr> <chr> ## 1 Luke Skywalker Human ## 2 C-3PO Droid ## 3 R2-D2 Droid ## 4 Darth Vader Human ## 5 Leia Organa Human ## 6 Owen Lars Human ## 7 Beru Whitesun lars Human ## 8 R5-D4 Droid ## 9 Biggs Darklighter Human ## 10 Obi-Wan Kenobi Human ## # ... with 77 more rows ``` ```r # Pick a single column and return a vector pull(starwars, name) ``` --- ### `slice()` ```r # Pick rows slice(starwars, c(1,3,5)) ``` ``` ## # A tibble: 3 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke~ 172 77 blond fair blue 19 male ## 2 R2-D2 96 32 <NA> white, bl~ red 33 <NA> ## 3 Leia~ 150 49 brown light brown 19 female ## # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>, ## # vehicles <list>, starships <list> ``` --- ### `filter()` ```r # Subset rows filter(starwars, species == "Droid") ``` ``` ## # A tibble: 5 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 2 R2-D2 96 32 <NA> white, bl~ red 33 <NA> ## 3 R5-D4 97 32 <NA> white, red red NA <NA> ## 4 IG-88 200 140 none metal red 15 none ## 5 BB8 NA NA none none black NA none ## # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>, ## # vehicles <list>, starships <list> ``` --- ### `arrange()` ```r # Reorder rows arrange(starwars, desc(mass)) ``` ``` ## # A tibble: 87 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Jabb~ 175 1358 <NA> green-tan~ orange 600 herma~ ## 2 Grie~ 216 159 none brown, wh~ green, y~ NA male ## 3 IG-88 200 140 none metal red 15 none ## 4 Dart~ 202 136 none white yellow 41.9 male ## 5 Tarf~ 234 136 brown brown blue NA male ## 6 Owen~ 178 120 brown, gr~ light blue 52 male ## 7 Bossk 190 113 none green red 53 male ## 8 Chew~ 228 112 brown unknown blue 200 male ## 9 Jek ~ 180 110 brown fair blue NA male ## 10 Dext~ 198 102 none brown yellow NA male ## # ... with 77 more rows, and 5 more variables: homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, starships <list> ``` --- class: inverse, center, middle ![magrittr](img/magrittr.png) --- ### `%>%` or the piping operator - combines multiple operations - passes a value / expression result forward into the next function call / expression - makes R code more readable --- ### Inserting the `%>%` operator Windows: Ctrl+Shift+M ; macOS: CmdShift+M) <br> ```r # select() starwars %>% select(name, species) ``` ```r # pull() starwars %>% pull(name) ``` ```r # slice() starwars %>% slice(c(1,3,5)) ``` ```r # filter() starwars %>% filter(species == "Droid") ``` ```r # arrange() starwars %>% arrange(desc(mass)) ``` --- ### `mutate()` ```r # Add a column starwars %>% mutate(bmi = mass / ((height / 100) ^ 2)) %>% select(name:mass, bmi) ``` ``` ## # A tibble: 87 x 4 ## name height mass bmi ## <chr> <int> <dbl> <dbl> ## 1 Luke Skywalker 172 77 26.0 ## 2 C-3PO 167 75 26.9 ## 3 R2-D2 96 32 34.7 ## 4 Darth Vader 202 136 33.3 ## 5 Leia Organa 150 49 21.8 ## 6 Owen Lars 178 120 37.9 ## 7 Beru Whitesun lars 165 75 27.5 ## 8 R5-D4 97 32 34.0 ## 9 Biggs Darklighter 183 84 25.1 ## 10 Obi-Wan Kenobi 182 77 23.2 ## # ... with 77 more rows ``` --- ### `group_by()` and `count()` ```r # Group rows with same value and calculate frequency starwars %>% group_by(species) %>% count(species, sort = TRUE) ``` ``` ## # A tibble: 38 x 2 ## # Groups: species [38] ## species n ## <chr> <int> ## 1 Human 35 ## 2 <NA> 5 ## 3 Droid 5 ## 4 Gungan 3 ## 5 Kaminoan 2 ## 6 Mirialan 2 ## 7 Twi'lek 2 ## 8 Wookiee 2 ## 9 Zabrak 2 ## 10 Aleena 1 ## # ... with 28 more rows ``` --- ### `summarise()` ```r # Return summary statistics starwars %>% group_by(species) %>% summarise(n = n(), mass = mean(mass, na.rm = TRUE)) %>% filter(n > 1) ``` ``` ## # A tibble: 9 x 3 ## species n mass ## <chr> <int> <dbl> ## 1 <NA> 5 48 ## 2 Droid 5 69.8 ## 3 Gungan 3 74 ## 4 Human 35 82.8 ## 5 Kaminoan 2 88 ## 6 Mirialan 2 53.1 ## 7 Twi'lek 2 55 ## 8 Wookiee 2 124 ## 9 Zabrak 2 80 ``` --- class: inverse, center, middle # Merging data --- ### Data ```r band_members ``` ``` ## # A tibble: 3 x 2 ## name band ## <chr> <chr> ## 1 Mick Stones ## 2 John Beatles ## 3 Paul Beatles ``` ```r band_instruments ``` ``` ## # A tibble: 3 x 2 ## name plays ## <chr> <chr> ## 1 John guitar ## 2 Paul bass ## 3 Keith guitar ``` --- ### `left_join()` ```r band_members %>% left_join(band_instruments, by = "name") ``` ``` ## # A tibble: 3 x 3 ## name band plays ## <chr> <chr> <chr> ## 1 Mick Stones <NA> ## 2 John Beatles guitar ## 3 Paul Beatles bass ```