class: center, middle, inverse, title-slide #
Getting Started in R
an introduction to data analysis and visualisation
## babynames ### Réka Solymosi & Sam Langton ### 1 July 2019 --- ### `babynames` Names of boys and girls born each year in the U.S. since 1880 ```r library(babynames) babynames ``` ``` ## # A tibble: 1,924,665 x 5 ## year sex name n prop ## <dbl> <chr> <chr> <int> <dbl> ## 1 1880 F Mary 7065 0.0724 ## 2 1880 F Anna 2604 0.0267 ## 3 1880 F Emma 2003 0.0205 ## 4 1880 F Elizabeth 1939 0.0199 ## 5 1880 F Minnie 1746 0.0179 ## 6 1880 F Margaret 1578 0.0162 ## 7 1880 F Ida 1472 0.0151 ## 8 1880 F Alice 1414 0.0145 ## 9 1880 F Bertha 1320 0.0135 ## 10 1880 F Sarah 1288 0.0132 ## # ... with 1,924,655 more rows ``` --- class: center, middle <img src="img/baby_plot.png" width="500px" /> --- ## The R code ```r library(tidyverse) ; library(babynames) babynames %>% filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + theme_minimal() + theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` --- class: center, middle, inverse # Break it down --- ### Load the R packages <br> .pull-left[ ```r *library(tidyverse) ; library(babynames) babynames %>% filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + theme_minimal() + theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` ] ``` ## Registered S3 methods overwritten by 'ggplot2': ## method from ## [.quosures rlang ## c.quosures rlang ## print.quosures rlang ``` ``` ## -- Attaching packages ----------------------------------------------------- tidyverse 1.2.1 -- ``` ``` ## v ggplot2 3.1.1 v purrr 0.3.2 ## v tibble 2.1.1 v dplyr 0.8.1 ## v tidyr 0.8.3 v stringr 1.4.0 ## v readr 1.3.1 v forcats 0.4.0 ``` ``` ## -- Conflicts -------------------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ``` --- ### Access the `babynames` data <br> .pull-left[ ```r library(tidyverse) ; library(babynames) *babynames %>% filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + theme_minimal() + theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` ] .pull-right[ ``` ## # A tibble: 1,924,665 x 5 ## year sex name n prop ## <dbl> <chr> <chr> <int> <dbl> ## 1 1880 F Mary 7065 0.0724 ## 2 1880 F Anna 2604 0.0267 ## 3 1880 F Emma 2003 0.0205 ## 4 1880 F Elizabeth 1939 0.0199 ## 5 1880 F Minnie 1746 0.0179 ## 6 1880 F Margaret 1578 0.0162 ## 7 1880 F Ida 1472 0.0151 ## 8 1880 F Alice 1414 0.0145 ## 9 1880 F Bertha 1320 0.0135 ## 10 1880 F Sarah 1288 0.0132 ## # ... with 1,924,655 more rows ``` ] --- ### Subset the data <br> .pull-left[ ```r library(tidyverse) ; library(babynames) babynames %>% * filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + theme_minimal() + theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` ] .pull-right[ ``` ## # A tibble: 187 x 5 ## year sex name n prop ## <dbl> <chr> <chr> <int> <dbl> ## 1 1880 F Mabel 808 0.00828 ## 2 1881 F Mabel 893 0.00903 ## 3 1882 F Mabel 997 0.00862 ## 4 1883 F Mabel 1086 0.00905 ## 5 1883 M Mabel 6 0.0000533 ## 6 1884 F Mabel 1270 0.00923 ## 7 1885 F Mabel 1349 0.00950 ## 8 1885 M Mabel 7 0.0000604 ## 9 1886 F Mabel 1422 0.00925 ## 10 1886 M Mabel 6 0.0000504 ## # ... with 177 more rows ``` ] --- ### Initialise a `ggplot()` object, inherit the filtered data, and map variables to the x and y axes .pull-left[ ```r library(tidyverse) ; library(babynames babynames %>% filter(name == "Mabel") %>% * ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + theme_minimal() + theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` ] .pull-right[ ![](slides_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] --- ### Use `geom_line()` to connect observations, map the `sex` variable to colour and set line thickness .pull-left[ ```r library(tidyverse) ; library(babynames) babynames %>% filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + * geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + theme_minimal() + theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` ] .pull-right[ ![](slides_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] --- ### Specify which colours the `sex` variable is mapped to <!-- <br> --> .pull-left[ ```r library(tidyverse) ; library(babynames) babynames %>% filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + * scale_color_manual(values = c("#66c2a5", "#ff7f00"), * labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + theme_minimal() + theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` ] .pull-right[ ![](slides_files/figure-html/unnamed-chunk-8-1.png)<!-- --> ] --- ### Add a title, label the axes, include a caption, and drop the legend title <!-- <br> --> .pull-left[ ```r library(tidyverse) ; library(babynames) babynames %>% filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + * labs(title = "Babies named Mabel between 1880 and 2015", * x = "Year", * y = "Frequency", * caption = "Source: Social Security Administration", * color = "") + theme_minimal() + theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` ] .pull-right[ ![](slides_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] --- ### Use a `ggplot2` theme to format the chart <br> .pull-left[ ```r library(tidyverse) ; library(babynames) babynames %>% filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + * theme_minimal() + theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` ] .pull-right[ ![](slides_files/figure-html/unnamed-chunk-10-1.png)<!-- --> ] --- ### Move the legend to the top of the plot <br> .pull-left[ ```r library(tidyverse) ; library(babynames) babynames %>% filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + theme_minimal() + * theme(legend.position = "top") ggsave("baby_plot.png", scale=1, dpi=300) ``` ] .pull-right[ ![](slides_files/figure-html/unnamed-chunk-11-1.png)<!-- --> ] --- ### Save the plot as a png at a high resolution <br> .pull-left[ ```r library(tidyverse) ; library(babynames) babynames %>% filter(name == "Mabel") %>% ggplot(data = ., aes(x = year, y = n)) + geom_line(aes(color = sex), size = 2) + scale_color_manual(values = c("#66c2a5", "#ff7f00"), labels = c("Female", "Male")) + labs(title = "Babies named Mabel between 1880 and 2015", x = "Year", y = "Frequency", caption = "Source: Social Security Administration", color = "") + theme_minimal() + theme(legend.position = "top") *ggsave("baby_plot.png", scale=1, dpi=300) ``` ] .pull-right[ ![](slides_files/figure-html/unnamed-chunk-12-1.png)<!-- --> ] --- class: center, middle, inverse # Your turn