class: inverse, center, bottom <img src="https://d33wubrfki0l68.cloudfront.net/b88ef926a004b0fce72b2526b0b5c4413666a4cb/24a30/cover.png" width="150" /> # Chapter 21: Iteration ## RStudio Instructor Training Study Session ### Silvia Canelón, PhD ### September 17th, 2020 --- class: left, top # Introduction Important to reduce duplication in your code: - easier to see intent of the code -- you can focus on what's different - as needs change, you only need to make changes in one place - reduces bugs associated with each line of code being in multiple places ### **Iteration** is a great companion to **functions** Helps us repeat the same operation on different columns or on different datasets ### Rule of thumb ⭐ Never copy and paste more than 2x --- # For loops ## Three components ```r output <- vector("double", ncol(df)) # 1. output for (i in seq_along(df)) { # 2. sequence output[[i]] <- median(df[[i]]) # 3. body } ``` 1. **output:** `output <- vector("double", length(x))` -- when possible, it's always good to allocate sufficient space for the output otherwise your loops will be very slow 2. **sequence:** `i in seq_along(df)` -- what will this loop over?<br/>(preferred over `1:length(x)`) 3. **body:** `output[[i]] <- median(df[[i]])` -- the code that does the work --- # For loop variations The three components, **output, sequence, and body**, may look different<br/>depending on what's needed in the loop. There are four variations on the basic theme of the **for loop:** 1. **Modifying an existing object**, instead of creating a new object<br/>(i.e. rescale every column in a data frame) 2. **Looping over names or values**, instead of indices<br/>(i.e. looping over the elements `for (x in xs)`, looping over the names `for (nm in names(xs))`) 3. **Handling outputs of unknown length**<br/>(i.e. rather than progressively growing your output vector, save your results in a list and then combine into a single vector once the loops is done) 4. **Handling sequences of unknown length**<br/>(i.e. if you don't know how long the input sequence should run for, use a **while loop**) --- # For loops vs. functionals You can wrap up for loops in a function and call that function instead of using the for loop directly. ⭐ Remember the rule of thumb: never copy and paste more than 2x For example, this: ```r f1 <- function(x) abs(x - mean(x)) ^ 1 # original f2 <- function(x) abs(x - mean(x)) ^ 2 # copy-and-paste #1 f3 <- function(x) abs(x - mean(x)) ^ 3 # copy-and-paste #2 ``` Can be generalized and written as: ```r f <- function(x, i) abs(x - mean(x)) ^ i ``` --- # The map functions .pull-left[ To take things a step further, functions from the `purrr` 📦 can help us eliminate the need for many common for loops. > These are recommended over base R functions `apply()`, `lapply()`, and `tapply()` ] .pull-right[ **Purrr family of functions to loop over a vector, do something to each element, and save the results:** - `map()` makes a list. - `map_lgl()` makes a logical vector. - `map_int()` makes an integer vector. - `map_dbl()` makes a double vector. - `map_chr()` makes a character vector. ] -- Map functions are cool, but .em[never feel bad about writing a for loop] instead of using one of them -- what's important is that you solve the problem you're working on! 🤗 --- # The map functions: example This function: ```r col_mean <- function(df) { output <- vector("double", length(df)) # 1. output for (i in seq_along(df)) { # 2. sequence output[i] <- mean(df[[i]]) # 3. body } output } ``` Can be simplified using a map function and written as: ```r map_dbl(df, mean) ``` --- # Dealing with failure 🚨 Your chances of seeing an operation fail is higher when you're using map functions to repeat many operations. 🎉 For that, we have `safely()`: > Recommended over the base R function `try()` <br/>which can beinconsistent and more difficult to work with It takes a function and returns a list with two elements: 1. `result` which is the original result if the function **succeeds** and `NULL` if there was an **error** 2. `error` is `NULL` if the function **succeeds** and an error object if it doesn't. --- # Dealing with failure: example ```r safe_log <- safely(log) str(safe_log(10)) ## List of 2 ## $ result: num 2.3 ## $ error : NULL str(safe_log("a")) ## List of 2 ## $ result: NULL ## $ error :List of 2 ## ..$ message: chr "non-numeric argument to mathematical function" ## ..$ call : language .Primitive("log")(x, base) ## ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition" ``` --- # Dealing with failure: other options .pull-left[ `possibly()` is simpler than `safely()` ```r x <- list(1, 10, "a") x %>% map_dbl(possibly(log, NA_real_)) ## [1] 0.000000 2.302585 NA ``` ] .pull-right[ `quietly()` is similar to `safely()` but captures printed output, messages, and warnings, instead of errors. ```r x <- list(1, -1) x %>% map(quietly(log)) %>% str() # PRODUCED A LIST PER ITEM # ONLY THE RESULT FROM THE # SECOND ITEM IS SHOWN #> $ :List of 4 #> ..$ result : num NaN #> ..$ output : chr "" #> ..$ warnings: chr "NaNs produced" #> ..$ messages: chr(0) ``` ] --- # Mapping over multiple arguments Map_*() functions let you iterate over a **single input** but sometimes we have **multiple inputs** that we need to iterate over in parallel. 🎉 For that, we have: - `map2()` which lets us iterate over two vectors in parallel - `pmap()` which takes a list of arguments. --- # Mapping over multiple arguments Using `map2()` to simulate some random normals with some means and standard deviations (total: 2) .pull-left[ ```r mu <- list(5, 10, -3) # some means sigma <- list(1, 5, 10) # some standard deviations map2(mu, sigma, rnorm, n = 5) %>% str() ## List of 3 ## $ : num [1:5] 4.11 5.57 6.57 4.92 6.08 ## $ : num [1:5] 13.92 5.15 9.4 6.51 21.62 ## $ : num [1:5] 0.0651 -8.1038 -2.2228 1.7309 -16.0956 ``` ] .pull-right[ ![](https://d33wubrfki0l68.cloudfront.net/68a21c4a103426c3b311c9dcfad8fe379d4892f1/55c9d/diagrams/lists-map2.png) ⭐ Arguments that vary for each call come _before_ the function.<br/> ⭐ Arguments that are the same for every call come _after_. ] --- # Mapping over multiple arguments Using `pmap()` to simulate some random normals with some means, standard deviations, and number of samples (total: 3) .pull-left[ ```r mu <- list(5, 10, -3) # some means sigma <- list(1, 5, 10) # some standard deviations n <- list(1, 3, 5) # some number of samples args1 <- list(mean = mu, sd = sigma, n = n) pmap(args1, rnorm) %>% str() ## List of 3 ## $ : num 5.28 ## $ : num [1:3] 14.3 15.7 15.4 ## $ : num [1:5] 8.44 9.98 -4.35 -9.26 -11.41 ``` ] .pull-right[ ![](https://d33wubrfki0l68.cloudfront.net/6da05576a8c55e4ee1ecb2e2c5c9a35e710abacd/b9ea6/diagrams/lists-pmap-named.png) ] --- # Mapping over multiple arguments That's all good and fine, but what if you want to vary the **arguments** _and_ the **function**? We can use `invoke_map()` for this. First we define the functions: ```r f <- c("runif", "rnorm", "rpois") param <- list( list(min = -1, max = 1), list(sd = 5), list(lambda = 10) ) ``` --- # Mapping over multiple arguments ...then we use `invoke_map()` .pull-left[ ```r # recall f <- c("runif", "rnorm", "rpois") invoke_map(f, param, n = 5) %>% str() ## List of 3 ## $ : num [1:5] -0.397 -0.773 0.665 -0.151 0.489 ## $ : num [1:5] -7.62 -3.45 7.22 3.93 -2.23 ## $ : int [1:5] 11 13 8 8 16 ``` ] .pull-right[ ![](https://d33wubrfki0l68.cloudfront.net/46ce0bbefff56809de8d5276120031a21e1bbbf1/753f2/diagrams/lists-invoke.png) ] ⭐ The first argument is a list of functions or character vector of function names.<br/> ⭐ The second argument is a list of lists giving the arguments that vary for each function.<br/> ⭐ The arguments that follow are passed on to every function. --- # Walk Can be used when you want to call a function for its _side effects_ rather than for its return value. Different options: `walk()`, `walk2()`, and `pwalk()` Used here to save each plot with a file name, to the corresponding location on the computer: ```r library(ggplot2) plots <- mtcars %>% split(.$cyl) %>% map(~ggplot(., aes(mpg, wt)) + geom_point()) # list of plots paths <- stringr::str_c(names(plots), ".pdf") # vector of file names pwalk(list(paths, plots), ggsave, path = tempdir()) ``` --- # Other patterns of for loops ### These aren't used as often, but there are additional purrr functions for other types of for loops .pull-left[ - **Predicate functions** that return either a single `TRUE` or `FALSE` - `keep()` - `discard()` - `some()` - `every()` - `detect()` - `head_while()` - `tail_while()` ] .pull-right[ These functions are useful to reduce a complex list to a simple list: - `reduce()` - `accumulate()` ] --- class: inverse, center, middle # The End ##