Chapter 21: Iteration

class: inverse, center, bottom

<img src="https://d33wubrfki0l68.cloudfront.net/b88ef926a004b0fce72b2526b0b5c4413666a4cb/24a30/cover.png" width="150" />
# Chapter 21: Iteration

## RStudio Instructor Training Study Session

### Silvia Canelón, PhD

### September 17th, 2020

---
class: left, top

# Introduction
Important to reduce duplication in your code:
- easier to see intent of the code -- you can focus on what's different
- as needs change, you only need to make changes in one place
- reduces bugs associated with each line of code being in multiple places

### **Iteration** is a great companion to **functions**
Helps us repeat the same operation on different columns or on different datasets

### Rule of thumb
⭐ Never copy and paste more than 2x

---
# For loops
## Three components

```r
output <- vector("double", ncol(df)) # 1. output
for (i in seq_along(df)) { # 2. sequence
 output[[i]] <- median(df[[i]]) # 3. body
}
```

1. **output:** `output <- vector("double", length(x))` -- when possible, it's always good to allocate sufficient space for the output otherwise your loops will be very slow
2. **sequence:** `i in seq_along(df)` -- what will this loop over? (preferred over `1:length(x)`)
3. **body:** `output[[i]] <- median(df[[i]])` -- the code that does the work

---
# For loop variations

The three components, **output, sequence, and body**, may look different depending on what's needed in the loop.

There are four variations on the basic theme of the **for loop:**
1. **Modifying an existing object**, instead of creating a new object (i.e. rescale every column in a data frame)
2. **Looping over names or values**, instead of indices (i.e. looping over the elements `for (x in xs)`, looping over the names `for (nm in names(xs))`)
3. **Handling outputs of unknown length** (i.e. rather than progressively growing your output vector, save your results in a list and then combine into a single vector once the loops is done)
4. **Handling sequences of unknown length** (i.e. if you don't know how long the input sequence should run for, use a **while loop**)

---
# For loops vs. functionals

You can wrap up for loops in a function and call that function instead of using the for loop directly.

⭐ Remember the rule of thumb: never copy and paste more than 2x

For example, this:

```r
f1 <- function(x) abs(x - mean(x)) ^ 1 # original
f2 <- function(x) abs(x - mean(x)) ^ 2 # copy-and-paste #1
f3 <- function(x) abs(x - mean(x)) ^ 3 # copy-and-paste #2
```

Can be generalized and written as:

```r
f <- function(x, i) abs(x - mean(x)) ^ i
```

---
# The map functions
.pull-left[
To take things a step further, functions from the `purrr`  📦 can help us eliminate the need for many common for loops.
> These are recommended over base R functions `apply()`, `lapply()`, and `tapply()`
]
.pull-right[
**Purrr family of functions to loop over a vector, do something to each element, and save the results:**
- `map()` makes a list.
- `map_lgl()` makes a logical vector.
- `map_int()` makes an integer vector.
- `map_dbl()` makes a double vector.
- `map_chr()` makes a character vector.
]

Map functions are cool, but .em[never feel bad about writing a for loop] instead of using one of them -- what's important is that you solve the problem you're working on! 🤗

---
# The map functions: example

This function:

```r
col_mean <- function(df) {
 output <- vector("double", length(df)) # 1. output
 for (i in seq_along(df)) { # 2. sequence
 output[i] <- mean(df[[i]]) # 3. body
 }
 output
}
```

Can be simplified using a map function and written as:

```r
map_dbl(df, mean)
```

---
# Dealing with failure

🚨 Your chances of seeing an operation fail is higher when you're using map functions to repeat many operations.

🎉 For that, we have `safely()`:
> Recommended over the base R function `try()` which can beinconsistent and more difficult to work with

It takes a function and returns a list with two elements:
1. `result` which is the original result if the function **succeeds** and `NULL` if there was an **error**
2. `error` is `NULL` if the function **succeeds** and an error object if it doesn't.

---
# Dealing with failure: example

```r
safe_log <- safely(log)
str(safe_log(10))
## List of 2
## $ result: num 2.3
## $ error : NULL
str(safe_log("a"))
## List of 2
## $ result: NULL
## $ error :List of 2
## ..$ message: chr "non-numeric argument to mathematical function"
## ..$ call : language .Primitive("log")(x, base)
## ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
```

---
# Dealing with failure: other options

.pull-left[
`possibly()` is simpler than `safely()`

```r
x <- list(1, 10, "a")
x %>% map_dbl(possibly(log, NA_real_))
## [1] 0.000000 2.302585 NA
```
]
.pull-right[
`quietly()` is similar to `safely()` but captures printed output, messages, and warnings, instead of errors.

```r
x <- list(1, -1)
x %>% map(quietly(log)) %>% str()
# PRODUCED A LIST PER ITEM
# ONLY THE RESULT FROM THE 
# SECOND ITEM IS SHOWN

#>  $ :List of 4
#>   ..$ result  : num NaN
#>   ..$ output  : chr ""
#>   ..$ warnings: chr "NaNs produced"
#>   ..$ messages: chr(0)
```
]

---
# Mapping over multiple arguments

Map_*() functions let you iterate over a **single input** but sometimes we have **multiple inputs** that we need to iterate over in parallel.

🎉 For that, we have:

- `map2()` which lets us iterate over two vectors in parallel
- `pmap()` which takes a list of arguments.

---
# Mapping over multiple arguments

Using `map2()` to simulate some random normals with some means and standard deviations (total: 2)
.pull-left[

```r
mu <- list(5, 10, -3) # some means
sigma <- list(1, 5, 10) # some standard deviations

map2(mu, sigma, 
     rnorm, n = 5) %>% str()
## List of 3
##  $ : num [1:5] 4.11 5.57 6.57 4.92 6.08
##  $ : num [1:5] 13.92 5.15 9.4 6.51 21.62
##  $ : num [1:5] 0.0651 -8.1038 -2.2228 1.7309 -16.0956
```
]
.pull-right[
![](https://d33wubrfki0l68.cloudfront.net/68a21c4a103426c3b311c9dcfad8fe379d4892f1/55c9d/diagrams/lists-map2.png)

⭐ Arguments that vary for each call come _before_ the function. 
⭐ Arguments that are the same for every call come _after_.
]

---
# Mapping over multiple arguments

Using `pmap()` to simulate some random normals with some means, standard deviations, and number of samples (total: 3)

.pull-left[

```r
mu <- list(5, 10, -3) # some means
sigma <- list(1, 5, 10) # some standard deviations 
n <- list(1, 3, 5) # some number of samples

args1 <- list(mean = mu, 
 sd = sigma, 
 n = n)

pmap(args1, rnorm) %>% str()
## List of 3
##  $ : num 5.28
##  $ : num [1:3] 14.3 15.7 15.4
##  $ : num [1:5] 8.44 9.98 -4.35 -9.26 -11.41
```
]
.pull-right[
![](https://d33wubrfki0l68.cloudfront.net/6da05576a8c55e4ee1ecb2e2c5c9a35e710abacd/b9ea6/diagrams/lists-pmap-named.png)
]

---
# Mapping over multiple arguments
That's all good and fine, but what if you want to vary the **arguments** _and_ the **function**?

We can use `invoke_map()` for this.

First we define the functions:

```r
f <- c("runif", "rnorm", "rpois")
param <- list(
 list(min = -1, max = 1), 
 list(sd = 5), 
 list(lambda = 10)
)
```

---
# Mapping over multiple arguments

...then we use `invoke_map()`
.pull-left[

```r
# recall f <- c("runif", "rnorm", "rpois")

invoke_map(f, param, n = 5) %>% str()
## List of 3
##  $ : num [1:5] -0.397 -0.773 0.665 -0.151 0.489
##  $ : num [1:5] -7.62 -3.45 7.22 3.93 -2.23
##  $ : int [1:5] 11 13 8 8 16
```
]
.pull-right[
![](https://d33wubrfki0l68.cloudfront.net/46ce0bbefff56809de8d5276120031a21e1bbbf1/753f2/diagrams/lists-invoke.png)
]

⭐ The first argument is a list of functions or character vector of function names. 
⭐ The second argument is a list of lists giving the arguments that vary for each function. 
⭐ The arguments that follow are passed on to every function.

---
# Walk

Can be used when you want to call a function for its _side effects_ rather than for its return value.

Different options: `walk()`, `walk2()`, and `pwalk()`

Used here to save each plot with a file name, to the corresponding location on the computer:

```r
library(ggplot2)
plots <- mtcars %>% 
 split(.$cyl) %>% 
 map(~ggplot(., aes(mpg, wt)) + geom_point()) # list of plots
paths <- stringr::str_c(names(plots), ".pdf") # vector of file names

pwalk(list(paths, plots), ggsave, path = tempdir())
```

---
# Other patterns of for loops

### These aren't used as often, but there are additional purrr functions for other types of for loops

.pull-left[
- **Predicate functions** that return either a single `TRUE` or `FALSE`
 - `keep()`
 - `discard()`
 - `some()`
 - `every()`
 - `detect()`
 - `head_while()`
 - `tail_while()`
]
.pull-right[ 
These functions are useful to reduce a complex list to a simple list: 
- `reduce()`
- `accumulate()`
]
---
class: inverse, center, middle

# The End

##