Advanced-R-Solutions/2-11-Function_operators.Rmd at master · anouel/Advanced-R-Solutions · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
```{r, include=FALSE}
source("common.R")
```

# Function operators

```{r}
library(purrr)
```


## Existing function operators

1. __<span style="color:red">Q</span>__: Base R provides a function operator in the form of `Vectorize()`. What does it do? When might you use it?

   __<span style="color:green">A</span>__: In R a lot of the common operations are "vectorized". This often means, that a computation is applied to the whole object (say an atomic vector) by default and because these operations are often implemented in a compiled language such as C they are also very fast.

   Despilte what the Functions name and the documentation (`?Vectorize`) tell us, this function operator is not able to speed up the provided function. It rather changes the input format of the arguments, so that it can be iterated over. Because of this it works best for scalar functions.

   In essence, `Vectorize` is mostly a wrapper for `mapply`. Let's take a look at an example from the documentation:

    ```{r}
    vrep <- Vectorize(rep.int)
    vrep

    # application
    vrep(1:2, 3:4)
    vrep(times = 1:2, x = 3:4)  # naming arguments still works
    ```

   `Vectorize()` provides a convenient and concise notation to iterate over multipe arguments. If you combine it with parallelization, it may also speed up your code consitently. If you already use the tidyverse, you may also consider the following type stable alternative:

    ```{r}
    map2(1:2, 3:4, rep.int)
    ```

2. __<span style="color:red">Q</span>__: Read the source code for `possibly()`. How does it work?

   __<span style="color:green">A</span>__: When we look at the source code, that `possibly()` is mostly a wrapper around `tryCatch()`:

    ```{r}
    possibly
    ```

   The function accepts a function (`.f`) as input and passing it through `as_mapper()` enables purrr's convenient as function interface. Afterwords the evaluation of the provided value for the "default return value" (in case of an error) (`otherwise`) is forced.

   Now a function wrapping `tryCatch()` is created and returned by the function operator. This function will evaluate the provided function and return the its output. In case of an error, the value of `otherwise` will be returned instead.

   (The desired behaviour following an user-interrupt is also specified as stopping and printing an informative error message. There is an option to opt-out of a quiet/silent mode, which will print any occurring error message to the console.)

3. __<span style="color:red">Q</span>__: Read the source code for `safely()`. How does it work?

   __<span style="color:green">A</span>__: `safely()` returns a function created by `capture_error()`, which we have to inspect in order to understand what this function operator does:

    ```{r}
    safely
    ```

    ```{r}
    purrr:::capture_error
    ```

   Here `tryCatch()` evaluates the `code`, which is passed to it, within a list with the two elements "results" and "error". If the code evaluates without an error, the returned value be assigned to the "results" element of this list.

   In cas of an error, the list will have the same structure, but "results" will be `otherwise` (`NULL` by default) and the "error" element will contain the error message. Take a look at the textbook or the documentation of `safely()` to see how you can take advantage of this behaviour, for example when fitting many models with R.

## Case study: Creating your own function operators

1. __<span style="color:red">Q</span>__: Weigh the pros and cons of `download.file %>% dot_every(10) %>% delay_by(0.1)` vs `download.file %>% delay_by(0.1) %>% dot_every(10)`.

   __<span style="color:green">A</span>__:

2. __<span style="color:red">Q</span>__: Should you memoise `file.download()`? Why/why not?

   __<span style="color:green">A</span>__: Before you consider to memoise `file.download()`, ensure that the files you want to download under a specific URL don't change. Otherwise, it makes sense to memoise `file.download()` in scenarios where objects are downloaded repeatedly and downloads might take a little longer. However, the downside when caching results in memory is that the regarding amount of memory will not be available for further tasks during the R session. Therefore, it doesn't make sense to memoise `file.download()` when it is needed to download very large files, lots of different files or time is not an issue. As the meaning of these terms varies regarding differing situations, it always depends on the use case and one has to weigh up these trade offs carefully.

3. __<span style="color:red">Q</span>__: Create a function operator that reports whenever a file is created or deleted in the working directory, using `dir()` and `setdiff()`. What other global function effects might you want to track?

   __<span style="color:green">A</span>__: We start with a short version to show the idea:

    ```{r, eval = FALSE}
    track_dir <- function(f){
      force(f)
      function(...){
        dir_old <- dir()
        on.exit(if(!setequal(dir(), dir_old)){
          message("files in your working directory were deleted or added by this function")})
        f(...)
      }
    }

    # the following test will create the file "delete_me" in your working directory
    td <- track_dir(dir.create)
    td("delete_me")
    ```

    Of course we can provide more information on the type of changes:

    ```{r, eval = FALSE}
    track_dir <- function(f){
      force(f)
      function(...){
        dir_old <- dir()

        on.exit(if(!setequal(dir(), dir_old)){
          message("Files in your working directory were deleted or added by this
                  function.")}, add = TRUE)
        on.exit(if(length(setdiff(dir_old, dir()) != 0)){
          message(paste0("The following files were deleted: ",
                         paste(setdiff(dir_old, dir()), collapse = ", ")
                         ))}, add = TRUE)
        on.exit(if(length(setdiff(dir(), dir_old) != 0)){
          message(paste0("The following files were added: ",
                         paste(setdiff(dir(), dir_old), collapse = ", ")
                         ))}, add = TRUE)

        f(...)
      }
    }

    # the following test will again create two files in your working directory
    td <- track_dir(sapply)
    td(c("delete_me", "me_too"), dir.create)
    ```

   Other global effects that might be worth tracking include changes regarding:

   * the search path and/or introduced `conflicts()`
   * `options()` and `par()` which modify global settings
   * the path of the working directory
   * environment variables
   * the locale.


4. __<span style="color:red">Q</span>__: Write a function operator that logs a timestamp and message to a file every time a function is run.

   __<span style="color:green">A</span>__: Note that the example will create a file file in your current working directory:

    ```{r, eval = FALSE}
    logger <- function(f, filename){
      force(f)
      filename_tmp <- paste(filename, basename(tempfile()), sep = "_")
      write(paste("created at:", Sys.time()), filename_tmp, append = TRUE)
      function(..., message = "you can add a message at each call") {
        write(paste0("used at: ", Sys.time(), ", ", message), filename_tmp, append = TRUE)
        f(...)
      }
    }

    # the following line creates a file, which name starts with "mean_log_"
    mean2 <- logger(mean, "mean_log")
    mean2(1:4, message = "first time")
    mean2(1:4, message = "second_time")
    ```

5. __<span style="color:red">Q</span>__: Modify `delay_by()` so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called `g <- delay_by(1, f); g(); Sys.sleep(2); g()` there shouldn't be an extra delay.

   __<span style="color:green">A</span>__: We can do this with three little tricks (and the help
    of 42):

    ```{r, eval = FALSE}
    delay_by_v2 <- function(delay, f) {
      force(f)
      # we initialise the timestamp for the last run. We set a specific default value,
      # to ensure that the first run of the returned function will never be delayed
      last_runtime <- Sys.time() - (delay + 42)
      function(...) {
        # we continually check if enough time passed with an (empty) while statement.
        while (Sys.time() < last_runtime + delay) {}
        # we override the start for the next waiting interval.
        # Note that this is done on exit (after the function is evaluated)
        on.exit(last_runtime <<- Sys.time())
        return(f(...))
      }
    }
    ```

   Alternatively to the empty while statement we could have used `Sys.sleep()`. I would not recommend this solution, since `?Sys.sleep` indicates that `Sys.sleep()` might have some overhead and seems not to be as exact as we need.