Advanced R programming

# Advanced R programming
### Katrien Antonio and Jonas Crevecoeur
### KU Leuven and UvA
### 2019-06-05

---

Course material including

- R scripts, data, lecture sheets

- a collection of **cheat sheets**

are available from

<center>

<a href="https://github.com/katrienantonio/workshop-R"target="_blank">https://github.com/katrienantonio/workshop-R</a>

</center>

---

Today you will work on:

- Data structures 
 - Data sets (dplyr)
 - Factor variables 
 - Date management (lubridate)

- Functional programming (purrr)

- Efficient programming
  - Debugging
  - Benchmarking
  - RProject

You will cover examples of code and work on **R challenges**.

---
.title[
Follow-up R basics workshop
]

Today we provide an answer to the questions raised during the **R basics** workshop.

* What about garbage collection in R?

* Why do some variables already exist when I start a new session in RStudio?

* Can RStudio automatically load packages at startup?

* How can I debug my code?

* What is the difference between `%Y` and `%y` in formatting date objects?

---

]

Don't forget to load this package.

```r
#install.packages("tidyverse")
require(tidyverse)
```
]

---

The tidyverse is a collection of R packages sharing the same design philosphy.

`require(tidyverse)` loads the 8 core packages:

&bull; ggplot2 
&bull; dplyr 
&bull; tidyr
</div>

<div class="center-column3">
&bull; readr 
&bull; purrr 
&bull; tibble

</div>

<div class="right-column3">
&bull; stringr 
&bull; forcats
</div>

`install.package(tidyverse)` installs many other packages, including:

&bull; lubridate
</div>

&bull; readxl

</div>

Today you will use 6 packages from the tidyverse!

---

The tidyverse structures the full workflow of a data analyst.

---
class: inverse, center, middle

# Programming style

---

Deciding on a programming style provides consistency to your code and assists in reading and writing code.

The choice of style guide is unimportant, but it is important to choose a style!

This workshop follows a set of rules roughly based on the [tidyverse style guide](https://style.tidyverse.org/).

---

Variable names contain only **lower case** letters. If the name consists of multiple words, then these words are separated by **underscores**.

```r
number_of_simulations <- 100
```

User defined functions follow the same convention as variable names, but start with a capital letter.

```r
Multiply_by_2 <- function(x) {
 return(x * 2)
} 
```

Functions from external packages usually start with a lowercase letter.

```r
zero_list <- rep(0, 100)
```

---

# Data sets

---
.title[
The policy data set
]

- `PolicyData.csv` available in the course material.

- Policy covariates from a motor insurance portfolio.

- Data stored in a `.csv` file.

- Individual records separated by a `semicolon`.

Extract the directory of the current active file in Rstudio. Requires installation of the package `rstudioapi`.

```r
dir <- dirname(rstudioapi::getActiveDocumentContext()$path)
setwd(dir)
```

Read and store the data

```r
policy_data <- read.csv(file = '../data/PolicyData.csv', sep = ';')
```

---

Use the skills you obtained in the **R basics** workshop.

1. Inspect the top rows of the data set.

1. How many observations does the data set contain?

3. Calculate the total exposure (`exposition`) in each region (`type_territoire`).

---

```r
head(policy_data)
```

```
##   numeropol  debut_pol    fin_pol freq_paiement langue
## 1         3 14/09/1995 24/04/1996       mensuel      F
## 2         3 25/04/1996 23/12/1996       mensuel      F
## 3         6  1/03/1995 27/02/1996        annuel      A
## 4         6  1/03/1996 14/01/1997        annuel      A
## 5         6 15/01/1997 31/01/1997        annuel      A
## 6         6  1/02/1997 28/02/1997        annuel      A
```

```r
nrow(policy_data)
```

```
## [1] 39075
```

```r
policy_data %>% 
  group_by(type_territoire) %>%
  summarise(exposure = sum(exposition))
```

```
## # A tibble: 3 x 2
## type_territoire exposure
## <fct> <dbl>
## 1 Rural 684.
## 2 Semi-urbain 16944.
## 3 Urbain 11050.
```
---

]

- Data available in the package `gapminder`.

- Describes the evolution of a number of population characteristics (GDP, life expectancy, ...) over time.

```r
#install.packages("gapminder")
require(gapminder)
```
]

---

Use the skills obtained in the **R basics workshop** to:

1. Inspect the top rows of the data.

2. Select the data for countries in Asia.

3. Which type of variable is `country`?

---

]

```r
head(gapminder)
```

```
## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
```

```r
asia <- filter(gapminder, continent == "Asia")
```

```r
class(gapminder$country)
```

```
## [1] "factor"
```

---
class: inverse

.titlecenter[
 Data structures
]
---
background-image: url("Images/Title/factor.jpg")
background-size: cover

---

]

+ Representation for categorical data.

+ Predefined list of outcomes (levels).

+ Protecting data quality.

---

]

```r
sex <- factor(c('m', 'f', 'm', 'f'),
* levels = c('m', 'f'))
```
The `factor` command creates a new factor variable.

The first input is the categorical variable.
]

---

]

```r
*sex <- factor(c('m', 'f', 'm', 'f'),
 levels = c('m', 'f')) 
```
`levels` specifies the possible outcomes of the variable.
]

---

Assigning an unrecognized level to a factor variable results in a warning.

```r
sex[1] <- 'male'
```

```
## Warning in `[<-.factor`(`*tmp*`, 1, value = "male"): invalid factor level,
## NA generated
```

This protects the quality of the data!

```r
sex
```

```
## [1] <NA> f m f 
## Levels: m f
```

The value `NA` is assigned to the invalid observation.
---

]

```r
  levels(sex)
```

```
## [1] "m" "f"
```
  `levels` prints the allowed outcomes for a factor variable.
]

---

]

```r
 levels(sex) <- c('male', 'female')
 sex
```

```
## [1] male   female male   female
## Levels: male female
```
  Assigning a vector to `levels()` renames the allowed outcomes. 
]
---

The variable `country` in the `gapminder` data set is a factor variable.

1. What are the possible levels for `country` in the subset `asia`.

2. Is this the result you expected?
---

]

```r
levels(asia$country)
```

```
##  [1] "Afghanistan"            "Albania"               
##  [3] "Algeria"                "Angola"                
##  [5] "Argentina"              "Australia"             
##  [7] "Austria"                "Bahrain"               
##  [9] "Bangladesh"             "Belgium"               
## [11] "Benin"                  "Bolivia"               
## [13] "Bosnia and Herzegovina" "Botswana"              
## [15] "Brazil"                 "Bulgaria"              
## [17] "Burkina Faso"           "Burundi"               
## [19] "Cambodia"               "Cameroon"              
##  [ reached getOption("max.print") -- omitted 122 entries ]
```

`asia$country` allows the same outcomes as `gapminder$country`.

This includes many countries outside of Asia.

---

]

```r
asia$country <- droplevels(asia$country)
```
 `droplevels` removes all outcomes which do not appear in the factor variable.
]

---

]

```r
levels(asia$country)
```

```
##  [1] "Afghanistan"        "Bahrain"            "Bangladesh"        
##  [4] "Cambodia"           "China"              "Hong Kong, China"  
##  [7] "India"              "Indonesia"          "Iran"              
## [10] "Iraq"               "Israel"             "Japan"             
## [13] "Jordan"             "Korea, Dem. Rep."   "Korea, Rep."       
## [16] "Kuwait"             "Lebanon"            "Malaysia"          
## [19] "Mongolia"           "Myanmar"            "Nepal"             
## [22] "Oman"               "Pakistan"           "Philippines"       
## [25] "Saudi Arabia"       "Singapore"          "Sri Lanka"         
## [28] "Syria"              "Taiwan"             "Thailand"          
## [31] "Vietnam"            "West Bank and Gaza" "Yemen, Rep."
```

---
.title[
  Add level
]

]

```r
levels(sex) <- c(levels(sex), 'x')
```
Adds `x` as a new allowed outcome for the variable `sex`. 
]

---
.title[
  cut()
]

]

```r
cut(gapminder$pop,
*   breaks = c(0, 10^7, 5*10^7, 10^8, Inf))
```
`cut()` bins a numeric variable into a factor variable.

We bin the number of inhabitans in a country (`gapminder$pop`).
]

---
.title[
  cut()
]

]

```r
*cut(gapminder$pop,
    breaks = c(0, 10^7, 5*10^7, 10^8, Inf)) 
```
`breaks` specifies the cutoff values.
]

---

Bin the life expectancy in 2007 in a factor variable.

1. Select the observations for year 2007.

2. Bin the life expectancy in four bins of roughly equal size (hint: `quantile`).

3. How many observations are there in each bin?
---

]

```r
gapminder2007 <- filter(gapminder, year == 2007)
breaks <- c(0, quantile(gapminder2007$lifeExp, c(0.25, 0.5, 0.75)), Inf)
breaks 
```

```
##               25%      50%      75%          
##  0.00000 57.16025 71.93550 76.41325      Inf
```

```r
gapminder2007 <- gapminder2007 %>% 
 mutate(life_expectancy_binned = cut(gapminder2007$lifeExp, breaks))

gapminder2007 %>%
  group_by(life_expectancy_binned) %>%
  summarise(frequency = n())
```

```
## # A tibble: 4 x 2
## life_expectancy_binned frequency
## <fct> <int>
## 1 (0,57.2] 36
## 2 (57.2,71.9] 35
## 3 (71.9,76.4] 35
## 4 (76.4,Inf] 36
```

---

```r
*ggplot(gapminder2007) +
  geom_bar(aes(life_expectancy_binned))
```

`geom_bar` takes a factor variable and creates a bar plot.

---

```r
*ggplot(gapminder2007) +
* geom_bar(aes(life_expectancy_binned,
               fill = continent))
```

`fill = continent` selects a different fill color for each continent.

---

```r
*ggplot(gapminder2007) +
* geom_bar(aes(life_expectancy_binned,
*              fill = continent),
           position = position_dodge())
```

`position = position_dodge()` shows the bars side-by-side instead of stacked.

---

```r
*ggplot(gapminder2007) +
* geom_bar(aes(life_expectancy_binned,
*              fill = continent,
               y = ..prop.., group = continent), 
*          position = position_dodge())
```

`y = ..prop..` and `group = continent` plot the proportion within each group instead of the absolute count.

---

---

]

You will learn to:

* store dates in the `Date` format in R

* convert text and numerical variables into a `Date` object

* perform basic calculations with dates

* start with base R and continue with lubridate.

---

]

```r
as.Date('2019-06-05', 
*       format = '%Y-%m-%d)
```
`as.Date` converts text into an R `Date` object.

First input is a vector of dates in text format.
]

---

```r
*as.Date('2019-06-05',
 format = '%Y-%m-%d) 
```
The `format` describes the structure of the input. 
* `%Y`: Year, 4 digit notation
* `%m`: Month number
* `%d`: Day of the month.
]

For a full list of formating options, see

```r
?strptime
```

---

]

```r
as.Date(21705, origin = '1960-01-01')
```
Dates are often stored as integers.

Convert integers to dates by speciying the origin (Day 0).

For example: SAS stores dates at the number of days elapsed since 1 Jan 1960. 
]

---

Work with the `policy_data` data set.

1. Convert the start date (`debut_pol`) and end date (`fin_pol`) into R `Date` objects.
---

]

```r
policy_data$start <- as.Date(policy_data$debut_pol, '%d/%m/%Y')
policy_data$end <- as.Date(policy_data$fin_pol, '%d/%m/%Y')
```

```r
head(policy_data %>% select(c('debut_pol', 'start')))
```

```
##    debut_pol      start
## 1 14/09/1995 1995-09-14
## 2 25/04/1996 1996-04-25
## 3  1/03/1995 1995-03-01
## 4  1/03/1996 1996-03-01
## 5 15/01/1997 1997-01-15
## 6  1/02/1997 1997-02-01
```

```r
class(policy_data$start)
```

```
## [1] "Date"
```

---

]

```r
*today <- as.Date('2019-06-05',
* format = '%Y-%m-%d')
format(today, '%A %d %B %Y')
```

```
## [1] "woensdag 05 juni 2019"
```
`format` converts a date into text 
* `%A`: full weekday name 
* `%B`: full month name
]

---
.title[
  Adding and subtracting dates
]

Calculate the duration of a contract.
.code[

```r
policy_duration = 
  policy_data$end - policy_data$start
```
Subtracting dates calculates the number of days elapsed between these dates.

```r
tomorrow = today + 1
print(tomorrow)
```

```
## [1] "2019-06-06"
```
You can add and subtract integers from dates.
]

---
.title[
  Lubridate
]

For more advanced `Date` manipulations use the `lubridate` package.
.code[

```r
# install.packages("lubridate")
require(lubridate)
```
]

---

]

```r
year(today)
```

```
## [1] 2019
```
`year()` selects the calendar year component from the date.

Other components are: `month()`, `day()`, `quarter()`, ...

---

]

```r
today + months(3)
```

```
## [1] "2019-09-05"
```
`+ months(3)` adds three months to the Date object.

Other periods are: `years()` and `days()`.
]

---

]

```r
floor_date(today, unit = "month")
```

```
## [1] "2019-06-01"
```
`floor_date` rounds down to the nearest unit.

In the example convert daily into monhtly data.
]

---

]

```r
seq(from = as.Date('2019-01-01'), 
    to = as.Date('2019-12-31'), 
    by = '1 month')
```

```
##  [1] "2019-01-01" "2019-02-01" "2019-03-01" "2019-04-01" "2019-05-01"
##  [6] "2019-06-01" "2019-07-01" "2019-08-01" "2019-09-01" "2019-10-01"
## [11] "2019-11-01" "2019-12-01"
```
Generate a sequence of dates, useful in loops.
]

---

Visualize the exposure contribution by start month of the contract in the `policy_data` data set.

1. Add a covariate `start_month` to the data set.

2. Group the data by `start_month`.

3. Calculate the exposure within each group.

4. Plot the data.

---

```r
exposure_by_month <- policy_data %>%
 mutate(start_month = floor_date(policy_data$start, unit = 'month')) %>%
 group_by(start_month) %>% 
 summarize(exposure = sum(exposition))
 
ggplot(exposure_by_month) +
 geom_point(aes(start_month, exposure))
```

---
class: inverse

---

A functional is a function which takes a function as input.

Example: the integral operator

$$ \int_0^{1}: C([0, 1]) \to \mathbb{R}, f \mapsto \int_0^{1} f(x) \, dx,$$
where `$C([0, 1])$` is the set of continuous functions on `$[0, 1]$`.

```r
f <- function(x){
 x^2
}
integrate(f, lower = 0, upper = 1)
```

```
## 0.3333333 with absolute error < 3.7e-15
```

---

This approach offers:

* an intuitive alternative for loops.
  
* code that is easy to read and interpret

**Example**:

Is there a policyholder who is a minor in the data set becomes:

```r
some(policy_data$age, Is_minor)
```

* Easily modifiable and reusable code.

* No need to copy/paste the same code many times.

* If you use something twice, put it in a function.

---

]

```r
# install.packages(purrr)
require(purrr)
```

]

---

![](Images/purrr/map.png)
.caption[
Illustration from the [purrr cheat sheet](https://github.com/rstudio/cheatsheets/blob/master/purrr.pdf)
]

```r
map(policy_data, class)
```

Apply the function `class` to each column of the data set `policy_data`.
]

---

```r
map(policy_data, class)
```

```
## $numeropol
## [1] "integer"
## 
## $debut_pol
## [1] "factor"
## 
## $fin_pol
## [1] "factor"
## 
## $freq_paiement
## [1] "factor"
## 
##  [ reached getOption("max.print") -- omitted 20 entries ]
```

The output of `map` is stored in a `list`.

---

```r
map_chr(policy_data, class)
```

```
## numeropol debut_pol fin_pol 
## "integer" "factor" "factor" 
## [ reached getOption("max.print") -- omitted 21 entries ]
```
If the output should be of class `character`, use `map_chr`. 
* Additional type check when applying the function.

* Returns a vector instead of a list.

Other `map` output types: 
* `map_int` for integers 
* `map_dbl` for doubles
]

---

]

```r
factor_data <- keep(policy_data, 
 is.factor)
```
Keeps the columns of `policy_data` for which the function `is.factor` returns `TRUE`. 
]

---

]

```r
non_factor_data <- discard(policy_data, 
 is.factor)
```

Discards the columns of `policy_data` for which the function `is.factor` returns `TRUE`.  
]

---

]

```r
every(policy_data$cout1, is.na)
```

```
## [1] FALSE
```

Returns `TRUE` if and only if `is.na` returns TRUE for all observations in `policy_data$cout1`.  
]

---

]

```r
some(policy_data$cout1, is.na)
```

```
## [1] TRUE
```

Returns `TRUE` if `policy_data$cout1` contains an observation for which `is.na` returns TRUE.
]

---

]

```r
Is_minor <- function(age) {
 return(age < 18)
}
some(policy_data$age, Is_minor)
```

```
## [1] FALSE
```

Unlock the true potential of the purrr functionals by applying them to user-defined functions.
]

---

- Discard all columns containing `NA` observations from the `policy_data` data set.

---

]

```r
Some_na <- function(x) {
 some(x, is.na)
}

policy_data_cleaned <- policy_data %>%
 discard(Some_na)
```

You can abbreviate this code as

```r
policy_data_cleaned <- policy_data %>%
 discard(~ some(.x, is.na))
```

Here `~[code]` is an abbreviation for `function(.x){[code]}`. 
You can use this for simple functions with a single input.

---

]

```r
*policy_data %>%
* group_by(utilisation) %>%
  nest()
```

`nest()` splits the data set into smaller sub sets based on the `group_by` category
]

---
.title[
nest()
]

```r
nested_data <- policy_data %>% 
 group_by(utilisation) %>% 
 nest()

print(nested_data)
```

```
## # A tibble: 3 x 2
## utilisation data 
## <fct> <list> 
## 1 Travail-quotidien <tibble [10,867 x 23]>
## 2 Travail-occasionnel <tibble [25,966 x 23]>
## 3 Loisir <tibble [2,242 x 23]>
```

`nested_data` contains three sub data sets. One for each type of vehicle use.

---

We investigate the variable `utilisation` (vehicle use) in the policy data set.

1. Remove observations without claim cost (`cout` = 0)

2. Group the data by `utilisation`.

3. Calculate the number of observations and average severity for each group.

---

]

```r
policy_data %>% 
  filter(cout > 0) %>%
  group_by(utilisation) %>%
  summarise(avg = mean(cout), count = n())
```

```
## # A tibble: 3 x 3
## utilisation avg count
## <fct> <dbl> <int>
## 1 Loisir 13258. 408
## 2 Travail-occasionnel 8533. 3938
## 3 Travail-quotidien 9537. 1343
```

`summarise` is useful for computing basic statistics for individual variables from the data set.

---
.title[
R challenge
]

We want to investigate the claim cost of young drivers (age < 30). Specifically we want to see whether the cost depends on the use of the vehicle.

We will verify this by:
* Fitting a linear regression model 
`$$\text{cout} \sim  \text{young_driver}$$`
on subsets of the data split by vehicle use.

* Comparing the fitted parameter for the young driver effect.

---
.title[
R challenge
]

Add a variable for young drivers and split the data by `utilisation`.

```r
nested_data <- policy_data %>% 
 mutate(young_driver = age < 30) %>%
 filter(cout > 0) %>%
 group_by(utilisation) %>%
 nest()
```

Create a function for fitting our linear model.

```r
Fit_model <- function(df) {
 lm(cout ~ young_driver, data = df)
}
```

---

Fit the linear model on each subset of the data using `map`.

```r
result <- nested_data %>%
 mutate(model = map(data, Fit_model))
```

Select the young driver effect from the model.

```r
Select_coef <- function(fit){
 coefficients(fit)["young_driverTRUE"]
}

result <- result %>% 
 mutate(effect_young_driver = map_dbl(model, Select_coef))
```

---

```r
result %>% select(utilisation, effect_young_driver)
```

```
## # A tibble: 3 x 2
## utilisation effect_young_driver
## <fct> <dbl>
## 1 Travail-occasionnel 165.
## 2 Travail-quotidien 4693.
## 3 Loisir -3650.
```

The advantages of having used functionals:
* easily modifiable code: what do you have to change in order to explore the interaction between young_drivers and terrain?

* no loops

* readable, intuitive code.

---

---

R provides four methods for giving feedback:

* print

* message

* warning

* stop

---

Here is a function to calculate `$n! = n \cdot (n-1) \cdot \ldots 1$`

```r
Factorial <- function(n) { 
 result <- 1 
 for(i in 1:n) { 
 result <- result * i 
 } 
 return(result) 
*}

fac5 <- Factorial(5)
```

Let's add some messages to this function!

---

```r
*Factorial <- function(n) {
* result <- 1
* for(i in 1:n) {
* result <- result * i
* }
* return(result)
*}

*fac5 <- Factorial(5)
print(fac5)
```

```
## [1] 120
```

Use `print()` when you want to display the value of a variable in the console.

---

```r
Factorial <- function(n) { 
 result <- 1 
 for(i in 1:n) { 
 result <- result * i
 message("step ", i, " from ", n, " -- result: ", result)
 } 
 return(result) 
}

fac3 <- Factorial(3) 
```

```
## step 1 from 3 -- result: 1
```

```
## step 2 from 3 -- result: 2
```

```
## step 3 from 3 -- result: 6
```

Use `message()` for diagnostic information, such as runtime and intermediary results.

---

What could go wrong when calling `Factorial(n)`?

* `n` might be non-integer, for example: 1.2

A function for checking whether n is an integer:

```r
Is_int <- function(n) {
 return(n == round(n))
}
```

* `n` can be negative, for example: -5

* `n` can be non-numeric, for example: "5", `c(3, 5)`.

---

```r
Factorial <- function(n) {
 if(!Is_int(n)){
 warning("non-integer value (n = ", n, ") in Factorial(). Calculating gamma(n+1).")
 return(gamma(n+1))
 }
 result <- 1 
 for(i in 1:n) { 
 result <- result * i 
 } 
 return(result) 
}

fac3.5 <- Factorial(3.5) 
```

```
## Warning in Factorial(3.5): non-integer value (n = 3.5) in Factorial().
## Calculating gamma(n+1).
```

Use `warning()` for suspicious, but non critical events.

---

```r
Factorial <- function(n) {
 if(!is.numeric(n) || n < 0){
 stop('Argument "n" in Factorial() must be a positive number.')
 }
 result <- 1 
 for(i in 1:n) { 
 result <- result * i 
 } 
 return(result) 
}

fac5 <- Factorial("5") 
```

```
## Error in Factorial("5") : 
##   Argument "n" in Factorial() must be a positive number.
```

Use `stop()` for critical events that demand an immediate interruption of the code.

---

]

* You fit a regression model `$y \sim x_1 + x_2$`, but the input covariates `$x_1$`, `$x_2$` are colinear.

```r
warning("Colinearity detected in the regression variables.")
```

* You use an iterative algorithm to fit parameters of a model. You want to display the optimal parameters after each iteration.

```r
message("iteration ", i, ": ", param)
```

---

* You have a function with inputs `$\mu$` and `$\sigma > 0$`. 
The input for `$\sigma$` is negative.

```r
stop("Positive value required for sigma parameter.")
```

* You fitted a model and want to inspect the fitted coeficients

```r
print(coefficients(fit))
```
or

```r
coefficients(fit)
```

---

Finding an error is often harder than solving the error.

The function below computes the largest element in a vector `x`.

```r
Largest <- function(x) {
 result <- -Inf
 for(i in 1:length(x)) {
 if(x[i] > result) {
 result <- x[i]
 }
 }
 return(result)
}
```

---

We generate a set of test inputs

```r
set.seed(31415) # For reproducibility

candidates <- rpois(100, lambda = 2) %>%
 map(runif)
head(candidates, 2)
```

```
## [[1]]
## [1] 0.9841379 0.1578220 0.1479970 0.4958537 0.3058589
## 
## [[2]]
## [1] 0.8507827
```

We have an error? Why?

```r
map(candidates, Largest)
```

```
## Error in if (x[i] > result) { : missing value where TRUE/FALSE needed
```

---

**In RStudio**: Debug > On Error > Break in Code.

![](Images/debug/break.png)
---

The error inspector opens when an error `stop()` occurs:

---

When `length(x)` is zero, the for loop iterates over `c(1, 0)`.

```r
Largest <- function(x) {
 result <- -Inf
 for(i in 1:length(x)) {
 if(x[i] > result) {
 result <- x[i]
 }
 }
 return(result)
}
```

Advice: replace `1:length(x)` by `seq_along(x)` in for loops.

```r
print(seq_along(c()))
```

```
## integer(0)
```

---

]
.code[

```r
f <- function() {
 ...
 browser()
 ...
}
```
The error inspector opens when `browser()` is reached. 
No error message required!
]

---

---
.title[
Sys.time()
]

]

```r
start_time <- Sys.time()
x <- runif(10^7)
end_time <- Sys.time()
end_time - start_time
```

```
## Time difference of 0.459038 secs
```
Use `Sys.time()` to measure the elapsed time between two steps of the program and detect efficiency bottlenecks.
]

---
.title[
microbenchmark
]

]

* Determine the most efficient implementation, when runtime is important.

```r
#install.packages("microbenchmark")
require(microbenchmark)
```
]

---
.title[
  R challenge
]

We implement a function called `Column_product`, which multiplies the `$i$`-th column of a matrix `m` by the `$i$`-th element in a `v`.

`$$\texttt{Column_product}(m = \begin{bmatrix}1 & 2\\3 & 4\end{bmatrix}, v = \begin{bmatrix}2\\4\end{bmatrix}) = \begin{bmatrix}2 & 8\\6 & 16\end{bmatrix}$$`

]

Create your own implementation of `Column_product()`.

---

Intuitive solution (for-loop):

```r
Column_product <- function(m, v){
 for(j in 1:ncol(m)) {
 m[, j] <- m[, j] * v[j]
 }
 return(m)
}
```