Understanding Tidy Evaluation in R

A metaphor for understanding Tidy Evaluation
Author
Affiliation
Published

December 19, 2017

Introduction

Have you ever had something that no matter how many times someone explained it, you really had no idea what it was for? For me, that was Non Standard Evaluation (NSE) in R, and its newer cousin Tidy Evaluation, or tidyeval. I had a real learning block about it. I really wanted to understand it, but for some reason I just really wasn’t getting the general concepts.

What is evaluation, really? For the longest time, I was extremely confused about it. When you provide an expression to R such as:

library(tidyverse)
library(rlang)
this_variable <- 2
this_variable * 6
[1] 12

You notice that there is an output to this_variable * 6, which is 12. Evaluation is really about looking up variable names in an environment and then acting on the results. What is going on here is that R looks for an object that is named this_variable in our global environment, and then returns the value, 2, which it then substitutes in the expression. So our original expression:

this_variable * 6
[1] 12

Becomes this expression:

2 * 6
[1] 12

Which R knows how to calculate, the output of which is 12. But sometimes you want to pass an expression or a variable, as is, without evaluating it first. The best case for this is to passing a variable into a function. We can do this by wrapping them up in quosures or enquosures.

Enter quosures

A quosure and an enquosure can be thought of as envelopes around an object. They obscure certain properties of the object until they can be delivered into a function. The envelopes basically are a way to sneak variables and expressions into a function’s environment. When the envelope is in the function, we can open it up and evaluate what’s in the envelope. The trick to NSE and tidyeval is that we can control when the function evaluates the expression, by controlling when we open this envelope. We do this by using the UQ() or !! functions.

In other words, quosures and enquosures are ways to prevent R from looking up a variable’s value in our current environment (usually the global environment), and delay this lookup until we get them into the environment of interest. This might be one level down (in our function of interest), or several levels down (in a function called by our function).

The point is, R won’t open the envelope with our variable in it until we tell it to.

Why Should I Care????

The short answer: if you want to write functions that directly work with the tidyverse, you need to understand tidyeval.

The best way to understand why you need to do this is to write a function that takes a data.frame and a reference to a column within that data.frame. You might notice that we can directly refer to a column in a data.frame for select, for example:

mtcars %>% select(cyl) %>% head(10)
                  cyl
Mazda RX4           6
Mazda RX4 Wag       6
Datsun 710          4
Hornet 4 Drive      6
Hornet Sportabout   8
Valiant             6
Duster 360          8
Merc 240D           4
Merc 230            4
Merc 280            6

Why does that work? This is the power of NSE and tidy evaluation. Basically, by wrapping up cyl in an envelope, we prevent R from evaluating it right away. We can then pass the envelope into other functions, or environments, and then tell R to remove the envelope and then evaluate it.

Let’s try and mimic this. We’ll write a function grab_col(x, colname) which returns the values in the column whose name we ask for as an object. If we do this, without tidyeval, this will happen.

grab_col <- function(x, colname){
  x %>%
    pull(colname)
}

Try running grab_col(mtcars, colname=cyl). You’ll get an error that cyl does not exist as an object. Augh! This is harder than we thought.

How can we fix this? We can wrap colname up in an enquosure using the enquo() function. Once it’s into pull(), we use UQ() to open the envelope and R knows that it should look in the data.frame’s environment for our colname:

library(rlang)

grab_col <- function(x, colname){
  ##wrap up colname in an enquosure
  cc <- rlang::enquo(colname)

  ##use UQ to evaluate it within the pull function
  x %>%
    pull(
      ## unquote and evaluate (open the envelope!)
      UQ(cc)
      )
}

grab_col(mtcars, colname=cyl)
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

Now try grab_col(mtcars, colname=cyl). Nifty, huh?

With quosures, values can come along for the ride

Why would we use quosures at all, instead of enquosures? Because with quosures we can actually bring some needed values along for the ride.

What about lots of arguments?

That’s what quos() is for. Ever notice that you can specify a number of unnamed arguments by specifying a ... in your function definition? And did you ever notice that select() can take lots of arguments such as select(mpg, cyl, wt)? That is the power of ... combined with quos()!

quos takes a list and makes each element of the list a quosure.

What about expressions?

Say we wanted to pass an expression such as cyl > 2 into our function. We’ll need to wrap it up in enexpr() instead of enquo():

filter_on_column <- function(x, col_expr){
  c_e <- rlang::enexpr(col_expr)

  x %>%
    ## The !! (called a bangbang) is just another way to use UQ()
    ## I don't really like it, I'd rather use UQ()
    filter(!! c_e)
}

#pass in a simple expression
mtcars %>% filter_on_column(cyl > 2) %>% head(5)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#pass in a compound expression
mtcars %>% filter_on_column(cyl > 2 & qsec > 18) %>% head(5)
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2

Be really careful with !!

In the above example, we used !!, called a bangbang, to unquote and evaluate our expression. Be really careful with what you put after the !!, since everything after it will be evaluated. If you have elements after the expression you don’t want to unquote, wrap the !! up in a set of parentheses:

bang <- function(val2){
  x <- enquo(val2)
  return((!! x) + 10)
}

bang(5)

Other applications

One of the coolest applications of NSE is to write code that writes code. You have to be very careful with this, but it’s potentially really useful. On my list of things to do for my flowDashboard package is to write code that generates a standalone app given the data objects you supply it.

For more information

Hopefully this was helpful in understanding NSE and tidyeval. I find that sometimes I have to write things up so I more clearly understand it. So, if anything, writing this was useful for clarifying my thinking.

I’m indebted to Edwin Thoen’s code examples that helped me finally understand what’s going on with tidyeval: https://edwinth.github.io/blog/dplyr-recipes/

I didn’t really talk about Base-R’s NSE, but I would say that this should at least give you enough background to understand what’s going on there.

Citation

BibTeX citation:
@online{laderas2017,
  author = {Laderas, Ted},
  title = {Understanding {Tidy} {Evaluation} in {R}},
  date = {2017-12-19},
  url = {https://laderast.github.io/posts/2017-12-19-understanding-tidyeval/},
  langid = {en}
}
For attribution, please cite this work as:
Laderas, Ted. 2017. “Understanding Tidy Evaluation in R.” December 19, 2017. https://laderast.github.io/posts/2017-12-19-understanding-tidyeval/.