Underrated Tidyverse Functions

Learn about our assignment to teach the tidyverse to each other.

December 1, 2020

The Assignment

I’m teaching an R Programming course next term. Jessica Minnier and I are developing the Ready for R Materials into a longer and more involved course.

I think one of the most important things is to teach people how to self-learn. As learning to program is a lifelong learning activity, it’s critically important to give them these meta-learning skills. So that’s the motivation behind the Tidyverse function of the Week assignment.

I asked on Twitter:

Some of my favorite suggestions

Here are some of the highlights from the thread.

I loved all of these. Danielle Quinn wins the MVP award for naming so many useful functions:

fill() was highly suggested:

Many people suggested the window functions, including lead() and lag() and the cumulative functions:

Alison Hill suggested problems(), which helps you diagnose why your data isn’t loading:

I think that deframe() and enframe() are really exciting, since I do this operation all the time:

unite(), separate() and separate_rows() also had their own contingent:

Wow! Let’s Grab All the Tweets and Replies

I was bowled over by all of the replies. This was an unexpectedly really fun thread, and lots of recommendations from others.

I thought I would try and summarize everyone’s suggestions and compile a list of recommended functions. I used this script with some modifications to pull all the replies to my tweet. In particular, I had to request for extended tweet mode, and I extracted a few more fields from the returned JSON.

This wrote the tweet information into a CSV file.

Then I started parsing the data. I wrote a couple of functions, remove_users_from_text(), which removes the users from a tweet (by looking for words that begin with @) and get_funcs(), which uses a relatively simple regular expression to try to return the function (it looks for paired parentheses () or an underscore “-” to extract the functions). It actually works pretty well, and grabs most of the functions.

Then I use separate_rows() to split the multiple functions into their separate rows. This makes it easier to tally all the functions.

remove_users_from_text <- function(col){
  str_replace_all(col, "\\@\\w*", "")

get_funcs <- function(col){
  out <- str_extract_all(col, "\\w*\\(\\)|\\w*_\\w*")
  paste(out[[1]], collapse=", ")  

parsed_tweets <- tweets %>%
  rowwise() %>%
  mutate(text = remove_users_from_text(text)) %>%
  mutate(funcs = get_funcs(text)) %>%
  ungroup() %>%
  separate_rows(funcs, sep=", ") %>%
  select(date, user, funcs, text, reply, parent_thread) %>%

write_csv(parsed_tweets, file = "cleaned_tweets_incomplete.csv")


At this point, I realized that I just needed to hand annotate the rest of the tweets, rather than wasting my time trying to parse the rest of the cases. So I pulled everything into Excel and just annotated the ones which I couldn’t pull from.

Functions by frequency

Here are the function suggestions by frequency. Unsurprisingly, case_when() (which I cover in the main course), has the most number of suggestions, because it’s so useful. tidyr::pivot_wider() and tidyr::pivot_longer() are also covered in the course.

There are some others which were new to me, and a bit of a surprise, such as coalesce(), fill().

cleaned_tweets <- read_csv("cleaned_tweets.csv") %>% select(-parent_thread) %>%
  mutate(user = paste0("[",user,"](",reply,")")) %>%
Rows: 266 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): date, user, funcs, text, reply, parent_thread

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
functions_by_freq <- cleaned_tweets %>%
  janitor::tabyl(funcs) %>%
  filter(!is.na(funcs)) %>%

write_csv(functions_by_freq, "functions_by_frequency.csv")

functions_by_freq %>%

Cleaned Tweets and Threads

Here’s all of the tweets from this thread (naysayers included). They are in somewhat order (longer threads are grouped).

Here’s a link to the cleaned CSV file


Source Code and Data

Feel free to use and modify.

Thank You

This post is my thank you for everyone who contributed to this thread. Thank you!


BibTeX citation:
  author = {Ted Laderas},
  title = {Underrated {Tidyverse} {Functions}},
  date = {2020-12-01},
  url = {https://laderast.github.io//articles/tidyverse_functions},
  langid = {en}
For attribution, please cite this work as:
Ted Laderas. 2020. “Underrated Tidyverse Functions.” December 1, 2020. https://laderast.github.io//articles/tidyverse_functions.