diamonds %>% select(color, cut) %>%
head() %>%
knitr::kable()| color | cut |
|---|---|
| E | Ideal |
| E | Premium |
| E | Good |
| I | Premium |
| J | Good |
| J | Very Good |
Learn some more about the many uses of group_by()/summarize().
May 24, 2018
Even though I’ve been using the tidyverse for a couple of years, there’s always a couple new applications of tidyverse verbs.
This one, in retrospect, is pretty simple. I had a one to many table that I wanted to collapse, tidy-style. Let’s look at the diamonds dataset:
| color | cut |
|---|---|
| E | Ideal |
| E | Premium |
| E | Good |
| I | Premium |
| J | Good |
| J | Very Good |
What if we wanted to collapse all the entries for each color into a single line? There’s 7 different colors, so we can use a combination of group_by on color and use the paste() function within summarize() to get what we want, which I’ve called all_colors here. By specifying the collapse argument, we can specify the delimiter within that column:
diamonds %>% select(color, cut) %>%
group_by(color) %>%
summarize(all_colors=
paste(cut, collapse=";"))# A tibble: 7 × 2
color all_colors
<ord> <chr>
1 D Very Good;Very Good;Very Good;Good;Good;Premium;Premium;Ideal;Ideal;Ver…
2 E Ideal;Premium;Good;Fair;Premium;Premium;Very Good;Very Good;Very Good;G…
3 F Premium;Very Good;Very Good;Very Good;Good;Premium;Very Good;Very Good;…
4 G Very Good;Ideal;Ideal;Very Good;Premium;Premium;Ideal;Very Good;Ideal;P…
5 H Very Good;Very Good;Very Good;Good;Good;Very Good;Good;Very Good;Very G…
6 I Premium;Very Good;Ideal;Good;Premium;Ideal;Ideal;Ideal;Ideal;Very Good;…
7 J Good;Very Good;Good;Ideal;Ideal;Good;Good;Very Good;Very Good;Very Good…
Thanks to Ken Butler, who pointed out that the tidyverse way (via stringr) is to use str_c instead:
diamonds %>% select(color, cut) %>%
group_by(color) %>%
summarize(all_colors=
stringr::str_c(cut, collapse=";")) # A tibble: 7 × 2
color all_colors
<ord> <chr>
1 D Very Good;Very Good;Very Good;Good;Good;Premium;Premium;Ideal;Ideal;Ver…
2 E Ideal;Premium;Good;Fair;Premium;Premium;Very Good;Very Good;Very Good;G…
3 F Premium;Very Good;Very Good;Very Good;Good;Premium;Very Good;Very Good;…
4 G Very Good;Ideal;Ideal;Very Good;Premium;Premium;Ideal;Very Good;Ideal;P…
5 H Very Good;Very Good;Very Good;Good;Good;Very Good;Good;Very Good;Very G…
6 I Premium;Very Good;Ideal;Good;Premium;Ideal;Ideal;Ideal;Ideal;Very Good;…
7 J Good;Very Good;Good;Ideal;Ideal;Good;Good;Very Good;Very Good;Very Good…
Finally, if we wanted to just get the unique values of the cuts in a single line, we can use unique:
diamonds %>% select(color, cut) %>%
group_by(color) %>%
summarize(all_colors=
paste(unique(cut), collapse=";")) # A tibble: 7 × 2
color all_colors
<ord> <chr>
1 D Very Good;Good;Premium;Ideal;Fair
2 E Ideal;Premium;Good;Fair;Very Good
3 F Premium;Very Good;Good;Fair;Ideal
4 G Very Good;Ideal;Premium;Good;Fair
5 H Very Good;Good;Premium;Fair;Ideal
6 I Premium;Very Good;Ideal;Good;Fair
7 J Good;Very Good;Ideal;Premium;Fair
@online{laderas2018,
author = {Laderas, Ted},
title = {Group\_by/Summarize: {Not} Just for Numeric Values!},
date = {2018-05-24},
url = {https://laderast.github.io/articles/2018-05-24-group-by-summarize-not-just-for-numeric-values/2018-05-24-group-by-summarize-not-just-for-numeric-values.html},
langid = {en}
}