%>% select(color, cut) %>%
diamonds head() %>%
::kable() knitr
color | cut |
---|---|
E | Ideal |
E | Premium |
E | Good |
I | Premium |
J | Good |
J | Very Good |
Learn some more about the many uses of group_by()
/summarize()
.
May 24, 2018
Even though I’ve been using the tidyverse
for a couple of years, there’s always a couple new applications of tidyverse
verbs.
This one, in retrospect, is pretty simple. I had a one to many table that I wanted to collapse, tidy-style. Let’s look at the diamonds
dataset:
color | cut |
---|---|
E | Ideal |
E | Premium |
E | Good |
I | Premium |
J | Good |
J | Very Good |
What if we wanted to collapse all the entries for each color into a single line? There’s 7 different colors, so we can use a combination of group_by
on color
and use the paste()
function within summarize()
to get what we want, which I’ve called all_colors
here. By specifying the collapse
argument, we can specify the delimiter within that column:
diamonds %>% select(color, cut) %>%
group_by(color) %>%
summarize(all_colors=
paste(cut, collapse=";"))
# A tibble: 7 × 2
color all_colors
<ord> <chr>
1 D Very Good;Very Good;Very Good;Good;Good;Premium;Premium;Ideal;Ideal;Ver…
2 E Ideal;Premium;Good;Fair;Premium;Premium;Very Good;Very Good;Very Good;G…
3 F Premium;Very Good;Very Good;Very Good;Good;Premium;Very Good;Very Good;…
4 G Very Good;Ideal;Ideal;Very Good;Premium;Premium;Ideal;Very Good;Ideal;P…
5 H Very Good;Very Good;Very Good;Good;Good;Very Good;Good;Very Good;Very G…
6 I Premium;Very Good;Ideal;Good;Premium;Ideal;Ideal;Ideal;Ideal;Very Good;…
7 J Good;Very Good;Good;Ideal;Ideal;Good;Good;Very Good;Very Good;Very Good…
Thanks to Ken Butler, who pointed out that the tidyverse
way (via stringr
) is to use str_c
instead:
diamonds %>% select(color, cut) %>%
group_by(color) %>%
summarize(all_colors=
stringr::str_c(cut, collapse=";"))
# A tibble: 7 × 2
color all_colors
<ord> <chr>
1 D Very Good;Very Good;Very Good;Good;Good;Premium;Premium;Ideal;Ideal;Ver…
2 E Ideal;Premium;Good;Fair;Premium;Premium;Very Good;Very Good;Very Good;G…
3 F Premium;Very Good;Very Good;Very Good;Good;Premium;Very Good;Very Good;…
4 G Very Good;Ideal;Ideal;Very Good;Premium;Premium;Ideal;Very Good;Ideal;P…
5 H Very Good;Very Good;Very Good;Good;Good;Very Good;Good;Very Good;Very G…
6 I Premium;Very Good;Ideal;Good;Premium;Ideal;Ideal;Ideal;Ideal;Very Good;…
7 J Good;Very Good;Good;Ideal;Ideal;Good;Good;Very Good;Very Good;Very Good…
Finally, if we wanted to just get the unique values of the cuts in a single line, we can use unique
:
diamonds %>% select(color, cut) %>%
group_by(color) %>%
summarize(all_colors=
paste(unique(cut), collapse=";"))
# A tibble: 7 × 2
color all_colors
<ord> <chr>
1 D Very Good;Good;Premium;Ideal;Fair
2 E Ideal;Premium;Good;Fair;Very Good
3 F Premium;Very Good;Good;Fair;Ideal
4 G Very Good;Ideal;Premium;Good;Fair
5 H Very Good;Good;Premium;Fair;Ideal
6 I Premium;Very Good;Ideal;Good;Fair
7 J Good;Very Good;Ideal;Premium;Fair
@online{laderas2018,
author = {Laderas, Ted},
title = {Group\_by/Summarize: {Not} Just for Numeric Values!},
date = {2018-05-24},
url = {https://laderast.github.io/articles/2018-05-24-group-by-summarize-not-just-for-numeric-values/2018-05-24-group-by-summarize-not-just-for-numeric-values.html},
langid = {en}
}