Tidy Tuesday: Seafood Production and Consumption

Understanding global cephalopod production.
tidytuesday
Author

Ted Laderas

Published

October 12, 2021

Understanding Seafood Production

For Tidy Tuesday, this week I decided to tackle a relatively easy question this week by understanding seafood production over the years. Since I am a big octopus fan. Which countries were responsible for the top production of cephalopods over the years?

Loading the Data

farmed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/aquaculture-farmed-fish-production.csv')
Rows: 11657 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (2): Year, Aquaculture production (metric tons)

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
consumption <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/fish-and-seafood-consumption-per-capita.csv')
Rows: 11028 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (2): Year, Fish, Seafood- Food supply quantity (kg/capita/yr) (FAO, 2020)

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
production <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv')
Rows: 10326 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (8): Year, Commodity Balances - Livestock and Fish Primary Equivalent - ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
production <- janitor::clean_names(production)
skimr::skim(production)
Data summary
Name production
Number of rows 10326
Number of columns 10
_______________________
Column type frequency:
character 2
numeric 8
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
entity 0 1.00 4 39 0 215 0
code 1734 0.83 3 8 0 181 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 1987.70 15.35 1961 1974.0 1988.0 2001.00 2013 ▇▆▇▇▇
commodity_balances_livestock_and_fish_primary_equivalent_pelagic_fish_2763_production_5510_tonnes 1586 0.85 877685.06 3244678.56 0 1300.0 35667.0 330158.00 43756110 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_crustaceans_2765_production_5510_tonnes 1146 0.89 128038.08 680253.04 0 97.0 3076.0 29500.00 12607540 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_cephalopods_2766_production_5510_tonnes 2836 0.73 67800.81 291850.11 0 0.0 546.0 12795.00 4285298 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_demersal_fish_2762_production_5510_tonnes 1822 0.82 498916.69 1831355.82 0 799.5 19591.5 191905.50 22261372 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_freshwater_fish_2761_production_5510_tonnes 527 0.95 465803.05 2569244.83 0 681.5 10600.0 78768.50 52335573 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_molluscs_other_2767_production_5510_tonnes 2860 0.72 238475.03 1343302.70 0 5.0 1394.0 37754.25 17952945 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_marine_fish_other_2764_production_5510_tonnes 1670 0.84 214545.61 935412.65 0 939.0 5700.0 45368.50 10865669 ▇▁▁▁▁
colnames(production) <- stringr::str_replace(colnames(production), "commodity_balances_livestock_and_fish_primary_equivalent_",replacement = "")

colnames(production) <- stringr::str_replace(colnames(production), "_production_5510_tonnes", replacement="")

skimr::skim(production)
Data summary
Name production
Number of rows 10326
Number of columns 10
_______________________
Column type frequency:
character 2
numeric 8
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
entity 0 1.00 4 39 0 215 0
code 1734 0.83 3 8 0 181 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 1987.70 15.35 1961 1974.0 1988.0 2001.00 2013 ▇▆▇▇▇
pelagic_fish_2763 1586 0.85 877685.06 3244678.56 0 1300.0 35667.0 330158.00 43756110 ▇▁▁▁▁
crustaceans_2765 1146 0.89 128038.08 680253.04 0 97.0 3076.0 29500.00 12607540 ▇▁▁▁▁
cephalopods_2766 2836 0.73 67800.81 291850.11 0 0.0 546.0 12795.00 4285298 ▇▁▁▁▁
demersal_fish_2762 1822 0.82 498916.69 1831355.82 0 799.5 19591.5 191905.50 22261372 ▇▁▁▁▁
freshwater_fish_2761 527 0.95 465803.05 2569244.83 0 681.5 10600.0 78768.50 52335573 ▇▁▁▁▁
molluscs_other_2767 2860 0.72 238475.03 1343302.70 0 5.0 1394.0 37754.25 17952945 ▇▁▁▁▁
marine_fish_other_2764 1670 0.84 214545.61 935412.65 0 939.0 5700.0 45368.50 10865669 ▇▁▁▁▁

I’ll pivot the production data frame to a longer one using pivot_longer().

production_long <- production %>% tidyr::pivot_longer(cols=contains("_"), names_to = "seafood_type",values_to = "production")

head(production_long)
# A tibble: 6 × 5
  entity      code   year seafood_type         production
  <chr>       <chr> <dbl> <chr>                     <dbl>
1 Afghanistan AFG    1961 pelagic_fish_2763            NA
2 Afghanistan AFG    1961 crustaceans_2765             NA
3 Afghanistan AFG    1961 cephalopods_2766             NA
4 Afghanistan AFG    1961 demersal_fish_2762           NA
5 Afghanistan AFG    1961 freshwater_fish_2761        300
6 Afghanistan AFG    1961 molluscs_other_2767          NA

Now we have the long form data frame, we can now ask some interesting time questions and compare across categories. As you can see below, total seafood production has risen steadily over the years.

production_long %>%
  group_by(year, seafood_type) %>%
  summarize(production = mean(production, na.rm=TRUE)) %>%
  ggplot() + 
  aes(x=year, y=production, fill=seafood_type) +
  geom_area() + 
  viridis::scale_fill_viridis(discrete=TRUE) +
  hrbrthemes::theme_ipsum() + 
  ggtitle("Production of Seafood has Risen Steadily over the Years")
`summarise()` has grouped output by 'year'. You can override using the `.groups`
argument.

Drilling into the cephalpods, I’m interested in percent production of the total for the top 10 producing countries.

Interesting that Japan’s share of production has decreased steadily, and that China is a leading producer lately.

production_long <- production %>% 
  tidyr::pivot_longer(cols=contains("_"), 
  names_to = "seafood_type",values_to = "production")

top_squid_eaters <- production_long %>%
  filter(seafood_type == "cephalopods_2766") %>%
  filter(code != "OWID_WRL") %>%
  group_by(code) %>%
  summarize(total_eating = sum(production)) %>%
  arrange(desc(total_eating)) %>%
  slice(1:10) %>%
  pull(code)

total_production <- production_long %>%
  filter(seafood_type == "cephalopods_2766") %>%
  filter(code != "OWID_WRL") %>%
  filter(code %in% top_squid_eaters) %>%
  group_by(year) %>%
  summarize(total_eating = sum(production, na.rm=TRUE)) 

total_ceph <- production_long %>%
  filter(seafood_type == "cephalopods_2766") %>%
  filter(code %in% top_squid_eaters) %>%
  group_by(year, code) %>%
  summarize(production = mean(production, na.rm=TRUE), entity) %>%
  left_join(y=total_production, by="year") %>%
  mutate(percent = production/total_eating * 100) %>%
  ggplot() + 
  aes(x=year, y=percent, fill=entity) +
  geom_area() + 
  viridis::scale_fill_viridis(discrete=TRUE, option="plasma") +
  hrbrthemes::theme_ipsum() 
`summarise()` has grouped output by 'year'. You can override using the `.groups`
argument.
total_ceph

total_ceph + annotate(geom="text", x= 1969, y=60, label = "Japan", colour="lightgrey", size=4) +
  annotate(geom="text", x=2005, y=80, label="China", colour="lightgrey", size=4) + 
  annotate(geom="text", x=1991, y=44, label="South Korea", color="lightgrey", size=4) +
    annotate(geom="text", x=2008, y=40, label="Peru", color="lightgrey", size=4) +
  labs(title="Top 10 cephalopod producers", subtitle = "Japan, South Korea, Peru, and China compete for top market share") + scale_x_continuous(breaks = c(1960, 1970, 1980, 1990, 2000, 2010))

ggsave("top_mollusk_production.jpg")
Saving 7 x 5 in image

Citation

BibTeX citation:
@online{laderas2021,
  author = {Ted Laderas and Ted Laderas},
  title = {Tidy {Tuesday:} {Seafood} {Production} and {Consumption}},
  date = {2021-10-12},
  url = {https://laderast.github.io//articles/2021-10-12-seafood/2021-10-12-seafood.html},
  langid = {en}
}
For attribution, please cite this work as:
Ted Laderas, and Ted Laderas. 2021. “Tidy Tuesday: Seafood Production and Consumption.” October 12, 2021. https://laderast.github.io//articles/2021-10-12-seafood/2021-10-12-seafood.html.