Tidy Tuesday: Seafood Production and Consumption

tidytuesday

Understanding global cephalopod production.

Ted Laderas
10/12/2021

Understanding Seafood Production

For Tidy Tuesday, this week I decided to tackle a relatively easy question this week by understanding seafood production over the years. Since I am a big octopus fan. Which countries were responsible for the top production of cephalopods over the years?

Loading the Data

farmed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/aquaculture-farmed-fish-production.csv')
consumption <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/fish-and-seafood-consumption-per-capita.csv')
production <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv')
production <- janitor::clean_names(production)
skimr::skim(production)
Table 1: Data summary
Name production
Number of rows 10326
Number of columns 10
_______________________
Column type frequency:
character 2
numeric 8
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
entity 0 1.00 4 39 0 215 0
code 1734 0.83 3 8 0 181 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 1987.70 15.35 1961 1974.0 1988.0 2001.00 2013 ▇▆▇▇▇
commodity_balances_livestock_and_fish_primary_equivalent_pelagic_fish_2763_production_5510_tonnes 1586 0.85 877685.06 3244678.56 0 1300.0 35667.0 330158.00 43756110 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_crustaceans_2765_production_5510_tonnes 1146 0.89 128038.08 680253.04 0 97.0 3076.0 29500.00 12607540 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_cephalopods_2766_production_5510_tonnes 2836 0.73 67800.81 291850.11 0 0.0 546.0 12795.00 4285298 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_demersal_fish_2762_production_5510_tonnes 1822 0.82 498916.69 1831355.82 0 799.5 19591.5 191905.50 22261372 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_freshwater_fish_2761_production_5510_tonnes 527 0.95 465803.05 2569244.83 0 681.5 10600.0 78768.50 52335573 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_molluscs_other_2767_production_5510_tonnes 2860 0.72 238475.03 1343302.70 0 5.0 1394.0 37754.25 17952945 ▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_marine_fish_other_2764_production_5510_tonnes 1670 0.84 214545.61 935412.65 0 939.0 5700.0 45368.50 10865669 ▇▁▁▁▁
colnames(production) <- stringr::str_replace(colnames(production), "commodity_balances_livestock_and_fish_primary_equivalent_",replacement = "")

colnames(production) <- stringr::str_replace(colnames(production), "_production_5510_tonnes", replacement="")

skimr::skim(production)
Table 1: Data summary
Name production
Number of rows 10326
Number of columns 10
_______________________
Column type frequency:
character 2
numeric 8
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
entity 0 1.00 4 39 0 215 0
code 1734 0.83 3 8 0 181 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 1987.70 15.35 1961 1974.0 1988.0 2001.00 2013 ▇▆▇▇▇
pelagic_fish_2763 1586 0.85 877685.06 3244678.56 0 1300.0 35667.0 330158.00 43756110 ▇▁▁▁▁
crustaceans_2765 1146 0.89 128038.08 680253.04 0 97.0 3076.0 29500.00 12607540 ▇▁▁▁▁
cephalopods_2766 2836 0.73 67800.81 291850.11 0 0.0 546.0 12795.00 4285298 ▇▁▁▁▁
demersal_fish_2762 1822 0.82 498916.69 1831355.82 0 799.5 19591.5 191905.50 22261372 ▇▁▁▁▁
freshwater_fish_2761 527 0.95 465803.05 2569244.83 0 681.5 10600.0 78768.50 52335573 ▇▁▁▁▁
molluscs_other_2767 2860 0.72 238475.03 1343302.70 0 5.0 1394.0 37754.25 17952945 ▇▁▁▁▁
marine_fish_other_2764 1670 0.84 214545.61 935412.65 0 939.0 5700.0 45368.50 10865669 ▇▁▁▁▁

I’ll pivot the production data frame to a longer one using pivot_longer().

production_long <- production %>% tidyr::pivot_longer(cols=contains("_"), names_to = "seafood_type",values_to = "production")

head(production_long)
# A tibble: 6 × 5
  entity      code   year seafood_type         production
  <chr>       <chr> <dbl> <chr>                     <dbl>
1 Afghanistan AFG    1961 pelagic_fish_2763            NA
2 Afghanistan AFG    1961 crustaceans_2765             NA
3 Afghanistan AFG    1961 cephalopods_2766             NA
4 Afghanistan AFG    1961 demersal_fish_2762           NA
5 Afghanistan AFG    1961 freshwater_fish_2761        300
6 Afghanistan AFG    1961 molluscs_other_2767          NA

Now we have the long form data frame, we can now ask some interesting time questions and compare across categories. As you can see below, total seafood production has risen steadily over the years.

production_long %>%
  group_by(year, seafood_type) %>%
  summarize(production = mean(production, na.rm=TRUE)) %>%
  ggplot() + 
  aes(x=year, y=production, fill=seafood_type) +
  geom_area() + 
  viridis::scale_fill_viridis(discrete=TRUE) +
  hrbrthemes::theme_ipsum() + 
  ggtitle("Production of Seafood has Risen Steadily over the Years")

Drilling into the cephalpods, I’m interested in percent production of the total for the top 10 producing countries.

Interesting that Japan’s share of production has decreased steadily, and that China is a leading producer lately.

production_long <- production %>% 
  tidyr::pivot_longer(cols=contains("_"), 
  names_to = "seafood_type",values_to = "production")

top_squid_eaters <- production_long %>%
  filter(seafood_type == "cephalopods_2766") %>%
  filter(code != "OWID_WRL") %>%
  group_by(code) %>%
  summarize(total_eating = sum(production)) %>%
  arrange(desc(total_eating)) %>%
  slice(1:10) %>%
  pull(code)

total_production <- production_long %>%
  filter(seafood_type == "cephalopods_2766") %>%
  filter(code != "OWID_WRL") %>%
  filter(code %in% top_squid_eaters) %>%
  group_by(year) %>%
  summarize(total_eating = sum(production, na.rm=TRUE)) 

total_ceph <- production_long %>%
  filter(seafood_type == "cephalopods_2766") %>%
  filter(code %in% top_squid_eaters) %>%
  group_by(year, code) %>%
  summarize(production = mean(production, na.rm=TRUE), entity) %>%
  left_join(y=total_production, by="year") %>%
  mutate(percent = production/total_eating * 100) %>%
  ggplot() + 
  aes(x=year, y=percent, fill=entity) +
  geom_area() + 
  viridis::scale_fill_viridis(discrete=TRUE, option="plasma") +
  hrbrthemes::theme_ipsum() 

total_ceph

total_ceph + annotate(geom="text", x= 1969, y=60, label = "Japan", colour="lightgrey", size=4) +
  annotate(geom="text", x=2005, y=80, label="China", colour="lightgrey", size=4) + 
  annotate(geom="text", x=1991, y=44, label="South Korea", color="lightgrey", size=4) +
    annotate(geom="text", x=2008, y=40, label="Peru", color="lightgrey", size=4) +
  labs(title="Top 10 cephalopod producers", subtitle = "Japan, South Korea, Peru, and China compete for top market share") + scale_x_continuous(breaks = c(1960, 1970, 1980, 1990, 2000, 2010))