Ted Laderas, PhD - Tidy Tuesday: Seafood Production and Consumption

Understanding Seafood Production

For Tidy Tuesday, this week I decided to tackle a relatively easy question this week by understanding seafood production over the years. Since I am a big octopus fan. Which countries were responsible for the top production of cephalopods over the years?

Loading the Data

farmed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/aquaculture-farmed-fish-production.csv')

Rows: 11657 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (2): Year, Aquaculture production (metric tons)

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

consumption <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/fish-and-seafood-consumption-per-capita.csv')

Rows: 11028 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (2): Year, Fish, Seafood- Food supply quantity (kg/capita/yr) (FAO, 2020)

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

production <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv')

Rows: 10326 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (8): Year, Commodity Balances - Livestock and Fish Primary Equivalent - ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

production <- janitor::clean_names(production)
skimr::skim(production)

Data summary
Name	production
Number of rows	10326
Number of columns	10
_______________________
Column type frequency:
character	2
numeric	8
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
entity	0	1.00	4	39	0	215	0
code	1734	0.83	3	8	0	181	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	0	1.00	1987.70	15.35	1961	1974.0	1988.0	2001.00	2013	▇▆▇▇▇
commodity_balances_livestock_and_fish_primary_equivalent_pelagic_fish_2763_production_5510_tonnes	1586	0.85	877685.06	3244678.56	0	1300.0	35667.0	330158.00	43756110	▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_crustaceans_2765_production_5510_tonnes	1146	0.89	128038.08	680253.04	0	97.0	3076.0	29500.00	12607540	▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_cephalopods_2766_production_5510_tonnes	2836	0.73	67800.81	291850.11	0	0.0	546.0	12795.00	4285298	▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_demersal_fish_2762_production_5510_tonnes	1822	0.82	498916.69	1831355.82	0	799.5	19591.5	191905.50	22261372	▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_freshwater_fish_2761_production_5510_tonnes	527	0.95	465803.05	2569244.83	0	681.5	10600.0	78768.50	52335573	▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_molluscs_other_2767_production_5510_tonnes	2860	0.72	238475.03	1343302.70	0	5.0	1394.0	37754.25	17952945	▇▁▁▁▁
commodity_balances_livestock_and_fish_primary_equivalent_marine_fish_other_2764_production_5510_tonnes	1670	0.84	214545.61	935412.65	0	939.0	5700.0	45368.50	10865669	▇▁▁▁▁

colnames(production) <- stringr::str_replace(colnames(production), "commodity_balances_livestock_and_fish_primary_equivalent_",replacement = "")

colnames(production) <- stringr::str_replace(colnames(production), "_production_5510_tonnes", replacement="")

skimr::skim(production)

Data summary
Name	production
Number of rows	10326
Number of columns	10
_______________________
Column type frequency:
character	2
numeric	8
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
entity	0	1.00	4	39	0	215	0
code	1734	0.83	3	8	0	181	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	0	1.00	1987.70	15.35	1961	1974.0	1988.0	2001.00	2013	▇▆▇▇▇
pelagic_fish_2763	1586	0.85	877685.06	3244678.56	0	1300.0	35667.0	330158.00	43756110	▇▁▁▁▁
crustaceans_2765	1146	0.89	128038.08	680253.04	0	97.0	3076.0	29500.00	12607540	▇▁▁▁▁
cephalopods_2766	2836	0.73	67800.81	291850.11	0	0.0	546.0	12795.00	4285298	▇▁▁▁▁
demersal_fish_2762	1822	0.82	498916.69	1831355.82	0	799.5	19591.5	191905.50	22261372	▇▁▁▁▁
freshwater_fish_2761	527	0.95	465803.05	2569244.83	0	681.5	10600.0	78768.50	52335573	▇▁▁▁▁
molluscs_other_2767	2860	0.72	238475.03	1343302.70	0	5.0	1394.0	37754.25	17952945	▇▁▁▁▁
marine_fish_other_2764	1670	0.84	214545.61	935412.65	0	939.0	5700.0	45368.50	10865669	▇▁▁▁▁

I’ll pivot the production data frame to a longer one using pivot_longer().

production_long <- production %>% tidyr::pivot_longer(cols=contains("_"), names_to = "seafood_type",values_to = "production")

head(production_long)

# A tibble: 6 × 5
  entity      code   year seafood_type         production
  <chr>       <chr> <dbl> <chr>                     <dbl>
1 Afghanistan AFG    1961 pelagic_fish_2763            NA
2 Afghanistan AFG    1961 crustaceans_2765             NA
3 Afghanistan AFG    1961 cephalopods_2766             NA
4 Afghanistan AFG    1961 demersal_fish_2762           NA
5 Afghanistan AFG    1961 freshwater_fish_2761        300
6 Afghanistan AFG    1961 molluscs_other_2767          NA

Now we have the long form data frame, we can now ask some interesting time questions and compare across categories. As you can see below, total seafood production has risen steadily over the years.

production_long %>%
  group_by(year, seafood_type) %>%
  summarize(production = mean(production, na.rm=TRUE)) %>%
  ggplot() + 
  aes(x=year, y=production, fill=seafood_type) +
  geom_area() + 
  viridis::scale_fill_viridis(discrete=TRUE) +
  hrbrthemes::theme_ipsum() + 
  ggtitle("Production of Seafood has Risen Steadily over the Years")

`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.

Drilling into the cephalpods, I’m interested in percent production of the total for the top 10 producing countries.

Interesting that Japan’s share of production has decreased steadily, and that China is a leading producer lately.

production_long <- production %>% 
  tidyr::pivot_longer(cols=contains("_"), 
  names_to = "seafood_type",values_to = "production")

top_squid_eaters <- production_long %>%
  filter(seafood_type == "cephalopods_2766") %>%
  filter(code != "OWID_WRL") %>%
  group_by(code) %>%
  summarize(total_eating = sum(production)) %>%
  arrange(desc(total_eating)) %>%
  slice(1:10) %>%
  pull(code)

total_production <- production_long %>%
  filter(seafood_type == "cephalopods_2766") %>%
  filter(code != "OWID_WRL") %>%
  filter(code %in% top_squid_eaters) %>%
  group_by(year) %>%
  summarize(total_eating = sum(production, na.rm=TRUE)) 

total_ceph <- production_long %>%
  filter(seafood_type == "cephalopods_2766") %>%
  filter(code %in% top_squid_eaters) %>%
  group_by(year, code) %>%
  summarize(production = mean(production, na.rm=TRUE), entity) %>%
  left_join(y=total_production, by="year") %>%
  mutate(percent = production/total_eating * 100) %>%
  ggplot() + 
  aes(x=year, y=percent, fill=entity) +
  geom_area() + 
  viridis::scale_fill_viridis(discrete=TRUE, option="plasma") +
  hrbrthemes::theme_ipsum()

`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.

total_ceph

total_ceph + annotate(geom="text", x= 1969, y=60, label = "Japan", colour="lightgrey", size=4) +
  annotate(geom="text", x=2005, y=80, label="China", colour="lightgrey", size=4) + 
  annotate(geom="text", x=1991, y=44, label="South Korea", color="lightgrey", size=4) +
    annotate(geom="text", x=2008, y=40, label="Peru", color="lightgrey", size=4) +
  labs(title="Top 10 cephalopod producers", subtitle = "Japan, South Korea, Peru, and China compete for top market share") + scale_x_continuous(breaks = c(1960, 1970, 1980, 1990, 2000, 2010))

ggsave("top_mollusk_production.jpg")

Saving 7 x 5 in image

Citation

BibTeX citation:

@online{laderas2021,
  author = {Laderas, Ted and Laderas, Ted},
  title = {Tidy {Tuesday:} {Seafood} {Production} and {Consumption}},
  date = {2021-10-12},
  url = {https://laderast.github.io//articles/2021-10-12-seafood/2021-10-12-seafood.html},
  langid = {en}
}

For attribution, please cite this work as:

Laderas, Ted, and Ted Laderas. 2021. “Tidy Tuesday: Seafood Production and Consumption.” October 12, 2021. https://laderast.github.io//articles/2021-10-12-seafood/2021-10-12-seafood.html.