Tidy Tuesday: Crop Production

tidytuesday

Understanding crop production across the world.

Ted Laderas
9/3/2020

Look at the available datasets

library(tidytuesdayR)
#This will open up in the help window
tidytuesdayR::tt_available()

What was your dataset?

Load your dataset in with the function below. The input is the date the dataset was issued. You should be able to get this from the tt_available() function.

#incoming data comes in as a list
datasets <- tidytuesdayR::tt_load("2020-09-01")

    Downloading file 1 of 5: `arable_land_pin.csv`
    Downloading file 2 of 5: `cereal_crop_yield_vs_fertilizer_application.csv`
    Downloading file 3 of 5: `cereal_yields_vs_tractor_inputs_in_agriculture.csv`
    Downloading file 4 of 5: `key_crop_yields.csv`
    Downloading file 5 of 5: `land_use_vs_yield_change_in_cereal_production.csv`
#show the names of the individual datasets
names(datasets)
[1] "arable_land_pin"                               
[2] "cereal_crop_yield_vs_fertilizer_application"   
[3] "cereal_yields_vs_tractor_inputs_in_agriculture"
[4] "key_crop_yields"                               
[5] "land_use_vs_yield_change_in_cereal_production" 

Key Crop Yields

key_crop_yields <- datasets$key_crop_yields

Visdat

visdat::vis_dat(key_crop_yields)

Skimr

skimr::skim(key_crop_yields)
Table 1: Data summary
Name key_crop_yields
Number of rows 13075
Number of columns 14
_______________________
Column type frequency:
character 2
numeric 12
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Entity 0 1.00 4 39 0 249 0
Code 1919 0.85 3 8 0 214 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Year 0 1.00 1990.37 16.73 1961.00 1976.00 1991.00 2005.00 2018.00 ▇▆▇▇▇
Wheat (tonnes per hectare) 4974 0.62 2.43 1.69 0.00 1.23 1.99 3.12 10.67 ▇▅▂▁▁
Rice (tonnes per hectare) 4604 0.65 3.16 1.85 0.20 1.77 2.74 4.16 10.68 ▇▇▃▁▁
Maize (tonnes per hectare) 2301 0.82 3.02 3.13 0.03 1.14 1.83 3.92 36.76 ▇▁▁▁▁
Soybeans (tonnes per hectare) 7114 0.46 1.45 0.75 0.00 0.86 1.33 1.90 5.95 ▇▇▂▁▁
Potatoes (tonnes per hectare) 3059 0.77 15.40 9.29 0.84 8.64 13.41 20.05 75.30 ▇▅▁▁▁
Beans (tonnes per hectare) 5066 0.61 1.09 0.82 0.03 0.59 0.83 1.35 9.18 ▇▁▁▁▁
Peas (tonnes per hectare) 6840 0.48 1.48 1.01 0.04 0.72 1.15 1.99 7.16 ▇▃▁▁▁
Cassava (tonnes per hectare) 5887 0.55 9.34 5.11 1.00 5.55 8.67 11.99 38.58 ▇▇▁▁▁
Barley (tonnes per hectare) 6342 0.51 2.23 1.50 0.09 1.05 1.88 3.02 9.15 ▇▆▂▁▁
Cocoa beans (tonnes per hectare) 8466 0.35 0.39 0.28 0.00 0.24 0.36 0.49 3.43 ▇▁▁▁▁
Bananas (tonnes per hectare) 4166 0.68 15.20 12.08 0.66 5.94 11.78 20.79 77.59 ▇▃▁▁▁

What was your question?

Given your inital exploration of the data, what was the question you wanted to answer?

How have key crop yields changed over time?

What were your findings?

Put your findings and your visualization code here.

key_crop_yields %>%
  tidyr::pivot_longer(cols = contains("(tonnes"), names_to="crop",
                      values_to="Yield") %>%
  ggplot() + aes(x=Year, y= Yield, group=crop, color=crop) + geom_line() + facet_wrap(~Entity)