Registered Nurses in the United States and Territories

Understanding wages for Registered Nurses.
tidytuesday
Author

Ted Laderas

Published

October 5, 2021

Research Question(s)

  1. Which states have the highest overall wages for registered nurses? When did this happen?
  2. Have wages increased overall for registered nurses across all states?

Loading Data

We’ll use the Tidy Tuesday code to directly load the data from the GitHub repository. We’ll also pass it into janitor::clean_names() to standardize the column names. (Life is too short to have to worry about whitespace and capitalization.)

nurses <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-05/nurses.csv') %>% janitor::clean_names()
Rows: 1242 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): State
dbl (21): Year, Total Employed RN, Employed Standard Error (%), Hourly Wage ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Initial EDA

We can see there are 22 columns overall. 21 of these are numeric.

skimr::skim(nurses)
Data summary
Name nurses
Number of rows 1242
Number of columns 22
_______________________
Column type frequency:
character 1
numeric 21
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
state 0 1 4 20 0 54 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 2009.00 6.64 1998.00 2.00300e+03 2009.00 2015.00 2020.00 ▇▆▇▆▇
total_employed_rn 5 1.00 47703.88 50241.05 240.00 1.22100e+04 31160.00 60230.00 307060.00 ▇▂▁▁▁
employed_standard_error_percent 5 1.00 4.36 3.04 0.70 2.50000e+00 3.50 5.10 26.10 ▇▂▁▁▁
hourly_wage_avg 6 1.00 28.48 6.65 9.23 2.37000e+01 28.25 32.39 57.96 ▁▇▆▁▁
hourly_wage_median 6 1.00 27.86 6.72 8.64 2.30800e+01 27.58 31.72 56.93 ▁▇▇▁▁
annual_salary_avg 6 1.00 59248.30 13829.14 19190.00 4.93000e+04 58750.00 67377.50 120560.00 ▁▇▆▁▁
annual_salary_median 6 1.00 57957.92 13978.95 17970.00 4.79950e+04 57375.00 65987.50 118410.00 ▁▇▇▁▁
wage_salary_standard_error_percent 6 1.00 1.27 0.70 0.40 9.00000e-01 1.10 1.42 7.50 ▇▁▁▁▁
hourly_10th_percentile 6 1.00 20.23 4.66 6.38 1.68100e+01 20.04 23.54 36.62 ▁▆▇▃▁
hourly_25th_percentile 6 1.00 23.54 5.51 7.33 1.94700e+01 23.24 27.01 45.18 ▁▇▇▂▁
hourly_75th_percentile 6 1.00 32.92 8.07 10.04 2.72100e+01 32.61 37.33 71.07 ▁▇▅▁▁
hourly_90th_percentile 6 1.00 38.16 9.23 12.33 3.25100e+01 37.50 43.41 83.35 ▁▇▅▁▁
annual_10th_percentile 6 1.00 42087.70 9694.20 13260.00 3.49575e+04 41670.00 48955.00 76180.00 ▁▆▇▃▁
annual_25th_percentile 6 1.00 48968.81 11469.49 15260.00 4.04875e+04 48335.00 56195.00 93970.00 ▁▇▇▂▁
annual_75th_percentile 6 1.00 68464.53 16777.63 20890.00 5.65975e+04 67835.00 77637.50 147830.00 ▁▇▅▁▁
annual_90th_percentile 6 1.00 79367.01 19201.21 25650.00 6.76200e+04 78015.00 90290.00 173370.00 ▁▇▅▁▁
location_quotient 649 0.48 1.01 0.19 0.32 9.00000e-01 1.01 1.13 1.50 ▁▁▇▇▁
total_employed_national_aggregate 4 1.00 134075563.81 6133532.52 124143490.00 1.29059e+08 131713800.00 138885360.00 147838700.00 ▅▇▅▃▃
total_employed_healthcare_national_aggregate 4 1.00 7268640.12 943177.74 5854360.00 6.22654e+06 7250140.00 8076300.00 8727310.00 ▇▃▅▅▆
total_employed_healthcare_state_aggregate 2 1.00 134743.23 143540.40 110.00 3.34475e+04 87435.00 175292.50 844930.00 ▇▂▁▁▁
yearly_total_employed_state_aggregate 0 1.00 2387208.60 2774288.09 110.00 5.96520e+05 1557110.00 2888682.50 17382400.00 ▇▂▁▁▁
head(nurses)
# A tibble: 6 × 22
  state       year total_employed_rn employed_standard_error_p…¹ hourly_wage_avg
  <chr>      <dbl>             <dbl>                       <dbl>           <dbl>
1 Alabama     2020             48850                         2.9            29.0
2 Alaska      2020              6240                        13              45.8
3 Arizona     2020             55520                         3.7            38.6
4 Arkansas    2020             25300                         4.2            30.6
5 California  2020            307060                         2              58.0
6 Colorado    2020             52330                         2.8            37.4
# ℹ abbreviated name: ¹​employed_standard_error_percent
# ℹ 17 more variables: hourly_wage_median <dbl>, annual_salary_avg <dbl>,
#   annual_salary_median <dbl>, wage_salary_standard_error_percent <dbl>,
#   hourly_10th_percentile <dbl>, hourly_25th_percentile <dbl>,
#   hourly_75th_percentile <dbl>, hourly_90th_percentile <dbl>,
#   annual_10th_percentile <dbl>, annual_25th_percentile <dbl>,
#   annual_75th_percentile <dbl>, annual_90th_percentile <dbl>, …

Looking at how years are divided.

nurses %>%
  count(year)
# A tibble: 23 × 2
    year     n
   <dbl> <int>
 1  1998    54
 2  1999    54
 3  2000    54
 4  2001    54
 5  2002    54
 6  2003    54
 7  2004    54
 8  2005    54
 9  2006    54
10  2007    54
# ℹ 13 more rows

Hmmm. 54 entries per year. This includes: D.C., Virgin Islands, Puerto Rico, and Guam in addition to the 50 states.

nurses %>%
  count(state)
# A tibble: 54 × 2
   state                    n
   <chr>                <int>
 1 Alabama                 23
 2 Alaska                  23
 3 Arizona                 23
 4 Arkansas                23
 5 California              23
 6 Colorado                23
 7 Connecticut             23
 8 Delaware                23
 9 District of Columbia    23
10 Florida                 23
# ℹ 44 more rows

The mean total number of nurses overall states shows an upward trend, except for a blip in 2012-2013.

nurses %>%
  group_by(year) %>%
  summarize(mean_employed_rn = mean(total_employed_rn, na.rm=TRUE)) %>%
  ggplot() +
  aes(x=year, y=mean_employed_rn) %>%
  geom_line()

Let’s visualize whether hourly wages are increasing or decreasing across the dataset by making a heatmap. On the x-axis, we will visualize year, and we will visualize by state on our y-axis. We’re going to map the fill value to hourly_wage_median:

nurses %>%
  mutate(state=forcats::fct_rev(state)) %>%
  ggplot() +
  aes(x=year, y=state, fill=hourly_wage_median) +
  geom_tile()

Scaling the data by state

Looking for trends in the nurses data, let’s try and scale each income so we can emphasize whether there were increases or decreases within each state. We’re just looking for trends here and whether the slope of these trends is the same for each state.

Note that by scaling within a state (transforming each value to a z-score), we are losing information, but we can see whether wages are steadily increasing for each of the states/territories.

In general, with some exceptions (Guam and Virgin Islands), most registered nurses saw an increase in median hourly wages from 1998 to 2020.

nurses %>%
  mutate(state=forcats::fct_rev(state)) %>%
  group_by(state) %>%
  mutate(scaled_income = scale(hourly_wage_median)) %>%
  ggplot() +
  aes(x=year, y=state, fill=scaled_income) +
  geom_tile(color="grey10") +
  scale_fill_distiller() +
  #theme_ipsum_ps()
  bplots::theme_avenir()

Since we looked at median hourly income, the question is whether these trends are the same or different for the 10th and 90th percentiles of registered nurses.

10th Percentile

nurses %>%
  mutate(state=forcats::fct_rev(state)) %>%
  group_by(state) %>%
  mutate(scaled_income = scale(hourly_10th_percentile)) %>%
  ggplot() +
  aes(x=year, y=state, fill=scaled_income) +
  geom_tile(color="grey10") +
  scale_fill_distiller() +
  bplots::theme_avenir() +
  theme(axis.text.x=element_text(angle=90))

90th Percentile

For the most part, if you are in the 90th percentile of hourly wages, you have seen a leveling off of income after about 2008. After 2008, the 90th income seems pretty static and unchanging.

nurses %>%
  mutate(state=forcats::fct_rev(state)) %>%
  group_by(state) %>%
  mutate(scaled_income = scale(hourly_90th_percentile)) %>%
  ggplot() +
  aes(x=year, y=state, fill=scaled_income) +
  geom_tile(color="grey10") +
  scale_fill_distiller() +
  bplots::theme_avenir() +
  ggtitle("90 percentile RNs have slower increases in income than the 10%")

Making heatmaps with dendrograms

Pivoting the data to be wider

One question we might ask are whether there are groupings by states in terms of the wage increases.

We can do this by pivoting the data and using the {heatmaply} package to make a matrix input suitable for heatmaply::heatmaply().

Here, we take hourly_wage_median and use it in the values of our matrix. Our rows correspond to state and our columns correspond to year.

nurse_median_frame <- nurses %>%
  select(state, year, hourly_wage_median) %>%
  arrange(year) %>%
  tidyr::pivot_wider(names_from = year, values_from = hourly_wage_median) 

nurse_median_matrix <- nurse_median_frame[,-1]
rownames(nurse_median_matrix) <- nurse_median_frame[["state"]]
Warning: Setting row names on a tibble is deprecated.
nurse_median_matrix <- as.matrix(nurse_median_matrix)

head(nurse_median_matrix)
            1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008
Alabama    17.63 18.09 19.60 19.99 20.60 20.81 21.23 22.43 23.52 24.92 25.80
Alaska     22.37 23.02 24.90 26.13 26.45 26.47 28.69 28.54 30.41 33.48 34.42
Arizona    19.37 20.26 21.97 22.23 23.35 23.88 25.12 26.90 28.06 29.17 30.59
Arkansas   16.66 17.18 18.02 18.44 19.20 19.98 21.17 22.63 23.62 24.17 24.78
California 23.95 25.12 26.50 27.36 28.38 29.47 31.61 33.15 35.23 36.77 38.93
Colorado   19.79 20.47 21.77 22.56 23.17 23.88 25.60 26.91 28.15 29.69 30.76
            2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019
Alabama    26.48 26.44 26.41 26.02 26.20 26.39 26.70 26.68 27.20 27.85 28.27
Alaska     35.33 37.39 38.67 38.73 40.08 41.12 42.37 41.01 41.45 42.14 43.54
Arizona    31.78 33.11 34.42 34.24 34.14 34.00 34.38 34.94 35.70 36.43 36.93
Arkansas   25.10 25.28 25.90 26.16 26.56 26.72 26.76 27.26 27.68 28.68 29.01
California 39.86 41.03 42.51 43.88 45.34 46.38 48.27 48.30 48.43 50.20 53.18
Colorado   31.74 31.81 32.35 32.22 32.73 32.83 32.95 33.05 34.27 35.03 36.10
            2020
Alabama    28.19
Alaska     45.23
Arizona    37.98
Arkansas   29.97
California 56.93
Colorado   36.78

Heatmap with No scaling

We can now ask questions about the actual income values. We make heatmaply only look at computing a dendrogram for the rows (states) to look for clustering patterns.

Note we have to set our scale argument to none here.

heatmaply(nurse_median_matrix, dendrogram = "row", 
          Colv = c(1:23), scale="none",
          main = "Oregon, California, and Hawaii have the highest median wage from 2017-2020")

Scaling by state

If we are interested in relative (scaled) values, the dendrogram is a little less interesting. Overall you can see that all states showed an increase in hourly median wage over the years.

heatmaply(nurse_median_matrix, dendrogram = "row", 
          Colv = c(1:23), scale="row", 
          main="Upward trends overall in terms of hourly median wage")

Conclusions

This was a nice dataset to get back into Tidy Tuesday.

  • Median wages have increased across all states for Registered Nurses.
  • Hawaii, Oregon, and California have the highest overall wages for Registered Nurses

Citation

BibTeX citation:
@online{laderas2021,
  author = {Laderas, Ted and Laderas, Ted},
  title = {Registered {Nurses} in the {United} {States} and
    {Territories}},
  date = {2021-10-05},
  url = {https://laderast.github.io//articles/2021-10-05-registered-nurses/2021-10-05-registered-nurses.html},
  langid = {en}
}
For attribution, please cite this work as:
Laderas, Ted, and Ted Laderas. 2021. “Registered Nurses in the United States and Territories.” October 5, 2021. https://laderast.github.io//articles/2021-10-05-registered-nurses/2021-10-05-registered-nurses.html.