Data Exploration of NHANES

Ted Laderas, Jessica Minnier, Thomas Frohwein



Please Note

This workshop adheres to the BioData Club Code of Conduct.

This is done to maintain a psychologically safe and inclusive environment for everyone. Please email me at or text me (503-481-8470) if you see any potential violations.

If you violate the CoC, you may be asked to leave.

These Slides are Here

Our Overall Goal

Why are we here?

What is NHANES and why are we looking at it?

Please Note

Take a Look at the Data as a Sheet

NHANES Extract in Google Sheet Form

NHANES is a valuable dataset in many ways

We can understand an outcome and look at its association with measured variables in the data.

Let’s look at three outcomes today:

Before we Start

Get into groups by your chosen outcome. Introduce yourselves, and pair off within your groups

Come up with one question about your outcome you’re curious about.

What do you expect is the case?

See if you can answer it!

What is Exploratory Data Analysis?


“Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone.” - John Tukey, Exploratory Data Analysis

Why Look at your Data?

Systolic Blood Pressure

Systolic Blood Pressure

Why Visualization?

EDA is about Visualization First

Running the Explorer App

We’ll start exploring the data immediately!

Go to the apps below:

We’ll separate the scavenger hunt by outcome, and we’ll ask questions, and then come back to present.

Map your questions to a tab:

What is the Overview Tab for?

Data Explorer

Data Explorer

Overview Tab

Let’s try it!

  1. How many categorical variables are there? (in R, we call them factors)
  2. How many missing cases are there for your outcome?
  3. What is the mean age for the dataset?

What is the Category Tab for?

Categorical Tab

Categorical Example

Do people with the most days of LittleInterest also have the most days of Depression?