Please Note

This workshop adheres to the BioData Club Code of Conduct.

This is done to maintain a psychologically safe and inclusive environment for everyone. Please email me at laderast@ohsu.edu or text me (503-481-8470) if you see any potential violations.

If you violate the CoC, you may be asked to leave.

These Slides are Here

http://bit.ly/data_nhanes

Put your post it up when you’re ready!
If someone comes in late, please show them where the slides are

Our Overall Goal

Look at NHANES data
Understand how the variables are defined
Understand associations between outcome and variable
Understand interactions between variables
Share insights about the data with each other

Why are we here?

Further exposure to methods used by others
Just getting some more hands-on experience.
A better way to work up my experimental data
Learn more about exploratory data analysis
Some confidence that I can actually learn this and a kick in the pants to get started! I’ve taken stats classes, but they were many years ago.
To get better at EDA with my work.
Just looking to continue learning.
Learning habits and approaches from other users. Also willing to help out newer users.
Interested to see how an information/data scavenger hunt is set up
I mostly work with data visualization and don’t get to do much analysis, so I need practice!
Hands-on practice with EDA
The ability to do EDA in an open source platform.
A feel for what it’s like to do exploratory analysis.
More data science skills to apply to the working world.
To get better at analyzing data
I would like to get more comfortable with data science tools and methods. I would also like to formulate a side-project that I can build on in the near future.
A better grasp of the practice of data science and experience that can be applied to my career.
Seeing different approaches and tools, collaboration
To get an introduction to the mathematical methods behind analyzing and exploring data.
Increase my comfort with R and statistical analyses!
Seeing how burro works again because I didn’t have a laptop for the Data Wrangling Workshop.

What is NHANES and why are we looking at it?

National Health And Nutrition Examination Survey
Assess health/nutritional status of adults/children in the United States
Types of Survey Questions:
- Demographic (Age, Race, Gender, many more…)
- Socioeconomic (Marriage Status, Household Income, Education)
- Dietary (Foods consumed, dietary supplements)
- Health (Body Mass Index, Sleep Trouble, Depression)

Please Note

We’re not going to look at all of NHANES.
We’re looking at an extract from two years of the survey (which years?)
We’re ignoring how particpants were chosen/sampled from the larger population
- We’ll talk a a little about this later.

Take a Look at the Data as a Sheet

NHANES Extract in Google Sheet Form

NHANES is a valuable dataset in many ways

We can understand an outcome and look at its association with measured variables in the data.

Let’s look at three outcomes today:

Depression
Type 2 Diabetes
Physical Activity

Before we Start

Get into groups by your chosen outcome. Introduce yourselves, and pair off within your groups

Come up with one question about your outcome you’re curious about.

What do you expect is the case?

See if you can answer it!

What is Exploratory Data Analysis?

Pioneered by John Tukey
Detective work on your data
An attitude towards data, not just techniques
‘Find patterns, reveal structure, and make tenative model assessments (Behrens)’

Remember

“Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone.” - John Tukey, Exploratory Data Analysis

Why Look at your Data?

Need to be aware of issues in the data!

Systolic Blood Pressure

EDA is about Visualization First

Visualization is a gateway
Understand the issues, not focus on coding right now
Build your foundation

Running the Explorer App

We’ll start exploring the data immediately!

Go to the apps below:

We’ll separate the scavenger hunt by outcome, and we’ll ask questions, and then come back to present.

Map your questions to a tab:

Overview
Categorical Variable
Continuous Variables

What is the Overview Tab for?

Seeing how many variables are in the dataset and which type
Seeing missing values and complete cases
Looking up a variable in the data dictionary

Data Explorer

Overview Tab

What values are missing from the dataset overall? (Visual Summary)
Are any numeric values skewed in distribution? (Tabular Summary)
How is the variable defined? (Data Dictionary)
What are the permissible values? (Data Dictionary)

Let’s try it!

How many categorical variables are there? (in R, we call them factors)
How many missing cases are there for your outcome?
What is the mean age for the dataset?

What is the Category Tab for?

Should we add a categorical variable to our model?
Does my categorical variable have predictive value?
Does adding my variable affect the number of cases I can analyse?
Is my variable missing at random or not at random?
Is my categorical variable confounded with another categorical variable?

Categorical Tab

What percentages exist for my categorical variable? (Single Category)
Is my variable associated with outcome? (Category/Outcome)
Is my variable associated with other variables? (Crosstab)
Are the missing values of my variable evenly distributed? (Missing Data)

Categorical Example

Do people with the most days of LittleInterest also have the most days of Depression?

Categories: Let’s try it!

How many categories are there for your outcome?
What is the category with the largest counts for your outcome?
Do the proportions of people with your outcome look the same for those who use marijuana versus those who don’t use it?

Continuous Tab

What is the distribution of my categorical variable? (Single Continuous)
Is my continuous variable associated with outcome? (Continuous/Outcome)
Is my continuous variable associated with another categorical variable? (Boxplot)
Is my continous variable associated with another continous variable? (Correlation)
Is my continuous variable missing values? (Correlation)

Continuous Scatter

If you get less hours of sleep per night, does that mean you have a higher BMI?

Continuous Boxplot

If you have a lot of depressed episodes, do you also get less sleep?

Continuous: Let’s Try it!

Is there something strange about Age in the dataset?
As you get older, does your BMI go up?
Do Depressed people have a higher systolic blood pressure than non depressed people?

Let’s start the scavenger hunt

Let’s learn from each other

Each group should present the findings from 2-3 interesting questions:

Where did you find it in the app?
What variables did you look at?
What did you expect?
What did you find?

Physical Activity Notes

Jess’ excellent overview of physical activity in NHANES

Some Final Notes in NHANES

NHANES was designed with a very special sampling design, that was meant to be representative to the United States
There are weights that you must utilize for real modeling and analysis

Congratulations

You are now a full fledged data explorer!

https://waynepelletier.com/work/tasty-icons

Please!

Fill out our post survey form - we need this info to do more workshops!

Post Survey Form

Let’s do more scavenger hunts

Are there other interesting public datasets that people want to look at?

We can build apps for them!

http://laderast.github.io/burro

Data Exploration of NHANES

Introductions

Please Note

These Slides are Here

http://bit.ly/data_nhanes

Our Overall Goal

Why are we here?

What is NHANES and why are we looking at it?

Please Note

Take a Look at the Data as a Sheet

NHANES is a valuable dataset in many ways

Before we Start

What is Exploratory Data Analysis?

Remember

Why Look at your Data?

Why Visualization?

EDA is about Visualization First

Running the Explorer App

Map your questions to a tab:

What is the Overview Tab for?

Overview Tab

Let’s try it!

What is the Category Tab for?

Categorical Tab

Categorical Example

Categories: Let’s try it!

Continuous Tab

Continuous Scatter

Continuous Boxplot

Continuous: Let’s Try it!

Let’s start the scavenger hunt

Let’s learn from each other

Physical Activity Notes

Some Final Notes in NHANES

Congratulations

Please!

Let’s do more scavenger hunts