How are Data Science and Systems Science Connected?

Ted Laderas

2/15/2018

Shameless Plug: Cascadia R Conference 2018

June 2, 2018 at CLSB (Collaborative Life Sciences Building)

https://cascadiarconf.com

A gRadual intro to Shiny

Learn about making interactive visualizations/dashboards in R

Please RSVP at: https://www.meetup.com/portland-r-user-group/events/247752115/

Overview

Introduction

What is Data Science?

Shlomo Argamon: At its core, data science is about making sense of the world using data.

Encompasses techniques from:

What about Systems Science and Data Science?

Carbone 2016: Further interdisciplinary advances and deeper insights will be needed for understanding:

  1. interactions among connected heterogeneous entities (namely space-time-dependent heterogeneous data structures)
  2. emergence of large-scale properties of interacting entities and clustering
  3. multi-scale data-driven approaches to identify patterns, fundamental shapes and parameters at different levels of abstraction

We need to bring more interactions to Data Science!

Data Science/Machine Learning Workflow

What is Feature Engineering?

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.

Let’s look at one type of feature engineering in Head and Neck cancer.

Background: Oncogenes

Oncogenic Collaboration and Hallmarks of Cancer

Not just one alteration, but many are involved in Cancer and they collaborate to disrupt cellular systems

Surrogate Legend

One problem: We don’t target unique alterations within patients

Long-Tail-of-Cancer

Research Question

Surrogate Example Surrogate Legend

Permutation Analysis on Networks

What subnetworks are significant? Use permutation analysis to decide on statistical cutoff.

Permutation Analysis

Surrogates incorporate long-tail mutations

White = unique/infrequently observed, Dark Blue = frequent observed mutations

BRCA surrogates

Surrogate Mutations

Lesson: Feature Engineering needs System Approaches

Machine Learning for Prediction

Classification Problem:

Classification Task

http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/

Big Data to Interpretability?

Classification Task

http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/

What is interpretability?

We should think of interpretability as human simulatability. A model is simulatable if a human can take in input data together with the parameters of the model and in reasonable time, step through every calculation required to make a prediction (Lipton 2016).

Why interpretability is important

Issues of trust and bias plague machine learning and its applications!

Transparency Matters in Many Cases!

If the data scientist’s goal is to create automated processes that affect people’s lives, then he or she should regularly consider ethics in a way that academics in computer science and statistics, generally speaking, do not.

The more processes we automate, the more obvious it will become that algorithms are not inherently fair and objective, and that they need human intervention. (The Ethical Data Scientist)

Research into Interpretability is Just Starting

NIPS 2017: Interpreting, Explaining and Visualizing Deep Learning - Now What?

Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017)

One effort: LIME

Highlighting importance of features in LIME

Feature Importance in Lime

Highlighting importance of features for image classification

LIME images

Beyond LIME: What’s missing from these explanations?

Jakulin 2004 - Interactions via Entropy

Three way interaction

Tom Fiddaman: Simulation and Data Science

Complex interplay between big data and dynamic simulation

Dynamic models can make a black box more understandable (Fiddaman):

Call to Action

Let’s Talk More!

OHSU-PSU Research Faculty Mixer for collaboration

Feb 22 - 4:30 to 6:30 p.m.

Collaborative Life Sciences Building

2730 SW Moody Ave.

It’s all about telling stories

The Complex Systems and Data Science program offered by University of Vermont trains emerging data scientists to find, model, understand, and tell the stories of the patterns they uncover.

https://www.mastersportal.eu/studies/114025/complex-systems-and-data-science.html

Keep in Touch

This talk: http://laderast.github.io/sysc_data_sci