Posts

I just gave a workshop teaching the basics of Shiny (the interactive web visualization framework) for a group of PDX R users. We had 10 people attend, and most of the attendees managed to get through the material and had lots of good questions. I really enjoyed talking with everyone and I hope everyone learned something. We’re planning to give the workshop again to the larger PDX R user community, and some of the attendees last night have volunteered to be TAs.

CONTINUE READING

Well, the week of teaching our Python Bootcamp for Neuroscientists is over. I had the pleasure of working with a great group of students, professors and instructors in developing the material, and had a great time teaching complete beginners to programming and Python. We had the overall goal of introducting 21 Neuroscience Graduate Program students at OHSU to the basics of programming in Python using data that they were interested in: electrophysiology data, and confocal microscopy data.

CONTINUE READING

I have had many people who have asked me for informational interviews. They tell me that they are interested in Data Science and want to hear about what I do on a day to day basis. To be honest, I’ve begun to dread these kinds of interviews. Inevitably, I spend a lot of energy explaining what I do to someone who rarely follows up. Consequently, I don’t find these interviews rewarding at all.

CONTINUE READING

Note: after posting this, I heard back from Roberto Tyley, the creator of the BFG. I’d like to note that the BFG actually does its job really well. I was mostly really frustrated about how Git/GitHub doesn’t prevent a user from doing something that’s hard to undo. So my frustration is really about that, not really about the BFG. This post has been edited to reflect that. Greg Wilson first said it, but I’ve come to agree.

CONTINUE READING

Have you ever had something that no matter how many times someone explained it, you really had no idea what it was for? For me, that was Non Standard Evaluation (NSE) in R, and its newer cousin Tidy Evaluation, or tidyeval. I had a real learning block about it. I really wanted to understand it, but for some reason I just really wasn’t getting the general concepts. What is evaluation, really?

CONTINUE READING

Hi Everyone, our paper called Teaching data science fundamentals through realistic synthetic clinical cardiovascular data is now available to read on Biorxiv. In this paper, we talk about a dataset that we synthesized for teaching aspects of clinical data that may be tricky to understand in data science. This dataset is interesting because it’s derived from a multivariate distribution based on real patient data, modeled as a Bayesian Network. Even when we knew true marginals for the real data, there was a lot of fine tuning to the Bayesian Network.

CONTINUE READING

I just came back from the Open Data Science Conference (ODSC) in San Francisco and I found it really stimulating and interesting. I learned a ton, met some great people working in very different fields, and overall found it quite worthwhile. Here are some of the highlights from my notes: Workshops scikit-learn intro Workshop and Advanced I admit that I am not really a Python person. But I am helping to develop some materials for an introductory workshop and I found this workshop and its materials to be a very beginner-friendly to scikit-learn and machine learning concepts, much like caret for R.

CONTINUE READING

One of the hardest concepts as an analyst that I have struggled with is separating my code from my data. A related issue is making your code reproducible across data instances.

CONTINUE READING

Since I didn’t get to go to useR 2017 this year, I’m compiling the interesting talks. This is an ongoing list.

CONTINUE READING

I’m going to be giving a talk for the PDX RLang Meetup on July 11 called “How to Not Be Afraid of Your Data: Teaching EDA using Shiny”. Abstract below. Many graduate students in the basic sciences are afraid of data exploration and cleaning, which can greatly impact their downstream analysis results. By using a synthetic dataset, some simple dplyr commands, and a shiny dashboard, we teach graduate students how to explore their data and how to handle issues that can arise (missing values, differences in units).

CONTINUE READING