Well, we just finished our clinical data wrangling workshop. This was a 12 hour workshop (spread over 4 days) where students got to work with a real research dataset (the Sleep Heart Health Study data). This is a workshop that we developed as part of an National Library of Medicine T15 training supplement in Data Science. The following is a short report describing the workshop and its outcomes.
We designed the workshop for our incoming informatics students (both on the clinical and biological majors) in order to introduce them to the difficulties of working with clinical data.
I’m still in the process of recovering from my current bout of depression and anxiety. I’d like to talk about what is currently helping me moderate my anxiety. I have been practicing mindfulness and meditation for the past three years and I’m beginning to realize how necessary it is in our information dense age. Many of my symptoms of anxiety are really from an information overglut.
I’m currently on way too many projects and am teaching as well.
I just gave a workshop teaching the basics of Shiny (the interactive web visualization framework) for a group of PDX R users. We had 10 people attend, and most of the attendees managed to get through the material and had lots of good questions. I really enjoyed talking with everyone and I hope everyone learned something. We’re planning to give the workshop again to the larger PDX R user community, and some of the attendees last night have volunteered to be TAs.
Well, the week of teaching our Python Bootcamp for Neuroscientists is over. I had the pleasure of working with a great group of students, professors and instructors in developing the material, and had a great time teaching complete beginners to programming and Python.
We had the overall goal of introducting 21 Neuroscience Graduate Program students at OHSU to the basics of programming in Python using data that they were interested in: electrophysiology data, and confocal microscopy data.
I have had many people who have asked me for informational interviews. They tell me that they are interested in Data Science and want to hear about what I do on a day to day basis. To be honest, I’ve begun to dread these kinds of interviews.
Inevitably, I spend a lot of energy explaining what I do to someone who rarely follows up. Consequently, I don’t find these interviews rewarding at all.
Note: after posting this, I heard back from Roberto Tyley, the creator of the BFG. I’d like to note that the BFG actually does its job really well. I was mostly really frustrated about how Git/GitHub doesn’t prevent a user from doing something that’s hard to undo. So my frustration is really about that, not really about the BFG. This post has been edited to reflect that.
Greg Wilson first said it, but I’ve come to agree.
Have you ever had something that no matter how many times someone explained it, you really had no idea what it was for? For me, that was Non Standard Evaluation (NSE) in R, and its newer cousin Tidy Evaluation, or tidyeval. I had a real learning block about it. I really wanted to understand it, but for some reason I just really wasn’t getting the general concepts.
What is evaluation, really?
Hi Everyone, our paper called Teaching data science fundamentals through realistic synthetic clinical cardiovascular data is now available to read on Biorxiv.
In this paper, we talk about a dataset that we synthesized for teaching aspects of clinical data that may be tricky to understand in data science. This dataset is interesting because it’s derived from a multivariate distribution based on real patient data, modeled as a Bayesian Network. Even when we knew true marginals for the real data, there was a lot of fine tuning to the Bayesian Network.
I just came back from the Open Data Science Conference (ODSC) in San Francisco and I found it really stimulating and interesting. I learned a ton, met some great people working in very different fields, and overall found it quite worthwhile.
Here are some of the highlights from my notes:
Workshops scikit-learn intro Workshop and Advanced
I admit that I am not really a Python person. But I am helping to develop some materials for an introductory workshop and I found this workshop and its materials to be a very beginner-friendly to scikit-learn and machine learning concepts, much like caret for R.
One of the hardest concepts as an analyst that I have struggled with is separating my code from my data. A related issue is making your code reproducible across data instances.