science hack day portland part II, the hackening

This is the second part of my write-up on what it was like organizing and participating in Science Hack Day Portland 2016. As an exciting sidenote and excuse, this is terribly overdue because I recently graduated, got a job, and relocated. More on that later!

Where were we? At the end of the first night things were winding down - our competent yeast cultures and CRISPR bacteria were warming up, the 3D printer was raging on, and I was exhausted. I pitched my tent and went to bed.

Guests slowly started streaming in the following morning, and I helped get our DIY coffee station up and running. Saturday was when the magic happened. Groups started to dig in and bring together their hardware and software ideas. I was taught how to design some 3D printed medals on the fly, and another science hacker (who I later found out was a quilter!) helped sew lanyards to hold them.

Here’s where I’m going to tell you about the project I worked on, and how it failed.

That sounds harsh, but as a scientist I’m really used to failure and it’s not at all a bad thing (particularly in the discovery phase). My first failure was to pick out something I already do. While it was nice to work with something familiar to me, and get the chance to teach my team mate about it, I really didn’t get out of my comfort zone. I also didn’t take advantage of the awesome toys and supplies that were donated. While everyone else was hacking their raspberrypi I was doing what I do every day at work.

If it’s not obvious, the project I chose was using bioinformatics - more specifically taking open data from NASA Genelab on a 16S rRNA marker gene survey of the indoor microbiome of the International Space Station. With my partner we were interested building some fun, hopefully interactive visualizations using this data. This turned out to be really challenging for one big reason. Neither of us were familiar enough with the cloud computing resources necessary to work with this data in such a short amount of time.

This is where I’d like to spin this in a positive direction - my partner and I had a crash course in tools like Docker, AWS, and Digital Ocean. That morning we both scrambled between getting AWS configured, to ditching that and getting a hook-up on some credits to spin up a really powerful Digital Ocean drive, and the mess ended with learning how to configure a docker image that contained the bioinformatics program QIIME along with enough storage to handle the raw 16S dataset from GeneLab.

I’m still a bit new to bioinformatics, but from my experience that day was very similar to the challenges I face with my day-to-day work. I was only half-joking with my previous PI when I said that 90% of bioinformatics use is figuring out how to install programs.

Anyways the day went on and in between bouts of yelling at my computer screen I walked around to observe and help out other projects. I especially enjoyed being a guinea pig in Jean & Marlene’s project to build a temperature-sensing glove for Reynaud’s syndrome. I stuck my hand in a freezer for 30 minutes FOR SCIENCE!

Other fun things included Max showing folks how to develop black & white photographs with vitamin C and coffee, a pipette-ejecting contest, a Hack Oregon confetti cannon, an LED lamp that projects a rainbow double helix and even pumpkins that light up LEDs when you touch them. It was freaking amazing.

And now for a bit of reflection

I learned so much from Science Hack Day - about my community & local scientists, about the logistics of organizing such an event, and a lot more about cloud computing. One thing I hope to see at the next Science Hack Day is more prep and even infrastructure geared toward data science with large data sets. Some of the Portland organizers are already pushing for this - and recently hosted an Open Data Day complete with tutorials on metadata and web scraping to accompany the event. We are also already in the planning stages for Science Hack Day Portland 2017, and are working towards more smaller events leading up to it.

science hack day portland part I

Back in July of 2016 I joined up with some folks who were organizing the first Science Hack Day in Portland, Oregon. If you’re not familiar, Science Hack Day is a 24-hour (or sometimes weekend-long) event where people come together to make cool things with science. Cities worldwide have hosted their own Science Hack Days, all with the support of local volunteers and donors. One of the best things is that the event is free, and provides meals and supplies for the participants. Each city has it’s own flavor, but typically there’s a mix of scientists, hardware & software engineers, students, children, and other nerds.

I won’t spend much time talking about background stuff but let me tell you - organizing an event like this was incredibly fun for me. I met a lot of new people, and enjoyed the challenge of working with a team (of really great nerds!) to figure out all of the outreach, logistics, and fundraising that needed to be done. Leading up to the event we had two meet & greets, along with online introductions on our Slack channel - and our community started to self-assemble.

I’m not sure if I read this somewhere, or someone suggested it to me but around the time I was an undergrad my mantra became “Don’t think, just sign up for stuff and do it”. For a kid who grew up shy, this has worked out well for me. So naturally I volunteered to be the first speaker for the lightening talks that night. My job was to introduce everyone to Science Hack Day, how it would work, what to expect, and basically be the hype-person. I chose to channel my childhood inspiration: Ms. Frizzle.

Maybe this is just coming from someone in academia, but I would have loved a whole day of just talks from our guests. The event was conveniently nestled between the 2016 Open Hardware Summit and the Hack Oregon Donut Sunday Summit, so we were fortunate to have guests and speakers from those events in attendance. Dr. Jenny Molloy gave a really exciting talk about open science and science hardware. I learned a ton about open hardware from Drew Fustini. Dietrich Ayala got us all passionate about the open internet and Cat Nikolovski introduced us to Hack Oregon - an amazing nonprofit group building open data projects.

When the talks were over we turned on the music, set up our beer and kombucha bar, put snacks out and started to mingle. Alex Chally fired up the 3D printers he had brought, and Dr. Molloy got to work with his help to print a 3D microscope.

Thomas Hudson had the mood lighting covered after he set up the Internet of Buckets. He showed a group of folks how to program the bucket colors, which were conveniently capable of 23 hues - the same number of amino acids used by humans! Soon the Internet of Buckets was stealthily streaming the amino acid sequence for the protein known as p53 - an important player in cancer suppression.

I was most excited by the two kits we purchased from The Odin - a DIY CRISPR kit and a DIY fluorescent yeast kit. Working with another organizer (now Dr.!) Danielle Robinson I helped teach someone who’d never used a pipette before to perform a CRISPR transformation on e. coli!

As the night wound down, and the microbes were put to bed (atop of the 3D printer as a makeshift incubator), I pitched my tent and played camp counselor while about 5 or so others hunkered down for the night…and the rest I’ll continue in another post.

on data presentation & visualization

I recently attended a workshop by Edward Tufte, a very well-known statistician, author, and data scientist. If you haven’t seen his books they are quite inspiring and frankly gorgeous to look at and explore. The workshop was focused on data presentation and visualization. I’d like to share some of my musings on the subject, from a bioinformatics perspective…

Presenting data


There are a lot of questions you need to answer when you’re preparing to present scientific data. Who is your audience? How much time do you have? What kind of media do you have to work with? What message are you trying to convey with the data?

Edward Tufte is an outspoken opponent of the PowerPoint format, and the examples he gives in his books (namely Beautiful Evidence) are sound. If the data you present to your audience is going to be used to help them make a judgement on say, safety procedures regarding the impact of damage from errant debris on a space shuttle - a PowerPoint pitch is terrible.

However, in academia I’d argue that a pitch is the best way to present research at a conference, and it should be treated as such. I was recently at the Pacific Northwest Women in Science Retreat and learned a lot about taking lessons from actors and translating them to presenting data (thanks Nancy Houfek!). One thing that really stuck with me is the importance of telling a story when you present. I realized that the most memorable presentations I’ve seen are ones where it feels more like the presenter is sharing a story of their work, rather than reading off details of their experiment. This kind of brings me back to a quote from Tufte: “No matter how beautiful your interface is it would be better if there were less of it”. For a talk, less details are better. A good talk should leave the audience with lots of questions - questions that can be answered in reading the paper on the study, or viewing the poster, or talking to the researchers.

Visualizing data


Depending on the mode of presentation, your visualizations may be different. Tufte presents a few principles of visualizing data that I think are important.

Get rid of chartjunk

Chartjunk is just what it sounds like, clutter on a data visualization. I’d like to think this is largely a thing of the past - from the days when flashy software for data visualization was first introduced. Just because you can make your pie chart look like an actual pie doesn’t mean you should. Actually if you’re going to use a pie chart you might as well…

One of my favorite terrible chart examples provided by Tufte in The Visual Display of Quantitative Information is this piece of work from the 1970’s that somehow tries to illustrate college enrollment by age group.

You know if you turn it on it’s side it almost looks like one of those vase/face illusion graphics…

Additionally, chartjunk reminds me of the book The Design of Everyday Things by Donald Norman. This book is a classic that most material designers read, but I think it easily translates to the design of everyday data things. The central idea Norman presents is of user-oriented design and the book explores ways in which design can be improved to optimize the use of an object by the user. My favorite example of this is a “Norman door”. If you’ve ever encountered a push/pull door that isn’t intuitively designed, in a way that has cause you (the user) frustration in discovering how to operate it…that’s essentially a Norman door. This video from Vox is a great discussion on the subject.

Back to data. We don’t typically operate data like we would a door, or a teapot (well unless you’re making a Shiny visualization with R) - but we do interact with it. This post over at Dataremixed.com highlights how data visualization can borrow from Norman’s ideas about user-oriented design. Namely that:

  1. Good visualizations are discoverable and understandable

Something I’m sure Tufte would agree upon. Tufte encourages you to trust the intelligence of your audience and let them explore your data. Using Norman’s principles you can make this work by designing your visualization in such a way that your viewer doesn’t need a guide to understand how to read the data.

  1. Absolute precision isn’t always necessary

This goes back to data presentation. In academia, there’s a certain hierarchy that should be followed - with higher precision expected in something such as a report or print article. If you’re pitching your research at a conference it may be a good idea to sacrifice precision for better readability.

Minimize data ink

Along the lines of chartjunk Tufte introduces the concept of the “data-ink ratio”. Essentially the best data-ink ratio is one where all of the ink in an image represents only useful data. While some of Tufte’s suggestions feel like extreme minimalism, I thought this example from Info-Viz Wiki highlight’s this concept well:

The plot above has a low data-ink ratio and contains unnecessary lines and shading that detract from the data.

The plot above has a high data-ink ratio, and minimizes the amount of visual distraction in the figure. I really like the horizontal lines within the bars that make it easy to read and compare percentages across the figure.

Data visualization Tools


Over the past year I’ve been learning how to analyze data using the programing language R. It’s hard to say what my favorite thing about R is…that it’s open source? That there’s an excellent graphical user interface? That there is an active community of developers on a plethora of packages?

One of my favorite packages in R is ggplot2, and the slew of other data manipulation packages from Hadley Wickham. In my (humble) opinion ggplot2 improves upon the base graphics in R, but more importantly for me it has been easier to write understandable code to produce my figures. If you need convincing, read this list on why you should use ggplot2.I should probably do an entire write-up on using ggplot2 with bioinformatics data, but for now I’ll just share a few links to things I found helpful:

  • Swirl: The best way to get acquainted with RStudio. Swirl is a package that teaches you R, in RStudio. It’s amazing. Seriously.
  • RClub at University of Oregon: Going to their workshops not only improved my data wrangling skills but the folks who run RClub are a great resource.
  • Colorbrewer2: This is a GREAT tool for picking colors to use with ggplot2 graphics. It includes color picking to accommodate color blindness, something really important to consider when creating data visualizations.
  • Cookbook for R: I used this site nearly every day when I took my first class that used R.
  • Cowplot: I often use this package with ggplot2 to create publication-ready graphics. The plot_grid() function makes it easy to combine plots on a gride.

awesome fonts

I spent a good chunk of time recently playing around with icons in GIMP with the intent of making some nicely customized buttons for the navbar on this site. I actually enjoy (and miss) graphic design projects, so this was fun (but frustrating)…

But do I feel pretty silly now that I know about little more about how to use CSS, and this awesome site fontawesome.io. Besides the fact that fonts are completely scaleable and I don’t have a good grasp on making vector images, fontawesome is just…awesome. Free, no java needed, accessible for screen readers…the list goes on. It’s a simple, super handy website. Now my navbar has nice button-like links to my various social media contacts. I fontawesome.

Loading...

getting started with this site

This post marks the start of my transition to using the Jekyll static site builder, attached to my github profile. In the past I was using weebly to host my personal site and resume, mostly because I was too busy learning more about coding with genetic information to be bothered with the awesome challenge of website design. I’m looking forward to sharpening my skills in front-end web development (on the side, of course) and I hope to share some of what I learn on this blog. I’m also an adventurous baker, food fermenter and composter so these things may pop up from time to time.