
Introduction to Data Debugging
Monday, November 16th 6:30-8:30pm
It’s been estimated that data scientists spend somewhere between 50% and 80% of their time “collecting and preparing unruly digital data” before they ever get to the analysis. Data is often badly labeled, inconsistently sampled, incorrect in strange places, missing, and otherwise contains a whole host of errors, leading to the “garbage in, garbage out” problem. While detecting the myriad ways in which the data is broken can sometimes be difficult, many traditional visualization and statistical analysis techniques can be used to sanity check data sets. In this workshop, we will walk through detecting and compensating for some of the most common problems with datasets.
This workshop will be hosted by Hannah Aizenman and Jeremy March.
Please contact gc.digitalfellows@gmail.com with any questions.
Photo credit: Michael Mol