Tag

Data Cleaning

Browsing

When discussing data collection, outliers inevitably come up. What is an outlier exactly? It’s a data point that is significantly different from other data points in a data set. While this definition might seem straightforward, determining what is or isn’t an outlier is actually pretty subjective, depending on the study and the breadth of information being collected. So what’s the…

Nothing is more frustrating than wrapping up a lengthy data collection exercise, aggregating all the data and looking through it, only to find missing data. At best, these missing values are a nuisance that can be fixed with a bit of work. At worst, they pose an intimidating threat to data quality and your sample size. How can you assess…

With data scientist being hailed as the sexiest job of the 21st century, there has been an influx of “big data” companies, visualization tools, and other products. But unless the input data is cleaned and managed, all these products are fairly useless. As the saying goes: Garbage in, garbage out! This blog post is about the un-sexy aspects of data science – the practices…