Tag

Data Preparation

Browsing

Data is most valuable when you have something to compare it to, but these comparisons aren’t helpful if the data is bad or irrelevant. Data is most valuable when you have something to compare it to. For example, it’s nice to know that your program helped 150 people this year, but that doesn’t tell you what you should do next…

When discussing data collection, outliers inevitably come up. What is an outlier exactly? It’s a data point that is significantly different from other data points in a data set. While this definition might seem straightforward, determining what is or isn’t an outlier is actually pretty subjective, depending on the study and the breadth of information being collected. So what’s the…

Nothing is more frustrating than wrapping up a lengthy data collection exercise, aggregating all the data and looking through it, only to find missing data. At best, these missing values are a nuisance that can be fixed with a bit of work. At worst, they pose an intimidating threat to data quality and your sample size. How can you assess…

For data collected through both paper and digital surveys, you should conduct some basic data checks before carrying out thorough data cleaning. Keep reading for 4 basic data checks that you can use to check for underlying errors in almost any data set. Number of Respondents vs. Rows For any kind of survey, you should always match the number of rows…

The number of villages in India is anywhere between 600,000 and one million, according to various government databases. The number and the definition of villages vary across databases, making it challenging to plan across sectors for a village development plan. There are around 649,481 villages in India, according to Census 2011, the most authoritative source of information about administrative boundaries…

With data scientist being hailed as the sexiest job of the 21st century, there has been an influx of “big data” companies, visualization tools, and other products. But unless the input data is cleaned and managed, all these products are fairly useless. As the saying goes: Garbage in, garbage out! This blog post is about the un-sexy aspects of data science – the practices…