We’re so excited to announce the launch of our second online course about geospatial data in R. Sign up here. When you hear “geospatial data”, what comes to your mind? For many people, it’s ordinary maps. These are an important output of geospatial data, but it can actually be used for so much more. Geospatial data is at the heart…

At the 14 July R User Meetup, hosted at Atlan, I had the pleasure of briefly introducing the relatively new tidytext package, written by Julia Silge (@juliasilge) and David Robinson (@drob). Essentially this package serves to bring text data into the “tidyverse”. It provides simple tools to manipulate unstructured text data in such a way that it can be analyzed…

What is an outlier? In short, it’s a data point that is significantly different from other data points in a data set. The long story? There isn’t a strong mathematical definition for what is or isn’t an outlier. In the end, detecting and handling outliers is often a somewhat subjective exercise. So how can you dive into a new data…

Cross tabulation is a method to quantitatively analyze the relationship between multiple variables. Also known as contingency tables or cross tabs, cross tabulation groups variables to understand the correlation between different variables. It also shows how correlations change from one variable grouping to another. It is usually used in statistical analysis to find patterns, trends, and probabilities within raw data. When…