Tag

Data Exploration

Browsing

At the 14 July R User Meetup, hosted at Atlan, I had the pleasure of briefly introducing the relatively new tidytext package, written by Julia Silge (@juliasilge) and David Robinson (@drob). Essentially this package serves to bring text data into the “tidyverse”. It provides simple tools to manipulate unstructured text data in such a way that it can be analyzed…

What is an outlier? In short, it’s a data point that is significantly different from other data points in a data set. The long story? There isn’t a strong mathematical definition for what is or isn’t an outlier. In the end, detecting and handling outliers is often a somewhat subjective exercise. So how can you dive into a new data…

Cross tabulation is a method to quantitatively analyze the relationship between multiple variables. Also known as contingency tables or cross tabs, cross tabulation groups variables to understand the correlation between different variables. It also shows how correlations change from one variable grouping to another. It is usually used in statistical analysis to find patterns, trends, and probabilities within raw data. When…