Why to Keep a Log File and How to Keep It in R - Atlan

One of the first things we are taught in Programming 101 is to write a well-structured and commented code. And as any newbie would, we ignore this lesson and focus on achieving the end result. Recently, I coded a R (the R language!) script to be run on files amounting to 30 GBs! This was my first professional experience after my graduation and I did not want to fuck up. So I structured the code, wrote all the comments and ran it on all the files. And what happened next?!

All files were not structured the same way and so my script broke for a few files, leaving my final data set void of some very important data. Moreover, my script was deleting some rows from every file, and thus I was tampering with the original data set without any logical and concrete reason behind it. It might not sound something as significant as not achieving the final goal, but believe me, in data science, if your data is not representative of the true data set, your analysis is considered void.

A log file is a file that records events that occur during a process. It basically helps to track back the process and discover if anything has gone wrong.

Reasons to Keep a Log File

So how to account for such cases? Maintain a log file! If you need more reasons for maintaining a log file, here are few I can think of:

Large data sets follow Murphy’s Law. Anything that can go wrong, will go wrong. And a log file is the best way to keep check.
While running a common script on several multiple files, a log file will give you a gist of the whole process.
A log file will help for future reference, both for your own self and also for others who will use the script or the data set again.

What to Write in a Log File

So, okay! I know a log file is important, but what do I write in the log file? It depends on the use case. As someone who works with data daily, I usually maintain the following parameters in my log file:

Total number of files the script was run for
File names
Number of rows in each file (before and after processing)
Number of columns in each file (before and after processing)
Any specific parameters important to the particular data set
Processing time

How to Keep a Log File in R

createLog <- function(df){log_con <- file("process.log",open="w")cat(nrow(df), file = log_con, sep="\n")...}

This is a very basic way to keep a log file. I prefer using this function in every script because it gives me the freedom to choose the contents of my log file.

Happy coding!

Achyut Joshi originally published this article on his personal blog.

2 Comments

Sada 7 years ago Reply
would’ve been better with a short explanation of that code chunk
- Joaquín Bruno Huete 7 years ago Reply
  The code chunk is self explainable if you read it in R
  It creates a function called createLog log_con
  Counts the number of rows of df: nrow(df)
  And records it into the opened file with cat() configured according to certain parameters (for those particularities it is good habit to visit those ?help R pages): cat(nrow(df), file = log_con, sep=”\n”)…
  It is true that the code itself can leave some questions unanswered, is it necesary to close the connection with the opened file being one of them plus, perhaps it is a very concrete function named with quite a broad name.

Write A Comment Cancel Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Reasons to Keep a Log File

What to Write in a Log File

How to Keep a Log File in R

Related Posts

Comparative Thematic Mapping with Mapdeck

Announcing flyio, an R Package to Interact with Data in the Cloud

An Introduction to Tidy Text Mining

2 Comments

Write A Comment Cancel Reply