This year, over 1,000 attendees, 40 speakers, 19 sponsors, and more people from the data science and tech community witnessed the magic of The Fifth Elephant.
Now, have you ever wondered how one of India’s most popular data science conferences got its name?
We have an answer! Turns out The Fifth Elephant was named after a fantasy novel of the same name by Terry Pratchett.
The fifth elephant is an enigma that can mean an unseen thing that controls events. And the eighth edition was indeed bigger than what it looked like in Bengaluru’s NIMHANS convention center.
The current edition of the community-led event brought together the entire Indian data ecosystem under one roof! And Atlan was one of the proud sponsors of the conference this year. We couldn’t have been more excited to meet data engineers, scientists, analysts, and diverse people from the ecosystem.
Besides the fantastic workshops, talks and fun contests, connecting with the amazing Humans of Data made the event a definite hit for us. And of course, we had the swag!
From independent consultants and data enthusiasts to leaders from organizations like Hotstar, Go-Jek, Flipkart, Walmart, Anaconda, and more, everyone was at the event to share their experiences of working with data, best data practices, hacks and tips, and more.
In case you missed attending it this year, we’ve got you covered! Below are five of our favorite talks from The Fifth Elephant 2019.
Why data privacy is critical for robust data management?
Peter Wang, co-founder and CTO of Anaconda, started his talk by sharing how we can treat data differently, borrowing the concept from Maciej Ceglowski’s 2015 talk.
While we all talk about “data being the new oil”, he insists on treating data as a waste product, a radioactive, toxic waste instead of a precious resource. This gives a new perspective, a new mindset to look at data.
Simple things like picking only the columns/variables you need from your data or compartmentalizing access can help in minimizing exposure to the valuable data we deal with every day. He also talks about the future of open-source.
7 steps to build-your-own data pipeline for day 1 of your startup
Kumar Puspesh is the co-founder and CTO of Moonfrog, an India-based mobile gaming company. Being in the consumer market meant one thing — Moonfrog relied heavily on data from Day 1.
Moonfrog’s current ingestion scale stands at 20 billion unique events and 800+ GB uncompressed data per day. But Moonfrog built their own data infrastructure before production usage from the beginning.
Kumar shared how they had set up a lightweight data collection pipeline to central queues which ingested real-time data to Redshift (their choice of the warehouse because of easy usage).
Moonfrog needed technology capable of handling real-time data in a cost-effective manner. Having identified this requirement right in their early days, they built data retention and querying capabilities that precisely served this need.
Watch this talk below and his slides here to learn more.
How to build blazingly fast distributed computing like Apache Spark in-house?
Upendra Singh is a Data Scientist at Clustr (a child organization of Tally), with over 12 years of experience and his talk digs into the story of Clustr.
Because of Clustr’s customers (small business owners), affordability remains central to everything that they build.
As Clustr comes from a legacy organization like Tally, there is a lack of C/C++ runtime support. While there are ways of integrating the existing C/C++ libraries to be distributed in a computational framework, the approach is limited and has flaws. Existing technologies like Spark solved their processing requirements but had their cost limitations.
So what did the team at Clustr do? Watch this incredible talk from Upendra Singh below and see his slides here.
The final stage of grief (about bad data) is acceptance
How many of you are happy with your data set?
Does it answer every question you want it to answer?
These are the questions that Chris Stucchio, the Head of Data Science at Simpl, attempts to answer in his talk.
All of us have dealt with bad data! So what do you do when your data is bad?
You still work with it! Chris’ talk was just about this — drawing correct inferences from low-quality data.
Instead of treating your data as a reflection of the current reality, treat your data as the imperfect picture of what’s actually out there in the world.
Chris shares how the approach of accepting bad data helps us get a detailed grasp of the data generation process, enhances the predictive model and gains insights for improving and repairing the entire process.
If you’re dealing with bad data, watch his talk below. You can check out the slides here.
A journey through Cosmos to understand users
Avinash Ramakanth, Tech Lead at Inmobi, talks about building a scalable user system, handling 50+ billion requests per day.
A typical Inmobi DSP processes anywhere from 250,000 – 1,000,000 queries per second, with an average response time of sub 50 milliseconds. This requires the supporting user store to be highly scalable, cost-effective and reliable to support intelligent decision-making at low latency. The journey through Cosmos covers how Inmobi built a cloud-native user feedback system for one of their products, DSP.
You can watch the talk below and access the slides here.
The event was two days of sharing, learning, and experiencing the magic that the amazing humans of data create with their work. After introducing Atlan and learning about different data teams, we are now even more committed to work with them, for them.
Want to catch up on all the other awesome talks from the conference? Check them out from HasGeek here.
And that’s a wrap
While the event had a successful closure, we brought back some great memories, feedback and lessons for our team at Atlan. Not to mention the friends we made and funny data puns we cracked!
In case you missed out on all the fun, do follow us on Twitter to stay updated on where our team is headed next.