There is an often quoted statistic in the world of data that analysts and data scientists spend about 80% of their time collecting, cleaning, and organizing data, all tasks that fit into the category of data preparation. If you’re not familiar with the work that goes into prepping data for analysis or you’re curious how to reduce the amount of time you’re spending on it, check out this page for starters.
Today, we’re going to talk about the undeniable importance of data preparation, as well as some things to consider during the process. A general principle to keep in mind is that the amount of value you will get out of your analysis is a function of how much work you put into the data preparation process. In other words, “garbage in, garbage out” (or rubbish, depending on where you’re reading)! Incorrect or poor-quality input will always produce faulty output.
Discover Golden Nuggets
Modern analytics rely on data from multiple sources, and unfortunately not all those sources structure data the same way. Thankfully analytic tools like Alteryx will accept data from nearly any source, and moreover, it can automatically restructure it into an analysis-ready format so that you can begin exploring your data uninhibited by poor structure.
Even simple descriptive statistics like measures of frequency, central tendency, and dispersion of data often uncover golden nuggets — new and unexpected ideas or understandings.
Pay It Forward: More Manageable Data for All
Once you’ve put the thought and effort into cleansing and prepping a data set, your hard work can be shared and reproduced for the benefit of others. Self-service platforms like Alteryx allow for automated and repeatable workflows that improve efficiencies and accountability, freeing up your time. Your team plus your future self will thank you!
No matter if you’re working in Alteryx, a different platform, or even writing code, it’s best to record your steps. Document any errors, duplicates, or inconsistencies you found to the owners of the data set, as they may want to make changes to the underlying database.
Trust Your Data
Statistician Hadley Wickham, famous for his contributions to the open-source statistical programming language R, wrote that, “every messy dataset is messy in its own way.”
As Wickham’s comment indicates, there are an infinite number of ways for data entry, structure, and semantics to go awry. Just like the humans that create them, databases and data sets are unique and fallible. By putting in the legwork and due diligence to explore, cleanse, and transform your data set, you are ensuring that it is a reliable basis for your analysis.
Trust Your Decisions
This leads us to our final, and perhaps our most valuable, argument for the importance of data preparation. Well-prepped data leads to dependable algorithms and analyses which ultimately leads to credible, profitable, business decisions.
While data and statistics are fun all on their own (at least we think so), their principle use in business is to inform and guide decisions.
According to McKinsey, organizations that use their customer behavior data for strategic analysis outperform their peers by 85% in sales growth margins.
The analyses and algorithms business hope to produce and the outcomes they seek to achieve rest upon high quality input — all made possible by the humble process of data preparation.
Tune in to this on-demand webinar and see how you can go
from spreadsheet master to analytic guru.