The first step in any data preparation process is acquiring the data that an analyst will use for their analysis. It’s likely that analysts rely on others (like IT) to obtain data for their analysis, likely from an enterprise software system or data management system. IT will usually deliver this data in an accessible format like an Excel document or CSV.
Modern analytic software can remove the dependency on a data-wrangling middleman to tap right into trusted sources like SQL, Oracle, SPSS, AWS, Snowflake, Salesforce, and Marketo. This means analysts can acquire the critical data for their regularly-scheduled reports as well as novel analytic projects on their own.
Examining and profiling data helps analysts understand how their analysis will begin to take shape. Analysts can utilize visual analytics and summary statistics like range, mean, and standard deviation to get an initial picture of their data. If data is too large to work with easily, segmenting it can help.
During this phase, analysts should also evaluate the quality of their dataset. Is the data complete? Are the patterns what was expected? If not, why? Analysts should discuss what they’re seeing with the owners of the data, dig into any surprises or anomalies, and consider if it’s even possible to improve the quality. While it can feel disappointing to disqualify a dataset based on poor quality, it is a wise move in the long run. Poor quality is only amplified as one moves through the data analytics processes.
During the exploration phase, analysts may notice that their data is poorly structured and in need of tidying up to improve its quality. This is where data cleansing data comes into play. Cleansing data includes:
- Correcting entry errors
- Removing duplicates or outliers
- Eliminating missing data
- Masking sensitive or confidential information like names or addresses
Data comes in many shapes, sizes, and structures. Some is analysis-ready, while other datasets may look like a foreign language.
Transforming data to ensure that it’s in a format or structure that can answer the questions being asked of it is an essential step to creating meaningful outcomes. This will vary based on the software or language that an analysts uses for their data analysis.
A couple of common examples of data transformations are:
- Pivoting or changing the orientation of data
- Converting date formats
- Aggregating sales and performance data across time