Explore: Data exploration or discovery is a way to identify patterns, trends, and missing or incomplete information in a dataset. The bulk of exploration happens before creating reports, data visualizations, or training models, but it’s common to uncover surprises and insights in a dataset during analysis too.
Transform: Transforming or structuring data is important; if not done early on, it can compromise the rest of the wrangling process. Data transformation involves putting the data in the right shape and format that will be useful for a report, data visualization, or analytic or modeling process. It may involve creating new variables (aka features) and performing mathematical functions on the data.
Cleanse: Data often contains errors as a result of manual entry, incomplete data, data automatically collected from sensors, or even malfunctioning equipment. Data cleansing corrects those entry errors, removes duplicates and outliers (if appropriate), eliminates missing data, and imputes missing values based on statistical or conditional modeling to improve data quality.
Enrich: Enrichment or blending makes a dataset more useful by integrating additional sources such as authoritative third-party census, firmographic, or demographic data. The enrichment process may also help uncover additional insights from the data within an organization or spark new ideas for capturing and storing additional customer information in the future. This is an opportunity to think strategically about what additional data might contribute to a report, model, or business process.
Store: The last part of the wrangling process is to store or preserve the final product, along with all the steps and transformations that took place so it can be audited, understood, and repeated in the future.