ETL is a widespread practice because it allows users to combine disparate data and to store it in a new, centralized repository.
ETL can be done manually by handwriting code or more efficiently and securely using ETL software, like our very own Alteryx Designer. Regardless of what tools and techniques you are using, these are three all too common issues you should consider as you create your ETL pipeline.
Mistake #1: Ignoring the End User
The ETL process is often performed by someone who is not the end user of data. For example, someone in the customer insights group might be extracting and loading data for a project or dashboard for the sales team.
All too often, the person doing the ETL work ignores the needs of the end user, instead basing their decisions on what is easiest or best according to their own standards. Don’t fall into this trap! Spend lots of time up front understanding the needs of the business units and individuals who will be interacting with your final product — the transformed data you load to its target location.
Know who in the organization needs the data you are loading and what data warehouses and systems they have access to. Ask what they will be using the data to analyze or create, and how often they will need it to be updated. Ask for examples of the dashboards, analyses, or models they are looking to create. If they haven’t gotten that far, create mock-ups together that can guide your ETL process decisions.
Mistake #2: Too Much, Too Soon
Alright, you’ve done your due diligence in understanding your end users’ needs from the data. It’s tempting to ETL your way to a quickly prepared target table.
As the old saying goes though, haste makes waste. If your end users need you to combine data from several sources and formats, it’s smart to subdivide each step of the ETL process into small chunks, such as removing nulls and then doing a currency conversion in a separate step. This makes your work easier to check, audit, and adapt as requirements change (as they inevitably do.)
You’ll should also document what each transformation step is doing in plain English. For example, “Multiply EUR value by USD conversion rate based on date of purchase.”
Mistake #3: Failing to Consider the Future
As we mentioned above, the requirements for your ETL process and product are likely to shift, either in the short or long term. Like they say, the only constant is change. This change could come in the form of a request to extract and incorporate a new source of information or new hardware or software.
Businesses also commonly fail to anticipate increases in volume, the amount of data that will be passed through their ETL pipeline. Leave lots of bandwidth to scale up your infrastructure, including both processing power and the amount of storage in your target loading environment.
By following these practical guidelines, you’re well on your way to a first-rate ETL process designed with both your end user and the future in mind.