So you’ve decided to transition (at least in part) your data analytics to the cloud. More specifically, you’re adopting Google Cloud Platform (GCP), one of the “big three” cloud providers. Now what? Here at Trifacta, we’ve worked with hundreds of customers in this position and have learned a thing or two about how to get up and running with self-service analytics on GCP as successfully as possible. The following five-part blog series is by no means a definitive list, but from our perspective, these tips should be top of mind for optimizing self-service analytics on GCP.
In the first post of this blog series, we recapped the concept of self-service and its importance in an age of mounting data volumes and complexity. With all of the benefits of self-service, it’s no wonder that self-service analytics and BI users will produce more analysis than data scientists this year. But when moving to a platform that enables self-service analytics, such as Google Cloud Platform (GCP), we also reviewed how it’s important not to take a lift-and-shift approach—analytics solutions must be cloud-native. That being said, not all cloud analytic solutions are created equal. In this post, we’re going to talk specifically about ETL (Extract, Transform, Load) tools and why they shouldn’t be your first choice for a cloud solution.
Lesson 2: ETL and Self-Service Aren’t a Good Fit
First, a little background on ETL. Roughly 25 years ago, the ETL market was created to automate much of the tedious coding required to integrate, standardize, and cleanse data before it was entered into a data warehouse. The idea was to give developers a layer of abstraction that would mask complexity and improve productivity. ETL promised to minimize the data janitorial work and improve analytics delivery for the business.
Now, more than two decades later, the results are decidedly mixed. Although the productivity gains versus writing code by hand are undeniable, organizations increasingly look at ETL as the bottleneck in their analytics efforts—much the same way they looked at code 25 years ago. Like coding, a small number of technical resources must manage ETL, who are then expected to understand and respond to the needs of the entire organization as quickly as possible. Hardly the self-service vision that organizations currently strive toward.
That being said, ETL tools still serve an important purpose—in transitioning to GCP, for example, they are extremely effective in building pipelines that move data stored on premise and in other clouds to GCP. Continuing to use ETL after a data lake or data warehouse has been filled is what we advise against. A technical and complex technology like ETL doesn’t enable business users to access data and uniquely prepare it for analytics. If self-service analytics is your aim, shoehorning an outdated solution for data preparation on the back of a new cloud platform isn’t going to do much good. Instead, business users need a new solution that will allow them to more intuitively access and prepare the data they need.
As part of its smart analytics suite, the Google Cloud Platform offers a data preparation solution: Cloud Dataprep by Trifacta. It assesses data quality, refines, standardizes, and cleanses data, and combines data and handles various data calculations.
Cloud Dataprep was designed to help a whole host of data professionals—data engineers, data analysts, business analysts, and other data-driven professionals—unlock a data lake or data warehouse by interacting with the content of the data to iteratively refine it and bring it together to feed downstream business-driven analytics. It was built with self-service in mind. Powered by machine learning, the solution will actually predict what transformation the user should make next and visually surface errors and outliers. Cloud Dataprep by Trifacta builds an essential foundation of clean data for any type of analytics that follows.
The Next Step Toward Self-Service
Given that data preparation is often the biggest bottleneck in the analytics process—and consequently the biggest barrier to self-service—understanding how to use Cloud Dataprep on the Google Cloud Platform is a huge step toward self-service success. In the following posts, we’ll focus more on self-service data preparation via Cloud Dataprep and how it is essential for successful self-service analytics.
In the meantime, you can get the full list of our tips for successful self-service analytics on GCP right now by downloading our eBook, “Self-Service Analytics on the Google Cloud Platform: Five Data Preparation Lessons Learned to Ensure Success.” And stay tuned as we unpack more of these learnings on the Trifacta blog.