So you’ve decided to transition (at least in part) your data analytics to the cloud. More specifically, you’re adopting Google Cloud Platform (GCP), one of the “big three” cloud providers. Now what? Here at Trifacta, we’ve worked with hundreds of customers in this position and have learned a thing or two about how to get up and running with self-service analytics on GCP as successfully as possible. The following five-part blog series is by no means a definitive list, but from our perspective, these tips should be top of mind for optimizing self-service analytics on GCP.
We’re well on our way into this five-part series. In our last post, we broke down why ETL and self-service just aren’t a good fit. In sum, ETL was built for those with strong technical skills—not business users. And if business users can’t access and prepare data themselves, it won’t feel much like self-service, no matter how many other kinds of self-service analytics technologies are at their disposal. As a self-service alternative for data preparation, the Google Cloud Platform offers Cloud Dataprep by Trifacta. Cloud Dataprep allows any data professional to access, cleanse and prepare data for analysis.
As the first step in self-service analytics, data preparation is an important one. But there’s more to effective and efficient data preparation on GCP than just using Cloud Dataprep. In fact, it would be a mistake to assume that the answer to self-service in any capacity is just adopting a single technology.True self-service requires processes and a deep partnership from the IT organization. Even as business users become more empowered than ever, they will still continue to depend on IT organizations to monitor and maintain data operations.
In the last two posts of the series, we’ll talk more about the roles and responsibilities required to build effective processes around self-service data preparation. But for now, let’s dig into the careful balance that must exist between business users and the IT organization. As organizations increase self-service, they must also increase governance, or keeping data from proliferating out of control, complying with regulatory requirements, and maintaining trust in the data used to drive business decisions.
Lesson 3: Self-Service Is Not a Free-For-All
In order to strike the right balance between protecting data assets (from governance and security perspectives) and enabling users to collaborate and derive value from data, we have three recommendations:
When data storage and processing are centralized in the cloud with virtually unlimited scalability, and when end users are authorized to bring their own data in the cloud, you stop data silos from proliferating. Users collect data extracts, run their own preparation routines, and create their reports in and from the cloud instead of extracting data and duplicating it in spreadsheets.
Use a Data Catalog
Shared resources, like a central catalog or glossary, that manage data definitions, metadata, and knowledge about the data’s lineage, help users find data faster and enable organizations to govern data sources and monitor its lifecycle. Machine learning (ML) and artificial intelligence (AI) solutions automate the collection and management of metadata and related knowledge about the data.
Track and Document Data Lineage
Data lineage—that is, how data has been used and transformed and by whom—is important for regulatory reporting and audits. It’s also important to decision-makers, who need to understand the history of the data behind analytics, visualizations, and prescriptive recommendations, as well as the impact of new requirements on the data pipeline to produce the analysis.
The Next Step Toward Self-Service
As we discussed, self-service is more than just any one technology or a group of business users. There is a lot of backend work for IT organizations in order to enable self-service—but work that proves well worth it. The key is considering self-service analytics a true team effort which requires involvement from all sides.
To get the full list of our tips for successful self-service analytics on GCP right now, you can download our eBook, “Self-Service Analytics on the Google Cloud Platform: Five Data Preparation Lessons Learned to Ensure Success.” And stay tuned as we unpack more of these learnings on the Trifacta blog!