How can you trust the results if you don’t know the sources of the data that went into them? Trust in data quality depends on knowing where the data came from and how it was processed along the way to insight. In removing the labor and repetition from analytics, automation makes it more important to trust the data source and the analytic process.
As companies progress on their analytics journey, business users need to know that the data they’re working with is accurate and consistent with data other users are working with. When users lack that trust, analytic processes suffer and silos arise. Users begin to store their own data in departmental databases and do their own work separately from the rest of the company.
Data lineage maps out where data came from and how it moved, offering a clear view of the entire analytics automation process. With data lineage, users have the opportunity to see and know the provenance of the data they depend on and how insights were derived.
Users rely on lineage for an understanding of the structure and fields in data and for assurance that those match everyone else’s definitions. Organizations with established data lineage can depend on consistency and move on to the creation of glossaries, data dictionaries, and definitions of metrics. When users are convinced that metadata (or data about the data) is uniform across the organization, they don’t need to create their own silos or work separately.
Data lineage enables full understanding of process, stepwise debugging when there are errors, and communication of processes to end users. It paves the way for analytic transformation and analytic process automation.