Ne manquez pas Inspire 2024, qui aura lieu du 13 au 16 mai 2024 au Venetian de Las Vegas. Inscrivez-vous !

 
Data Quality Template:

Validate File Data with Schema Drifts

Schema Drift Detection Flow The flow view of this template

Transformations:
splitrows, header, $sourcerownumber, join

This template shows how you can validate your file data against expected schema, or when data has shifted in schema from what was expected. It makes use of Trifacta’s ability to import data as is without applying inferred row splitting technique, and comparing it to an expected schema’s headers through a join. The results are then split into two outputs, if the file input matches against the expected schema, then the Output – Valid Header output will contain the input data, otherwise you will find the data of the invalid input in the Output – Invalid Header output.

To customize this template for your use, you will need to create 3 distinct datasets to replace the existing datasets in this flow template.

1) A file that contains the expected schema by having the header metadata in the 1st row of the file. This file can contain some sample data as well. This file needs to be imported into Trifacta as an unstructured file (see below).

2) An input file to validate against the expected schema.This file should also have its header metadata in the first row of the file. This file needs to be imported into Trifacta twice, once as unstructured and once as a structured file.

3) Replace InvalidHeader-Source-Unstructured.csv with the unstructured dataset from step 2), and replace InvalidHeader-Source-Structured.csv with the structured dataset from step 2). Replace Expected-Target-Unstructured.csv with dataset from step 1).

A note on importing file as unstructured:

When you import a file into Trifacta, by default it will automatically try to infer how to split the data into records by automatically applying a splitrows transform. Normally you do not see this step nor are you able to modify it. But you can disable this by unchecking the “Detect structure” option in the import dataset settings page.

New user?

If your data is mostly on Google Cloud Platform, please use Dataprep. Otherwise, choose Designer Cloud.

Use in Designer Cloud Use in Dataprep