Data Quality Template:

Transform Data in Tables to Remove Duplicates

Remove rows where duplicate values exist in specific columns

Removing Duplicates Flow The flow view of this template

aggregate functions (count), rownumber

Trifacta has a deduplicate transformation that allows you to remove rows where the values are identical across all columns. However, what if you want to remove rows where the data is duplicated in only certain columns? This simple template shows you how to find and remove rows when there are duplicate values in some of the columns, but not all columns. To customize the template for your own use, simply update the aggregate group by parameter to include all the columns that you want to check for duplicates

New user?

If your data is mostly on Google Cloud Platform, please use Dataprep. Otherwise, choose Designer Cloud.

Use in Designer Cloud Use in Dataprep