What if you had a genie for your data? Well, we believe Transform by Example gets us closer to providing just that (3 wish limit need not apply). As part of our ongoing effort to make data cleaning both powerful and intuitive, we’re introducing Transform by Example this week into our free Designer Cloud trial. Transform by Example is a new paradigm in data interaction: rather than directly creating data transformation steps, Transform by Example allows you to provide examples of how you’d like the end state of your data to look, and Designer Cloud will figure out the steps needed to get there.
One of the most common tasks analysts need to perform is pattern reformatting – converting multiple formats of data into a single format, by manipulating delimiters, tokens, and word lengths, while preserving semantic content. For example, suppose you have a column of phone numbers that you’d like to reformat into the common +1 ### ### #### US format.
Visualizing phone number data
Writing out data transformations to solve this task can be a time consuming and error prone process, especially because the data may have many different formats of phone numbers, as demonstrated above by the Patterns interface. Moreover, Designer Cloud’s intelligent suggestions may not always apply to the data types you’re trying to manipulate, especially when you’d like to create a new format not already present in your data (in this example, adding the country code +1).
On the other hand, you know exactly what you’d like your data to look like. For example, given the first record of the input column, “236.926.9604”, you know that you want it to look like “+1 236 926 9604”. Wouldn’t it be nice if you could simply provide this knowledge of the end result to Designer Cloud, and have it figure out the rest?
This is exactly the objective of Transform by Example. Rather than authoring transforms, you instead type out one or more examples of what you’d like your output records to look like, and Trifacta will create the transform to get you there.
Typing out an example
After entering the example on the first row, Designer Cloud infers exactly the kind of transformation you’re trying to do. It applies this transformation to your input column, and provides you with a preview of what your data will look like once committed. If you’re not satisfied with what Trifacta predicts, you can simply add more examples for different input records until you’re happy with the results. Finally, you can add the transformation as a step to your recipe, which can eventually be executed at scale on your full dataset.
Formatting heterogenous dates by example
Let’s take a look at another example. This start_date column above is not in the format we need for our downstream analysis. Additionally, there’s a data quality issue of having multiple different formats present here. We can tackle both of these issues easily using Transform by Example.
Under the hood, Designer Cloud’s algorithm uses state of the art research in string processing, machine learning and graph theory to predict the transform you’re trying to apply. We will continue to refine and extend this algorithm to handle new types of data and all types of transformations, and like the rest of Designer Cloud’s product, Transform by Example uses Machine Learning to actually become smarter over time as users interact with it.
We’re very excited about this feature, as it reflects our core philosophy of empowering the user to provide their expertise and knowledge while minimizing repetitive effort. Transform by Example is available now in Designer Cloud.