Greetings! We’re back with a new release from Alteryx (formerly Trifacta). Here are the highlights from our latest 8.8 release.
Pushdown Optimization Now Supports Additional Transformations
With the 8.8 release, we now support more than 80 data types, functions, and transformations for Pushdown Optimization. This is part of our continued efforts to enable the modern data stack and integrate with cloud data warehouses and extends the number of supported functions from the first Pushdown Optimization launch with our earlier 8.3 release. One of the key capabilities from this launch was BigQuery Pushdown with Dataprep by Trifacta on Google Cloud. This capability addresses use cases where the source and the destination are both within Google BigQuery. Dataprep helps convert the data transformation steps from recipes to SQL statements. Since the execution happens directly within BigQuery, there is no movement of data outside the data warehouse. This innovation has become one of the most adopted features from our customers with job acceleration gains of more than 20x.
We now support additional popular data types such as arrays and objects, aggregation functions such as List and Unique, additional date functions, string functions, nested functions, and transformations. With this expanded support, you can use BigQuery Pushdown with Dataprep extensively for all your data transformation requirements.
Full Execution from Files to BigQuery
Our innovation with BigQuery Pushdown continues to address additional use cases. Starting with the 8.8 release, we address scenarios where the data source is outside the cloud data warehouse such as a file in Google Cloud Storage (GCS). This scenario is akin to an ETL pipeline, where the source is a file and the destination is the cloud data warehouse such as BigQuery.
Here, BigQuery references the source file as an external data structure acting as a table. To achieve pushdown from files to BigQuery, you need to manually write a SQL statement to define the data structure and then create and load the table using a SELECT statement to apply the transformations, which can be complex. An example SQL statement is below.
With Dataprep, there is no need to manually write these statements. These complex SQL statements are generated for you from the recipe within Dataprep using a no-code approach, reducing time and effort. With this new capability, we support a number of file formats including CSV/TSV, JSON, plain text, logs, and compressed files. If there is a need to combine BigQuery tables and files in the same data pipeline (using join, union, lookup functions), Dataprep enables you to do this with a few clicks, enabling you to achieve maximum productivity. From initial benchmarks and early customer feedback, we have observed job acceleration gains of up to 23x, especially for smaller files that are commonly stored in GCS.
Learn all about full job execution on BigQuery here.
Better Visibility into Pre and Post-Run SQL Scripts
As part of our 8.5 release earlier this year, we had launched the ability to run SQL scripts before data ingestion and after it is published to a database table. These scripts can be run on the Run Settings page inside Trifacta, enabling easier data management when you are working with cloud data warehouses. With this latest 8.8 release, we provide you with better visibility into these SQL scripts. You can now access these scripts from the popular flow interface within Trifacta, making it easier to view these scripts and take the required action.
Additionally, we now have two separate tabs for manual and scheduled settings displaying the respective SQL scripts and publishing actions. While the manual settings tab can be used for testing or intermediate previews, the scheduled setting tab can be used for orchestrating and operationalizing flows on a regular basis. Clicking any of these tabs will take you to the Run Settings page where you can view, edit, and author new scripts as required.
You can learn more about defining pre and post-run SQL scripts from this article.
Runtime Processing on Customers’ VPC with Trifacta’s In-Memory Engine
As part of our efforts to constantly enhance security and compliance requirements for our customers, we now support runtime processing using Dataprep’s in-memory engine within the customer’s VPC for Dataprep on Google Cloud. This ensures that data does not move out of the customer’s private network during runtime. This enables you to run bigger workloads by deploying on nodes with higher computing power and memory within their own VPC.
Currently in private preview, this capability requires a Google Kubernetes Engine (GKE) cluster within the customer’s account. Please contact your Trifacta sales representative, if you would like to sign up for the private preview.
User Experience enhancements
At Alteryx, we pride ourselves on a superior user interface to provide our customers with the best experience. As we add new capabilities to the product, we constantly look for ways to improve our user interface based on customer feedback and internal test efforts. As part of the 8.8 release, we have a number of enhancements to our user interface for better data engineering.
Ability to import multiple flows and plans
You can now import multiple flows simultaneously making it easier to scale, especially for restoring backups or previous versions that you may want to revert to. This ability allows you to select multiple flow archive files and import them or drag and drop them into Trifacta. You can learn all about importing and exporting flows here.
Similarly, you can upload or import multiple plan files into Dataprep making it easier to select and upload many plans at the same time. Learn all about sharing, importing, and exporting plans from this article.
Add datasets directly from the flow canvas
It is now easier to add datasets within the flow view. With a simple right-click, you can add your datasets from the flow canvas in addition to the flow header menu. This small step is a massive improvement to the user experience making it seamless and easy to add datasets directly from the canvas.
Easier identification of default settings in the workspace
The default settings are now more obvious in the workspace, especially for Boolean and enum settings within the workspace. This enables you to either use the default settings or change them as required.
Well, that was quite an exciting release for us with BigQuery Pushdown enhancements, in-memory engine processing within the customer’s VPC, better visibility into SQL scripts, and multiple enhancements to improve the user experience.