Use dados facilmente com Alteryx Analytics Cloud e Presto

Technology   |   David Hare   |   Mar 6, 2024 TIME TO READ: 4 MINS
TIME TO READ: 4 MINS

PrestoDB, commonly referred to as simply Presto, is an open-source distributed SQL query engine. Presto is used by companies like Meta and Netflix to efficiently query massive data volumes from disparate data sources. Alteryx offers support for Presto with both Designer and Alteryx Analytics Cloud. In this blog, we’ll focus on the integration with Alteryx Analytics Cloud specifically.

Integration with Alteryx Analytics Cloud

Presto is commonly used in conjunction with a Hadoop cluster and could be deployed in the cloud alongside a technology like Amazon EMR, or on-premises. On-prem deployments are still the most common, and this will be the scenario we focus on in this blog, although the integration with AACP is the same regardless of where Presto is deployed.

An important concept with Presto is that of the “catalog,” which is essentially a data source that Presto is configured to query. As Presto can query a large number of data sources through a single environment, in many ways, it can be thought of as a data federation or data virtualization layer. As such, many organizations have chosen to build application integrations into Presto rather than building integrations with each data source they want to query.

This becomes especially interesting with Alteryx Analytics Cloud being a SaaS cloud-hosted platform, as organizations may not want to open up connectivity to all their data sources. However, with Presto acting as a data virtualization layer, Alteryx Analytics Cloud could be granted access to Presto, with all the data sources kept private behind additional networking firewall rules. In such a scenario, only the Presto environment would need to be whitelisted to allow the Alteryx Analytics Cloud IP ranges, and the actual data sources would only be accessible through Presto.

This image is an infographic explaining the connectivity of Alteryx Analytics Cloud (AAC) with on-premises data sources using Presto. On the top left, there's a symbol representing AAC Users connected to a cloud icon labeled "Alteryx Analytics Cloud," which includes elements such as "DESIGNER CLOUD," "MACHINE LEARNING," "AUTO INSIGHTS," and "LOCATION INTELLIGENCE." Below, two key points are listed: 1) Alteryx Analytics Cloud connects to on-prem data through Presto. 2) Presto is allowed connectivity to private data sources. In the center, a diagram shows the workflow with two steps. The first step, indicated by a dotted arrow and the number '1', connects "On Prem Data Sources" to "presto." The second step, indicated by the number '2', shows solid arrows pointing from "presto" to three database icons: "HIVE," "MySQL," and "PostgreSQL." Each database icon is represented by their respective logos.

Figure 1 – Example deployment with Presto acting as a data virtualization layer between on-prem data sources and the Alteryx Analytics Cloud Platform.

Creating a Connection and Loading Data

Alteryx Analytics Cloud enforces a centralized data governance model by providing a single place for defining and sharing data connections. On the Connections page, Admins or those with the Create Connections permission can define a new Connection to Presto.

This image shows a graphical user interface for creating a database connection to Presto. The window is titled "Create Connection" and has a dark blue banner at the top. Below the banner, there is a section labeled "Presto" with a checkbox for "Import only" and a link to "Learn More". In the "CONNECTION INFORMATION" section, there are fields for "Connection Name" with "Presto Server" entered as an example, and "Connection Description (optional)" which is left blank. In the "SERVER INFORMATION" section, there are fields for "Host" with "my-presto-server.corp.com" entered, and "Port" with "8080" entered. There is a checkbox for "Enable SSL" which is checked, and a field for "Connect String Options" with "Timeout=0;" entered. At the bottom of the window, there are buttons for "Back," "Cancel," and "Create," indicating the steps to navigate through the connection setup or to finalize it.

The Create Connections panel allows you to configure the connection details to the Presto environment, including any specific connection requirements. In most deployments, Presto will be configured with LDAP Authentication to validate the user, which is also used to determine what underlying data sources the user has access to.

After creating the Connection, the user can navigate to the Data page to browse Presto for data to work with in Alteryx Analytics Cloud. This is where the configured “catalogs” come into play and are presented to the user based on their auth assignments.

This image displays a user interface for selecting a database to import data from within a system named "Demo_Presto." There's a search bar at the top with a placeholder text "Search...". Below this, there is a list with the title "NAME" on the left side, indicating different database types that can be chosen. The databases listed are "hive," "mysql," "postgresql," "system," "tpcds," and "tpch," each with a checkbox to the left of the name. Next to "hive," "mysql," and "postgresql," there are blue arrows pointing to a callout box with text that reads, "Based on the Auth model and credentials supplied, the available databases supported through Presto are presented as sources to import data from." This suggests that the user can select these databases for importing data based on their authentication and credentials provided.

With Presto, the catalog is presented first, then upon selection, you can click into a Schema/Database and ultimately view the list of tables. From this view, a user can preview data and begin to work with the Alteryx Analytics Cloud Platform to solve a business problem.

 The image shows a navigation panel on the left with a list of tables such as "amcbank_stresstest", "bnassets", "canada_transactions", "ccs_bns_assets", and "cdh_customers", indicating different datasets or categories of data. On the right, there's a preview of a selected table with columns "adverse_scenario", "baseline_scenario", "business_unit", and "capital_alloc", filled with numerical data. Most numbers are negative, suggesting they might represent financial losses or similar metrics in different scenarios. The business units are labeled as "Bank East" and "Bank Midwest", indicating the data may pertain to financial performance of different branches or divisions of a bank. The user interface is clean, with a modern design, and there are icons next to the table names that suggest different functionalities, like viewing or editing the tables.

Final Thoughts

This blog has provided a brief overview of how Presto could be used as a data virtualization layer with the Alteryx Analytics Cloud Platform, providing data and analytic users access to data sources efficiently to a broad range of data sources without creating connections to each data source. You can learn more about Presto and its integration with Alteryx Analytics Cloud using the resources below:

prestodb.io
Alteryx Help – Presto Connections