MLOps: Continuous Delivery for Machine Learning with Alteryx on AWS
Time-consuming. Disruptive. Error-prone. You can use all of these words to describe the AI project delivery process.
The process is so challenging that only 38% of AI projects ever get produced.
And, even when they do, they’re often manual. Not only do they take more time to get up and running, they're not scalable nor easy to update
Most of the problems you'll encounter with AI projects are caused by manual processes, a lack of cohesion between your data and people, and the technology you use
With automation and continuous delivery for machine learning (CD4ML), you can bypass the time-consuming steps and bring reliable machine models to production while saving time and reaping all the benefits
Continuous delivery for machine learning (CD4ML)
CD4ML brings machine learning applications to production across multiple departments by developing them using automation, data, code, and models at a smaller, but easily reproducible scale. The end goal is to create models that can grow and adapt based on small changes that come about from new data and training.
Because of this, it's not only easier to produce machine learning models, it's also safer. The CD4ML process reduces the likelihood of errors that come with the large, one-time releases produced by a standard AI-project process.
Of course, all of that sounds easy in theory. But the good news is deploying a CD4ML model is also easy in practice.
With that said, here's how you can deliver machine learning models with an MLOPs solution through Alteryx and AWS.
Deploying A Comprehensive CD4ML Solution with Alteryx and AWS
To deploy a comprehensive CD4ML solution, you'll need to do a few things first, such as:
- Automating time-consuming data access and analysis processes
- Knocking down unneeded data silos
- Creating consistent processes for your organization
- Implement scalable solutions that anyone can learn
- Alteryx Connect – a collaborative data cataloging tool
- Alteryx Designer – desktop and cloud software that enables the assembly of code-free analytic workflows and apps
- Alteryx Server – an analytical hub that allows users to scale their analytic capabilities in the cloud or on premises on enterprise hardware
- Alteryx Promote – a deployable, containerized solution that enables the easy deployment of machine learning models as highly available REST APIs
Here’s how each helps with CD4ML.
Data governance and curation using Alteryx Connect
Alteryx Connect can be used to catalog data from disparate sources including the datasets Alteryx offers as add-ons.
How to catalog data sources with Alteryx Connect
Connect also makes it easy for you and your team to discover and understand relevant data assets.
Once a data source is represented in Connect, your organization can collaborate using social validation tools like voting, commenting, and sharing to highlight the usefulness and freshness of your available data.
Once Connect is installed, which you can do in a Windows Server environment running in Amazon EC2, you can use one or more of the 25+ existing database metadata loaders to add data sources. This includes loaders for Amazon Redshift and Amazon S3, and loaders for Postgres and MySQL that can load metadata from Amazon Aurora.
If a data source is missing a metadata loader, Alteryx offers intuitive SDKs that make writing new loaders easy for developers in multiple languages and via REST APIs. Connect offers a cross-platform experience, so anyone using desktop Designer and Server can explore and utilize data assets based upon shared metadata.
Data asset lineage in Alteryx Connect
You can also augment user data with datasets from industry data providers. Alteryx Datasets can provide valuable location and business insights when combined with proprietary data. In the modeling realm, this data is most typically paired with proprietary data to offer demographic and geographic features into models.
Machine learning experimentation with Alteryx Designer
You can use Alteryx Designer to import data for use in any of the several predictive modeling and machine learning experimentation tool suites. Each tool suite caters to all the different levels of machine learning experience within your organization—and even helps them learn. Test it out for yourself with our Alteryx Intelligence Suite Free Trial.
Alteryx Designer offers several options for modeling and experimentation based on a user’s level of experience
Once your team implements a data architecture and identifies the appropriate data asset, you can begin analytics. Designer is both code-free and code-friendly development environment, so analysts of all skill levels can create automated analytics workflows — including those that require machine learning.
You can use Designer on a local Windows machine and in the cloud.
Alteryx is agnostic to where and how data is stored and provides connectors to over 80 different data sources. This includes an AWS Starter Kit that contains connectors for Amazon Athena, Amazon Aurora, Amazon S3, and Amazon Redshift.
Because Alteryx provides a common ground to data processing from multiple sources, for high-performance workloads, it is often a best practice to co-localize the data by preprocessing workflows. For example, to reduce future processing latency, you could move on-premises data to an AWS source. This can all be done with drag-and-drop, code-free data connector building blocks, avoiding the need to know any CLI/SQL intricacies of the underlying infrastructure, although the latter is possible as well.
Designer includes over 260 automation building blocks that enable the code-free processing of data. This includes building blocks for data prep, cleansing, blending, mapping, visualization, and modeling. Data cleansing, blending, and prep building blocks are often used before machine learning experimentation to prepare training, test, and validation datasets.
Build complex analytic workflows in Alteryx Designer
Much of the data preprocessing that occurs before modeling can also be accomplished using Alteryx’s In-Database functionality. This functionality pushes down data processing tasks to the database and delays the data import until after that processing has been completed and a local machine in-memory action has been executed.
Alteryx Designer provides users with a couple of choices for machine learning.
Alteryx Predictive Suite
The Alteryx Predictive Suite offers code-free functionality for many descriptive, predictive, and prescriptive analytics tasks. You can also customize the underlying R code that powers these building blocks to address their specific use cases.
Alteryx Intelligence Suite
The Alteryx Intelligence Suite offers code-free functionality for building machine learning pipelines and additional functionality for text analytics.
The Intelligence Suite also offers Assisted Modeling, an automated modeling product designed to help business analysts learn machine learning while building validated models that solve their specific business problems.
Assisted Modeling is built on open-source libraries and provides you with the option to export your drag-and-drop or wizard-created models as Python scripts.
With these two options, you can use code-friendly building blocks that support R and Python to write machine learning code that is embedded in an otherwise code-free workflow. Users can use these building blocks to work with their preferred frameworks and libraries and the built-in Jupyter notebook integration enables interactive data experimentation.
Compare trained models in the Assisted Modeling leaderboard
Productionized ML pipelines with Alteryx Server
You can leverage Alteryx Server to operationalize workflows, including those that are used for data governance. Server offers a componentized installation experience that works natively in AWS.
Alteryx Server can be installed easily in AWS to productionize machine learning and data governance workflows
Alteryx Server supports scaling to support larger training data, hyperparameter tuning, and productionization. You can use it to manage and deploy analytic assets.
You can also use it to easily append CPU-optimized machines to a Server cluster that can be specified for use by machine learning training pipelines. By executing long-running training jobs in Server, you get the flexibility to continue designing analytic workflows in Designer while the training job executes.
Server also enables the scheduling and sequencing of analytic workflows. Each of these features can be used as part of CI/CD pipelines that ensure the quality of models that have been deployed to production. Using REST APIs, you can programmatically trigger workflows and monitor the status to integrate into established DevOps and CI/CD setups.
Alteryx Server can be installed in an on-premises data center or the AWS Cloud and supports single and multi-node configurations. It’s offered as an Amazon Machine Image (AMI) in the AWS Marketplace for easy one-click deployments. Customized instances can also be deployed in a private subnet using Amazon Virtual Private Cloud. Server offers many options for customization, one of which is the option to store Server metadata in a user-managed MongoDB instance, for which AWS offers a Quick Start.
For detailed guidance, see Best Practices for Deploying Alteryx Server on AWS.
Alteryx Server offers built-in governance and version control of analytic assets, which can be used in place of, or in addition to, other source control solutions.
Model serving and deployment with Alteryx Promote
Alteryx Promote ties the platform together, offering a solution for model management, real-time model serving, and model monitoring.
Alteryx Promote offers an MLOps solution providing model management and highly-available, low-latency model serving
The Alteryx APA Platform offers several options for model deployment. Promote is used primarily for real-time deployments, common for models that interact with web applications. Promote enables the rapid deployment of pre-trained machine learning models through easy-to-use Python and R client libraries or in a code-free manner using Alteryx Designer.
Models that have been deployed to a Promote cluster server environment are packaged as Docker containers, replicated across nodes, and made accessible as highly available REST APIs that host in-memory inference methods. The number of replications of each model is configurable, as is the number of nodes available in the Promote cluster. An internal load balancer spreads requests across the available replications.
Monitor the performance of your models in production with Promote
Like Server and Connect, Promote can be installed in an AWS Cloud environment or an on-premises data center. The recommended setup also includes an external load balancer, such as Elastic Load Balancing, to distribute prediction requests across every Promote node. Promote is ideal for inference cases in which throughput s already known or is acceptable to be changed on demand. While automatic scaling is technically possible, it’s beyond the intended use of the product.
Alteryx Server is the recommended solution for models that require batch inference on known existing hardware. Batch models can be packaged for prediction in workflow or analytic apps and can be scheduled to run in Server on compute-optimized nodes.
You can also leverage the workflow management functionality of Server to ensure that predictions are made only after up-to-date features have been generated through data preprocessing.
Additionally, users often find they need a hybrid of Alteryx and AWS solutions to deploy complex models at scale. One usage pattern we have observed is using our Assisted Modeling tool on the desktop to prototype a model on sample data. Using Designer and Server, clients prep/blend data from local sources and push the resulting data to S3.
Then, the model code from Assisted Modeling can be pushed to SageMaker, where the model can be trained on the entire dataset resident in Amazon S3, and deployed as an API in the SageMaker ecosystem to take advantage of containerization, scaling, and serverless capabilities. As Alteryx focuses on friendly model building, this is often the best path for organizations that are light in data science, but who have heavy DevOps or engineering resources.
Model testing and quality
Alteryx enables model testing throughout the modeling and deployment process. During the experimentation phase, Predictive building blocks and Assisted Modeling report performance metrics and visualizations, making it possible to compare the generalizability of each model.
Assisted Modeling also offers Explainable AI (XAI) reporting in the form of feature importance scores, calculated using the permutation importance approach.
During model deployment, it’s easy to add test data to a Promote deployment script. The testing step can be used to conditionally allow or disallow the deployment of that model version.
New Promote model versions are initially hosted in logical development and staging environments, allowing users to run a new model in parallel with the previously running production model. Testers can set up their systems to make predictions on both the production and staging model versions before deciding to replace the production model, which is accomplishable using an API.
Promote also records all request and response data, making it possible for users to develop custom workflows that leverage that data to test for bias, fairness, and concept drift.
In addition to recording all incoming requests and their responses, Promote tracks aggregated metrics in Amazon Elasticsearch Service so administrators can observe the performance of the models they have deployed. Metrics for requests, errors, and latency over the previous month inform whether the model needs to be replicated further. Additional system utilization reporting helps administrators determine if additional nodes must be added to the Promote cluster.
Finally, you can export the historical request data for concept or data drift analysis. These analyses can be performed in Alteryx Designer, scheduled to run in Server, and can kick off the CD pipeline if drift is detected.
To deploy a comprehensive CD4ML solution, you need to automate your data access and processes, remove data siloes, and implement scalable solutions.
With Alteryx and AWS, you can.
The Alteryx APA Platform is an end-to-end platform. It provides the data connectors, building blocks, and functionality to create and deploy modeling solutions with very little coding required if any.
It includes an open ecosystem in terms of APIs, third-party data connectors, and open-source solutions, which provides developers the ability to mix and match the Alteryx solution with AWS native components.
With this, you have the freedom to deploy machine learning as it best fits your business requirements.
Start deploying machine learning models with Intelligence Suite Starter Kit.
Learn how to scale with Best Practices for Deploying Alteryx Server on AWS and deploy Alteryx Server from the AWS Marketplace.