Alteryx Analytics

Predictive Analytics Tools

Alteryx Analytics makes predictive analytics tools accessible to users of all types and skillset. It delivers over 30 prepackaged tools of the most widely used procedures for predictive analytics, grouping, and forecasting to help analyst throughout the predictive analytics process. These tools are built upon the R framework through a drag and drop tool, eliminating the need for programming and scripting.

This is a sample of the tools available in the Alteryx Designer. For the full list of tools, click here.

Icon Tool Description
AB Test Analysis AB Test Analysis Compare the percentage change in a performance measure to the same measure one year prior.
AB Controls AB Controls The Control Select tool matches one to ten control units (e.g., stores, customers, etc.) to each member of a set of previously selected test units, on the criteria such as seasonal patterns and growth trends for a key performance indicator, along with other user provided criteria.
AB Treatments AB Treatments Determine which group is the best fit for AB testing.
AB Treatments AB Trends Create measures of trend and seasonal patterns that can be used in helping to match treatment to control units (e.g., stores or customers) for A/B testing. The trend measure is based on period to period percentage changes in the rolling average (taken over a one year period) in a performance measure of interest. The same measure is used to assess seasonal effects. In particular, the percentage of the total level of the measure in each reporting period is used to assess seasonal patterns.
Market Basket Rules Market Basket Rules Step 1 of a Market Basket Analysis: Take transaction oriented data and create either a set of association rules or frequent item sets. A summary report of both the transaction data and the rules/item sets is produced, along with a model object that can be further investigated in an MB Inspect tool.
Market Basket Inspect Market Basket Inspect Step 2 of a Market Basket Analysis: Take the output of the MB Rules tool, and provide a listing and analysis of those rules that can be filtered on several criteria in order to reduce the number or returned rules or item sets to a manageable number.
Market Basket Affinity Market Basket Affinity Used to construct a matrix of affinity measures between different items with respect to their likelihood of being part of the same action or transaction.
Boosted Model Boosted Model

Provides generalized boosted regression models based on the gradient boosting methods of Friedman.* It works by serially adding simple decision tree models to a model ensemble so as to minimize an appropriate loss function.

Count Regression Count Regression Estimate regression models for count data (e.g., the number of store visits a customer makes in a year), using Poisson regression, quasi-Poisson regression, or negative binomial regression. The R functions used to accomplish this are glm() (from the R stats package) and glm.nb() (from the MASS package).
Forest Model Decision Tree Predict a target variable using one or more predictor variables that are expected to have an influence on the target variable by constructing a set of if-then split rules that optimize a criteria. If the target variable identifies membership in one of a set of categories, a classification tree is constructed (based on Gini coefficient) to maximize the 'purity' at each split. If the target variable is a continuous variable, a regression tree is constructed using the split criteria of 'minimize the sum of the squared errors' at each split.
Forest Model Forest Model Predict a target variable using one or more predictor variables that are expected to have an influence on the target variable, by constructing and combining a set of decision tree models (an "ensemble" of decision tree models).
Gamma Regression Gamma Regression Based on the R and Revo generalized linear model, called the Gamma Regression, (which is based on an underlying Gamma distribution) that handles strictly positive target variables that have a long right tail (so most values are relatively small, and there is a long right-hand tail to the distribution).
Lift Chart Lift Chart Compare the improvement (or lift) that various models provide to each other as well as a ‘random guess' to help determine which model is ‘best.' Produce a cumulative captured response chart (also called a gains chart) or an incremental response rate chart.
Nested Test Nested Test Examine whether two models, one of which contains a subset of the variables contained in the other, are statistically equivalent in terms of their predictive capability.
Linear Regression Linear Regression Relate a variable of interest (target variable) to one or more variables (predictor variables) that are expected to have an influence on the target variable. (Also known as a linear model or a least-squares regression.)
Linear Regression Logistic Regression Relate a binary (yes/no) variable of interest (target variable) to one or more variables (predictor variables) that are expected to have an influence on the target variable.
Naives Bayes Naives Bayes Creates a binomial or multinomial probabilistic classification model of the relationship between a set of predictor variables and a categorical target variable. The Naive Bayes classifier assumes that all predictor variables are independent of one another and predicts, based on a sample input, a probability distribution over a set of classes, thus calculating the probability of belonging to each class of the target variable.
Neural Networks Neural Networks This tool allows a user to create a feedforward perceptron neural network model with a single hidden layer. The neurons in the hidden layer use a logistic (also known as a sigmoid) activation function, and the output activation function depends on the nature of the target field. Specifically, for binary classification problems (e.g., the probability a customer buys or does not buy), the output activation function used is logistic, for multinomial classification problems (e.g., the probability a customer chooses option A, B, or C) the output activation function used is softmax, for regression problems (where the target is a continuous, numeric field) a linear activation function is used for the output.
Support Vector Machine Support Vector Machine Support Vector Machines (SVM), or Support Vector Networks (SVN), are popular supervised learning algorithms used for classification problems, and are meant to accommodate instances where the data (i.e., observations) are considered linearly non-separable. In other words, the target values cannot be separated into their underlying classes using a simple, single linear boundary.
Spline Model Spline Model This tool implements Friedman's multivariate adaptive regression spline (MARS) model. This is in the more modern class of models (like the Forest and Boosted Models) that handles both variable selection and non-linear relationships directly with the algorithm. In some ways it is similar to a decision tree, but instead of making discrete jumps at "splits", the splits (called "knots" in this method) place in a "hinge", where the slope of the effect of a predictor on a target changes, resulting in the effect of numeric predictors being modeled as piecewise linear components.
Stepwise Stepwise Determine the "best" predictor variables to include in a model out of a larger set of potential predictor variables for linear, logistic, and other traditional regression models. The Alteryx R-based stepwise regression tool makes use of both backward variable selection and mixed backward and forward variable selection.
Score Score Calculate a predicted value for the target variable in the model. This is done by appending a 'Score' field to each record in the output of the data stream, based on the inputs: an R model object (produced by the Logistic Regression, Decision Tree, Forest Model, or Linear Regression) and a data stream consistent with the model object (in terms of field names and the field types).
Test of Means Test of Means Compare the difference in mean values (using a Welch two sample t-test) for a numeric response field between a control group and one or more treatment groups.
Network Analysis Network Analysis Creates an interactive visualization of a network along with summary statistics and distribution of node centrality measures.
Boosted Model Boosted Model Provides generalized boosted regression models based on the gradient boosting methods of Friedman. It works by serially adding simple decision tree models to a model ensemble so as to minimize an appropriate loss function. Accessible via the regular predictive tool palette and will automatically convert to the In-DB version of the tool if an In-DB connection exists.

*only available in Microsoft SQL Server 2016 and Teradata
Decision Tree Decision Tree Predict a target variable using one or more predictor variables that are expected to have an influence on the target variable by constructing a set of if-then split rules that optimize a criteria. If the target variable identifies membership in one of a set of categories, a classification tree is constructed (based on Gini coefficient) to maximize the 'purity' at each split. If the target variable is a continuous variable, a regression tree is constructed using the split criteria of 'minimize the sum of the squared errors' at each split. Accessible via the regular predictive tool palette and will automatically convert to the In-DB version of the tool if an In-DB connection exists.

*only available in Microsoft SQL Server 2016 and Teradata
Forest Model Forest Model Predict a target variable using one or more predictor variables that are expected to have an influence on the target variable, by constructing and combining a set of decision tree models (an "ensemble" of decision tree models). Accessible via the regular predictive tool palette and will automatically convert to the In-DB version of the tool if an In-DB connection exists.

*only available in Microsoft SQL Server 2016 and Teradata
Linear Regression In-DB Linear Regression In-DB Uses the database's native language (e.g., R) to create an expression to relate a variable of interest (target variable) to one or more variables (predictor variables) that are expected to have an influence on the target variable. (Also known as a linear model or a least-squares regression.) Accessible via the regular predictive tool palette and will automatically convert to the In-DB version of the tool if an In-DB connection exists.

*only available in Oracle R, Microsoft SQL Server 2016 and Teradata
Logistic Regression In-DB Logistic Regression In-DB Uses the database's native language (e.g., R) to create an expression to relate a binary (yes/no) variable of interest (target variable) to one or more variables (predictor variables) that are expected to have an influence on the target variable expression. Accessible via the regular predictive tool palette and will automatically convert to the In-DB version of the tool if an In-DB connection exists.

*only available in Oracle R, Microsoft SQL Server 2016 and Teradata
Linear Regression In-DB Scoring In-DB Uses the database's native language (e.g., R) to create an expression to calculate a predicted value for the target variable in the model. This is done by appending a 'Score' field to each record in the output of the data stream, based on the inputs: an R model object (produced by the Logistic Regression, Decision Tree, Forest Model, or Linear Regression) and a data stream consistent with the model object (in terms of field names and the field types). Accessible via the regular predictive tool palette and will automatically convert to the In-DB version of the tool if an In-DB connection exists.

*only available in Oracle R, Microsoft SQL Server 2016 and Teradata
Optimization Optimization Solve linear programming (LP), mixed integer linear programming (MILP), and quadratic programming optimization problems using matrix, manual, and file inputs.
Simulation Sampling Simulation Sampling Simulation Sampling allows for sampling. The samples can be done parametrically from a distribution, from input data, or as a combination — best fitting from a distribution and sampling from that data. This data can also be "drawn" if one is unsure of the parameters of a distribution but is also lacking data.
Simulation Summary Simulation Summary Simulation summary contains two main components. Firstly, it allows for a visualization of simulated distributions, and results from operations on those distributions. Secondly, it allows for visual and quantitative analysis of input vs. output variables.
Simulation Scoring Simulation Scoring Sample from an approximation of a model object error distribution.
TS ARIMA TS ARIMA Estimate a univariate time series forecasting model using an autoregressive integrated moving average (or ARIMA) method.
TS Compare TS Compare Compare one or more univariate time series models created with either the ETS or ARIMA tools.
TS Compare TS ETS Estimate a univariate time series forecasting model using an exponential smoothing method.
TS Filler TS Filer

This tool allows a user to take a data stream of time series data and “fill in” any gaps in the series

TS Covariant Forecast TS Covariant Forecast The TS Covariate Forecast tool provides forecasts from an ARIMA model estimated using covariates for a user-specified number of future periods. In addition, upper and lower confidence interval bounds are provided for two different (user-specified) percentage confidence levels. For each confidence level, the expected probability that the true value will fall within the provided bounds corresponds to the confidence level percentage. In addition to the model, the covariate values for the forecast horizon must also be provided.
TS Forecast TS Forecast Provide forecasts from either an ARIMA or ETS model for a specific number of future periods.
TS Plot TS Plot Create a number of different univariate time series plots, to aid in the understanding the time series data and determine how to develop a forecasting model.
TS Forecast Factory TS Forecast Factory

Generate point forecasts and confidence intervals from groups of either ARIMA or ETS models for a user-specified number of periods.

TS Model Factory TS Model Factory

Estimate time series forecasting models for multiple groups at once using either the ARIMA method (with or without covariates) or the ETS method.

Append Cluster Append Cluster Appends the cluster assignments from a K-Centroids Cluster Analysis tool to a data stream containing the set of fields (with the same names, but not necessarily the same values) used to create the original cluster solution.
K-Centroids Analysis K-Centroids Analysis Partition records into "K groups" around centroids by assigning cluster memberships, using K-Means, K-Medians, or Neural Gas clustering.
K-Centroids Diagnostics K-Centroids Diagnostics Assess the appropriate number of clusters to specify, given the data and the selected Predictive Grouping algorithm (K-Means, K-Medians, or Neural Gas).
K-Nearest Neighbor K-Nearest Neighbor Find the selected number of nearest neighbors in the "data" stream that corresponds to each record in the "query" stream based on their Euclidean distance.
Principal Components Principal Components Reduce the dimensions (number of numeric fields) in a database by transforming the original set of fields into a smaller set that accounts for most of the variance (i.e., information) in the data. The new fields are called factors, or principal components.
R Tool R Tool Execute an R language script and link incoming and outgoing data from Alteryx to R, an open-source tool used for statistical and predictive analysis.

^Top

Try Alteryx Designer for Free Download Now