Predictive Analytics Tools
Alteryx Analytics makes predictive analytics tools accessible to users of all types and skillset. It delivers over 50 prepackaged tools of the most widely used procedures for predictive analytics, grouping, and forecasting to help analyst throughout the predictive analytics process. These tools are built upon the R framework through a drag and drop tool, eliminating the need for programming and scripting.
This is a sample of the tools available in the Alteryx Designer. For the full list of tools, click here.
Icon  Tool  Description 

AB Test Analysis  Compare the percentage change in a performance measure to the same measure one year prior.  
AB Controls  The Control Select tool matches one to ten control units (e.g., stores, customers, etc.) to each member of a set of previously selected test units, on the criteria such as seasonal patterns and growth trends for a key performance indicator, along with other user provided criteria.  
AB Treatments  Determine which group is the best fit for AB testing.  
AB Trends  Create measures of trend and seasonal patterns that can be used in helping to match treatment to control units (e.g., stores or customers) for A/B testing. The trend measure is based on period to period percentage changes in the rolling average (taken over a one year period) in a performance measure of interest. The same measure is used to assess seasonal effects. In particular, the percentage of the total level of the measure in each reporting period is used to assess seasonal patterns.  
Market Basket Rules  Step 1 of a Market Basket Analysis: Take transaction oriented data and create either a set of association rules or frequent item sets. A summary report of both the transaction data and the rules/item sets is produced, along with a model object that can be further investigated in an MB Inspect tool.  
Market Basket Inspect  Step 2 of a Market Basket Analysis: Take the output of the MB Rules tool, and provide a listing and analysis of those rules that can be filtered on several criteria in order to reduce the number or returned rules or item sets to a manageable number.  
Market Basket Affinity  Used to construct a matrix of affinity measures between different items with respect to their likelihood of being part of the same action or transaction.  
Boosted Model  Provides generalized boosted regression models based on the gradient boosting methods of Friedman.^{*} It works by serially adding simple decision tree models to a model ensemble so as to minimize an appropriate loss function. 

Count Regression  Estimate regression models for count data (e.g., the number of store visits a customer makes in a year), using Poisson regression, quasiPoisson regression, or negative binomial regression. The R functions used to accomplish this are glm() (from the R stats package) and glm.nb() (from the MASS package).  
Decision Tree  Predict a target variable using one or more predictor variables that are expected to have an influence on the target variable by constructing a set of ifthen split rules that optimize a criteria. If the target variable identifies membership in one of a set of categories, a classification tree is constructed (based on Gini coefficient) to maximize the 'purity' at each split. If the target variable is a continuous variable, a regression tree is constructed using the split criteria of 'minimize the sum of the squared errors' at each split.  
Forest Model  Predict a target variable using one or more predictor variables that are expected to have an influence on the target variable, by constructing and combining a set of decision tree models (an "ensemble" of decision tree models).  
Gamma Regression  Based on the R and Revo generalized linear model, called the Gamma Regression, (which is based on an underlying Gamma distribution) that handles strictly positive target variables that have a long right tail (so most values are relatively small, and there is a long righthand tail to the distribution).  
Lift Chart  Compare the improvement (or lift) that various models provide to each other as well as a ‘random guess' to help determine which model is ‘best.' Produce a cumulative captured response chart (also called a gains chart) or an incremental response rate chart.  
Nested Test  Examine whether two models, one of which contains a subset of the variables contained in the other, are statistically equivalent in terms of their predictive capability.  
Linear Regression  Relate a variable of interest (target variable) to one or more variables (predictor variables) that are expected to have an influence on the target variable. (Also known as a linear model or a leastsquares regression.)  
Logistic Regression  Relate a binary (yes/no) variable of interest (target variable) to one or more variables (predictor variables) that are expected to have an influence on the target variable.  
Naives Bayes Classifier  Creates a binomial or multinomial probabilistic classification model of the relationship between a set of predictor variables and a categorical target variable. The Naive Bayes classifier assumes that all predictor variables are independent of one another and predicts, based on a sample input, a probability distribution over a set of classes, thus calculating the probability of belonging to each class of the target variable.  
Neural Networks  This tool allows a user to create a feedforward perceptron neural network model with a single hidden layer. The neurons in the hidden layer use a logistic (also known as a sigmoid) activation function, and the output activation function depends on the nature of the target field. Specifically, for binary classification problems (e.g., the probability a customer buys or does not buy), the output activation function used is logistic, for multinomial classification problems (e.g., the probability a customer chooses option A, B, or C) the output activation function used is softmax, for regression problems (where the target is a continuous, numeric field) a linear activation function is used for the output.  
Support Vector Machine  Support Vector Machines (SVM), or Support Vector Networks (SVN), are popular supervised learning algorithms used for classification problems, and are meant to accommodate instances where the data (i.e., observations) are considered linearly nonseparable. In other words, the target values cannot be separated into their underlying classes using a simple, single linear boundary.  
Spline Model  This tool implements Friedman's multivariate adaptive regression spline (MARS) model. This is in the more modern class of models (like the Forest and Boosted Models) that handles both variable selection and nonlinear relationships directly with the algorithm. In some ways it is similar to a decision tree, but instead of making discrete jumps at "splits", the splits (called "knots" in this method) place in a "hinge", where the slope of the effect of a predictor on a target changes, resulting in the effect of numeric predictors being modeled as piecewise linear components.  
Stepwise  Determine the "best" predictor variables to include in a model out of a larger set of potential predictor variables for linear, logistic, and other traditional regression models. The Alteryx Rbased stepwise regression tool makes use of both backward variable selection and mixed backward and forward variable selection.  
Score  Calculate a predicted value for the target variable in the model. This is done by appending a 'Score' field to each record in the output of the data stream, based on the inputs: an R model object (produced by the Logistic Regression, Decision Tree, Forest Model, or Linear Regression) and a data stream consistent with the model object (in terms of field names and the field types).  
Test of Means  Compare the difference in mean values (using a Welch two sample ttest) for a numeric response field between a control group and one or more treatment groups.  
Network Analysis  Creates an interactive visualization of a network along with summary statistics and distribution of node centrality measures.  
Boosted Model  Provides generalized boosted regression models based on the gradient boosting methods of Friedman. It works by serially adding simple decision tree models to a model ensemble so as to minimize an appropriate loss function. Accessible via the regular predictive tool palette and will automatically convert to the InDB version of the tool if an InDB connection exists. ^{*}only available in Microsoft SQL Server 2016 and Teradata 

Decision Tree  Predict a target variable using one or more predictor variables that are expected to have an influence on the target variable by constructing a set of ifthen split rules that optimize a criteria. If the target variable identifies membership in one of a set of categories, a classification tree is constructed (based on Gini coefficient) to maximize the 'purity' at each split. If the target variable is a continuous variable, a regression tree is constructed using the split criteria of 'minimize the sum of the squared errors' at each split. Accessible via the regular predictive tool palette and will automatically convert to the InDB version of the tool if an InDB connection exists. ^{*}only available in Microsoft SQL Server 2016 and Teradata 

Forest Model  Predict a target variable using one or more predictor variables that are expected to have an influence on the target variable, by constructing and combining a set of decision tree models (an "ensemble" of decision tree models). Accessible via the regular predictive tool palette and will automatically convert to the InDB version of the tool if an InDB connection exists. ^{*}only available in Microsoft SQL Server 2016 and Teradata 

Linear Regression InDB  Uses the database's native language (e.g., R) to create an expression to relate a variable of interest (target variable) to one or more variables (predictor variables) that are expected to have an influence on the target variable. (Also known as a linear model or a leastsquares regression.) Accessible via the regular predictive tool palette and will automatically convert to the InDB version of the tool if an InDB connection exists. ^{*}only available in Oracle R, Microsoft SQL Server 2016 and Teradata 

Logistic Regression InDB  Uses the database's native language (e.g., R) to create an expression to relate a binary (yes/no) variable of interest (target variable) to one or more variables (predictor variables) that are expected to have an influence on the target variable expression. Accessible via the regular predictive tool palette and will automatically convert to the InDB version of the tool if an InDB connection exists. ^{*}only available in Oracle R, Microsoft SQL Server 2016 and Teradata 

Scoring InDB  Uses the database's native language (e.g., R) to create an expression to calculate a predicted value for the target variable in the model. This is done by appending a 'Score' field to each record in the output of the data stream, based on the inputs: an R model object (produced by the Logistic Regression, Decision Tree, Forest Model, or Linear Regression) and a data stream consistent with the model object (in terms of field names and the field types). Accessible via the regular predictive tool palette and will automatically convert to the InDB version of the tool if an InDB connection exists. ^{*}only available in Oracle R, Microsoft SQL Server 2016 and Teradata 

Optimization  Solve linear programming (LP), mixed integer linear programming (MILP), and quadratic programming optimization problems using matrix, manual, and file inputs.  
Simulation Sampling  Simulation Sampling allows for sampling. The samples can be done parametrically from a distribution, from input data, or as a combination — best fitting from a distribution and sampling from that data. This data can also be "drawn" if one is unsure of the parameters of a distribution but is also lacking data.  
Simulation Summary  Simulation summary contains two main components. Firstly, it allows for a visualization of simulated distributions, and results from operations on those distributions. Secondly, it allows for visual and quantitative analysis of input vs. output variables.  
Simulation Scoring  Sample from an approximation of a model object error distribution.  
TS ARIMA  Estimate a univariate time series forecasting model using an autoregressive integrated moving average (or ARIMA) method.  
TS Compare  Compare one or more univariate time series models created with either the ETS or ARIMA tools.  
TS ETS  Estimate a univariate time series forecasting model using an exponential smoothing method.  
TS Filer  This tool allows a user to take a data stream of time series data and “fill in” any gaps in the series 

TS Covariant Forecast  The TS Covariate Forecast tool provides forecasts from an ARIMA model estimated using covariates for a userspecified number of future periods. In addition, upper and lower confidence interval bounds are provided for two different (userspecified) percentage confidence levels. For each confidence level, the expected probability that the true value will fall within the provided bounds corresponds to the confidence level percentage. In addition to the model, the covariate values for the forecast horizon must also be provided.  
TS Forecast  Provide forecasts from either an ARIMA or ETS model for a specific number of future periods.  
TS Plot  Create a number of different univariate time series plots, to aid in the understanding the time series data and determine how to develop a forecasting model.  
TS Forecast Factory  Generate point forecasts and confidence intervals from groups of either ARIMA or ETS models for a userspecified number of periods. 

TS Model Factory  Estimate time series forecasting models for multiple groups at once using either the ARIMA method (with or without covariates) or the ETS method. 

Append Cluster  Appends the cluster assignments from a KCentroids Cluster Analysis tool to a data stream containing the set of fields (with the same names, but not necessarily the same values) used to create the original cluster solution.  
KCentroids Analysis  Partition records into "K groups" around centroids by assigning cluster memberships, using KMeans, KMedians, or Neural Gas clustering.  
KCentroids Diagnostics  Assess the appropriate number of clusters to specify, given the data and the selected Predictive Grouping algorithm (KMeans, KMedians, or Neural Gas).  
KNearest Neighbor  Find the selected number of nearest neighbors in the "data" stream that corresponds to each record in the "query" stream based on their Euclidean distance.  
Principal Components  Reduce the dimensions (number of numeric fields) in a database by transforming the original set of fields into a smaller set that accounts for most of the variance (i.e., information) in the data. The new fields are called factors, or principal components.  
R Tool  Execute an R language script and link incoming and outgoing data from Alteryx to R, an opensource tool used for statistical and predictive analysis. 