Data Investigation Tools
Before a process or analysis takes place, users need to understand the details of their data. Alteryx gives users an array of Data Investigation tools to quickly and easily understand their data.
This is a sample of the tools available in the Alteryx Designer. For the full list of tools, click here.
|Association Analysis||Determine which fields in a database have a bivariate association with one another.|
|Contingency Table||Create a contingency table based on selected fields, to list all combinations of the field values with frequency and percent columns.|
|Create Samples||Split the data stream into two or three random samples with a specified percentage of records in the estimation and validation samples. If the total is less than 100%, the remaining records fall in the holdout sample.|
|Distributed Analysis||Allows you to fit one or more distributions to the input data and compare them based on a number of Goodness-of-Fit* statistics. Based on the statistical significance (p-values) of the results of these tests, the user can determine which distribution best represents the data.|
|Field Summary Report||Produce a concise summary report of descriptive statistics for the selected data fields.|
|Frequency Table||Produce a frequency analysis for selected fields - output includes a summary of the selected field(s) with frequency counts and percentages for each value in a field.|
|Histogram||Provides a histogram plot for a numeric field. Optionally, it provides a smoothed empirical density plot. Frequencies are displayed when a density plot is not selected, and probabilities when this option is selected. The number of breaks can be set by the user, or determined automatically using the method of Sturges.|
|Heat Plot||This tools plots the empirical bivariate density of two numeric fields using colors to indicate variations in the density of the data for different levels of the two fields.|
|Oversample Field||Sample incoming data so that there is equal representation of data values to enable effective use in a predictive model.|
|Pearson Correlation||Replaces the Pearson Correlation Coefficient in previous versions…
The Pearson coefficient is obtained by dividing the covariance of the two variables by the product of their standard deviations.
|Plot of Means||Take a numeric or binary categorical (converted into a set of zero and one values) field as a response field along with a categorical field and plot the mean of the response field for each of the categories (levels) of the categorical field.|
|Scatterplot||Produce enhanced scatterplots, with options to include boxplots in the margins, a linear regression line, a smooth curve via non-parametric regression, a smoothed conditional spread, outlier identification, and a regression line. The smooth curve can expose the relationship between two variables relative to a traditional scatter plot, particularly in cases with many observations or a high level of dispersion in the data.|
|Spearman Correlation Coefficient||Assesses how well an arbitrary monotonic function could describe the relationship between two variables without making any other assumptions about the particular nature of the relationship between the variables.|
|Violin Plot||Shows the distribution of a single numeric variable, and conveys the density of the distribution based on a kernel smoother that indicates the density of values (via width) of the numeric field.
In addition to concisely showing the nature of the distribution of a numeric variable, violin plots are an excellent way of visualizing the relationship between a numeric and categorical variable by creating a separate violin plot for each value of the categorical variable.