Data Science and Data Analytics Glossary of Terms

Advanced Analytics

Advanced analytics uses sophisticated techniques to uncover insights, identify patterns, predict outcomes, and generate recommendations.

...

Analytics Maturity Model

The higher your organization’s level of analytics maturity, the more capable it is of using data to deliver business outcomes.

...

AutoML

Automated machine learning, or AutoML, makes ML accessible to non-experts by enabling them to build, validate, iterate, and explore ML models through an automated experience.

...

Batch Processing

Batch processing refers to the scheduling and processing of large volumes of data simultaneously, generally at periods of time when computing resources are experiencing low demand. Batch jobs are typically repetitive in nature and are often scheduled (automated) to occur at set intervals, such as ...

Business Analytics

Business analytics is the process analyzing data using statistical and quantitative methods to make decisions that drive better business outcomes.

...

Business Intelligence

Business intelligence is the cumulative outcome of an organization’s data, software, infrastructure, business processes, and human intuition that delivers actionable insights.

...

Cloud Analytics

Cloud analytics involves both using data stored in the cloud for analytic processes and leveraging the fast computing power of the cloud for faster analytics.

...

Cloud Data Warehouse (CDW)

A cloud data warehouse is a database that is managed as a service and delivered by a third party, such as Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure. Cloud data architectures are distinct from on-premise data architectures, where organizations manage their own phys ...

Data Aggregation

Data aggregation is the process of compiling data (often from multiple data sources) to provide high-level summary information that can be used for statistical analysis. An example of a simple data aggregation is finding the sum of the sales in a particular product category for each region you op ...

Data Analytics

Data analytics is the process of exploring, transforming, and analyzing data to identify meaningful insights and efficiencies that support decision-making.

...

Data Application

Data applications are applications built on top of databases that solve a niche data problem and, by means of a visual interface, allow for multiple queries at the same time to explore and interact with that data. Data applications do not require coding knowledge in order to procure or understand ...

Data Blending

Data blending is the act of bringing data together from a wide variety of sources into one useful dataset to perform deeper, more complex analyses.

...

Data Catalog

A data catalog is a comprehensive collection of an organization’s data assets, which are compiled to make it easier for professionals across the organization to locate the data they need. Just as book catalogs help users quickly locate books in libraries, data catalogs help users quickly search ...

Data Cleansing

Data cleansing, also known as data cleaning or scrubbing, identifies and fixes errors, duplicates, and irrelevant data from a raw dataset.

...

Data Enrichment

Data enrichment is the process of combining first party data from internal sources with disparate data from other internal systems or third party data from external sources. The data enrichment process makes data more useful and insightful. A well-functioning data enrichment process is a fundamen ...

Data Exploration

Data exploration is one of the initial steps in the analysis process that is used to begin exploring and determining what patterns and trends are found in the dataset. An analyst will usually begin data exploration by using data visualization techniques and other tools to describe the characteris ...

Data Governance

Data governance is the collection of policies, processes and standards that define how data assets can be used within an organization and who has authority over them. Governance dictates who can use what data and in what way. This ensures that data assets remain secure and adhere to agreed upon q ...

Data Ingestion

Data ingestion is the process of collecting data from its source(s) and moving it to a target environment where it can be accessed, used, or analyzed.

...

Data Integrity

Data integrity refers to the accuracy and consistency of data over its entire lifecycle, as well as compliance with necessary permissioning constraints and other security measures. In short, it is the trustworthiness of your data.

...

Data Lakehouse

A data lakehouse is a data management architecture that seeks to combine the strengths of data lakes with the strengths of data warehouses.

...

Data Lineage

Track where an organization’s data comes from, the journey it takes through the system, and keep business data compliant and accurate.

...

Data Mesh

A data mesh is a new approach to designing data architectures. It takes a decentralized approach to data storage and management, having individual business domains retain ownership over their datasets rather than flowing all of an organization’s data into a centrally owned data lake. Data is ac ...

Data Munging

Data munging is the process of manual data cleansing prior to analysis. It is a time consuming process that often gets in the way of extracting true value and potential from data. In many organizations, 80% of the time spent on data analytics is allocated to data munging, where IT manually cleans ...

Data Observability

Data observability refers to the ability of an organization to monitor, track, and make recommendations about what’s happening inside their data systems in order to maintain system health and reduce downtime. Its objective is to ensure that data pipelines are productive and can continue running ...

Data Onboarding

Data onboarding is the process of preparing and uploading customer data into an online environment. It allows organizations to bring customer records gathered through offline means into online systems, such as CRMs. Data onboarding requires significant data cleansing to correct for errors and for ...

Data Pipeline

A data pipeline is a sequence of steps that collect, process, and move data between sources for storage, analytics, machine learning, or other uses. For example, data pipelines are often used to send data from applications to storage devices like data warehouses or data lakes. Data pipelines are ...

Data Preparation

Data preparation, also sometimes called “pre-processing,” is the act of cleaning and consolidating raw data prior to using it for business analysis and machine learning.

...

Data Profiling

Data profiling helps discover, understand, and organize data by identifying its characteristics and assessing its quality.

...

Data Science

Data science is a form of applied statistics that incorporates elements of computer science and mathematics to extract insight from both quantitative and qualitative data.

...

Data Science vs Machine Learning

Data science and machine learning are buzzwords in the technology world. Both enhance AI operations across the business and industry spectrum. But which is best?

...

Data Source

A data source is the digital or physical location where data originates from or is stored, which influences how it is stored per its location (e.g. data table or data object) and its connectivity properties.

...

Data Standardization

Data standardization abstracts away all the complex semantics of how data is captured, standardized, and cobbled together to provide businesses with faster and more accurate analytics.

...

Data Transformation

Data transformation is the process of converting data into a different format that is more useful to an organization. It is used to standardize data between data sets, or to make data more useful for analysis and machine learning. The most common data transformations involve converting raw data i ...

Data Validation

Data validation is the process of ensuring that your data is accurate and clean. Data validation is critical at every point of a data project’s life—from application development to file transfer to data wrangling—in order to ensure correctness. Without data validation from inception to iter ...

Data Visualization

Data visualization is the visual representation of data by using graphs, charts, plots or information graphics.

...

Data Wrangling

Data wrangling is the act of transforming, cleansing, and enriching data to make it more applicable, consumable, and useful to make smarter business decisions.

...

Decision Intelligence

Decision intelligence is the process of applying analytics, AI and automation to decisions that impact< ...

Demand Forecasting

Demand forecasts estimate future demand for products and services, which helps inform business decisions. Demand forecasts include granular data, historical sales data, questionnaires, and more.

...

Descriptive Analytics

Descriptive analytics answers the question “What happened?” by drawing conclusions from large, raw datasets. The findings are then visualized into accessible line graphs, tables, pie and bar charts, and generated narratives.

...

ETL

ETL is the process used to copy, combine, and convert data from different sources and formats and load it into a new destination such as a data warehouse or data lake.

...

ETL Developer

An ETL Developer is an IT specialist who designs, develops, automates, and supports complex applications to extract, transform, and load data. An ETL Developer plays an important role in determining the data storage needs of their organization.

...

Feature Engineering

With feature engineering, organizations can make sense of their data and turn it into something beneficial.

...

Machine Learning

Machine learning is the iterative process a computer uses to identify patterns in a dataset given specific constraints.

...

MLOps (Machine Learning Operations)

Machine learning models (MLOps) provide valuable insights to the business, but only if those models can access and analyze the organization’s data on an ongoing basis. MLOps is the critical process that makes this possible.

...

Predictive Analytics

Predictive analytics is a type of data analysis that uses statistics, data science, machine learning, and other techniques to predict what will happen in the future.

...

Prescriptive Analytics

Prescriptive analytics answers the question “What should/can be done?” by using machine learning, graph analysis, simulation, heuristics, and other methods.

...

Regex (Regular Expression)

A regex (short for regular expression) is a sequence of characters used to specify a search pattern. It allows users to easily conduct searches matching very specific criteria, saving large amounts of time for those who regularly work with text or analyze large volumes of data. An example of a re ...

Sales Analytics

Sales analytics is the practice of generating insights from data and used to set goals, metrics, and a larger strategy.

...

Source-to-Target Mapping

Source-to-Target Mapping is a set of data transformation instructions that determine how to convert the structure and content of data in the source system to the structure and content needed in the target system.

...

Spatial Analysis

Spatial analysis models problems geographically, allowing a company to analyze the locations, relationships, attributes, and proximities in geospatial data to answer questions and develop insights.

...

Supervised vs Unsupervised Learning

Supervised and unsupervised learning have one key difference. Supervised learning uses labeled datasets, whereas unsupervised learning uses unlabeled datasets.

...

Systems of Intelligence

Systems of intelligence help organizations extract value from their tech stack
Read More

User Defined Function (UDF)

A User Defined Function (UDF) is a custom programming function that allows users to reuse processes without having to rewrite code. For example, a complex calculation can be programmed using SQL and stored as a UDF. When this calculation needs to be used in the future on a different set of data, ...

WHY WORK WITH US?

Partner Program

TRUST CENTER

GENERATIVE AI

Alteryx AiDIN

PLATFORM OVERVIEW

PLATFORM CAPABILITIES

ALTERYX ANALYTICS CLOUD

ON-PREMISES PRODUCTS

DEPARTMENT

INDUSTRY

ROLE

PLATFORM TOUR

Take a tour of the Alteryx AI Platform for Enterprise Analytics

RESOURCES

LEARN

EVENTS

TAKE THE ASSESSMENT

Data Scorecard

COMPANY

LIFE AT ALTERYX

NEWSROOM

TRY FOR FREE

Transform Your Analytics

Data Science and Data Analytics Glossary

Transform Your Analytics

About Alteryx

Resources

Support

Community

Trending Now

Popular

Company

Data Science and Data Analytics Glossary

Inspire 2024 is just weeks away!

May 13-16 | The Analytics Event of the Year