The Data Stack in the Age of Generative AI

Strategy   |   Heather Ferguson   |   Mar 11, 2024 TIME TO READ: 8 MINS

Generative AI has quickly become an undisputed game changer in business. 82% of business leaders agree that AI is significantly impacting organizational goals, and 46% of board members have stated that generative AI is their top priority over everything else.

We have moved beyond the stage of advocating for the adoption of generative AI; the focus is now on how businesses will use it. With genAI now reaching the peak of the hype cycle, IT teams are realizing that there is one clear differentiating element that can make or break a generative AI project: data.

Data is one of the most significant pieces to a successful generative AI program; teams need access to high-quality, governed, and reliable data to run tests, experiment, and explore results.

Are IT teams and their data stacks ready? Are their systems, technologies, and business cultures set up for the massive shift that is generative AI?

We interviewed 3,100 global IT leaders about their data stacks, organizational structure, and approaches to data strategy. Here’s what we learned.

Businesses are confident in their data. Should they be?

This image shows an graph titled "Level of trust in organization’s data". There are five levels of trust indicated, each with a corresponding percentage value shown next to a horizontal bar.
From bottom to top, the levels are: * "Complete trust": Represented by the smallest bar, indicating 20% trust. It is defined as data being trusted implicitly and decisions are made with confidence in the data’s accuracy and validity. * "High trust": The second smallest bar, shows 31% trust. It suggests that the data is highly trusted, and users have confidence in its accuracy, completeness, and relevance. * "Moderate trust": A mid-sized bar, represents 25% trust. It is described as data that is considered reliable and accurate and is used for routine decisions. * "Limited trust": A larger bar, indicates 17% trust. This suggests that the data is somewhat reliable, but with reservations about its accuracy or completeness. * "Low / No trust": The largest bar, indicating the highest percentage at 7%. It is explained as there being little confidence in the data’s accuracy and quality.

Figure 1 – What level of trust do you have in your organization’s data?

Throughout the survey responses, one theme emerged: overall, IT leaders are satisfied with their data. Over half of companies (54%) rated their data maturity as good or advanced, and 76% trust their data.

This is good news, right?

Turns out there are some cracks in the facade.

While the sentiment is true that leaders feel confident in their data, there is some evidence to the contrary when we do a little digging. Only 10% of businesses surveyed state they have a “modern” data stack – almost half (47%) are updating their data stack infrastructure to make it more modern. One in five companies (22%) faces data bias challenges, and 20% need help with data quality — and while this doesn’t represent the majority, it’s enough to raise some questions.

How does this explain the confidence that leaders are feeling? Addressing data quality and bias could be the key to getting the other 24% into the same positive position. In fact, you’ll see below that IT leaders ranked data quality as their top goal for new technology investments. The secret to success in a genAI-augmented world will be doubling down on efforts to improve or maintain data quality and infrastructure throughout the disruption.

Is it time for the death of the spreadsheet?

This image is a graph comparing the current and predicted future composition of data stacks in organizations. The left side of the infographic, labeled "Current composition of data stack" and indicated by blue bars, lists various data technology categories along with their respective adoption percentages:* Customer Relationship Management (CRM) software (i.e., Salesforce) - 41% * Enterprise Resource Planning (ERP) software (i.e., SAP) - 35% * Spreadsheet software (i.e., Microsoft Excel) - 35% * Data Science and AI Platform - 33% * Generative AI (i.e., Chat GPT) - 32% * Virtualization software (i.e., VMware, Hyper-V) - 28% * Enterprise reporting platform - 26% * Business intelligence tools (i.e., Tableau) - 25% * Location Intelligence platform - 25% The right side, labeled "Composition in 3 years’ time" and indicated by orange bars, predicts the following percentages: * Customer Relationship Management (CRM) software (i.e., Salesforce) - 43% * Data Science and AI Platform - 40% * Enterprise Resource Planning (ERP) software (i.e., SAP) - 39% * Generative AI (i.e., Chat GPT) - 36% * Virtualization software (i.e., VMware, Hyper-V) - 32% * Enterprise reporting platform - 31% * Spreadsheet software (i.e., Microsoft Excel) - 30% * Location Intelligence platform - 29% * Business intelligence tools (i.e., Tableau) - 29% Each category is represented by a horizontal bar whose length corresponds to the percentage.

Figure 2 – What is the composition of your data stack currently? And what do you expect your data technology stack to look like in 3 years’ time?

Is now finally the time for the spreadsheet to bite the dust? We asked IT leaders what technologies make up their current data stacks, and it’s what you would expect: the top three were CRM (Customer Relationship Management) software, ERP (Enterprise Resource Planning) software, and spreadsheets.

When we asked about the future makeup of the data stack, we saw some shifts. IT leaders expected the top three elements of the data stack in three years to be CRM software, data science and AI platforms, and ERP software. Spreadsheets dropped from 3rd in the stack to 7th.

What’s driving this? Is the spreadsheet on its way out?

The increase in data science and AI platforms from 4th to 2nd position in the future data stack, along with generative AI’s close 4th position, could partially explain this. IT leaders likely hope these more advanced platforms will allow companies to automate many tedious, manual aspects of managing data — and, perhaps even more significantly, make sophisticated modeling and predictive capabilities more accessible.

So, does this mean the end of the spreadsheet? Our data indicates no. Even after falling from 3rd to 7th place in the data stack, spreadsheets are still expected to make up 31% of the data stack in 3 years. So, it turns out that reports of the death of the spreadsheet are greatly exaggerated. However, it may be true that teams will rely less on spreadsheets for sophisticated data integration, manipulation, and analysis while other technologies step in.

How are data stacks built, anyway?

The image is a bar graph titled "Defining the data stack," showing the distribution of different types of data stacks. There are four types of data stacks defined, each with a corresponding percentage represented by the height of the bar.* A hybrid data stack combining modern and legacy technologies – This has the highest representation at 43%. * A legacy data stack on-premises data centers that lack scalability, inflexible architecture, data volume in GB-TB, data generated daily – This is represented by a bar slightly shorter than the first, indicating 30%. * A transitional data stack in transition from a legacy data stack to a modern data stack – Represented by a still shorter bar at 17%. * A modern data stack cloud-based infrastructure (like a cloud data lake), flexible, modular tools, new data sources like IoT, data generated in real-time – The smallest representation at 10%, shown by the shortest bar. The percentages suggest a current preference or dominance of hybrid data stacks and indicate that a smaller proportion of organizations have fully modern data stacks. The graph is likely intended to show the current landscape of data technology adoption and the progression from legacy to modern data infrastructure.

Figure 3 – Would you classify your organization’s data stack as:

How a data stack is built says a lot about a company’s data strategy; a business with a data stack in the cloud will approach its strategies differently than a company managing and maintaining legacy and on-prem systems.

So, what is driving how a data stack is built? It’s a bit of a chicken and the egg scenario – does strategy build the stack, or does the stack drive the strategy?

The image is a vertical bar chart with a title "Top drivers determining the structure of the data stack." It presents various factors that influence the data stack structure, each with an associated percentage that seems to represent their importance or impact level. The chart has a dark blue background, and the bars are colored in a gradient of blue shades. Here are the factors listed in descending order of their percentage values:* Existing IT Infrastructure - 22% * Data Sources - 22% * Technical Expertise - 22% * Cost-Benefit Analysis - 21% * Business Objectives - 21% * User requirements for using/accessing data - 20% * Data Volume and Variety - 20% * Regulatory and Compliance Requirements - 20% * Data Lifecycle - 18% * Budget Constraints - 18% * Data Latency and Real-time Requirements - 18% * Scalability and Growth Plans - 17% * Data Culture and Organizational Buy-In - 16% * Industry Best Practices - 15% * Remaining competitive - 15% * Vendor Ecosystem - 12% Each factor has a corresponding horizontal bar whose length is proportional to the percentage, indicating that factors like existing IT infrastructure, data sources, and technical expertise are currently seen as the most significant drivers in shaping the structure of a data stack.

Figure 4 – What are the top 3 drivers determining the structure of your organization’s data stack?

According to our data, the top three drivers of a data stack structure are:

  • Existing IT infrastructure (22%)
  • Data sources (22%)
  • Technical expertise (22%)

Interestingly, these three factors outweigh business objectives, which came slightly behind at 21%.

This explains why so many companies have hybrid data stacks; not surprisingly, in the current economic climate, leaders feel that they must work with the infrastructure investments they already have and build strategy from there.

What about new technology investments? The top factors for net-new technology are:

  • Cost (32%)
  • Ease of use (29%)
  • Security and compliance credentials (27%)

To add another dimension of information to this picture, we asked IT leaders what they were looking for in their new technology investments – the top response was improving data quality (23%).

Are data teams set up for AI failure?

Outside of the technology powering data stacks, how data teams are managed and organized, and the operating procedures around data, will significantly affect businesses’ ability to adopt and adapt to generative AI. We asked several questions about how data teams are run, and the results point to a few scenarios that may hinder generative AI adoption.

Who owns data?

When asked who owns data within organizations, the majority responded that it was the CDO (22%), though the results varied widely and didn’t show a consensus across respondents regarding data ownership.

For 11% of companies, the board of directors was the ultimate owner of the data; for 8% of businesses, senior executives owned the data.

While there’s no correct answer for who should own data, the lack of consensus may be concerning when data access and management are requirements for a successful generative AI implementation.

Who owns the budget?

IT teams, in general, are responsible for data technology budgets, but the reality of how those budgets are allocated and adjusted tells a story that may have made generative AI adoption difficult over the last year.

Over half of businesses state that budgets are not reviewed or adjusted throughout the year, even if new needs arise. Fifty-four percent (54%) state that if other priorities, projects, or spending needs arise after budgets are allocated, they cannot be adjusted. This must have been a considerable challenge in 2023, when the pressure to adopt generative AI grew exponentially.

Considering how quickly generative AI moved in the last year and how quickly it continues to change, some updates to how IT budgets are allocated or reviewed may need to be revised to allow for innovation.

How are data teams structured?

Two-fifths of companies (41%) do not have a centralized data or analytics function that maintains data as a shared resource for the business; instead, individual departments manage their data. When addressing generative AI, this may not be an issue, except that these departments don’t seem to be sharing this data across the company; 48% of respondents stated that data is kept within the department that generates the data.

This further emphasizes that data has no clear owner and is isolated in silos. To fully see the impact of generative AI, the more (clean, accessible, governed) data, the better. It will be interesting to see if this approach changes as generative AI adoption moves across the business.

What is next for IT teams and genAI?

We are still only in the early stages of seeing the impact of generative AI on companies. It will be interesting to see how the fundamental elements of data teams in the enterprise shift, whether it’s how data stacks are managed, what tools are prioritized, or how data teams are structured.

To dig into the full results of the research, download the report, The Data Stack Evolution: Legacy Challenges and AI Opportunities, here.