Data readiness isn’t new, but AI demands more from it
It’s tempting to think of AI as a new frontier requiring new rules. But in many ways, the principles of data readiness remain unchanged. Clean, well-structured, and well-documented data has always been critical to deriving insights. The difference now is that machines, rather than humans, are increasingly consuming and acting on that data, often in opaque or probabilistic ways.
In a recent Alter Everything podcast, data orchestration expert Nick Schrock, CTO and founder of Dagster Labs, joined the discussion to explore what it truly means to prepare data for AI. Schrock offered practical insights into how organizations can prepare AI-ready data, overcome challenges in context engineering, and establish effective governance for AI-driven data workflows.
The rise of context engineering
This is where the concept of context engineering comes in.
For years, “prompt engineering” referred to the art of crafting the perfect input for a chatbot. But as Schrock explains, enterprise AI requires something more sophisticated: orchestrating the right context, to the right model, at the right time.
This is the evolution from simple prompting to true engineering. Instead of relying on ad hoc prompts, organizations must design systems that manage data context as an intentional, reusable asset.
Schrock warns that more context isn’t always better. Providing too much information, or conflicting information, can lead to confusion and hallucinations. He also highlights the problem of context rot, where outdated or irrelevant data accumulates, degrading performance over time.
Success depends on precision: curating relevant, high-quality context and delivering it efficiently to the model. For many organizations, that’s a new discipline and a new engineering challenge.
Governance in AI data workflows
Data governance has always been essential, but AI raises the stakes. Governance now extends beyond compliance and data security; it’s about trust, explainability, and control in a world where AI can generate and modify data on the fly.
Schrock emphasizes the need for guardrails that define how AI operates within data pipelines. At Dagster Labs, his team designs abstractions that confine AI operations to smaller, modular units, preventing what he calls technical debt super spreading. Without such boundaries, AI tools can replicate poor patterns across an entire codebase, compounding errors rather than resolving them. Some of Schrock’s key recommendations include:
- Treating prompts and metadata like code. They should be version-controlled, reviewable, and reversible.
- Compartmentalizing AI operations. Limit where and how AI interacts with data pipelines to maintain oversight.
- Establishing model observability. Monitor AI outputs through evaluations to ensure consistent performance and detect when quality drifts.
As Schrock puts it, model observability is still an undiscovered country. Few organizations fully understand why models behave as they do. But introducing governance frameworks and versioning practices helps demystify AI systems and lays the groundwork for accountability.
Balancing speed and quality
Generative AI tools have lowered the cost of experimentation, enabling teams to prototype solutions faster than ever. But rapid iteration can also create fragile systems that collapse under real-world conditions.
Schrock describes this tension as the skyscraper problem: AI makes it easy to build tall, but not necessarily stable. Without strong foundations organizations risk scaling instability rather than innovation.
To move fast and build reliably, enterprises must embrace phased delivery. Early prototypes are valuable for learning, but before scaling, teams need to invest in clean data models, pipeline validation, and evaluation mechanisms that ensure consistency over time. AI’s promise of speed should be used to accelerate learning, not to bypass the discipline of data engineering.
The future of AI and data engineering
For all the disruption AI promises, Schrock sees it as a catalyst and not a replacement for the field of data engineering. “I’ve never been more bullish on data engineering,” he says. “The underlying value of these systems can only be exploited with good engineering.”
AI also has the potential to improve collaboration across business and technical teams. Schrock describes a use case where his team uses a Slack bot to translate natural language requests into SQL queries. The result is a shared space where non-technical stakeholders can express what they need in plain English, and data engineers can see exactly how those requests translate into database queries.
This kind of AI-enabled collaboration bridges communication gaps and accelerates problem-solving. Business users speak in their own domain language, while engineers gain visibility into how that language maps to the data model — a mutual learning process that drives better outcomes.
AI could also transform legacy infrastructure. With the ability to rewrite and migrate code efficiently, organizations can modernize decades-old systems faster than ever. But again, this potential depends on one thing: AI-ready data.
The path to AI value
As AI hype continues to swell, the companies that succeed won’t be the ones with the flashiest demos; they’ll be the ones with the strongest data foundations. AI data readiness isn’t about engineering systems that can adapt, scale, and deliver value in a world where machines and humans increasingly share the wheel.
Now is the time to invest in the less glamorous, but more essential, parts of your AI strategy: data quality, context engineering, governance frameworks, and cross-functional fluency.