Skip to main content

AI SDLC

AI Changes Software Engineering, Part 3: The New Operating Model

AI stops looking like a collection of projects and starts looking like an enterprise capability that must be governed, operated, observed, funded, and evolved over time.

··9 min read

Answer summary

Enterprise AI adoption eventually forces organizations to move from project thinking to capability thinking. Models, prompts, embeddings, retrieval systems, evaluation datasets, agents, and governance policies all require ownership, lifecycle management, observability, evidence, and operational support.

Key takeaways

  • AI adoption eventually becomes an operating-model problem.
  • AI systems introduce new lifecycle objects: models, prompts, embeddings, retrieval systems, evaluation datasets, agents, and policies.
  • Successful organizations spend less time arguing about models and more time building operational foundations.
  • Capability thinking beats project thinking for enterprise AI.
  • AI can generate artifacts, but it does not generate responsibility.

Series navigation: Part 1 - The Bottleneck Has Moved · Part 2 - AI Exposes Enterprise Fault Lines · Part 3 of 3

Continued from Part 2.

The new operating model

A recurring theme in enterprise AI adoption is that organizations tend to begin by thinking about technology and gradually find themselves redesigning operating models instead. The transition is rarely planned and almost never announced; it accumulates. One team introduces a coding assistant, another builds an internal retrieval system, a business unit experiments with an AI-supported workflow, and for a while these efforts look entirely independent of one another. A year later the same organization notices that all of them depend on the same underlying capabilities - governance, data access, platform services, evaluation frameworks, observability, operational ownership, security review, and some process for when a system behaves in a way nobody anticipated. At that point AI stops resembling a collection of projects and begins resembling an enterprise capability.

The distinction matters because capabilities are managed differently from projects. A project has a budget, a timeline, a sponsor, and a set of deliverables, after which it is considered finished. A capability has an operating model, an ownership structure, a support model, a governance framework, and a continuous funding requirement that does not end. The strongest pattern I have observed across organizations making measurable progress with AI is that they eventually stop treating it as an innovation initiative and begin treating it as a platform capability that has to be governed, operated, and evolved over time. The organizations that remain stuck usually remain stuck in project thinking, still funding AI in discrete bursts and still mildly surprised, each time, that the previous burst now needs support. The move from project thinking to capability thinking sounds like a matter of vocabulary, but in practice it reorganizes budgets, reporting lines, and on-call rotations.

Editorial diagram showing business, engineering, platform, governance, and operations connected around AI systems as an enterprise capability.

The emerging AI engineering stack

For many years, engineering teams could reason about systems through a reasonably stable set of layers. Applications sat on services, services sat on infrastructure, and infrastructure sat on networks and hardware. The arrangement was rarely simple, but it was familiar, and most operational disciplines were organized around it.

AI introduces additional layers that many organizations are still learning to manage: models, prompts, embeddings, retrieval systems, evaluation datasets, agent frameworks, and the governance policies that bind them together. What makes these layers demanding is not their number but the fact that each carries its own lifecycle. Prompts are revised, embeddings grow stale as the underlying content changes, retrieval quality drifts quietly until someone notices the answers have degraded, models are deprecated and replaced, evaluation datasets need maintenance to stay representative, and agents accumulate dependencies that nobody fully tracks. The consequence is that engineering teams increasingly operate systems with more moving parts than the applications they are accustomed to supporting.

A retrieval assistant illustrates this well, because it is rarely just an interface connected to a model. In practice it depends on document ingestion pipelines, metadata services, access control systems, vector indexes, evaluation frameworks, observability tooling, and governance policies, each of which can fail or degrade on its own schedule. The architecture expands, and the operational burden expands with it. None of this makes AI impractical. It does make AI operationally richer than many early adoption discussions, which tended to consider the model in isolation, were prepared to acknowledge.

The enterprise pattern I keep seeing

Looking across the AI programs I have observed over the last two years, one pattern recurs with some consistency: the organizations that make progress tend to spend less time arguing about models and more time building the operational foundations around them, while the organizations that struggle tend to invert that emphasis. This is not because models are unimportant - they obviously matter - but because model quality rarely remains the dominant constraint beyond the first few months. Once a system is in real use, the conversation reliably migrates toward a different set of questions: how outputs are evaluated consistently, who owns the system, which datasets are permitted to participate in retrieval, how the system is supported operationally, how its behaviour can be explained after the fact, and how far an agent may act on its own.

What I find striking is how similar these questions look across industries. The regulatory language differs between a bank, an insurer, a pharmaceutical company, and a public-sector organization, but the architectural concerns underneath are remarkably alike. This is one reason I have come to think of AI adoption as resembling organizational learning more than technology deployment. Organizations are not really installing new software. They are learning, often for the first time, how to operate a category of system whose behaviour is not fixed at release.

What I would do Monday morning

If I were asked to review an AI-enabled engineering organization today, I would spend considerably less time evaluating model catalogs and considerably more time examining how the surrounding system behaves. Seven questions tend to reveal more than any benchmark comparison.

1. Which bottlenecks have genuinely disappeared?

Many organizations assume their delivery has improved because coding has accelerated. The more useful question is whether the delivery system as a whole has accelerated, or whether the constraint has simply relocated. If implementation is faster but releases are not, the bottleneck has moved somewhere less visible - usually into review, integration, or decision-making - and it is worth knowing exactly where.

2. Which bottlenecks have become more visible?

AI tends to expose weaknesses that already existed: data quality problems, governance gaps, ownership ambiguity, architectural inconsistency. Mapping those weaknesses honestly often creates more value than another round of model comparison, because they are the constraints that will limit every future system rather than only the current one.

3. Can we evaluate behaviour systematically?

Most organizations can demonstrate outputs. Far fewer can explain how quality is measured, or defend that measurement to someone skeptical. Evaluation is quietly becoming one of the more important engineering disciplines in AI-enabled environments, and its absence is usually easier to see in a steering committee than in a demo.

4. Can we reconstruct what happened?

If a regulator, an auditor, a customer, or an executive asks why a system behaved a particular way, can the organization answer with confidence rather than improvising an explanation afterward? The answer depends on lineage, observability, governance, and evidence generation working together, and in regulated environments it is frequently the question that decides whether a system is allowed to stay in production.

5. Who owns the operational lifecycle?

Ownership should be explicit. Models, prompts, retrieval systems, and agents all require continuing support, and where ownership is left implicit it tends to resolve itself at the worst possible moment - during an incident, when the question of who is responsible stops being theoretical. The pattern I see most often is that a system is built by a project team that disbands before anyone has agreed who operates it afterward, and the gap goes unnoticed until the first time something breaks.

6. Where does governance execute?

Governance that lives only in documents drifts away from the running system over time. The more revealing question is whether governance is visible in the system itself - whether policy enforcement, access restrictions, and permissions are actually executing at runtime, or whether they exist only as approved intentions that nobody has reconciled against production in months.

7. Which engineering disciplines are becoming more important?

This question often reveals more than any productivity metric. Organizations that look closely tend to find that architecture, testing, observability, governance, and operational ownership are appreciating in value as AI adoption widens, which is to say the disciplines that surround code are becoming more valuable faster than the act of writing code itself.

The tradeoff most organizations discover late

The most common misconception in enterprise AI is that automation simplifies software delivery on its own. In practice, AI tends to redistribute complexity rather than remove it. Coding effort falls while evaluation effort rises; prototypes become trivial to produce while operational ownership becomes more demanding; implementation accelerates while governance expands to cover categories of risk that did not previously exist. This redistribution is neither good nor bad in itself. It is simply what happens when a powerful new capability is introduced into an already complex enterprise environment.

The organizations that adapt well tend to recognize this early and move their attention accordingly. The ones that struggle often continue to measure success through implementation speed - the metric that improved most obviously - while underestimating governance, evaluation, and operational readiness, which are precisely where the displaced effort went.

Closing reflection

The most interesting consequence of AI may not be that software becomes easier to create. Software has been getting easier to create for decades. Compilers, high-level frameworks, cloud platforms, managed services, and low-code tools each reduced the effort required to produce a working artifact, and AI continues that long trajectory rather than breaking from it. What makes the current moment worth attention is less the falling cost of producing software and more the way that fall changes the relative importance of the disciplines around it.

Architecture becomes more consequential precisely because systems can be created more quickly and mistakes propagate further. Testing becomes harder because behaviour is now probabilistic rather than deterministic. Observability expands because organizations need to understand not only whether a system is healthy but why it produced a particular result. Governance moves closer to runtime because AI systems continue making decisions long after they are deployed. Each of these shifts points in the same direction: the work surrounding code is becoming more demanding as the code itself becomes cheaper.

Through all of it, one requirement stays in place. Organizations still need people capable of exercising judgment - people who can decide what should be built, which risks are acceptable, which tradeoffs make sense, and how accountability should be assigned. AI can generate code, tests, documentation, and even a passable architecture diagram. It does not generate the responsibility for any of them, and in regulated environments responsibility is rarely something that can be delegated to a system that cannot itself be held accountable.

The organizations that benefit most from AI will probably not be the ones that automate coding first. They are more likely to be the ones that learn to govern, evaluate, observe, and operate AI-enabled systems with the same seriousness they once reserved for their systems of record. As the cost of producing software continues to fall, the value of deciding what to produce - and of standing behind that decision afterward - only becomes harder to ignore.

Start the series: Part 1 - The Bottleneck Has Moved.

References

  1. DORA: State of AI-assisted Software Development 2025 — Research on AI-assisted software development, delivery performance, organizational capabilities, and the amplifier effect of AI.
  2. arXiv: The Impact of AI on Developer Productivity - Evidence from GitHub Copilot — Controlled study reporting a 55.8 percent faster task-completion result for GitHub Copilot users.
  3. GitHub: Quantifying Copilot's impact in the enterprise with Accenture — Enterprise research context for AI-assisted coding adoption, satisfaction, and productivity variation.
  4. TechRadar Pro: Observability was built for humans. AI agents need something different — Analysis of why AI agents create new observability requirements beyond traditional human-centric telemetry.
  5. arXiv: AI Assurance - A Comprehensive Testing Strategy for Enterprise AI Systems — Research framing enterprise AI testing as continuous assurance and risk reduction rather than deterministic verification.
  6. arXiv: Cryptographic Runtime Governance for Autonomous AI Systems — Runtime-governance architecture research treating policy and legal constraints as execution conditions.

Author

Géza Kuti is a senior Data and AI executive based in Bülach (ZH), Switzerland, focused on data strategy, enterprise architecture, AI governance, hybrid cloud, and regulated delivery.

Related articles