Perspectives

The AI SDLC: what good looks like

How AI is reshaping the software development lifecycle (SDLC), and what it takes to get it right

That was then, this is now

What good looks like

AI is now capable of participating across the entire software delivery process, from initial research through to testing and review. This requires a structured and careful approach, because while a tool that can change software at speed is a potentially great asset, it also has the potential to be a dangerous liability, especially if unsupervised.

The waterfall model treated software like a building: specify it completely, then construct it. Agile replaced that with iterative cycles and fast feedback. DevOps extended the loop further, integrating operations into development through continuous integration, continuous delivery, and a focus on flow. Each shift brought the feedback loop tighter and the cycle time shorter.

AI agents are the next iteration. They participate – or have the potential to participate – across the entire lifecycle, including in operations, and they change the economics of time and cost at every stage. This is the AI SDLC: a way of organising software delivery that accounts for AI as a first-class participant, with all the discipline that implies.

By “AI agent”, we mean an autonomous LLM that can take a goal, break it into steps, use tools and context to execute those steps, and iterate on the results. This is distinct from autocomplete or chat-based assistants, and it applies across the lifecycle: agents that write code, agents that produce prototypes, agents that synthesise research, agents that review.

AI agents are a big (and very new) iteration on the way we build software, and there are currently a range of approaches. This guide will look at these approaches, and show how we’ve responded to the challenge at Softwire.

The amplifier effect: good engineering, only more so

What good looks like

AI makes good teams faster and struggling teams worse. If your engineering foundations are sound (reliable testing, frequent and automated deployments, clear standards), well-deployed AI will accelerate delivery. If they are weak, AI will produce more defects, faster, and with misplaced confidence.

Most organisations start by adopting AI as a coding assistant. That captures real value, but it also reveals something important: AI is an amplifier. It magnifies whatever practices are already in place.

For an individual developer with strong habits (clear naming, good test discipline, careful commit hygiene) an AI agent accelerates output while preserving quality. For a developer who cuts corners, AI will cut those same corners faster and more confidently.

The same dynamic applies at the team level. A team with mature CI/CD, well-maintained documentation, and clear architectural boundaries will find that AI slots in quite naturally. A team with inconsistent practices, unclear ownership, or patchy test coverage will find AI makes those problems more visible and more expensive.

The AI SDLC is everyone’s responsibility

This is why the AI SDLC is a team-level concern, not an individual productivity tool. Many of the largest gains come from transforming shared processes: how work is specified, how changes are reviewed, how quality is validated. AI works well on a one-to-one basis almost by default. Making it work across a team requires deliberate design, and there are interesting implications to work through.

When individual developers can produce working code far faster than before, new bottlenecks appear at review and at points of integration, and shared human understanding of the codebase and system as a whole becomes a major challenge.

An AI agent has no intuition about whether its output works. The only way you know it works is through automated validation: test coverage, CI/CD maturity, deployment frequency, change-failure rates. These were always the hallmarks of a well-run engineering organisation, and with AI writing code, they have become load-bearing foundations to an effective SDLC.

Note that automation (as distinct from selective manual inspection and a team’s judgement on code quality), by contrast, won’t be able to keep up with the output of AI; good pipelines were always automated, but automation is now essential.

At Softwire

We run AI adoption assessments that evaluate a client team's existing engineering maturity before introducing AI tooling. Introducing agents into a team with weak foundations tends to accelerate the accumulation of technical debt. We typically start with a baseline review of test coverage, CI/CD pipeline health, documentation quality, and architectural clarity, then build an adoption plan that addresses gaps before (or alongside) AI rollout.

Equally important is the team's engineering judgement: the ability to review AI-generated code critically and efficiently, to recognise when a solution is over-engineered, and to reject output that passes tests but misses the point. Automated process and tooling support this (and are essential given the required pace), but cannot replace a team’s acquired judgement, which we work to preserve and exercise.

We treat DORA metrics and deployment frequency as leading indicators of AI readiness. Teams that can already deploy frequently with low change-failure rates have the feedback infrastructure to catch AI-generated defects early. Teams that deploy infrequently or lack automated rollback tend to need pipeline improvements before AI agents can safely contribute to production code paths.

AI in scoping and design

What good looks like

AI can cut the cost and time of early-stage product work: turning rough ideas into working prototypes in hours, and processing user research at speed. This means you can test and validate more before committing to a full build, reducing the risk of expensive mid-project course corrections.

Significant potential gains come from involving AI at the scoping and design stages, where ambiguity is highest and later rework has expensive consequences.

AI reduces the cost of producing a realistic prototype. Designers, working independently of developers, can rapidly go from a rough wireframe to a working interactive version (which is already in code and directly amenable to further rapid AI-executed iteration). This shift changes the economics of discover-and-define user research, because you can run more iteration cycles, starting earlier, with higher fidelity.

Research synthesis (e.g. interview transcripts, survey coding, affinity mapping) that previously took days can be done in hours. As prototypes are expressed directly in code, there can be a closer, more collaborative and iterative link between design and development – less “coding” than “product engineering”. It is vital, however, that appropriate data safeguards are maintained.

Moving beyond the prototype into build, you need AI-native structured design artefacts: a coherent set of broad specification and task-planning documents, along with system design and architecture decision records that live – crucially – in the repository alongside the code.

These artefacts serve a dual purpose: they are useful documentation for humans, and they provide context for AI agents working downstream. AI-friendly modern project management tooling (vs. legacy systems with poor API surfaces) helps close the loop between planning and execution.

At Softwire

We use spec-driven development on AI-enabled projects: a structured specification document (typically a SPEC.md in the repo root) is produced and reviewed before any code is written. AI agents participate in drafting and refining the spec and planning tasks, but a human signs it off. This ensures difficult questions and ambiguities are resolved before implementation starts rather than later, disruptively.

Harness engineering

What good looks like

Engineers working with AI spend less time writing code and more time defining what the AI should and should not do: setting boundaries, encoding standards, and checking output. This “stewardship” role is a new discipline, and getting it right determines whether AI-generated work is consistent and trustworthy or erratic and expensive to fix.

When an AI agent writes code, the engineer's role shifts from authorship to stewardship. We call this “harness engineering”: the discipline of defining and enforcing the invariants of a codebase in a world where agents are active participants.

This dynamic works in two directions:

Constraints, context, and instructions shape what agents produce.
Agents, in turn, can enforce architectural and quality standards over time, for example by periodically reviewing the codebase against architecture decision records and flagging drift.

The term is deliberate and evocative – a harness both enables and limits. The engineer's job is to define the space in which the agent can operate effectively, and to ensure it cannot operate outside that space.

Repository-level instructions (AGENTS.md or equivalent) encode architectural decisions, code style rules, and boundaries. Linters, static analysis, complexity thresholds, test coverage metrics, and deterministic validation gates provide hard constraints that the agent cannot override. Reusable prompt libraries and role-specific skills give agents focused expertise without bloating a single configuration file.

Good examples matter more than they used to. Well-written tests, clear function signatures, and consistent library usage in the existing codebase all function as in-context learning material. The quality of a codebase directly determines the quality of AI-generated contributions to it.

In practice, harness engineering extends beyond configuration files. A mature setup includes project-specific documentation for different areas of the codebase, connections to automation tools (e.g. Playwright) for visual feedback, and sometimes a project-specific server that lets agents interact with the application under test.

As agent capabilities improve over time, orchestration becomes a further concern: managing pipelines of multiple agents with different roles and coordinating their output. AI capability also shapes technology choices; strongly typed (especially compiled) languages and well-documented platforms tend to produce better agent output, which is worth factoring into early architectural decisions.

At Softwire

We maintain an internal repository of reusable AI skills that encode best-practice approaches to common engineering tasks. These act as composable role definitions: rather than overloading a single agent configuration with every concern, engineers attach the relevant skills for a given task. This keeps agent context focused and behaviour predictable. We also build per-project harness libraries: not a single AGENTS.md file, but a structured set of documentation, skills, and tool configurations tailored to the project's architecture and the specific agents operating within it.

Pairing, aligning, and the team interface

What good looks like

AI makes individual developers more productive, but risks fragmenting the team. When each person works with their own AI in isolation, shared understanding of the system erodes. Teams need deliberate practices to stay aligned on design direction and collectively own the codebase.

XP practices like pair programming and mob programming were designed to spread knowledge and catch errors through continuous human review. AI changes the shape of these practices without eliminating the need for them.

A developer working with an AI agent is already in a form of pairing, where the agent proposes and the human evaluates. The risk is that this becomes an isolated loop: one developer and one agent, producing work that nobody else has context on². Periodic mob sessions (where a team works through an AI-generated changeset together) can counter this by rebuilding shared understanding that individual AI-assisted work tends to erode.

The broader point is that AI makes individual developers more autonomous, which is a benefit until it fragments the team's shared mental model. Therefore, caution must be exercised. Practices that maintain collective ownership of the codebase, whether through mob reviews, rotating agent-configuration ownership, or shared AGENTS.md governance, become more important as individual throughput increases.

At Softwire

We hold periodic alignment sessions on AI-heavy projects, where the team reviews recent architectural changes and agent-generated patterns together. The purpose is not line-by-line code review (which is better handled asynchronously or by automated gates) but maintaining a shared understanding of how the codebase is evolving and whether the agent's output is drifting from the intended design. We also rotate ownership of AGENTS.md and our internal skill repository configurations so that harness engineering does not become siloed knowledge.

Security and simplicity

What good looks like

AI introduces new security risks that cannot be managed by policy alone; they require architectural decisions about what the AI (and the systems it helps build) is permitted to access and do. Simpler systems are easier to secure and easier to verify; complexity is the enemy of confidence.

AI agents introduce specific security risks that require architectural (not procedural) solutions. Grant the agent ‘write’ (or even just ‘read’) access only to what it needs. Enforce infrastructure boundaries (narrow IAM policies, sandboxed execution environments, isolated dev machines) so that a compromised or careless agent cannot cause lateral damage.

The same principle applies to the systems you build, not only the tools you build with. When AI agents contribute to production code, architectural controls become the primary line of defence: strong authentication and authorisation boundaries at every API surface, infrastructure-as-code with least-privilege defaults, and separation of concerns that limits the blast radius of any single defect. If the architecture constrains what is possible, individual code errors (whether human or AI-generated) are less likely to produce security failures.

Sandboxing exists on a spectrum, from lightweight process-level containers through to full virtual machine isolation with near-instant startup times. At the lighter end, application-level sandboxes can restrict what a single process is allowed to do without virtualisation overhead. The right level of isolation depends on the threat model and the acceptable friction for developers. The point is that “just run it in a container” is no longer a sufficient default when the agent has broad access to your codebase and infrastructure.

Supply chain risk has increased over time (not only due to, but exacerbated by, the rise of AI). However, AI makes it cheap to build small utilities from scratch, which may be preferable to pulling in third-party dependencies with their own attack surface. Every additional dependency increases both maintenance burden and the probability of a cascading failure, so it’s worth being highly selective about what you import, review dependencies frequently, and being ready to test and promote dependency updates at speed.

Beware rampant complexity

Behind all of this sits a renewed case for simplicity³. AI has a tendency toward rampant complexity: over-abstracted class hierarchies, unnecessary indirection, clever patterns nobody asked for. An engineer reviewing AI-generated code can only convince themselves it is correct if the system is easy to reason about. Thin, single-purpose handlers (rather than large controllers with shared state and caching layers) produce code that is individually testable and straightforward to review. produces code that is individually testable and straightforward to review.

DRY still applies to functional code, where the aim is to reduce complexity. But much of the historical pressure to reduce boilerplate was essentially about saving typing rather than managing complexity, and it often introduced its own problems: clever abstractions, implicit type coercion, and shared base classes that became a source of obscure bugs. When boilerplate is free to generate and each handler can be tested in isolation (and when isolated blocks of code are more amenable to AI, as well to humans), there is even less need for this “cleverness” than in the past.

At Softwire

We default to sandboxed execution for AI agents, especially on projects with elevated security requirements. Our current standard baseline is container-level isolation with restricted network access and narrowly scoped filesystem permissions, with gVisor or microVM isolation where the threat model warrants it. We maintain internal guidance on dependency policy, and think when to build a small utility⁴ rather than import a package, given that AI reduces the cost of the former while the supply chain risk of the latter continues to grow.

Optimise for review

What good looks like

When AI writes the code, human review becomes the bottleneck. The response is to focus human attention where it matters most (security boundaries, critical infrastructure) and automate the rest. This requires the system to be designed so that high-risk and low-risk areas are cleanly separated, and optimally to facilitate effective, reliable AI review.

As AI takes on more of the writing, review becomes the bottleneck. This has direct implications for how you structure work.

Keep changes small and self-contained. Design systems so that each change touches a narrow surface area. Apply the same principles of flow that DevOps brought to deployment: reduce batch size, shorten queue time, make the review process itself fast and predictable. If your reviews are slow or inconsistent, AI-generated volume will make them slower.

Pay particular attention to test review. If automated tests are the primary mechanism for validating AI-generated code, then the tests themselves become critical. A test that passes but does not actually verify the intended behaviour is worse than no test at all, because it creates false confidence.

Test code needs to be at least as readable and reviewable as production code – and as reliable. A common failure mode in AI-generated tests is the false assertion: a test that runs green but checks something trivial, or that masks a genuine failure behind a commented-out check with a promise to revisit later. Reviewing for this requires understanding the intent behind each test, which is one reason test readability matters as much as production code readability.

Reviewing every line of AI-generated output with the same scrutiny is neither realistic nor a good use of human attention. Static analysis, linters, and formatters should enforce style and mechanical correctness as CI gates before any reviewer sees the code. But as volume increases, a risk-based approach becomes necessary.

Target human review at high-impact areas (infrastructure-as-code, authentication and authorisation boundaries, data access controls are good choices) and rely more on automated review gates and architectural constraints to manage the rest. Note that this is more than a review policy – it is only possible if the architecture supports it. A system where critical logic is cleanly separated from lower-risk application code can be reviewed selectively, but a system where those concerns are entangled cannot.

AI-assisted review adds value too: checking consistency with documented architectural intent, flagging meaningful gaps in test coverage, and verifying that a change addresses the specification. Human reviewers then focus on security implications, architectural fit, and whether the change actually solves the right problem.

At Softwire

We use AI-assisted review gates in CI pipelines: a second agent evaluates PRs against the project's AGENTS.md and architectural rules before a human reviewer sees them. This pre-screening catches style violations, architectural drift, and common AI-generated anti-patterns (excessive abstraction, unnecessary dependencies) so that human reviewers can focus on intent and correctness rather than mechanical compliance.

Trade-offs

What good looks like

Every practice in this article carries a cost: time, overhead, friction. There are no free gains. The discipline is in choosing the right trade-offs for your context and investing deliberately, rather than adopting AI tools and hoping the benefits materialise on their own.

Two reasons to take AI SDLC seriously

What good looks like

AI is already in your engineering teams, often adopted informally by individual developers. Will you govern that adoption deliberately or let it govern you? Organisations that invest in the foundations see measurable improvements in both speed and quality, but those that do not accumulate risk.

The first is efficiency. Organisations that integrate AI across the lifecycle with proper foundations see reductions in cycle time and improvements in flow metrics. Better flow tends to produce better software, because defects are caught earlier and fixed sooner, and feedback is continuous.

The second is inevitability. AI agents are already present in most engineering teams, often adopted piecemeal and bottom-up by individual developers. Without deliberate structure, this produces inconsistent tooling, configuration drift, and ungoverned LLM access to production systems. The choice is whether you shape AI adoption intentionally and for the better, or let it shape your engineering culture by default and for the worse.

At Softwire

We work with clients across financial services, energy and utilities, and government to design and implement AI-native development practices. If you are navigating this transition, we would welcome a conversation.

Tim Benjamin

Chief Technology Officer

24 June 2026

to our monthly newsletter for our latest expert content.

About the AuthorTim Benjamin

Tim Benjamin, Softwire’s CTO, has over 25 years’ experience leading digital transformation across startups and global enterprises. He specialises in scaling teams, delivering AI-driven solutions, and turning emerging technology into real business outcomes, combining entrepreneurial vision with enterprise discipline.

The AI SDLC: what good looks like

That was then, this is now

What good looks like

The amplifier effect: good engineering, only more so

What good looks like

The AI SDLC is everyone’s responsibility

The AI SDLC is everyone’s responsibility

At Softwire

AI in scoping and design

What good looks like

At Softwire

Harness engineering

What good looks like

At Softwire

Pairing, aligning, and the team interface

What good looks like

At Softwire

Security and simplicity

What good looks like

Beware rampant complexity

At Softwire

Optimise for review

What good looks like

At Softwire

Trade-offs

What good looks like

Two reasons to take AI SDLC seriously

What good looks like

At Softwire

About the AuthorTim Benjamin

Further reading

“AI-powered” is meaningless: why energy networks need a sharper strategy

Preparing insurance for agentic AI: the urgent case for data modernisation

The cost of customer experience: why “free” is quietly killing UK retail

“AI-powered” is meaningless: why energy networks need a sharper strategy

Preparing insurance for agentic AI: the urgent case for data modernisation

The cost of customer experience: why “free” is quietly killing UK retail