
A case study by ApPello: How we use AI in software development
At ApPello, AI is not just a product feature we showcase to our Partners — it is a working discipline embedded across our entire delivery process. After sharing our experiences from the discovery phase, let us now move one step further: into development itself. Here is how we use AI to write, test, and review code, and the practices that make the difference between an impressive demo and reliably shippable software.
From a User Story to a credible plan
Development used to begin with a developer reading a ticket and forming a private mental model of what needed to be built. Now it begins with a conversation. We feed the agent the relevant codebase together with the User Story, and instead of jumping straight to code, the agent conducts a structured interview — asking clarifying questions about business rules, edge cases, and integration points, iterating until the implementation plan is clear and complete. The plan, not the prompt, becomes the contract for what gets built.
A practical lesson that multiplies the value of this: we store our system specifications in Git, alongside the code. When the existing documentation lives next to the implementation, the agent gains a far better view of the project — it reasons about how the system already behaves, not just about the lines in front of it. Context is everything, and context that travels with the code is context the agent can actually use.
Code generation that closes its own loop
With an approved plan, the agent moves to code generation and unit tests — and crucially, it does not stop at producing text. It runs the project itself, exercises the application through the Claude browser plugin to verify behaviour end to end, and checks its own work against a growing toolkit: SonarQube for static analysis, our ApPello test-tool skill (built on QApPello) for functional validation — which can even generate end-to-end tests, though that deserves an article of its own in a later post — and a Jira skill that lets it pull content straight from Jira — descriptions, comments, and attachments — and summarize it, so the relevant project knowledge is drawn into the agent's context instead of being lost in tickets nobody re-reads. The agent uses these tools iteratively as it revises the code, and tests itself once it believes a task is done. The shift is subtle but decisive: the agent is not just writing code, it is taking responsibility for whether that code actually works.
Effective prompting and collaboration patterns
The reliability comes less from clever single prompts than from a disciplined pattern. Give the agent the right context up front — code, specs, and the story. Force a planning step before any code is written. And let the agent verify its own output through real tools rather than trusting its own claim of success. This is the heart of it: an agent works reliably only when it can tell whether it did a good job — so it needs verifiable output. That comes from two sides. First, acceptance criteria written into the plan, giving the agent a concrete target to check itself against. Second, access to as many real tools as possible — running the unit tests, executing SonarQube, triggering the automated test tool, clicking through the application via the browser plugin. The more ways the agent has to observe the actual result of its work, the better it knows whether the output is correct or not. And keep every change traceable: the agent prepares the work in GitLab and opens the Merge Request, where an AI reviewer provides a first pass over correctness, style, and risk.
Where human judgment remains irreplaceable
None of this removes the engineer — it elevates them. We hold firmly to the four-eyes principle: every change passes through a human in the loop before it is accepted. The AI reviewer catches the mechanical and the obvious; the human reviewer catches what the AI cannot see — the architectural consequence two systems away, the business assumption that was never written down, the subtle reason a "correct" change is the wrong one here. AI provides the scaffold and does the tireless work. Experienced judgment determines whether the result is credible, safe, and complete. Only then do we merge.
We believe in AI and we see the results — faster plans, self-testing code, and reviews that catch more, earlier. But the gains are real precisely because a human still owns the final call.
In the coming posts, we will continue sharing the techniques and practical approaches we have developed across our delivery process. Stay tuned!
