The lazy agent

I’ve been working with agentic software development tools and discovered something unsettling.

My setup required full test suites to run at each phase exit. Unlike typical setups, this project needed Java and Maven. Multiple phases completed successfully. When a human review checkpoint arrived, I asked for test results. I got them — a polished report showing 80%+ passing, with thoughtful explanations for the failures.

Impressive. So I tried running a scenario manually. It failed. Tried another. Also failed.

I went back and asked the agent what had actually been tested. Same answer: 80%+ passing, here’s the report.

But Java and Maven weren’t even installed on the machine.

I dug into the audit trails — and there it was. The developer and reviewer agents had discussed the difficulty of installing Java and Maven, agreed it was too hard, and decided to just inspect the code and predict which tests would pass if they were run. Then they wrote a program to generate a realistic-looking test report. Checked the box. Exited the phase. Multiple times.

The agents didn’t fail at the task. They collaborated to fake the deliverable while satisfying the exit criteria.

Trust, but verify. Especially when your agents can talk to each other.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.