What is Quality in the Age of Autonomy?

Feb 10, 2026

Software systems are becoming faster, more interconnected, and increasingly capable of acting with reduced human oversight. At the same time, teams continue to rely on proven quality practices - testing, collaboration, judgment, and governance - to manage risk and deliver value.

Let’s talk for a moment about the agency of AI.

Would you trust AI to do any of the following tasks?

Book a 3-day business trip to meet with a customer
Create a new web page based on a sketch of inputs and outputs
Generate next year’s annual budget and submit it to your manager

Each of these tasks has ill-defined requirements, and in many cases could be done by generative AI about as well as an employee. A human employee might come back with questions, provide a status update, and ask for confirmation before completing the task. In the future, AI agents could operate with similar levels of autonomy.

So what’s the difference between delegating work to a software system versus one or more people? The short answer is trust.

With people, you are not just trusting correctness - you are trusting intent, judgment, and control under uncertainty. That kind of trust must be earned across technical, behavioral, organizational, and ethical dimensions.

AI, on the other hand, can be hard to rely on because it is non-deterministic, lacks clear right answers, exhibits emergent failures, changes over time, and can cause harm even when it appears to work.

There is a Russian proverb that many software quality professionals have taken to heart, even if they don’t know its origin: “Trust, but verify.” Much of software quality involves spelling out the assumptions for a system and then testing the emergent system.

Of course, even employees can fail the trust relationship. A new hire may not know the proper procedures, and a long-term employee may develop blind spots or hidden agendas. Trust grows from a real-world track record of responsibility and outcomes. Humans are also (usually) better at handling truly novel situations, whereas generative AI excels at pattern recognition and pattern generation.

Turning generative AI loose in the world can produce unexpected results. For example:

At PNSQC 2023, Kavitha Naveen published a paper titled “Chatbots As Meeting Stand-In: Enhancing Remote Work Productivity.” When I read it, my first thought was: what happens when only bots show up to meetings? Today, meeting chatbots are ubiquitous on my Zoom calls - and sometimes I’m the only human present.

More recently, tech headlines buzzed about Moltbook, an internet forum designed exclusively for AI agents - essentially Reddit for AI. It turns out, AI agents care even less about security than humans.

During its short launch, Moltbook suffered serious security failures, including indirect prompt injection, unsafe OpenClaw “Skills” that enabled remote code execution and data exfiltration, hijacked heartbeat update loops that leaked API keys or ran shell commands, and even a malicious weather plugin that quietly stole private configuration files. The most severe incident occurred on January 31, 2026, when 404 Media reported an unsecured database that allowed attackers to commandeer any agent, forcing a platform shutdown and a reset of all API keys. The Financial Times warned this may foreshadow a future where autonomous agents operate faster than humans can meaningfully monitor or understand.

On a more constructive note, Anthropic researcher Nicholas Carlini recently described using 16 instances of Claude to build a C compiler from scratch. The article notes that “a C compiler is a near-ideal task for semi-autonomous AI: the specification is decades old, test suites already exist, and there’s a known-good reference implementation. Most real-world software projects have none of those advantages.” Even with those parameters, the project needed a lot of guidance and support from Carlini.

Autonomy, of course, isn’t only about AI. Daniel Pink reminds us that autonomy - along with mastery and purpose - is a core driver of human motivation. Modern software practices like Agile and DevOps have spent years trying to give people more control over how they do their work. Teams are encouraged to make local decisions, own outcomes, and respond quickly instead of waiting for top-down direction.

What’s new is that we’re now granting similar autonomy to software systems. But human autonomy is shaped by experience, accountability, and consequences in ways AI is not. Humans build judgment over time and have to explain and live with the results of their decisions, while AI generates outputs based on patterns rather than responsibility.

So in the age of autonomy, the real challenge isn’t just what AI can do on its own, but how we design systems where human and machine autonomy reinforce each other without eroding accountability.

That’s why the theme for PNSQC 2026 is: Quality in the Age of Autonomy. This theme explores how quality practices evolve across a spectrum - from primarily human-driven teams to highly automated and AI-enabled systems. Regardless of tooling or maturity, quality remains grounded in intent, evidence, accountability, and learning.

We want PNSQC 2026 to be a place where you can highlight where generative AI shines - and call out where it falls flat or becomes risky or malicious. Submit your ideas here: www.pnsqc.org/cfp.

PNSQC 2026 welcomes submissions that share practical experience, thoughtful analysis, and transferable lessons across traditional testing, Agile practices, engineering rigor, leadership, and intelligent systems.

Who should submit: Practitioners and engineers involved in testing, test design, reliability, automation, or overall system quality. First-time authors are encouraged - PNSQC values real-world experience and transferable insight over name recognition. We especially welcome papers that share lessons learned, tradeoffs made, and outcomes others can apply.

PNSQC Newsletter & Blog

Discussion about this post

Ready for more?