SINGAPORE – 20 MAY 2026
1. Artificial intelligence is entering the next phase of its evolution. Increasingly agentic systems are opening new possibilities, with significant potential to enhance productivity, improve service delivery, and enable more efficient, citizen-centric public services. At the same time, greater autonomy of agentic systems could introduce new uncertainties, making it essential to understand how AI agents behave in practice, the risks and unintended consequences that may arise, and how established governance frameworks may need to evolve to enable their use.
2. In this context, Google and the Singapore Government, specifically the Cyber Security Agency of Singapore (CSA Singapore), Government Technology Agency of Singapore (GovTech Singapore), and the Infocomm Media Development Authority (IMDA), launched a global-first AI Agents Sandbox in August 2025 to better understand how agents, specifically computer-use agents in this instance, operate in real-world settings, and to use those insights to inform how they are developed, deployed, and governed. The sandbox was conducted over approximately four months, involving close collaboration between all parties.
3. To ensure meaningful and well-rounded insights, participants prioritised three use cases spanning different levels of risk exposure.
- Automated quality assurance — exploring how AI agents could support and automate quality assurance (QA) testing of government digital services, improving reliability and freeing up engineering resources.
Outcome: The agent successfully evaluated government websites, testing response times, search functionality, and page integrity. Using natural language understanding, it correctly identified intentionally seeded inactive pages, filler text, and staging URL mismatches. This demonstrates the broader potential of agentic AI in software quality assurance. Unlike traditional scripted testing tools, AI agents apply interpretation and judgment to dynamic interfaces. -
AI safety testing — automating the safety testing of AI software like chatbots to ensure they meet the government’s requirements prior to deployment.Outcome: The trial showed that AI agents can reliably perform large-scale safety testing across various languages and formats, significantly reducing the manual effort needed for chatbot assessments. While the implementation is not entirely error free, this approach offers a more scalable and consistent way to strengthen AI assurance as the technology evolves.
-
Social assistance applications — assisting citizens in navigating and applying for social assistance programmes, helping to streamline complex processes.
Outcome: The trial demonstrated the agent’s ability to guide applicants or social workers through complex social assistance application processes, potentially reducing the need for substantial resources devoted to in-person assistance, helplines, and manual follow-ups to address errors, omissions, and incomplete submissions.
4. Learn more about the key outcomes of these use cases in the full report.
Managing risks related to AI agents
5. At the same time, several common risk themes and challenges emerged through the sandbox.
- Human oversight — ensuring sufficient control and accountability, particularly where decisions have real-world consequences for individuals.
- Customisation and control — balancing flexibility with safeguards, especially in testing or evaluation environments.
- Cybersecurity — including, most prominently, indirect prompt injection, where an agent could be deceived into performing unintended actions, including remote code execution.
- Data protection and privacy — risks arising where agents interact directly with personal data, including potential privacy breaches or data leakages.
Preparing for a future with AI agents
6. In light of these opportunities and challenges, two broad sets of considerations emerged for the use of AI agents. These included near-term considerations around the kinds of concrete measures, controls, and design choices that organisations may want to consider, as well as longer-term issues that may require deeper study as agentic technologies evolve.
- Strengthening trust and resilience for AI agents today — key considerations include:
a. Choosing where to start with AI agents: The sandbox highlighted the importance of controlled testing and incremental real-world deployment as a means of building confidence and trust in AI agents.
b. Calibrating the level of human oversight: Oversight should be risk-based to balance control and autonomy, with safeguards distributed across the system, the organisation, and the user. Higher-risk actions may require pre-approval, while lower-risk actions can proceed with post-hoc review where outcomes are reversible and redress mechanisms exist.
c. Keeping AI agents secure: Developing and deploying AI agents securely is a shared responsibility. Safeguards should be distributed across the platform/model, system/organisational, and end-user levels, depending on which actors are best placed to anticipate and manage different risks.
d. Balancing flexibility and control in AI agents: Systems should be safe and secure by default, while allowing for calibrated flexibility and customisation where appropriate. Such flexibility should, however, remain bounded so that experimentation does not inadvertently introduce new risks.
- Looking ahead: exploring the horizons of an agentic future — this includes both technical and governance considerations, such as how AI agents are designed, integrated, and supported by underlying infrastructure as they scale, as well as broader questions involving trade-offs, value judgements, and societal implications as these systems become more capable and widely used. Key areas include:
a. Addressing the current technical limitations of AI agents technologies: While still at an early stage of development, the sandbox demonstrated both the potential of agentic systems and what may become possible as the technology matures. It also highlighted areas for further exploration — for example, how screenshot-based perception could be complemented by alternative techniques in scenarios requiring higher accuracy, particularly when handling information-dense content.
b. Potential of multi-agent approaches: During the sandbox, participants discussed the potential for multiple agents to collaborate to review, critique, and refine outputs. While still exploratory, such approaches could unlock new capabilities, but also bring interoperability and governance challenges into sharper focus as these systems evolve. In particular, if agents developed by different organisations or platforms are expected to interact, coalition-led open standards and common foundations — such as Google’s Agent2Agent protocol — will become increasingly important.
c. Building the digital infrastructure for agentic AI: The sandbox highlighted a mismatch between today’s digital environment — largely designated around human users — and one that could reliably support agentic interaction at scale. Elements of the underlying ecosystem, from identity and authentication frameworks to permission and access controls, may need to evolve to accommodate more autonomous, agent-driven interactions.
d. Balancing privacy and personalisation with AI agents: As AI agents become more autonomous and gain greater access to users’ personal context, the tension between the benefits of personalisation and privacy becomes more pronounced. The challenge is in ensuring agents can continue to leverage data to deliver value, while maintaining meaningful user control, minimising unnecessary data use, and exploring privacy-enhancing approaches that move beyond a zero-sum trade-off between utility and protection.
7. The AI Agents Sandbox marks an important step towards building an agile, innovation-friendly approach to governance. With continued commitment, dialogue, and collaboration between industry, government, and the wider ecosystem, there is an opportunity for Singapore — and the global community — to shape a future in which AI agents are deployed in a way that is safe, trusted, and which delivers meaningful impact.
8. Learn more the findings in the full report.