How Agentic AI Became Possible
How Agentic AI Became Possible: the honest reality
3/18/20265 min read
I've spent the last few years building systems that coordinate autonomous AI agents in environments where failure isn't an option — government infrastructure, industrial monitoring, critical public services. My framework, Deep Workflow Orchestration (DWO), now runs in production across Hong Kong's mechanical and electrical systems. The Ritz Herald recently described that work as the architecture that "turned AI agents into infrastructure," and called me a pioneer of agentic orchestration in AI.
But I didn't invent this field out of thin air. Nobody does. Agentic AI is the product of decades of accumulated thought — some brilliant, some painful, all necessary. What I want to do in this post is lay out the full arc of how we got here, where my own contributions fit into that arc, and where I believe this is all heading. If you work in AI, build with agents, or simply want to understand why this moment matters, this timeline is for you.
The Deep History Most People Skip
The story of agentic AI doesn't start in 2023. It starts with a question Alan Turing posed in the 1950s: can machines think? That question, and the decades of work it inspired from Turing, John McCarthy, Marvin Minsky, and others, established the conceptual bedrock everything else was built on. Through the 1950s, 1960s, 1970s, and into the 1980s, researchers explored rule-based systems — programs that followed explicit logical instructions to simulate narrow forms of reasoning. These early systems were reactive rather than autonomous. They didn't plan, adapt, or pursue goals across steps. But they proved something essential: that machines could encode and apply structured knowledge in useful ways.
From the 1980s through the 2000s, two parallel developments pushed things forward. Expert systems — like MYCIN for medical diagnosis or XCON for computer configuration — demonstrated that encoding specialist knowledge into software could produce real, practical value in constrained domains. Meanwhile, a branch of Distributed AI research gave rise to Multi-Agent Systems, or MAS. The core insight behind MAS was that some problems are simply too complex, too distributed, or too dynamic for any single agent to solve alone. You need multiple agents, each with its own role, collaborating toward a shared objective. This was the conceptual seed of everything I would later build. But at the time, MAS remained largely academic. The infrastructure, the models, and the compute simply weren't there to make it operational at scale.
The Chatbot Detour
The 2010s brought a shift that, in hindsight, was both a step forward and a distraction. The rise of intent-based chatbots — particularly in customer service — showed that AI could parse natural language, recognize what a user wanted, and respond in contextually appropriate ways. This was meaningful progress. But these systems had a fundamental ceiling: they could reply, but they couldn't act. They couldn't break a complex goal into subtasks, execute those subtasks across tools and data sources, and adapt when conditions changed. They were conversational interfaces, not autonomous workers. The industry spent nearly a decade optimizing for this paradigm, and it created enormous commercial value, but it didn't solve the harder problem of genuine agency.
2023: The Framework Explosion
Everything changed in 2023. The combination of powerful large language models, cheaper inference, and maturing tooling made it suddenly practical to build systems where multiple AI agents collaborated on multi-step tasks. Microsoft released AutoGen as an open-source multi-agent system. Amazon launched Bedrock Agents, offering fully managed, goal-driven agent capabilities in the cloud. Retrieval-Augmented Generation — RAG — became a standard mechanism for giving agents access to external knowledge, essentially functioning as a memory layer that kept responses grounded in real data rather than pure hallucination.
This was the Framework Era. For the first time, developers could spin up multi-agent systems that genuinely talked to each other, split tasks, and produced impressive results. The energy was infectious. But here's what I noticed, even then: the results were impressive in controlled settings. Demos were dazzling. Production was a different story.
2024: The Mainstream Moment — and Its Limits
By 2024, the term "agentic AI" had entered the mainstream lexicon, championed by influential voices like Andrew Ng. New frameworks arrived rapidly. CrewAI introduced role-based agent collaboration. LangGraph brought graph-based orchestration that could handle complex, iterative cycles of reasoning and action. The tooling was getting more sophisticated, and the ambition was scaling to match.
But I was watching a gap widen — a gap between what these frameworks could do on a developer's laptop and what they could survive in a real operational environment. The moment you tried to deploy multi-agent systems into high-stakes settings — government operations, industrial monitoring, critical public services — everything buckled. Hallucinations didn't just occur; they cascaded from one agent to the next, compounding errors at every step. State got lost between workflow stages. Long-running processes collapsed without recovery mechanisms. There were no deterministic safety checks, no stable governance layers, no way to keep a human in the loop at critical decision points.
The frameworks were built for developers writing scripts. They were not built for institutions running cities.
Early 2025: Deep Workflow Orchestration
This is where my work entered the picture. In April 2025, I introduced the Deep Workflow Orchestrator at a Hong Kong Government EMSD event (it was built for them). DWO was designed from the ground up to solve the problem I just described: not smarter agents, but smarter management of agents. A central nervous system for coordinating independent AI workers and human operators in real time, with high-stability communication, reduced data transfer overhead in complex agent ecosystems, and deterministic safety checks at every step.
The critical design choice — the one that made government adoption possible — was what I call the human-in-the-loop standard. In environments where pure autonomy isn't just risky but unacceptable, you need architecture that ensures people remain at the center of every critical decision. Not as an afterthought or an override button, but as a structural feature of how the system operates. DWO was built around that principle.
DWO is now embedded in the monitoring of complex public systems across Hong Kong — one of the densest urban environments on the planet. Not a pilot. Not a sandbox. Production infrastructure that millions of people depend on.
Mid-to-Late 2025: The Standardization Wave
Once orchestration proved viable in production, the industry moved toward standardization. Anthropic's Model Context Protocol (MCP) emerged as an open standard for connecting agents to external tools and data sources, giving the ecosystem a common language for interoperability. Salesforce launched Agentforce 2.0, an enterprise-scale orchestration platform. Across the industry, the conversation shifted from "can agents work together?" to "how do we govern, standardize, and scale the systems that manage them?"
This was validation. Not of any single product, but of the entire thesis that orchestration — not raw model capability — is the bottleneck standing between experimental AI and operational AI.
2026 and Beyond: The Inflection Point
We are now entering what I believe is an inflection point. Businesses are beginning to scale what some are calling "Agentic Meshes" — modular, interconnected workflows where AI agents are managed as a first-class workforce alongside human teams. Microsoft's Agent 365 vision points in this direction. So does the broader enterprise push to treat agents not as novelty features but as durable operational resources.
The trajectory is clear: agents will become infrastructure. They will monitor systems, execute workflows, make recommendations, escalate exceptions, and operate continuously — not because they replace human judgment, but because they extend it into domains where speed, scale, and complexity exceed what people can handle alone.
The hard part was never building a clever agent. The hard part was building the connective tissue between intelligence and infrastructure — the orchestration layer that makes agents trustworthy enough to operate where it matters. That's the problem I set out to solve. That's the problem DWO was built for. And that's the problem the entire industry is now organized around.