The gap between "talking about agents" and actually building them is massive right now.
Most people stop at the chatbot wrapper stage and call it an agent. Building something that actually responds to real incidents with real consequences is a completely different problem.
Curious how you handle the trust layer. When the agent recommends an action during an incident, what's the human override flow look like?
I wouldn't give any AI access to change things in production.
Most cloud providers already handle Identity and Access Management, which controls who can access and how they can access. In my team, nobody can access production with an admin role without peer approval (or a high-severity incident)
What I do is give my AI agent view-only credentials and write-permissions to tickets. The worst scenario is that my AI reads something from production, misinterprets it, and posts a wrong comment in a ticket... So it's pretty much harmless.
Current cloud providers also allow for automated triggers (even before AI). Things like auto-scaling your compute or DB once you receive more traffic. I think the future of AI will be similar to this: give AI the ability to take action on controlled things like scaling out, but don't give it permissions to anything. Still, I think it's very early to trust AI with any write permissions
The furniture assembly analogy is the best explanation of agents I've seen.
Most people skip step 1 though. They jump straight to tooling without understanding the manual process first. That's why their agents break in weird ways. You can't automate what you don't understand.
The 10-step framework is solid because it forces you to earn each layer of autonomy instead of pretending you can skip to step 10.
Thanks Dhruv, that was exactly the purpose of the article 😄
Most times we move out of FOMO (fear of missing out). We hear people using <a new shiny tool> and we want to jump into using it, without learning how it works or why it’s useful
Really like how you frame the transition as staged capability, not a single “agent switch.”
The part that stood out to me is Step 6→8: SOP-driven orchestration plus scheduled execution is where teams finally move from demo energy to operational reliability.
The "ahá" moment with AI for me is when I can trust it to do one thing very well, thanks to SOPs, and I can trust that it'll get done on time thanks to scheduling it
The gap between "talking about agents" and actually building them is massive right now.
Most people stop at the chatbot wrapper stage and call it an agent. Building something that actually responds to real incidents with real consequences is a completely different problem.
Curious how you handle the trust layer. When the agent recommends an action during an incident, what's the human override flow look like?
I wouldn't give any AI access to change things in production.
Most cloud providers already handle Identity and Access Management, which controls who can access and how they can access. In my team, nobody can access production with an admin role without peer approval (or a high-severity incident)
What I do is give my AI agent view-only credentials and write-permissions to tickets. The worst scenario is that my AI reads something from production, misinterprets it, and posts a wrong comment in a ticket... So it's pretty much harmless.
Current cloud providers also allow for automated triggers (even before AI). Things like auto-scaling your compute or DB once you receive more traffic. I think the future of AI will be similar to this: give AI the ability to take action on controlled things like scaling out, but don't give it permissions to anything. Still, I think it's very early to trust AI with any write permissions
The furniture assembly analogy is the best explanation of agents I've seen.
Most people skip step 1 though. They jump straight to tooling without understanding the manual process first. That's why their agents break in weird ways. You can't automate what you don't understand.
The 10-step framework is solid because it forces you to earn each layer of autonomy instead of pretending you can skip to step 10.
Thanks Dhruv, that was exactly the purpose of the article 😄
Most times we move out of FOMO (fear of missing out). We hear people using <a new shiny tool> and we want to jump into using it, without learning how it works or why it’s useful
Really like how you frame the transition as staged capability, not a single “agent switch.”
The part that stood out to me is Step 6→8: SOP-driven orchestration plus scheduled execution is where teams finally move from demo energy to operational reliability.
Glad you liked it!
The "ahá" moment with AI for me is when I can trust it to do one thing very well, thanks to SOPs, and I can trust that it'll get done on time thanks to scheduling it