Five things that decide whether your Agentforce agent actually works
Agentforce demos beautifully and fails quietly. After deploying agents in production, here is what separates the ones that deflect real tickets from the ones that embarrass you.
Agentforce is the most interesting thing to happen to Salesforce in years. It is also the easiest to get wrong. An agent demos beautifully in a sandbox with three clean records, then goes to production and either refuses to do anything useful or confidently invents a refund policy that does not exist.
We have now shipped Agentforce agents into live service orgs. The gap between a demo and something you can put in front of real customers comes down to a handful of decisions. Here are the five that matter most.
1. Grounding beats prompting
The single biggest predictor of a useful agent is the quality of what it can retrieve. An agent answering from a vague prompt is a liability; an agent answering from your actual knowledge articles, order records, and policies — through Data Cloud — is an asset.
Before you write a single topic, get the data right: which objects, which knowledge, what is in Data Cloud, and how fresh it is. If the grounding is thin, no amount of prompt-tuning saves you.
An agent is only as trustworthy as the data it can cite. Fix the data model before you tune the personality.
2. Narrow topics, explicit actions
The instinct is to build one agent that does everything. Resist it. Agents work when their topics are narrow and their actions are explicit and well-described. “Handle billing questions” is too broad. “Look up an invoice, explain a charge, and issue a refund under $500 with manager approval above that” is something you can test, govern, and trust.
Each action should map to a deterministic operation underneath — a Flow or an Apex method — that does the actual work. The agent decides whether; your platform code decides how, every time, the same way.
3. Decide what it must never do
Spend as much time on guardrails as on capabilities. For every action, ask: what is the blast radius if the agent calls this incorrectly? Issuing a $50 credit is recoverable. Cancelling a contract is not.
Put the irreversible, high-value, and compliance-sensitive actions behind human approval or simply out of scope. A good agent knows the edge of its authority and escalates cleanly — with full context — rather than guessing.
4. Test like an adversary
Functional testing (“can it answer a simple question?”) is not enough. Test the way a frustrated or adversarial customer would:
- Ambiguous requests with missing information
- Attempts to talk it into actions outside its scope
- Questions where the honest answer is “I don’t know, let me get a human”
- Edge cases in the data — closed accounts, partial refunds, duplicate records
The agents that survive production are the ones that were tried to be broken before launch.
5. Instrument outcomes from day one
If you cannot measure it, you are running a demo, not a feature. Decide up front what success means — deflection rate, resolution time, escalation quality, CSAT — and instrument it before go-live. Watch the transcripts in the first weeks; they will tell you exactly where the topics and grounding need work.
The teams that win with Agentforce treat the launch as the start of the tuning loop, not the finish line.
The pattern underneath
Notice that four of these five are not about AI at all. They are about data, scope, governance, and measurement — the same disciplines that separate a healthy Salesforce org from a fragile one. Agentforce does not change the fundamentals. It raises the stakes on getting them right.
Thinking about Agentforce and want it grounded, governed, and measured from day one? Talk to us — it’s what we do.