Five things that decide whether your Agentforce agent actually works

Agentforce is the most interesting thing to happen to Salesforce in years. It is also the easiest to get wrong. An agent demos beautifully in a sandbox with three clean records, then goes to production and either refuses to do anything useful or confidently invents a refund policy that does not exist.

We have now shipped Agentforce agents into live service orgs. The gap between a demo and something you can put in front of real customers comes down to a handful of decisions. Here are the five that matter most.

1. Grounding beats prompting

The single biggest predictor of a useful agent is the quality of what it can retrieve. An agent answering from a vague prompt is a liability; an agent answering from your actual knowledge articles, order records, and policies — through Data Cloud — is an asset.

Before you write a single topic, get the data right: which objects, which knowledge, what is in Data Cloud, and how fresh it is. If the grounding is thin, no amount of prompt-tuning saves you.

An agent is only as trustworthy as the data it can cite. Fix the data model before you tune the personality.

2. Narrow topics, explicit actions

The instinct is to build one agent that does everything. Resist it. Agents work when their topics are narrow and their actions are explicit and well-described. “Handle billing questions” is too broad. “Look up an invoice, explain a charge, and issue a refund under $500 with manager approval above that” is something you can test, govern, and trust.

Each action should map to a deterministic operation underneath — a Flow or an Apex method — that does the actual work. The agent decides whether; your platform code decides how, every time, the same way.

3. Decide what it must never do

Spend as much time on guardrails as on capabilities. For every action, ask: what is the blast radius if the agent calls this incorrectly? Issuing a $50 credit is recoverable. Cancelling a contract is not.

Put the irreversible, high-value, and compliance-sensitive actions behind human approval or simply out of scope. A good agent knows the edge of its authority and escalates cleanly — with full context — rather than guessing.

4. Test like an adversary

Functional testing (“can it answer a simple question?”) is not enough. Test the way a frustrated or adversarial customer would:

Ambiguous requests with missing information
Attempts to talk it into actions outside its scope
Questions where the honest answer is “I don’t know, let me get a human”
Edge cases in the data — closed accounts, partial refunds, duplicate records

The agents that survive production are the ones that were tried to be broken before launch.

5. Instrument outcomes from day one

If you cannot measure it, you are running a demo, not a feature. Decide up front what success means — deflection rate, resolution time, escalation quality, CSAT — and instrument it before go-live. Watch the transcripts in the first weeks; they will tell you exactly where the topics and grounding need work.

The teams that win with Agentforce treat the launch as the start of the tuning loop, not the finish line.

The pattern underneath

Notice that four of these five are not about AI at all. They are about data, scope, governance, and measurement — the same disciplines that separate a healthy Salesforce org from a fragile one. Agentforce does not change the fundamentals. It raises the stakes on getting them right.

Thinking about Agentforce and want it grounded, governed, and measured from day one? Talk to us — it’s what we do.