Tutorial For Power Users

Let Hermes run until it's done: `/goal` and `/subgoal` in practice

Hermes Agent

@hermesagents

May 17, 2026

8 min read

Most AI agents run one turn at a time. You type, they respond, they wait. You're the loop. The agent is just a function call.

Hermes Agent has that mode — it's the default. But it also ships a different mode, called /goal, where the agent runs the loop. You set a target plus success criteria, the agent proposes, executes, evaluates, retries, and keeps going until a separate judge LLM agrees the criteria are met. v0.14.0 (May 16, 2026) added /subgoal so you can layer extra criteria onto an active loop mid-flight without restarting (#25449).

This post walks through what's actually happening, when it's the right tool, and the failure modes that make it the wrong tool.

The default mode (for contrast)

Every chat session in Hermes is one turn per user message. The agent reads your message, optionally calls tools, then returns a response. If the task isn't done, you have to nudge it: "keep going," "try again," "what about X." You are the outer loop.

This is fine for exploratory work — "explain this code," "draft a memo," "find me a bug." You want the agent to pause after each step so you can redirect.

It is not fine for tasks where the success condition is concrete and the path to get there is iterative. "Refactor this module until pytest passes" is a thirty-turn task if you drive each turn. With /goal it's a single command.

What `/goal` actually does

/goal Make all tests in tests/api/ pass. Don't change the test assertions. Done when pytest exits 0.

When you send that, three things happen:

1.The goal text becomes the target prompt for the worker. Each subsequent turn, the worker model gets a system message that includes the goal and the current best attempt.
2.A separate "judge" LLM call runs after each worker turn. The judge sees the goal, the current state, and the proposed completion. It returns either "done" (loop exits) or "keep going, here's what's still missing."
3.The loop continues until the judge says done — or until you stop it with /stop, or until it hits the configured iteration limit.

The judge is the key piece. It is not the same LLM call as the worker, and it doesn't see the chain of thought — only the goal and the current state. That separation is what makes /goal work: a worker model that's already convinced its answer is right is a bad judge. A fresh LLM call with no context is a much better one.

This pattern, internal to the Hermes codebase, is called the "Ralph loop" after the canonical pseudocode while not done: do(work); ralph = judge(work). v0.14.0's /subgoal extension lets the user inject new judge criteria into a running loop.

`/subgoal` — appending criteria mid-flight

You started a /goal to refactor a module. Three loops in, you realize you also want the refactor to keep cyclomatic complexity below 10 per function. You don't want to stop the loop and restart.

/subgoal Each function must have cyclomatic complexity <= 10.

The next time the judge runs, it factors that new constraint in. If the current best attempt fails it, the loop keeps going. If the current best attempt passes it, the loop exits.

This is the kind of feature that looks small in a release-notes bullet — "user-added criteria appended to an active /goal" — and turns out to be load-bearing for anyone who actually uses the loop. Real goals get refined as you watch the agent work. Without /subgoal, the only way to refine was /stop + redefine + /goal again, losing the in-progress state.

Practical examples

Refactor until tests pass

/goal Refactor src/api/users.py so the User class follows the new naming convention in src/conventions.md. Don't break any existing tests. Done when:
1. pytest exits 0
2. The User class matches the convention rules in conventions.md

The worker tries refactors, the judge checks both conditions. When both are green, loop exits.

Iterate on a UI

/goal Make the button on /pricing more prominent. Done when:
1. The button is the largest interactive element above the fold on desktop
2. It uses the primary brand color (#FF5A50)
3. Existing Lighthouse accessibility score doesn't drop

The worker edits CSS, the judge takes a screenshot via the browser tool and checks. Lots of iterations possible without you babysitting.

Find a bug

/goal Find the cause of the intermittent test failure in tests/auth/test_session.py::test_logout_clears_cookie. Done when you produce a minimal failing repro and a one-paragraph explanation.

The judge here is checking that both parts of the deliverable exist — repro and explanation — not just whether one or the other landed. /subgoal lets you add a constraint like "explanation must reference the relevant request/response cycle" if the first draft is too vague.

When not to use it

/goal is the wrong tool for tasks where:

•The success condition is fuzzy. "Make this more elegant" — the judge can't grade it consistently, so the loop oscillates or rubber-stamps. Use turn-by-turn here.
•You want to see the work as it happens. Each iteration runs to completion before the judge fires, so you don't get the same per-turn visibility. Use turn-by-turn or /handoff if mid-stream review matters.
•The cost matters more than the speed. Each loop iteration is a worker call plus a judge call. For a 10-iteration goal, you're paying 20 LLM calls. Worth it for refactor work; wasteful for "what should I name this variable."
•You haven't thought about the success criteria. Garbage criteria → garbage loop. /goal rewards specificity, and the agent will exploit ambiguity.

How `/goal` interacts with `/handoff`

v0.14.0 also shipped /handoff, which transfers a live session between models without losing context (#23395). The two compose: you can hand a goal-in-progress from a fast model to a deep-reasoning model when the goal hits something the fast model can't solve. The judge keeps grading the same criteria; the worker just got better.

Same with /sessions (#20805) — you can interrupt a goal, browse to a different session, and resume the goal later. The loop state is checkpointed.

Where this fits in the agent stack

Three different shapes of autonomous work, in increasing order of how-much-driving-the-agent-does:

1.Turn-by-turn — you drive, agent responds. Conversational.
2./goal — you set criteria, agent loops until met. Bounded autonomy.
3.Cron scheduling — agent runs unattended on a schedule, with delivery to messaging platforms. Unbounded autonomy in time.

/goal is the middle one. It's the right reach for a category of tasks that used to require either heavy babysitting or a custom script. v0.14.0's /subgoal makes the loop steerable mid-flight, which is the thing that turns it from a curiosity into a daily tool.