Surf Workflows
This document describes how Surf controls the way agents perform work inside a repo.
Environments define where work happens. Workflows define how work happens.
The core idea is simple: a Task does not just drop an agent into a repo with vague instructions and hope it follows the right process. Instead, Surf lets each org define a Workflow with explicit stages, deterministic checks, review loops, and completion rules.
This is how teams encode their engineering process into the platform.
Goals
- make agent behavior more reliable than prompt-only instruction files
- let teams encode their own engineering process without building a custom agent stack
- combine flexible agent work with deterministic checks and gates
- make Workflow execution observable and auditable
- keep the Workflow model simple enough that teams can adopt it quickly
Core model
Surf separates a few concepts:
Workflow— the process a repo or org wants Tasks to followTask— a single unit of work entering the platformRun— one execution of a Workflow for a specific TaskStep— one stage inside a WorkflowArtifact— structured output produced by a Step for later Steps or humans
A Workflow is the durable definition. A Run is one live execution of that definition.
Example workflow
Here is the shape of a typical repo Workflow:
ClarifyAgent Step usingClarify Agent. Ask focused questions when the Task is underspecified, then continue once enough context exists.PlanAgent Step usingPlan Agent. Understand the Task, inspect the codebase, identify touched areas, and produce a short plan.CodeAgent Step usingCode Agent. Make the change using editing, environment, browser, schema, and knowledge tools as needed.LintBash Step that runs lint, formatting, or similar checks before moving on.ReviewAgent Step usingReview Agent. Review the diff, check logs or browser behavior if needed, and produce structured findings.DoneTerminal success state.
This is simple, enforceable, and already more reliable than hoping an agent faithfully follows a markdown checklist.
Why Workflows exist
Instruction files such as AGENTS.md are useful for background guidance, but they are not enough to enforce process.
They cannot reliably guarantee that an agent actually follows those instructions in order.
Workflows solve that by moving process from soft prompt guidance into platform-enforced execution.
Design principles
Deterministic outside the Step, flexible inside the Step
The Workflow decides which Step runs next, what counts as success, when to loop back, and when to stop or escalate. The agent decides how to accomplish the mission inside an Agent Step.
Checks happen at deliberate points
Surf does not encourage agents to constantly rerun checks while they are still in the middle of making changes.
A better default is:
- let the agent take a real pass at the Task
- run deterministic checks at a defined point
- if checks fail, loop back with those failures attached
That matches how human engineering processes already work.
Clarification is first-class
Many Tasks arrive with ambiguity.
Surf supports clarifying questions as an integral part of a Workflow, not as an awkward failure mode.
A strong default is a built-in Clarify Step before Plan:
- inspect the Task
- ask focused clarifying questions through the harness’ question tool when needed
- continue once the Task is well-understood and ready to be planned
If no clarification is needed, the Step passes immediately.
The same question mechanism is also available later in a Run when a Step is blocked on missing human input.
What a Step does
A Step is a controlled stage in a Run.
Each Step defines:
- a purpose
- what can run in that Step
- what context is available
- what outputs it can produce
- what outcomes are possible
- where each outcome routes next
Step kinds
Surf uses a small set of built-in Step kinds.
A Step kind defines how work is executed. An Agent defines who or what performs work inside an agent Step.
agent
An Agent Step gives an agent a mission and a constrained tool/context surface.
Examples:
- plan the work
- implement the change
- review the diff for correctness and risk
- investigate a failure and explain the root cause
An Agent Step usually produces structured output, not just free-form text.
Each Agent Step also chooses an Agent profile.
bash
A Bash Step runs bash commands at a deliberate point in the Workflow.
Examples:
bun run linttscpnpm run test
Bash Steps are important because they let the Workflow enforce correctness or compile changes for review without relying on the agent to remember when to run it, and without encouraging constant retesting while the agent is still in the middle of making changes.
Standard Agents and Custom Agents
Surf provides a small set of Standard Agents that work well out of the box, such as Clarify Agent, Plan Agent, Code Agent, and Review Agent. These live in a shared Agent Library and give teams strong defaults for mission framing, output shape, and context usage.
Surf also lets teams author Custom Agents in-platform. A team can create a Custom Agent through a lightweight platform flow, save it into its own library, and then use it inside Workflows the same way it uses a Standard Agent. Examples include Verify Compliance, Security Review, Migration Review, or Docs Consistency Check.
Step outcomes and Run states
Every Step finishes with a small, explicit set of outcomes.
Step outcomes describe what a single Step returned. Run states describe the status of the overall Workflow execution, such as still running, waiting on input, or finished.
A good default set is:
pass— the Step completed successfullyissues— the Step completed and found actionable issuesblocked— the Step cannot proceed without external help or missing contextfail— the Step itself failed unexpectedly
This is important because many useful loops are not true failures.
For example:
- a Review Step finding problems usually returns
issues, notfail - a Lint Step with violations usually returns
issues, notfail - a missing secret or crashed runtime is a real
fail
Transitions and loops
Workflows route based on Step outcome.
That makes patterns like this first-class:
Plan -> Code -> Lint -> Review -> Done
with loops:
Lint found issues -> CodeReview found issues -> Code
A concrete routing example looks like this:
Lint.pass -> ReviewLint.issues -> CodeLint.fail -> blockedReview.pass -> DoneReview.issues -> Code
This is one of the most important parts of the model. It lets Surf encode real engineering process instead of pretending every Run is a straight line.
Bounded loops
Every loop has a limit.
Examples:
- max 2 review loops
- max 3 fix attempts after checks fail
If the limit is exceeded, the Run moves to a clear terminal path such as:
blockedapprovalhuman review
Human in the loop
Human input is a first-class part of the Run model. It does not force Surf to rerun the entire Workflow by default.
It can happen through:
- a planned Human Gate
- an earlier approval point such as after
Plan - an answer to a clarifying question
- feedback on a paused or completed Step
That input can then:
- move the Run forward through an approval
- resume the current Step with clarifying input
- route back to a Workflow-defined target Step when changes are requested
- route to an earlier planning Step when the scope changes
Human messages during a running agent step are treated as instructions to keep working while steering the current Step’s behavior.
When a Step returns blocked because it needs human input, the Run pauses in an awaiting-input state until that input arrives.
Step invalidation and resume
When a Run moves backward, Surf invalidates the affected tail of the Workflow rather than blindly rerunning everything.
The general model is simple: feedback or a failed gate selects a target Step, that Step becomes the resume point, all downstream Steps are invalidated and rerun, and all earlier unaffected Steps remain valid. A Human Gate can route back to Code or Plan, a Review Step can route to a dedicated remediation Step, and an answer to a clarifying question can simply resume Clarify. This gives teams control without making the engine unpredictable.
Artifacts and handoffs
Each Step can leave behind structured Artifacts for later Steps.
Examples:
- a plan with touched areas and risks
- a list of lint failures
- a structured review with findings
- a summary of browser verification
- a note explaining why the Run is blocked
Artifacts matter because later Steps do not have to reconstruct everything from scratch.
This also makes the Workflow more observable for humans.
Workflow Builder
Workflows are composed in a dedicated platform surface.
The Builder lets teams compose the core pieces of a Workflow in one place: Steps, Agent profiles, Checks, Human Gates, transitions, loop limits, and resume targets. It feels visual enough to understand the flow at a glance, but structured enough that configuring a Step still feels precise and fast.
Preview matters a lot here. The Builder makes it easy to understand what happens when a Step returns pass, issues, blocked, or fail, which Step becomes the resume point after feedback, and which parts of the Workflow rerun. It feels more like inspecting a state machine than sketching a diagram.
Templates are built in from the start. Most teams do not begin from a blank page. They start from a strong default such as Clarify -> Plan -> Code -> Check -> Human Gate -> Done, then adjust the flow to fit their own process.
Default workflow patterns
A few common patterns cover most teams.
Plan, implement, validate, review
Plan -> Code -> Review -> Done
This is a strong default shape for many repos.
Investigate, fix, validate
Investigate -> Code -> Review -> Done
Useful for bug reports and failures with unclear causes.
Why this matters
The point of Workflows is not just to make agents more rigid.
The point is to make them more trustworthy in real codebases by giving teams control over:
- when agents are allowed to write
- when deterministic checks run
- when review happens
- when a Run loops back
- when a Run is blocked
- which tools are available at each stage
This is a more reliable way to encode engineering process than long prompt files or static instructions alone.