Surf Workflows

This document describes how Surf controls the way agents perform work inside a repo.

Environments define where work happens. Workflows define how work happens.

The core idea is simple: a Task does not just drop an agent into a repo with vague instructions and hope it follows the right process. Instead, Surf lets each org define a Workflow with explicit stages, deterministic checks, review loops, and completion rules.

This is how teams encode their engineering process into the platform.

Goals

make agent behavior more reliable than prompt-only instruction files
let teams encode their own engineering process without building a custom agent stack
combine flexible agent work with deterministic checks and gates
make Workflow execution observable and auditable
keep the Workflow model simple enough that teams can adopt it quickly

Core model

Surf separates a few concepts:

Workflow — the process a repo or org wants Tasks to follow
Task — a single unit of work entering the platform
Run — one execution of a Workflow for a specific Task
Step — one stage inside a Workflow
Artifact — structured output produced by a Step for later Steps or humans

A Workflow is the durable definition. A Run is one live execution of that definition.

Example workflow

Here is the shape of a typical repo Workflow:

Clarify Agent Step using Clarify Agent. Ask focused questions when the Task is underspecified, then continue once enough context exists.
Plan Agent Step using Plan Agent. Understand the Task, inspect the codebase, identify touched areas, and produce a short plan.
Code Agent Step using Code Agent. Make the change using editing, environment, browser, schema, and knowledge tools as needed.
Lint Bash Step that runs lint, formatting, or similar checks before moving on.
Review Agent Step using Review Agent. Review the diff, check logs or browser behavior if needed, and produce structured findings.
Done Terminal success state.

This is simple, enforceable, and already more reliable than hoping an agent faithfully follows a markdown checklist.

Why Workflows exist

Instruction files such as AGENTS.md are useful for background guidance, but they are not enough to enforce process.

They cannot reliably guarantee that an agent actually follows those instructions in order.

Workflows solve that by moving process from soft prompt guidance into platform-enforced execution.

Design principles

Deterministic outside the Step, flexible inside the Step

The Workflow decides which Step runs next, what counts as success, when to loop back, and when to stop or escalate. The agent decides how to accomplish the mission inside an Agent Step.

Checks happen at deliberate points

Surf does not encourage agents to constantly rerun checks while they are still in the middle of making changes.

A better default is:

let the agent take a real pass at the Task
run deterministic checks at a defined point
if checks fail, loop back with those failures attached

That matches how human engineering processes already work.

Clarification is first-class

Many Tasks arrive with ambiguity.

Surf supports clarifying questions as an integral part of a Workflow, not as an awkward failure mode.

A strong default is a built-in Clarify Step before Plan:

inspect the Task
ask focused clarifying questions through the harness’ question tool when needed
continue once the Task is well-understood and ready to be planned

If no clarification is needed, the Step passes immediately.

The same question mechanism is also available later in a Run when a Step is blocked on missing human input.

What a Step does

A Step is a controlled stage in a Run.

Each Step defines:

a purpose
what can run in that Step
what context is available
what outputs it can produce
what outcomes are possible
where each outcome routes next

Step kinds

Surf uses a small set of built-in Step kinds.

A Step kind defines how work is executed. An Agent defines who or what performs work inside an agent Step.

`agent`

An Agent Step gives an agent a mission and a constrained tool/context surface.

Examples:

plan the work
implement the change
review the diff for correctness and risk
investigate a failure and explain the root cause

An Agent Step usually produces structured output, not just free-form text.

Each Agent Step also chooses an Agent profile.

`bash`

A Bash Step runs bash commands at a deliberate point in the Workflow.

Examples:

bun run lint
tsc
pnpm run test

Bash Steps are important because they let the Workflow enforce correctness or compile changes for review without relying on the agent to remember when to run it, and without encouraging constant retesting while the agent is still in the middle of making changes.

Standard Agents and Custom Agents

Surf provides a small set of Standard Agents that work well out of the box, such as Clarify Agent, Plan Agent, Code Agent, and Review Agent. These live in a shared Agent Library and give teams strong defaults for mission framing, output shape, and context usage.

Surf also lets teams author Custom Agents in-platform. A team can create a Custom Agent through a lightweight platform flow, save it into its own library, and then use it inside Workflows the same way it uses a Standard Agent. Examples include Verify Compliance, Security Review, Migration Review, or Docs Consistency Check.

Step outcomes and Run states

Every Step finishes with a small, explicit set of outcomes.

Step outcomes describe what a single Step returned. Run states describe the status of the overall Workflow execution, such as still running, waiting on input, or finished.

A good default set is:

pass — the Step completed successfully
issues — the Step completed and found actionable issues
blocked — the Step cannot proceed without external help or missing context
fail — the Step itself failed unexpectedly

This is important because many useful loops are not true failures.

For example:

a Review Step finding problems usually returns issues, not fail
a Lint Step with violations usually returns issues, not fail
a missing secret or crashed runtime is a real fail

Transitions and loops

Workflows route based on Step outcome.

That makes patterns like this first-class:

Plan -> Code -> Lint -> Review -> Done

with loops:

Lint found issues -> Code
Review found issues -> Code

A concrete routing example looks like this:

Lint.pass -> Review
Lint.issues -> Code
Lint.fail -> blocked
Review.pass -> Done
Review.issues -> Code

This is one of the most important parts of the model. It lets Surf encode real engineering process instead of pretending every Run is a straight line.

Bounded loops

Every loop has a limit.

Examples:

max 2 review loops
max 3 fix attempts after checks fail

If the limit is exceeded, the Run moves to a clear terminal path such as:

blocked
approval
human review

Human in the loop

Human input is a first-class part of the Run model. It does not force Surf to rerun the entire Workflow by default.

It can happen through:

a planned Human Gate
an earlier approval point such as after Plan
an answer to a clarifying question
feedback on a paused or completed Step

That input can then:

move the Run forward through an approval
resume the current Step with clarifying input
route back to a Workflow-defined target Step when changes are requested
route to an earlier planning Step when the scope changes

Human messages during a running agent step are treated as instructions to keep working while steering the current Step’s behavior.

When a Step returns blocked because it needs human input, the Run pauses in an awaiting-input state until that input arrives.

Step invalidation and resume

When a Run moves backward, Surf invalidates the affected tail of the Workflow rather than blindly rerunning everything.

The general model is simple: feedback or a failed gate selects a target Step, that Step becomes the resume point, all downstream Steps are invalidated and rerun, and all earlier unaffected Steps remain valid. A Human Gate can route back to Code or Plan, a Review Step can route to a dedicated remediation Step, and an answer to a clarifying question can simply resume Clarify. This gives teams control without making the engine unpredictable.

Artifacts and handoffs

Each Step can leave behind structured Artifacts for later Steps.

Examples:

a plan with touched areas and risks
a list of lint failures
a structured review with findings
a summary of browser verification
a note explaining why the Run is blocked

Artifacts matter because later Steps do not have to reconstruct everything from scratch.

This also makes the Workflow more observable for humans.

Workflow Builder

Workflows are composed in a dedicated platform surface.

The Builder lets teams compose the core pieces of a Workflow in one place: Steps, Agent profiles, Checks, Human Gates, transitions, loop limits, and resume targets. It feels visual enough to understand the flow at a glance, but structured enough that configuring a Step still feels precise and fast.

Preview matters a lot here. The Builder makes it easy to understand what happens when a Step returns pass, issues, blocked, or fail, which Step becomes the resume point after feedback, and which parts of the Workflow rerun. It feels more like inspecting a state machine than sketching a diagram.

Templates are built in from the start. Most teams do not begin from a blank page. They start from a strong default such as Clarify -> Plan -> Code -> Check -> Human Gate -> Done, then adjust the flow to fit their own process.

Default workflow patterns

A few common patterns cover most teams.

Plan, implement, validate, review

Plan -> Code -> Review -> Done

This is a strong default shape for many repos.

Investigate, fix, validate

Investigate -> Code -> Review -> Done

Useful for bug reports and failures with unclear causes.

Why this matters

The point of Workflows is not just to make agents more rigid.

The point is to make them more trustworthy in real codebases by giving teams control over:

when agents are allowed to write
when deterministic checks run
when review happens
when a Run loops back
when a Run is blocked
which tools are available at each stage

This is a more reliable way to encode engineering process than long prompt files or static instructions alone.