Surf Workflows

This document describes how Surf controls the way agents perform work inside a repo.

Environments define where work happens. Workflows define how work happens.

The core idea is simple: a Task does not just drop an agent into a repo with vague instructions and hope it follows the right process. Instead, Surf lets each org define a Workflow with explicit stages, deterministic checks, review loops, and completion rules.

This is how teams encode their engineering process into the platform.

Goals

Core model

Surf separates a few concepts:

A Workflow is the durable definition. A Run is one live execution of that definition.

Example workflow

Here is the shape of a typical repo Workflow:

  1. Clarify Agent Step using Clarify Agent. Ask focused questions when the Task is underspecified, then continue once enough context exists.
  2. Plan Agent Step using Plan Agent. Understand the Task, inspect the codebase, identify touched areas, and produce a short plan.
  3. Code Agent Step using Code Agent. Make the change using editing, environment, browser, schema, and knowledge tools as needed.
  4. Lint Bash Step that runs lint, formatting, or similar checks before moving on.
  5. Review Agent Step using Review Agent. Review the diff, check logs or browser behavior if needed, and produce structured findings.
  6. Done Terminal success state.

This is simple, enforceable, and already more reliable than hoping an agent faithfully follows a markdown checklist.

Why Workflows exist

Instruction files such as AGENTS.md are useful for background guidance, but they are not enough to enforce process.

They cannot reliably guarantee that an agent actually follows those instructions in order.

Workflows solve that by moving process from soft prompt guidance into platform-enforced execution.

Design principles

Deterministic outside the Step, flexible inside the Step

The Workflow decides which Step runs next, what counts as success, when to loop back, and when to stop or escalate. The agent decides how to accomplish the mission inside an Agent Step.

Checks happen at deliberate points

Surf does not encourage agents to constantly rerun checks while they are still in the middle of making changes.

A better default is:

  1. let the agent take a real pass at the Task
  2. run deterministic checks at a defined point
  3. if checks fail, loop back with those failures attached

That matches how human engineering processes already work.

Clarification is first-class

Many Tasks arrive with ambiguity.

Surf supports clarifying questions as an integral part of a Workflow, not as an awkward failure mode.

A strong default is a built-in Clarify Step before Plan:

  1. inspect the Task
  2. ask focused clarifying questions through the harness’ question tool when needed
  3. continue once the Task is well-understood and ready to be planned

If no clarification is needed, the Step passes immediately.

The same question mechanism is also available later in a Run when a Step is blocked on missing human input.

What a Step does

A Step is a controlled stage in a Run.

Each Step defines:

Step kinds

Surf uses a small set of built-in Step kinds.

A Step kind defines how work is executed. An Agent defines who or what performs work inside an agent Step.

agent

An Agent Step gives an agent a mission and a constrained tool/context surface.

Examples:

An Agent Step usually produces structured output, not just free-form text.

Each Agent Step also chooses an Agent profile.

bash

A Bash Step runs bash commands at a deliberate point in the Workflow.

Examples:

Bash Steps are important because they let the Workflow enforce correctness or compile changes for review without relying on the agent to remember when to run it, and without encouraging constant retesting while the agent is still in the middle of making changes.

Standard Agents and Custom Agents

Surf provides a small set of Standard Agents that work well out of the box, such as Clarify Agent, Plan Agent, Code Agent, and Review Agent. These live in a shared Agent Library and give teams strong defaults for mission framing, output shape, and context usage.

Surf also lets teams author Custom Agents in-platform. A team can create a Custom Agent through a lightweight platform flow, save it into its own library, and then use it inside Workflows the same way it uses a Standard Agent. Examples include Verify Compliance, Security Review, Migration Review, or Docs Consistency Check.

Step outcomes and Run states

Every Step finishes with a small, explicit set of outcomes.

Step outcomes describe what a single Step returned. Run states describe the status of the overall Workflow execution, such as still running, waiting on input, or finished.

A good default set is:

This is important because many useful loops are not true failures.

For example:

Transitions and loops

Workflows route based on Step outcome.

That makes patterns like this first-class:

Plan -> Code -> Lint -> Review -> Done

with loops:

A concrete routing example looks like this:

This is one of the most important parts of the model. It lets Surf encode real engineering process instead of pretending every Run is a straight line.

Bounded loops

Every loop has a limit.

Examples:

If the limit is exceeded, the Run moves to a clear terminal path such as:

Human in the loop

Human input is a first-class part of the Run model. It does not force Surf to rerun the entire Workflow by default.

It can happen through:

That input can then:

Human messages during a running agent step are treated as instructions to keep working while steering the current Step’s behavior.

When a Step returns blocked because it needs human input, the Run pauses in an awaiting-input state until that input arrives.

Step invalidation and resume

When a Run moves backward, Surf invalidates the affected tail of the Workflow rather than blindly rerunning everything.

The general model is simple: feedback or a failed gate selects a target Step, that Step becomes the resume point, all downstream Steps are invalidated and rerun, and all earlier unaffected Steps remain valid. A Human Gate can route back to Code or Plan, a Review Step can route to a dedicated remediation Step, and an answer to a clarifying question can simply resume Clarify. This gives teams control without making the engine unpredictable.

Artifacts and handoffs

Each Step can leave behind structured Artifacts for later Steps.

Examples:

Artifacts matter because later Steps do not have to reconstruct everything from scratch.

This also makes the Workflow more observable for humans.

Workflow Builder

Workflows are composed in a dedicated platform surface.

The Builder lets teams compose the core pieces of a Workflow in one place: Steps, Agent profiles, Checks, Human Gates, transitions, loop limits, and resume targets. It feels visual enough to understand the flow at a glance, but structured enough that configuring a Step still feels precise and fast.

Preview matters a lot here. The Builder makes it easy to understand what happens when a Step returns pass, issues, blocked, or fail, which Step becomes the resume point after feedback, and which parts of the Workflow rerun. It feels more like inspecting a state machine than sketching a diagram.

Templates are built in from the start. Most teams do not begin from a blank page. They start from a strong default such as Clarify -> Plan -> Code -> Check -> Human Gate -> Done, then adjust the flow to fit their own process.

Default workflow patterns

A few common patterns cover most teams.

Plan, implement, validate, review

Plan -> Code -> Review -> Done

This is a strong default shape for many repos.

Investigate, fix, validate

Investigate -> Code -> Review -> Done

Useful for bug reports and failures with unclear causes.

Why this matters

The point of Workflows is not just to make agents more rigid.

The point is to make them more trustworthy in real codebases by giving teams control over:

This is a more reliable way to encode engineering process than long prompt files or static instructions alone.