ADR-005: Plan-First, Human-Checkpointed Workflow

Status: Accepted
Date: 2026-05-09

Context

Embercore automates multi-step marketing workflows using LLM-powered agents. A typical workflow might involve research → copywriting → design brief → assembly. Two execution philosophies were considered:

Autonomous execution — The system plans and executes without human intervention. Fast, but risky for brand-sensitive marketing content.
Plan-first with checkpoints — Generate a plan, let humans review it, then execute with approval gates. Slower, but ensures quality and brand alignment.

Marketing teams need to maintain control over brand voice, messaging accuracy, and strategic alignment. Fully autonomous AI-generated marketing content poses reputational risk.

Decision

We implement a plan-first, human-checkpointed workflow with pause/resume support.

Phase 1: Plan generation (Athena)

Athena generates a structured plan spec from a natural-language brief:

// plan/plan.go
type Spec struct {
    Name        string  `yaml:"name"`
    Description string  `yaml:"description"`
    Input       Input   `yaml:"input"`
    Steps       []Step  `yaml:"steps"`
}

type Step struct {
    Name       string   `yaml:"name"`
    Agent      string   `yaml:"agent"`
    Prompt     string   `yaml:"prompt"`
    DependsOn  []string `yaml:"depends_on"`
    Checkpoint bool     `yaml:"checkpoint"`
}

Plans are validated (plan.Validate) and topologically sorted (plan.TopoSort via Kahn’s algorithm) to determine execution order and parallelizable layers.

Phase 2: Human review

Before execution, the generated plan can be:

Inspected via embercore status <plan-id>
Saved to disk as YAML (--output flag)
Modified by hand before running
Approved or rejected

Phase 3: Checkpoint-gated execution (Hermes)

Hermes executes the plan step by step. Steps marked checkpoint: true trigger an approval gate:

// agents/hermes/hermes.go
type CheckpointHandler func(step plan.Step, output string) (approved bool, feedback string, err error)

At each checkpoint:

Hermes pauses execution and presents the step output
The handler (interactive stdin prompt or MCP request_approval tool) asks for human approval
If approved → execution continues to the next step
If rejected → feedback is captured, and the step can be re-executed or the run paused

Phase 4: Pause and resume

Execution state is persisted in SQLite (via Hestia) and on the filesystem (internal/state/):

Hestia.RecordCheckpoint() — saves step-level state to the database
Hestia.GetRunState() / ResumeRun() — retrieves and resumes paused runs
internal/state/ — atomic file writes, PID lockfiles, session metadata (.session.meta.json)
CLI: embercore resume <run-id> picks up where execution stopped

CLI workflow

# Step 1: Generate a plan
embercore plan --model claude-sonnet-4-20250514 "Launch campaign for new product"

# Step 2: Review the plan
embercore status <plan-id>

# Step 3: Execute with checkpoint approvals
embercore run <plan-id>

# Step 4: Resume if paused
embercore resume <run-id>

Consequences

Benefits:

Humans remain in the loop for brand-sensitive decisions
Plans are inspectable, editable, and auditable before execution
Pause/resume enables async workflows (start a run, review checkpoints later)
Topological sorting maximizes parallelism within the safety constraints
Plan specs are portable YAML — can be version-controlled, shared, templated

Trade-offs:

Slower end-to-end execution compared to autonomous mode (by design)
Checkpoint approvals create a blocking dependency on human availability
Pause/resume adds state management complexity (SQLite + filesystem state)
No auto-approval mode yet for trusted/low-risk workflows (planned for future)

Related decisions:

ADR-002 — SQLite stores plan and checkpoint state
ADR-004 — Athena plans, Hermes executes, Hestia persists
ADR-007 — The shared plan/ package defines the plan spec types

ADR-005: Plan-First Checkpointed Workflow

Open-source AI marketing agent toolkit