ADR-005: Plan-First, Human-Checkpointed Workflow
Status: Accepted
Date: 2026-05-09
Context
Embercore automates multi-step marketing workflows using LLM-powered agents. A typical workflow might involve research → copywriting → design brief → assembly. Two execution philosophies were considered:
- Autonomous execution — The system plans and executes without human intervention. Fast, but risky for brand-sensitive marketing content.
- Plan-first with checkpoints — Generate a plan, let humans review it, then execute with approval gates. Slower, but ensures quality and brand alignment.
Marketing teams need to maintain control over brand voice, messaging accuracy, and strategic alignment. Fully autonomous AI-generated marketing content poses reputational risk.
Decision
We implement a plan-first, human-checkpointed workflow with pause/resume support.
Phase 1: Plan generation (Athena)
Athena generates a structured plan spec from a natural-language brief:
// plan/plan.go
type Spec struct {
Name string `yaml:"name"`
Description string `yaml:"description"`
Input Input `yaml:"input"`
Steps []Step `yaml:"steps"`
}
type Step struct {
Name string `yaml:"name"`
Agent string `yaml:"agent"`
Prompt string `yaml:"prompt"`
DependsOn []string `yaml:"depends_on"`
Checkpoint bool `yaml:"checkpoint"`
}
Plans are validated (plan.Validate) and topologically sorted (plan.TopoSort via Kahn’s algorithm) to determine execution order and parallelizable layers.
Phase 2: Human review
Before execution, the generated plan can be:
- Inspected via
embercore status <plan-id> - Saved to disk as YAML (
--outputflag) - Modified by hand before running
- Approved or rejected
Phase 3: Checkpoint-gated execution (Hermes)
Hermes executes the plan step by step. Steps marked checkpoint: true trigger an approval gate:
// agents/hermes/hermes.go
type CheckpointHandler func(step plan.Step, output string) (approved bool, feedback string, err error)
At each checkpoint:
- Hermes pauses execution and presents the step output
- The handler (interactive stdin prompt or MCP
request_approvaltool) asks for human approval - If approved → execution continues to the next step
- If rejected → feedback is captured, and the step can be re-executed or the run paused
Phase 4: Pause and resume
Execution state is persisted in SQLite (via Hestia) and on the filesystem (internal/state/):
Hestia.RecordCheckpoint()— saves step-level state to the databaseHestia.GetRunState()/ResumeRun()— retrieves and resumes paused runsinternal/state/— atomic file writes, PID lockfiles, session metadata (.session.meta.json)- CLI:
embercore resume <run-id>picks up where execution stopped
CLI workflow
# Step 1: Generate a plan
embercore plan --model claude-sonnet-4-20250514 "Launch campaign for new product"
# Step 2: Review the plan
embercore status <plan-id>
# Step 3: Execute with checkpoint approvals
embercore run <plan-id>
# Step 4: Resume if paused
embercore resume <run-id>
Consequences
Benefits:
- Humans remain in the loop for brand-sensitive decisions
- Plans are inspectable, editable, and auditable before execution
- Pause/resume enables async workflows (start a run, review checkpoints later)
- Topological sorting maximizes parallelism within the safety constraints
- Plan specs are portable YAML — can be version-controlled, shared, templated
Trade-offs:
- Slower end-to-end execution compared to autonomous mode (by design)
- Checkpoint approvals create a blocking dependency on human availability
- Pause/resume adds state management complexity (SQLite + filesystem state)
- No auto-approval mode yet for trusted/low-risk workflows (planned for future)
Related decisions: