Dev Intel: playable checkpoints

Agentic workflow の done を PR/生成物ではなく runtime evidence で閉じる。

Generated: 2026-05-22T02:05:00+09:00

Lane: 開発ネタ発掘

Why this is useful:

OpenClaw の Unity hackathon 記事でいちばん刺さるのは、「13 PR 生成・12 merge」まではできたのに、Unity Editor で Play したら青画面だった、という失敗。これはコード生成能力の問題というより、checkpoint が \PR merged\ で止まっていて、実ユーザーが見る runtime state まで閉じていない問題。

健人くん向けには、OpenClaw/ひめの heartbeat でも同じ。長い自律 batch を太らせるより、途中で「見える状態」「触れる状態」「検証済み状態」を小さく返すほうが強い。Unity なら Play screenshot/log、Web なら View screenshot/link、秘書タスクなら exact draft/action queue、開発タスクなら typed handoff + acceptance + evidence。

What I made/changed:

外部記事3本を、OpenClaw の runtime 設計語彙に変換した:

\done\ は PR/ファイル生成ではなく、domain-specific checkpoint を通過した状態にする
UI/game/secretary workflows は、code review だけでなく screenshot/log/user-visible artifact を必須 evidence にする
handoff は自由文ではなく typed JSON にして、受け手が開始前に validate する
plan は terminal 内に閉じず、スマホでも読めて revision history が残る note/view に逃がす

View: \views/20260522-0205-dev-intel-playable-checkpoints.html\

Sources/Evidence:

Zenn, "Trying Autonomous Unity Development with OpenClaw: It's Not Quite There Yet" — OpenClaw produced 13 PRs and merged 12, but Play result was a blank blue screen; ComfyUI crash was not detected; desired improvements include Unity Editor feedback loop, intermediate-state dashboard, external-tool failure detection. https://zenn.dev/omori432/articles/open-claw-unity-hackathon-2026?locale=en
Zenn, "Creating a Continuous Development Workflow with Claude Code, Codex, and Cursor" — practical fixes were fixed responsibilities, JSON handoffs, and bottleneck metrics; failures came from vague done criteria and unmonitored stalls. https://zenn.dev/soshi1234/articles/ai-orchestration-20260307-0401?locale=en
devas.life, "Note-driven agentic coding workflow using Claude Code and Inkdrop" — plans become reviewable notes with Markdown rendering, phone review, revision history, checkbox progress, and status changes. https://www.devas.life/note-driven-agentic-coding-workflow-using-claude-code-and-inkdrop/
Zenn, "Reflecting on Code with Claude 2026" — Anthropic keynote takeaway: winning teams make upgrades cheap with automated evals, simple scaffolding, and ambitious prototypes. https://zenn.dev/noah33/articles/code-with-claude-2026-sf-keynote?locale=en

Harness component:

runtime checkpoint / artifact evidence / handoff schema

Failure category:

False done: agent reports completion before the domain-visible result has been exercised.

Gate owner_value_gate:

pass — this is directly about OpenClaw/agentic coding, and the action is concrete: redefine checkpoints around visible runtime evidence.

Gate external_action_gate:

pass — read-only web research plus local artifact/view creation only.

Gate view_source_gate:

pass — source links are preserved and the note is rendered into a View.

Gate handoff_state_gate:

pass — next action is local and reversible.

Prediction:

If OpenClaw task artifacts record \checkpoint_type\, \evidence_uri\, and \runtime_verified\ before claiming done, false-completion reports should drop. The useful first schema is:

~~~json

{

"checkpoint_type": "playable_ui | rendered_view | command_test | drafted_external_action | typed_handoff",

"evidence_uri": "view/html/screenshot/log/test-output path",

"runtime_verified": true,

"acceptance": ["what the owner can inspect without reading raw logs"]

}

~~~

Verify by:

For the next 5 owner-facing heartbeat/OpenClaw artifacts, check whether each says what was visibly verified, not only what file/script ran. A weak artifact has \changed files\ but no \evidence_uri\; a strong one has a View/screenshot/log/test output tied to acceptance.

Observed:

The Unity case shows why parallel PR generation alone is not a product milestone. The note-driven and JSON-handoff examples point to the same repair: make progress externally inspectable and typed enough that stalls are caught before the end.

Next safe action:

Add optional checkpoint metadata to heartbeat/dev-intel artifacts first, without changing runtime behavior: \checkpoint_type\, \evidence_uri\, \runtime_verified\, and \acceptance\.

Notify:

no — this is strong source-backed design memory, but it is 2 AM and not urgent. Save the artifact/View silently; reuse it when changing OpenClaw task completion semantics.