Dev Intel: playable checkpoints
Agentic workflow の done を PR/生成物ではなく runtime evidence で閉じる。
Generated: 2026-05-22T02:05:00+09:00
Lane: 開発ネタ発掘
Why this is useful:
OpenClaw の Unity hackathon 記事でいちばん刺さるのは、「13 PR 生成・12 merge」まではできたのに、Unity Editor で Play したら青画面だった、という失敗。これはコード生成能力の問題というより、checkpoint が \PR merged\ で止まっていて、実ユーザーが見る runtime state まで閉じていない問題。
健人くん向けには、OpenClaw/ひめの heartbeat でも同じ。長い自律 batch を太らせるより、途中で「見える状態」「触れる状態」「検証済み状態」を小さく返すほうが強い。Unity なら Play screenshot/log、Web なら View screenshot/link、秘書タスクなら exact draft/action queue、開発タスクなら typed handoff + acceptance + evidence。
What I made/changed:
外部記事3本を、OpenClaw の runtime 設計語彙に変換した:
- \
done\は PR/ファイル生成ではなく、domain-specific checkpoint を通過した状態にする - UI/game/secretary workflows は、code review だけでなく screenshot/log/user-visible artifact を必須 evidence にする
- handoff は自由文ではなく typed JSON にして、受け手が開始前に validate する
- plan は terminal 内に閉じず、スマホでも読めて revision history が残る note/view に逃がす
View: \views/20260522-0205-dev-intel-playable-checkpoints.html\
Sources/Evidence:
- Zenn, "Trying Autonomous Unity Development with OpenClaw: It's Not Quite There Yet" — OpenClaw produced 13 PRs and merged 12, but Play result was a blank blue screen; ComfyUI crash was not detected; desired improvements include Unity Editor feedback loop, intermediate-state dashboard, external-tool failure detection. https://zenn.dev/omori432/articles/open-claw-unity-hackathon-2026?locale=en
- Zenn, "Creating a Continuous Development Workflow with Claude Code, Codex, and Cursor" — practical fixes were fixed responsibilities, JSON handoffs, and bottleneck metrics; failures came from vague done criteria and unmonitored stalls. https://zenn.dev/soshi1234/articles/ai-orchestration-20260307-0401?locale=en
- devas.life, "Note-driven agentic coding workflow using Claude Code and Inkdrop" — plans become reviewable notes with Markdown rendering, phone review, revision history, checkbox progress, and status changes. https://www.devas.life/note-driven-agentic-coding-workflow-using-claude-code-and-inkdrop/
- Zenn, "Reflecting on Code with Claude 2026" — Anthropic keynote takeaway: winning teams make upgrades cheap with automated evals, simple scaffolding, and ambitious prototypes. https://zenn.dev/noah33/articles/code-with-claude-2026-sf-keynote?locale=en
Harness component:
runtime checkpoint / artifact evidence / handoff schema
Failure category:
False done: agent reports completion before the domain-visible result has been exercised.
Gate owner_value_gate:
pass — this is directly about OpenClaw/agentic coding, and the action is concrete: redefine checkpoints around visible runtime evidence.
Gate external_action_gate:
pass — read-only web research plus local artifact/view creation only.
Gate view_source_gate:
pass — source links are preserved and the note is rendered into a View.
Gate handoff_state_gate:
pass — next action is local and reversible.
Prediction:
If OpenClaw task artifacts record \checkpoint_type\, \evidence_uri\, and \runtime_verified\ before claiming done, false-completion reports should drop. The useful first schema is:
~~~json
{
"checkpoint_type": "playable_ui | rendered_view | command_test | drafted_external_action | typed_handoff",
"evidence_uri": "view/html/screenshot/log/test-output path",
"runtime_verified": true,
"acceptance": ["what the owner can inspect without reading raw logs"]
}
~~~
Verify by:
For the next 5 owner-facing heartbeat/OpenClaw artifacts, check whether each says what was visibly verified, not only what file/script ran. A weak artifact has \changed files\ but no \evidence_uri\; a strong one has a View/screenshot/log/test output tied to acceptance.
Observed:
The Unity case shows why parallel PR generation alone is not a product milestone. The note-driven and JSON-handoff examples point to the same repair: make progress externally inspectable and typed enough that stalls are caught before the end.
Next safe action:
Add optional checkpoint metadata to heartbeat/dev-intel artifacts first, without changing runtime behavior: \checkpoint_type\, \evidence_uri\, \runtime_verified\, and \acceptance\.
Notify:
no — this is strong source-backed design memory, but it is 2 AM and not urgent. Save the artifact/View silently; reuse it when changing OpenClaw task completion semantics.