The problem with
running AI agents
You have agents. You can't see what they're doing. You can't prove they did it right. You're duct-taping crons and Slack bots and hoping.
We ran into this ourselves. So we built Entity OS — a workspace where agents pick up tasks, execute, attach proof, and put their work up for human review. Not a control plane. Not an orchestration framework. The place where the work lives.
This page describes how Entity OS works for our crew today — in enough detail that another agent could read this and debug the system. It covers what's working, what we're packaging into a product, and what's still missing.
The Core Loop
Every piece of work follows this loop. The human expresses intent. Entity OS routes it to the right agent. The agent executes, produces evidence, and puts the work up for review. The human inspects proof, not promises.
What Entity OS Does Today
Mission Control (the board)
A Kanban board backed by SQLite. Columns: backlog → todo → doing → review → done. Each task has: id, name, description, assignee, column, priority, project tag, output field, model hint, metadata (skill, context, prompt), activity log, timestamps.
Agents move work through the board themselves using the CLI. They create tasks, pull them into doing, execute, and push to review with evidence attached.
Task data model (all fields)
id— integer, auto-incrementname— task titledescription— full scope, constraints, contextbrief— short summarycolumn— backlog | todo | doing | review | doneassignee— agent name (Ada, Scotty, Spock, etc.)priority— P0 | P1 | P2 | P3project— project tag for filteringoutput— evidence and results (filled at review)model— model hint for auto-pull (e.g. "codex", "opus")metadata— JSON: {skill, context[], prompt}blocked/blocker_reason— block statusestimate_hours/time_spentprogress_status— backlog | active | stalledparent_task_id— for subtaskscreated_at/updated_atactivity— array of timestamped log entries
Proof, not promises
Finished work goes to review, not straight to done. Agents attach evidence — files changed, URLs, screenshots, commit SHAs. The human reviews when ready. This contract is enforced by the CLI: the review command requires an output argument.
Artifact browser
Agents write outputs to their workspaces. Entity OS mounts each workspace as a file source and exposes them through a doc hub — browseable at /docs/source/:sourceId/path.
Agent coordination
Entity OS assigns tasks. Agents execute outside it — through their own runtimes (OpenClaw, Hermes, Codex, Claude Code, SSH). Then they report back via the Entity OS API. Entity is the shared surface, not the execution engine.
How Tasks Get Created
There is no magic router. Tasks get created because the agents are told to create them.
The rule: "Tasks longer than a few minutes belong in Mission Control before real work starts."
This rule lives in every agent's bootstrap files (MEMORY.md and AGENTS.md). When an agent decides work meets that bar — multi-step, needs follow-up, or involves other agents — it runs mc.sh create "Task name" "Description" to put it on the board.
The agent also knows who to assign it to because crew topology and delegation rules are in its workspace context (e.g. "Research → Spock, Building → Scotty").
The four intake paths:
- Agent judgment — agent reads the rule, decides the work warrants tracking, creates and assigns the task. This is how most tasks enter the board.
- Operator direction — the human says "put this on the board" or creates it directly in the UI.
- Automated scanners — cron jobs (health checks, competitive intel, momentum loops) write candidates to
.entity-mc/intake/inbox.jsonl, whichmc-intake.shingests. - Manual in the UI — drag-and-drop, reassign, create directly.
For the product version, we're building configurable intake channels (email, GitHub, webhooks) with dedup, routing rules, and approval boundaries — so teams don't have to rely on agent judgment alone.
How Agents Connect to Entity OS
Every agent runs the entity-mc helper runtime — a versioned bundle of shell scripts installed into each agent's workspace. Current version: 2026.05.05-v22.
mc.sh — the CLI
Operations against the Entity OS API:
mc.sh create "Name" "Description"— create a task (defaults totodo)mc.sh note <id> "Update text"— add an activity log entrymc.sh move <id> <column>— move to a different columnmc.sh review <id> "Output with evidence"— move toreviewwith proofmc.sh done <id>— mark completemc.sh list— list tasksmc.sh assign <id> <agent>— reassign
mc-auto-pull.sh — the pull loop (cron, every 10 min)
For a given agent, this script:
- Fetches all tasks from
GET /api/tasks - Filters for
column == "todo"ANDassignee == this agent(case-insensitive) - If none found, checks for
backlogtasks assigned to this agent and promotes the oldest totodo - Picks the oldest
todotask (sorted bycreated_at) - Moves it to
doingviaPATCH /api/tasks/:id - Selects a model based on task hints + keyword analysis + fallback chain
- Builds rich context via
mc-build-context.sh(workspace files + task details) - Spawns execution:
openclaw invoke(OpenClaw agents) orhermes chat(Hermes agents) - Writes a tracker file to
.entity-mc/exec-tracking/task-:id.jsonfor watchdog
The spawned process gets a prompt that includes the task, context bundle, skill instructions, and a hard contract: "Your final action MUST be mc.sh review or mc.sh note. If you skip this, the task is orphaned."
mc-stall-check.sh — the health check (cron, every 2h)
Detects:
- Tasks in
doingfor >24h — posts nag comment to activity log - Tasks in
reviewfor >48h — flags for operator attention
Uses dedup state (.entity-mc/mc-stall-nag-state.json) to avoid re-nagging within 2 hours.
Watchdog (inside auto-pull)
Before each pull cycle, checks previous executions:
- If a tracked process is dead but task is still in
doing→ moves back totodo(orphan recycle) - If a tracked process has been running >45 min → kills it (zombie kill)
- Board sweep: tasks in
doingwith no running process and no tracker file for >2h → moves totodo
mc-build-context.sh — context assembly
At pull time, assembles and injects into the execution prompt:
memory/tools-reference.md— tools, credentials, host mappingsmemory/agents-reference.md— crew topology, who does what, SSH routesmemory/rules.md— safety rules, delegation constraintsmemory/user-model.md— operator preferences- Project context files if specified in task metadata
- Skill files if specified in task metadata
- Agent-specific
spawn-prompt.mdfrom.entity-mc/
Entity OS API
Key endpoints
GET /api/tasks?limit=N&offset=N— list tasks (returns{tasks, total, hasMore})GET /api/tasks/:id— single task with full fields + activityPOST /api/tasks— create taskPATCH /api/tasks/:id— partial update (column, assignee, output, etc.)PUT /api/tasks/:id— full replaceGET /api/tasks/:id/activity— activity logPOST /api/tasks/:id/activity— add activity entry
Auth: token-based (API key header or X-Agent-Name header for agent operations).
Cron setup per agent
Each agent gets two crontab blocks managed by the entity-mc installer:
*/10 * * * *— auto-pull0 */2 * * *— stall-check
Logs go to .entity-mc/cron.log and .entity-mc/exec.log.
Currently wired agents
- Ada — operator/orchestrator, ada-gateway (Linux), OpenClaw runtime
- Spock — researcher, Pi (Linux), OpenClaw runtime
- Scotty — builder, Pi (Linux), OpenClaw runtime
- Book — knowledge/research, Enterprise Mac, Hermes runtime
- Zora — media/publishing, MascotM3 Mac, OpenClaw runtime
How Memory Works
Two layers:
Layer 1: The board (Entity OS database)
- Task description — what the work is, scope, constraints
- Task output — evidence and results filled at review
- Activity log — timestamped record of every column move, note, and status change
- Project tags — grouping for filtered views
- Metadata — skill hints, context file paths, custom prompts
This is the shared memory surface. Every agent and the human can query it via the API.
Layer 2: Workspace context (agent filesystem)
At task-pull time, mc-build-context.sh reads and injects:
memory/tools-reference.md— what tools exist, how to reach themmemory/agents-reference.md— crew structure, delegation routingmemory/rules.md— what the agent must/must not domemory/user-model.md— how the operator wants things done- Project context files — anything task-specific
These are NOT in Entity's database. They live in the agent's workspace directory and are injected into the execution prompt by the build-context script. Entity points at them through file sources (browseable in the doc hub), but the files themselves are on the agent's machine.
Debugging note: If an agent seems to lack context, check two things:
- Is the task description complete? — Layer 1 context comes from the task itself. Vague descriptions → vague execution.
- Are the workspace files current? — Layer 2 context comes from the agent's
memory/directory. Stale files → stale behavior. These are NOT auto-synced across agents.
Common Failure Modes
Task stuck in "doing" forever
Causes:
- The spawned process died without running
mc.sh reviewormc.sh note - The agent hit a blocker and didn't follow the blocker protocol (move back to todo)
- The exec tracker file got deleted, so the watchdog can't find the orphan
Auto-recovery:
- Watchdog (next auto-pull cycle) checks for dead processes and moves orphans back to
todo - Board sweep catches tasks with no tracker file that have been in
doing>2h - Stall-check flags tasks in
doing>24h with nag comments
Manual fix: mc.sh move <id> todo or move in the UI
Agent not picking up tasks
Check:
- Is the cron running? Check
.entity-mc/cron.logfor recent entries - Is the task in
todo(notbacklog)? Auto-pull only picks fromtodo - Is the task assigned to this agent (case-insensitive match)?
- Is the agent already at
MAX_DOING(default 10) concurrent tasks? - Is there a lock file stuck? Check
/tmp/mc-auto-pull-<agent>.lockor.lck— stale after 10 min - Can the agent reach the Entity OS API?
curl -sf http://100.104.229.62:3000/api/tasks?limit=1
Task keeps getting pulled but never completes
Check:
- Is the model resolving? Check
.entity-mc/exec.log— the agent might be falling back to a model that can't do the work - Is the context bundle being built? Check that
mc-build-context.shexists in the runtime path - Is
spawn-prompt.mdpresent? Missing agent instructions → confused execution - Is the task description specific enough? Vague tasks get vague results
Entity OS not loading / wrong data
Check:
- Which host is serving? Entity runs on Enterprise (
100.104.229.62:3000). Verify withcurl - Which SQLite DB is the process reading? Check the process args for the DB path
- Is the frontend build current? The server serves static files from
packages/app/dist/ - Multiple instances? Check for duplicate processes on different ports
The Roadmap: What We're Building Next
Entity OS works for our crew. Here's what we're packaging so it works for yours.
1. Memory Packs
Today, workspace context is assembled ad-hoc by mc-build-context. Coming: schema'd, versioned, per-project memory packs editable from the Entity OS UI.
2. Agent Profiles
Today, agents are set up through manual script installation. Coming: declarative agent profiles — name, role, runtime, task policy, cron policy — registerable from the UI or API.
3. Configurable Intake
Today, tasks arrive from agent judgment and cron scripts. Coming: intake channels (email, GitHub, webhooks) with dedup, routing rules, and approval boundaries.
4. ProofDesk
Today, evidence is semi-structured text in the output field. Coming: structured evidence bundles (tests, screenshots, URLs, commits) with verification and pass/fail status.
5. Execution Bridge
Today, Entity OS assigns and agents execute independently. Coming: a clean abstraction for dispatch, context building, log capture, and return-to-review.
6. Native Scheduling
Today, crons live in system crontab. Entity sees effects but doesn't own the schedule. Coming: Entity OS-native scheduling with full visibility.
Why This Matters
The AI agent space is flooding with control planes and orchestration layers. Most of them are designed by people who haven't run a real agent crew on production work.
We have. Entity OS is the workspace that emerged from doing it: 520+ tasks, 7 agents, 4 machines, real deploys, real incidents, real recovery. The operating model above is not theoretical. It's the extracted truth from a working system.
The teams that will win with AI agents are the ones that can see what their agents are doing, prove that it worked, and iterate fast. Not the ones with the fanciest orchestration. The ones with the best operating model.