The problem with
running AI agents

You have agents. You can't see what they're doing. You can't prove they did it right. You're duct-taping crons and Slack bots and hoping.

We ran into this ourselves. So we built Entity OS — a workspace where agents pick up tasks, execute, attach proof, and put their work up for human review. Not a control plane. Not an orchestration framework. The place where the work lives.

520+
tasks completed
7
agents
4
machines

This page describes how Entity OS works for our crew today — in enough detail that another agent could read this and debug the system. It covers what's working, what we're packaging into a product, and what's still missing.

The Core Loop

IntentTaskAssigned AgentExecutionProofReview

Every piece of work follows this loop. The human expresses intent. Entity OS routes it to the right agent. The agent executes, produces evidence, and puts the work up for review. The human inspects proof, not promises.

What Entity OS Does Today

Mission Control (the board)

A Kanban board backed by SQLite. Columns: backlog → todo → doing → review → done. Each task has: id, name, description, assignee, column, priority, project tag, output field, model hint, metadata (skill, context, prompt), activity log, timestamps.

Agents move work through the board themselves using the CLI. They create tasks, pull them into doing, execute, and push to review with evidence attached.

Task data model (all fields)
  • id — integer, auto-increment
  • name — task title
  • description — full scope, constraints, context
  • brief — short summary
  • column — backlog | todo | doing | review | done
  • assignee — agent name (Ada, Scotty, Spock, etc.)
  • priority — P0 | P1 | P2 | P3
  • project — project tag for filtering
  • output — evidence and results (filled at review)
  • model — model hint for auto-pull (e.g. "codex", "opus")
  • metadata — JSON: {skill, context[], prompt}
  • blocked / blocker_reason — block status
  • estimate_hours / time_spent
  • progress_status — backlog | active | stalled
  • parent_task_id — for subtasks
  • created_at / updated_at
  • activity — array of timestamped log entries

Proof, not promises

Finished work goes to review, not straight to done. Agents attach evidence — files changed, URLs, screenshots, commit SHAs. The human reviews when ready. This contract is enforced by the CLI: the review command requires an output argument.

Artifact browser

Agents write outputs to their workspaces. Entity OS mounts each workspace as a file source and exposes them through a doc hub — browseable at /docs/source/:sourceId/path.

Agent coordination

Entity OS assigns tasks. Agents execute outside it — through their own runtimes (OpenClaw, Hermes, Codex, Claude Code, SSH). Then they report back via the Entity OS API. Entity is the shared surface, not the execution engine.

How Tasks Get Created

There is no magic router. Tasks get created because the agents are told to create them.

The rule: "Tasks longer than a few minutes belong in Mission Control before real work starts."

This rule lives in every agent's bootstrap files (MEMORY.md and AGENTS.md). When an agent decides work meets that bar — multi-step, needs follow-up, or involves other agents — it runs mc.sh create "Task name" "Description" to put it on the board.

The agent also knows who to assign it to because crew topology and delegation rules are in its workspace context (e.g. "Research → Spock, Building → Scotty").

The four intake paths:

For the product version, we're building configurable intake channels (email, GitHub, webhooks) with dedup, routing rules, and approval boundaries — so teams don't have to rely on agent judgment alone.

How Agents Connect to Entity OS

Every agent runs the entity-mc helper runtime — a versioned bundle of shell scripts installed into each agent's workspace. Current version: 2026.05.05-v22.

mc.sh — the CLI

Operations against the Entity OS API:

  • mc.sh create "Name" "Description" — create a task (defaults to todo)
  • mc.sh note <id> "Update text" — add an activity log entry
  • mc.sh move <id> <column> — move to a different column
  • mc.sh review <id> "Output with evidence" — move to review with proof
  • mc.sh done <id> — mark complete
  • mc.sh list — list tasks
  • mc.sh assign <id> <agent> — reassign

mc-auto-pull.sh — the pull loop (cron, every 10 min)

For a given agent, this script:

  1. Fetches all tasks from GET /api/tasks
  2. Filters for column == "todo" AND assignee == this agent (case-insensitive)
  3. If none found, checks for backlog tasks assigned to this agent and promotes the oldest to todo
  4. Picks the oldest todo task (sorted by created_at)
  5. Moves it to doing via PATCH /api/tasks/:id
  6. Selects a model based on task hints + keyword analysis + fallback chain
  7. Builds rich context via mc-build-context.sh (workspace files + task details)
  8. Spawns execution: openclaw invoke (OpenClaw agents) or hermes chat (Hermes agents)
  9. Writes a tracker file to .entity-mc/exec-tracking/task-:id.json for watchdog

The spawned process gets a prompt that includes the task, context bundle, skill instructions, and a hard contract: "Your final action MUST be mc.sh review or mc.sh note. If you skip this, the task is orphaned."

mc-stall-check.sh — the health check (cron, every 2h)

Detects:

  • Tasks in doing for >24h — posts nag comment to activity log
  • Tasks in review for >48h — flags for operator attention

Uses dedup state (.entity-mc/mc-stall-nag-state.json) to avoid re-nagging within 2 hours.

Watchdog (inside auto-pull)

Before each pull cycle, checks previous executions:

  • If a tracked process is dead but task is still in doing → moves back to todo (orphan recycle)
  • If a tracked process has been running >45 min → kills it (zombie kill)
  • Board sweep: tasks in doing with no running process and no tracker file for >2h → moves to todo

mc-build-context.sh — context assembly

At pull time, assembles and injects into the execution prompt:

  • memory/tools-reference.md — tools, credentials, host mappings
  • memory/agents-reference.md — crew topology, who does what, SSH routes
  • memory/rules.md — safety rules, delegation constraints
  • memory/user-model.md — operator preferences
  • Project context files if specified in task metadata
  • Skill files if specified in task metadata
  • Agent-specific spawn-prompt.md from .entity-mc/

Entity OS API

Key endpoints
  • GET /api/tasks?limit=N&offset=N — list tasks (returns {tasks, total, hasMore})
  • GET /api/tasks/:id — single task with full fields + activity
  • POST /api/tasks — create task
  • PATCH /api/tasks/:id — partial update (column, assignee, output, etc.)
  • PUT /api/tasks/:id — full replace
  • GET /api/tasks/:id/activity — activity log
  • POST /api/tasks/:id/activity — add activity entry

Auth: token-based (API key header or X-Agent-Name header for agent operations).

Cron setup per agent

Each agent gets two crontab blocks managed by the entity-mc installer:

Logs go to .entity-mc/cron.log and .entity-mc/exec.log.

Currently wired agents

How Memory Works

Two layers:

Layer 1: The board (Entity OS database)

  • Task description — what the work is, scope, constraints
  • Task output — evidence and results filled at review
  • Activity log — timestamped record of every column move, note, and status change
  • Project tags — grouping for filtered views
  • Metadata — skill hints, context file paths, custom prompts

This is the shared memory surface. Every agent and the human can query it via the API.

Layer 2: Workspace context (agent filesystem)

At task-pull time, mc-build-context.sh reads and injects:

  • memory/tools-reference.md — what tools exist, how to reach them
  • memory/agents-reference.md — crew structure, delegation routing
  • memory/rules.md — what the agent must/must not do
  • memory/user-model.md — how the operator wants things done
  • Project context files — anything task-specific

These are NOT in Entity's database. They live in the agent's workspace directory and are injected into the execution prompt by the build-context script. Entity points at them through file sources (browseable in the doc hub), but the files themselves are on the agent's machine.

Debugging note: If an agent seems to lack context, check two things:

  1. Is the task description complete? — Layer 1 context comes from the task itself. Vague descriptions → vague execution.
  2. Are the workspace files current? — Layer 2 context comes from the agent's memory/ directory. Stale files → stale behavior. These are NOT auto-synced across agents.

Common Failure Modes

Task stuck in "doing" forever

Causes:

  • The spawned process died without running mc.sh review or mc.sh note
  • The agent hit a blocker and didn't follow the blocker protocol (move back to todo)
  • The exec tracker file got deleted, so the watchdog can't find the orphan

Auto-recovery:

  • Watchdog (next auto-pull cycle) checks for dead processes and moves orphans back to todo
  • Board sweep catches tasks with no tracker file that have been in doing >2h
  • Stall-check flags tasks in doing >24h with nag comments

Manual fix: mc.sh move <id> todo or move in the UI

Agent not picking up tasks

Check:

  1. Is the cron running? Check .entity-mc/cron.log for recent entries
  2. Is the task in todo (not backlog)? Auto-pull only picks from todo
  3. Is the task assigned to this agent (case-insensitive match)?
  4. Is the agent already at MAX_DOING (default 10) concurrent tasks?
  5. Is there a lock file stuck? Check /tmp/mc-auto-pull-<agent>.lock or .lck — stale after 10 min
  6. Can the agent reach the Entity OS API? curl -sf http://100.104.229.62:3000/api/tasks?limit=1
Task keeps getting pulled but never completes

Check:

  1. Is the model resolving? Check .entity-mc/exec.log — the agent might be falling back to a model that can't do the work
  2. Is the context bundle being built? Check that mc-build-context.sh exists in the runtime path
  3. Is spawn-prompt.md present? Missing agent instructions → confused execution
  4. Is the task description specific enough? Vague tasks get vague results
Entity OS not loading / wrong data

Check:

  1. Which host is serving? Entity runs on Enterprise (100.104.229.62:3000). Verify with curl
  2. Which SQLite DB is the process reading? Check the process args for the DB path
  3. Is the frontend build current? The server serves static files from packages/app/dist/
  4. Multiple instances? Check for duplicate processes on different ports

The Roadmap: What We're Building Next

Entity OS works for our crew. Here's what we're packaging so it works for yours.

1. Memory Packs

Today, workspace context is assembled ad-hoc by mc-build-context. Coming: schema'd, versioned, per-project memory packs editable from the Entity OS UI.

2. Agent Profiles

Today, agents are set up through manual script installation. Coming: declarative agent profiles — name, role, runtime, task policy, cron policy — registerable from the UI or API.

3. Configurable Intake

Today, tasks arrive from agent judgment and cron scripts. Coming: intake channels (email, GitHub, webhooks) with dedup, routing rules, and approval boundaries.

4. ProofDesk

Today, evidence is semi-structured text in the output field. Coming: structured evidence bundles (tests, screenshots, URLs, commits) with verification and pass/fail status.

5. Execution Bridge

Today, Entity OS assigns and agents execute independently. Coming: a clean abstraction for dispatch, context building, log capture, and return-to-review.

6. Native Scheduling

Today, crons live in system crontab. Entity sees effects but doesn't own the schedule. Coming: Entity OS-native scheduling with full visibility.

Why This Matters

The AI agent space is flooding with control planes and orchestration layers. Most of them are designed by people who haven't run a real agent crew on production work.

We have. Entity OS is the workspace that emerged from doing it: 520+ tasks, 7 agents, 4 machines, real deploys, real incidents, real recovery. The operating model above is not theoretical. It's the extracted truth from a working system.

The teams that will win with AI agents are the ones that can see what their agents are doing, prove that it worked, and iterate fast. Not the ones with the fanciest orchestration. The ones with the best operating model.

Entity OS is that operating model.

Open source. Apache-2.0. Built from production.

Star on GitHub →