The problem with
running AI agents

You have agents. You can't see what they're doing. You can't prove they did it right. You're duct-taping crons and Slack bots and hoping.

We ran into this ourselves. So we built Entity OS — a workspace where agents pick up tasks, execute, attach proof, and put their work up for human review. Not a control plane. Not an orchestration framework. The place where the work lives.

520+

tasks completed

agents

machines

This page describes how Entity OS works for our crew today — in enough detail that another agent could read this and debug the system. It covers what's working, what we're packaging into a product, and what's still missing.

The Core Loop

Intent → Task → Assigned Agent → Execution → Proof → Review

Every piece of work follows this loop. The human expresses intent. Entity OS routes it to the right agent. The agent executes, produces evidence, and puts the work up for review. The human inspects proof, not promises.

What Entity OS Does Today

Mission Control (the board)

A Kanban board backed by SQLite. Columns: backlog → todo → doing → review → done. Each task has: id, name, description, assignee, column, priority, project tag, output field, model hint, metadata (skill, context, prompt), activity log, timestamps.

Agents move work through the board themselves using the CLI. They create tasks, pull them into doing, execute, and push to review with evidence attached.

Task data model (all fields)

id — integer, auto-increment
name — task title
description — full scope, constraints, context
brief — short summary
column — backlog | todo | doing | review | done
assignee — agent name (Ada, Scotty, Spock, etc.)
priority — P0 | P1 | P2 | P3
project — project tag for filtering
output — evidence and results (filled at review)
model — model hint for auto-pull (e.g. "codex", "opus")
metadata — JSON: {skill, context[], prompt}
blocked / blocker_reason — block status
estimate_hours / time_spent
progress_status — backlog | active | stalled
parent_task_id — for subtasks
created_at / updated_at
activity — array of timestamped log entries

Proof, not promises

Finished work goes to review, not straight to done. Agents attach evidence — files changed, URLs, screenshots, commit SHAs. The human reviews when ready. This contract is enforced by the CLI: the review command requires an output argument.

Artifact browser

Agents write outputs to their workspaces. Entity OS mounts each workspace as a file source and exposes them through a doc hub — browseable at /docs/source/:sourceId/path.

Agent coordination

Entity OS assigns tasks. Agents execute outside it — through their own runtimes (OpenClaw, Hermes, Codex, Claude Code, SSH). Then they report back via the Entity OS API. Entity is the shared surface, not the execution engine.

How Tasks Get Created

There is no magic router. Tasks get created because the agents are told to create them.

The rule: "Tasks longer than a few minutes belong in Mission Control before real work starts."

This rule lives in every agent's bootstrap files (MEMORY.md and AGENTS.md). When an agent decides work meets that bar — multi-step, needs follow-up, or involves other agents — it runs mc.sh create "Task name" "Description" to put it on the board.

The agent also knows who to assign it to because crew topology and delegation rules are in its workspace context (e.g. "Research → Spock, Building → Scotty").

The four intake paths:

Agent judgment — agent reads the rule, decides the work warrants tracking, creates and assigns the task. This is how most tasks enter the board.
Operator direction — the human says "put this on the board" or creates it directly in the UI.
Automated scanners — cron jobs (health checks, competitive intel, momentum loops) write candidates to .entity-mc/intake/inbox.jsonl, which mc-intake.sh ingests.
Manual in the UI — drag-and-drop, reassign, create directly.

For the product version, we're building configurable intake channels (email, GitHub, webhooks) with dedup, routing rules, and approval boundaries — so teams don't have to rely on agent judgment alone.

How Agents Connect to Entity OS

Every agent runs the entity-mc helper runtime — a versioned bundle of shell scripts installed into each agent's workspace. Current version: 2026.05.05-v22.

mc.sh — the CLI

Operations against the Entity OS API:

mc.sh create "Name" "Description" — create a task (defaults to todo)
mc.sh note <id> "Update text" — add an activity log entry
mc.sh move <id> <column> — move to a different column
mc.sh review <id> "Output with evidence" — move to review with proof
mc.sh done <id> — mark complete
mc.sh list — list tasks
mc.sh assign <id> <agent> — reassign

mc-auto-pull.sh — the pull loop (cron, every 10 min)

For a given agent, this script:

Fetches all tasks from GET /api/tasks
Filters for column == "todo" AND assignee == this agent (case-insensitive)
If none found, checks for backlog tasks assigned to this agent and promotes the oldest to todo
Picks the oldest todo task (sorted by created_at)
Moves it to doing via PATCH /api/tasks/:id
Selects a model based on task hints + keyword analysis + fallback chain
Builds rich context via mc-build-context.sh (workspace files + task details)
Spawns execution: openclaw invoke (OpenClaw agents) or hermes chat (Hermes agents)
Writes a tracker file to .entity-mc/exec-tracking/task-:id.json for watchdog

The spawned process gets a prompt that includes the task, context bundle, skill instructions, and a hard contract: "Your final action MUST be mc.sh review or mc.sh note. If you skip this, the task is orphaned."

mc-stall-check.sh — the health check (cron, every 2h)

Detects:

Tasks in doing for >24h — posts nag comment to activity log
Tasks in review for >48h — flags for operator attention

Uses dedup state (.entity-mc/mc-stall-nag-state.json) to avoid re-nagging within 2 hours.

Watchdog (inside auto-pull)

Before each pull cycle, checks previous executions:

If a tracked process is dead but task is still in doing → moves back to todo (orphan recycle)
If a tracked process has been running >45 min → kills it (zombie kill)
Board sweep: tasks in doing with no running process and no tracker file for >2h → moves to todo

mc-build-context.sh — context assembly

At pull time, assembles and injects into the execution prompt:

memory/tools-reference.md — tools, credentials, host mappings
memory/agents-reference.md — crew topology, who does what, SSH routes
memory/rules.md — safety rules, delegation constraints
memory/user-model.md — operator preferences
Project context files if specified in task metadata
Skill files if specified in task metadata
Agent-specific spawn-prompt.md from .entity-mc/

Entity OS API

Key endpoints

GET /api/tasks?limit=N&offset=N — list tasks (returns {tasks, total, hasMore})
GET /api/tasks/:id — single task with full fields + activity
POST /api/tasks — create task
PATCH /api/tasks/:id — partial update (column, assignee, output, etc.)
PUT /api/tasks/:id — full replace
GET /api/tasks/:id/activity — activity log
POST /api/tasks/:id/activity — add activity entry

Auth: token-based (API key header or X-Agent-Name header for agent operations).

Cron setup per agent

Each agent gets two crontab blocks managed by the entity-mc installer:

*/10 * * * * — auto-pull
0 */2 * * * — stall-check

Logs go to .entity-mc/cron.log and .entity-mc/exec.log.

Currently wired agents

Ada — operator/orchestrator, ada-gateway (Linux), OpenClaw runtime
Spock — researcher, Pi (Linux), OpenClaw runtime
Scotty — builder, Pi (Linux), OpenClaw runtime
Book — knowledge/research, Enterprise Mac, Hermes runtime
Zora — media/publishing, MascotM3 Mac, OpenClaw runtime

How Memory Works

Two layers:

Layer 1: The board (Entity OS database)

Task description — what the work is, scope, constraints
Task output — evidence and results filled at review
Activity log — timestamped record of every column move, note, and status change
Project tags — grouping for filtered views
Metadata — skill hints, context file paths, custom prompts

This is the shared memory surface. Every agent and the human can query it via the API.

Layer 2: Workspace context (agent filesystem)

At task-pull time, mc-build-context.sh reads and injects:

memory/tools-reference.md — what tools exist, how to reach them
memory/agents-reference.md — crew structure, delegation routing
memory/rules.md — what the agent must/must not do
memory/user-model.md — how the operator wants things done
Project context files — anything task-specific

These are NOT in Entity's database. They live in the agent's workspace directory and are injected into the execution prompt by the build-context script. Entity points at them through file sources (browseable in the doc hub), but the files themselves are on the agent's machine.

Debugging note: If an agent seems to lack context, check two things:

Is the task description complete? — Layer 1 context comes from the task itself. Vague descriptions → vague execution.
Are the workspace files current? — Layer 2 context comes from the agent's memory/ directory. Stale files → stale behavior. These are NOT auto-synced across agents.

Common Failure Modes

Task stuck in "doing" forever

Causes:

The spawned process died without running mc.sh review or mc.sh note
The agent hit a blocker and didn't follow the blocker protocol (move back to todo)
The exec tracker file got deleted, so the watchdog can't find the orphan

Auto-recovery:

Watchdog (next auto-pull cycle) checks for dead processes and moves orphans back to todo
Board sweep catches tasks with no tracker file that have been in doing >2h
Stall-check flags tasks in doing >24h with nag comments

Manual fix: mc.sh move <id> todo or move in the UI

Agent not picking up tasks

Check:

Is the cron running? Check .entity-mc/cron.log for recent entries
Is the task in todo (not backlog)? Auto-pull only picks from todo
Is the task assigned to this agent (case-insensitive match)?
Is the agent already at MAX_DOING (default 10) concurrent tasks?
Is there a lock file stuck? Check /tmp/mc-auto-pull-<agent>.lock or .lck — stale after 10 min
Can the agent reach the Entity OS API? curl -sf http://100.104.229.62:3000/api/tasks?limit=1

Task keeps getting pulled but never completes

Check:

Is the model resolving? Check .entity-mc/exec.log — the agent might be falling back to a model that can't do the work
Is the context bundle being built? Check that mc-build-context.sh exists in the runtime path
Is spawn-prompt.md present? Missing agent instructions → confused execution
Is the task description specific enough? Vague tasks get vague results

Entity OS not loading / wrong data

Check:

Which host is serving? Entity runs on Enterprise (100.104.229.62:3000). Verify with curl
Which SQLite DB is the process reading? Check the process args for the DB path
Is the frontend build current? The server serves static files from packages/app/dist/
Multiple instances? Check for duplicate processes on different ports

The Roadmap: What We're Building Next

Entity OS works for our crew. Here's what we're packaging so it works for yours.

1. Memory Packs

Today, workspace context is assembled ad-hoc by mc-build-context. Coming: schema'd, versioned, per-project memory packs editable from the Entity OS UI.

2. Agent Profiles

Today, agents are set up through manual script installation. Coming: declarative agent profiles — name, role, runtime, task policy, cron policy — registerable from the UI or API.

3. Configurable Intake

Today, tasks arrive from agent judgment and cron scripts. Coming: intake channels (email, GitHub, webhooks) with dedup, routing rules, and approval boundaries.

4. ProofDesk

Today, evidence is semi-structured text in the output field. Coming: structured evidence bundles (tests, screenshots, URLs, commits) with verification and pass/fail status.

5. Execution Bridge

Today, Entity OS assigns and agents execute independently. Coming: a clean abstraction for dispatch, context building, log capture, and return-to-review.

6. Native Scheduling

Today, crons live in system crontab. Entity sees effects but doesn't own the schedule. Coming: Entity OS-native scheduling with full visibility.

Why This Matters

The AI agent space is flooding with control planes and orchestration layers. Most of them are designed by people who haven't run a real agent crew on production work.

We have. Entity OS is the workspace that emerged from doing it: 520+ tasks, 7 agents, 4 machines, real deploys, real incidents, real recovery. The operating model above is not theoretical. It's the extracted truth from a working system.

The teams that will win with AI agents are the ones that can see what their agents are doing, prove that it worked, and iterate fast. Not the ones with the fanciest orchestration. The ones with the best operating model.

The problem withrunning AI agents

The Core Loop

What Entity OS Does Today

Mission Control (the board)

Proof, not promises

Artifact browser

Agent coordination

How Tasks Get Created

How Agents Connect to Entity OS

mc.sh — the CLI

mc-auto-pull.sh — the pull loop (cron, every 10 min)

mc-stall-check.sh — the health check (cron, every 2h)

Watchdog (inside auto-pull)

mc-build-context.sh — context assembly

Entity OS API

Cron setup per agent

Currently wired agents

How Memory Works

Layer 1: The board (Entity OS database)

Layer 2: Workspace context (agent filesystem)

Common Failure Modes

The Roadmap: What We're Building Next

1. Memory Packs

2. Agent Profiles

3. Configurable Intake

4. ProofDesk

5. Execution Bridge

6. Native Scheduling

Why This Matters

Entity OS is that operating model.

The problem with
running AI agents