- Rust 77.7%
- Python 7.7%
- JavaScript 7.3%
- TeX 4%
- CSS 1.3%
- Other 2%
regression: "create hello.py and verify it works" was hanging 3+ minutes and ultimately marked failed even though hello.py worked. Three cooperating bugs: 1. Default-verifier inference too loose (judge contract). The words "verify"/"verification"/"build" anywhere in the prompt would auto-install a required pytest verifier. Hello-world has no tests; pytest exited 5 (no tests collected); judge marked failed. Drop those three trigger words — keep only "test"/"tests"/"tested" and the explicit-framework path. Adds 2 regression tests. 2. Hard intra-phase verification gate (executor). complete_phase was refused while pending_verifications > 0, expecting a test/build to clear debt first. The `python3 hello.py` -> mutated-file path is supposed to clear it, but with reused workspaces (leftover files from prior runs) the mutated_paths set ends up with mismatched entries and the heuristic misses. Agent then burns minutes on consult_council retries. Soften: emit a trace event, allow the call. The contract-level verifier still gates explicit-test tasks. 3. Phantom `running` autonomy_jobs after unclean shutdown. On clean shutdown the daemon already marks running rows as paused. On crash /kill -9 that path skips, leaving rows that the autonomy reconciler re-touches each tick — symptom we hit in this session was the whole RPC layer wedging until the rows were manually edited out of sqlite. Mirror the shutdown sweep at startup whenever previous_run_was_unclean. Smoke: "create hello.py and verify it works" now finishes in 24s with status=completed, verifier_status=passed. Prior run was timing out at 180s with the agent still bouncing between complete_phase refusals. |
||
|---|---|---|
| .claude/worktrees | ||
| .forgejo/workflows | ||
| .githooks | ||
| .github/workflows | ||
| benchmarks | ||
| crates | ||
| deploy | ||
| docs | ||
| examples | ||
| scripts | ||
| skills | ||
| .env.example | ||
| .gitignore | ||
| buildenv.sh | ||
| Cargo.lock | ||
| Cargo.toml | ||
| CONTRIBUTING.md | ||
| deny.toml | ||
| Dockerfile | ||
| Makefile | ||
| README.md | ||
| shrimp.yml | ||
| shrimp.yml.example | ||
Hero Shrimp
Hero Shrimp is a local Hero service for assistant-driven work. It combines quick answers, delegated background tasks, memory, schedules, tool execution, audit logging, skills, and proof review into one Rust workspace.
The everyday user surface is assistant-first: ask, do, remember, schedule, jobs, inbox, workspace, and proof. Internally, durable background work is still represented as a job with job.*, job_id, artifact_job_id, and .agent/job_artifacts/ proof files.
What It Does
- Accepts assistant requests from the CLI, UI, SDK, and channel adapters.
- Answers quick questions directly and turns delegated tasks into scoped, proof-backed background jobs.
- Tracks delegated work through plans, phases, tool calls, verifier evidence, proof bundles, and postludes.
- Persists conversations, jobs, tasks, subagents, memories, playbooks, audit rows, and work scopes in SQLite.
- Provides an operator UI for chat, live jobs, proof review, memory management, schedules, and diagnostics.
- Supports Hero-managed service lifecycle through the
hero_shrimpmanager binary. - Speaks the Agent Client Protocol so Zed (and other ACP-aware editors) can drive the agent natively.
What's Unique
What no other coding+personal agent ships in one piece:
- Plan → Verify → Proof — every claim of phase completion is backed by a contract verifier that must exit zero. No "completed (unverified)" labels exist anywhere in the system.
- Ed25519-signed wire log — every event is appended to a hash-chained JSONL with a per-process signing key. A third party given only the log + the public key can attest the file is unmodified since the daemon wrote it. Keys rotate via
hero_shrimp wire-log rotate-keys. - Persistent USD spending cap — daily + monthly + per-provider caps survive
pkill, hydrated from SQLite on boot. A runaway daemon can't outspend its budget. - Verifier-signed RL trajectories — every Plan→Verify→Proof cycle emits a structured row to
$SHRIMP_TRAJECTORY_DIR/<day>.jsonlwithverifier_passed: boolfrom the contract verifier (not self-assessment).hero_shrimp trajectory export --format atroposbundles them as a training dataset where reward = verifier outcome. - Persistent user model — Honcho-style dialectic state (preferences, working style, current goals) survives across sessions. LLM-callable via
user_model_remember/user_model_recall; an optional background worker (SHRIMP_USER_MODEL_WORKER=1) proposes updates from recent dialogue with a confidence floor. - Drift-gated self-improving skills — markdown skills can be auto-revised by an LLM proposer (
SHRIMP_SKILL_EVOLUTION=1); every candidate edit passes throughskill_drift::detect_driftbefore touching disk, so a confused revision can't silently break references to missing tools. - Verifier flake- and repair-retries —
SHRIMP_VERIFIER_RETRY=1re-runs the verifier once on transient failures (timeouts, port races, connection refused);SHRIMP_VERIFIER_REPAIR=1asks the LLM for a unified-diff fix on real failures, applies it viagit apply, and re-verifies — bounded to one attempt with the audit row showing exactly what diff landed.
See docs/architecture.md for the full surface, docs/components.md for the module map, and docs/internals.md for deep technical details (hooks, sandbox, signed log, ACP).
5-Minute Quickstart
The shortest path from clone to a verifier-gated job with a custom agent.
1. Boot the daemon
make install-hero # one-time: register Hero service
make run-hero # start the daemon
make status-hero # confirm it is running
The daemon listens on a Unix socket and mints a bearer token at
$XDG_CONFIG_HOME/hero_shrimp/rpc.token (mode 0600). The shipped CLI,
SDK, and Web UI read it automatically.
2. Ask a quick question
hero_shrimp ask "what is the difference between ask and do?"
Every session writes a JSONL wire log of tool calls and results. Resolve its path or tail it via RPC (see step 5).
3. Add a custom agent
Drop a YAML profile in ~/.shrimp/agents/ (or $WORKSPACE/.shrimp/agents/
for project-scoped overrides). A worked example ships at
docs/examples/agents/code-reviewer.yml:
mkdir -p ~/.shrimp/agents
cp docs/examples/agents/code-reviewer.yml ~/.shrimp/agents/
List loaded agents and inspect rendered prompts:
hero_shrimp agents list
hero_shrimp agents show code-reviewer
Activate the agent for the current chat session by typing in a chat message:
/agent code-reviewer
The pin persists for that channel/user until you /agent clear. From now
on, delegate_task calls and message turns honor the agent's tool
allowlist, sandbox floor, and approval mode.
4. Start a verifier-gated job
hero_shrimp do "add validation for the config loader"
hero_shrimp jobs
Each job runs under an isolated workspace at jobs/<job_id>/. Job
status carries a top-level verifier_status field —
passed / failed / missing / running / not_required — so the
contract verifier moat is visible to UIs and scripts, not buried in
nested phase output.
hero_shrimp jobs --json | jq '.[].verifier_status'
A job will not be reported completed until verification passes. There
is no "Completed (unverified)" status.
5. Plug in MCP servers
Hero Shrimp speaks MCP. Add servers under mcp.servers in
shrimp.yml (stdio or socket transport). Inspect what's loaded:
hero_shrimp mcp list
hero_shrimp tools list # MCP-sourced tools show source: "mcp"
The daemon performs an OSV malware preflight before spawning any
npx or uvx stdio server.
6. Replay or audit a session
Every session writes a JSONL wire log of tool calls and results. Tail
it with standard tools, or fetch the last N events via wire.tail
(RPC). The tail reader seeks from the end in 64KB chunks, so it stays
O(limit) regardless of file size.
7. Swap providers
The default config routes through OpenRouter. To use Anthropic, OpenAI,
Ollama, or Mistral directly, copy a provider overlay from
docs/examples/providers/:
cp docs/examples/providers/anthropic.yml ./shrimp.yml
export ANTHROPIC_API_KEY=sk-ant-...
make run-hero
See docs/examples/providers/README.md for the full matrix.
8. Benchmark
Run a JSONL fixture and emit per-task pass/fail plus an aggregate summary:
hero_shrimp eval docs/examples/eval/smoke.jsonl --judge engine
Judge modes: engine (contract verifier), external (your shell
verifier exits 0), or both. Bring your own fixtures —
docs/examples/eval/README.md documents the format.
What just happened
That walk-through exercised the load-bearing moats:
- Bubblewrap-sandboxed shell — every
shell_runinvocation runs underbwrapwith PID/UTS/IPC namespaces and an allowlisted/etcsurface (DNS, certs, passwd, localtime — not the whole directory). - Custom agent profiles — YAML-driven personas with per-agent tool
allowlists, model overrides, and
${SHRIMP_*}template vars, selectable per chat with/agent. - Contract verifier — proof-backed jobs that surface
verifier_statusas a first-class field, not a derived afterthought. - MCP support — stdio + HTTP-broker transports with OSV preflight, so external Model Context Protocol servers (GitHub, Postgres, filesystem, Linear, etc.) plug in without engine changes.
- Provider matrix — OpenAI-compatible and Anthropic-compatible
protocols, per-phase routing, ready-to-use overlays in
docs/examples/providers/. - Wire log — every session emits a replayable JSONL audit trail.
- Checkpoint store — file-level revert backed by a FIFO-capped store so memory cost stays bounded regardless of session length.
Workspace Layout
crates/hero_shrimp_types shared domain and wire types
crates/hero_shrimp_store config, SQLite storage, memory, queue, events
crates/hero_shrimp_engine agent loop, tools, jobs, admin routes, skills
crates/hero_shrimp_server daemon process and JSON-RPC server
crates/hero_shrimp manager and thin CLI
crates/hero_shrimp_sdk typed SDK and OpenRPC client helpers
crates/hero_shrimp_web web UI service
crates/hero_shrimp_examples integration examples
examples/ plugin examples
scripts/ Hero service scripts and build helpers
skills/ embedded markdown skill bundles
docs/ current architecture and concept docs
Build And Check
make check
make test
make lint
make build
Useful focused checks while editing:
cargo check --workspace --all-targets
cargo test -p hero_shrimp_server --lib
cargo test -p hero_shrimp_engine --test admin_endpoints
cargo test -p hero_shrimp_store work_contexts
Start The Service
For Hero-managed local service lifecycle:
make install-hero
make run-hero
make status-hero
make stop-hero
For a local development server with debug logging:
make dev
The manager binary is hero_shrimp. The server binary is hero_shrimp_server. The Web UI binary is hero_shrimp_web.
CLI Usage
Ask a quick question:
hero_shrimp ask "summarize the current project state"
Delegate a proof-backed background task:
hero_shrimp do "add validation for the config loader"
Remember a fact or preference:
hero_shrimp remember "status_style=concise with proof for file changes"
Schedule recurring work:
hero_shrimp schedule --cadence "daily 9am" "summarize active jobs and anything needing attention"
Check current work:
hero_shrimp jobs
hero_shrimp inbox
hero_shrimp inbox --json
inbox summarizes jobs needing attention, recurring schedules, and recent active memories so you do not have to inspect each surface separately.
Target a specific workspace for delegated file work:
hero_shrimp do --workdir /path/to/repo "fix the failing parser tests"
When no explicit workspace is supplied, delegated jobs use an isolated workspace under Shrimp's configured workspace directory, normally jobs/<job_id>/.
Job Artifacts
Each job has durable state in SQLite and human-readable proof artifacts on disk.
Common files and directories:
jobs/<job_id>/ isolated job workspace
.agent/job_artifacts/<job_id>/ proof and execution-control bundle
.agent/job_artifacts/<job_id>/job.json job metadata snapshot
.agent/job_artifacts/<job_id>/job_contract.json
.agent/job_artifacts/<job_id>/job_journal.jsonl
.agent/job_artifacts/<job_id>/file_claims.json
.agent/job_artifacts/<job_id>/file_conflicts.json
.agent/job_artifacts/<job_id>/file_checkpoints/
The database is authoritative for queryable state. Artifact files are the operator-facing proof surface.
RPC And UI
The daemon exposes newline-delimited JSON-RPC over a Unix socket. Public job methods use the job.* namespace. Introspection methods use job-first names such as intro.recent_jobs.
The same socket speaks HTTP/1.1 for legacy SDK clients (POST /rpc,
GET /openrpc.json, GET /health) and a Server-Sent Events feed for live
chat streaming (GET /events/chat?session_id=<id>).
JSON-RPC requests may be sent singly (one object per NDJSON line or
POST body) or batched as a JSON array; the server returns an array of
responses in matching order, with notifications (no id) silently
consumed per the JSON-RPC 2.0 spec.
Authentication
The daemon mints a bearer token at boot, stored mode 0600 at
$XDG_CONFIG_HOME/hero_shrimp/rpc.token. Two enforcement modes:
| Mode | When | Behavior |
|---|---|---|
| Open (default) | SHRIMP_REQUIRE_AUTH unset |
Token exists but is not required; any UDS-local process is trusted. Existing CLIs work unchanged. |
| Enforced | SHRIMP_REQUIRE_AUTH=1 |
Every non-public method requires Authorization: Bearer <token> (HTTP) or auth: "<token>" (JSON-RPC top-level). rpc.health and rpc.discover stay public so liveness probes still work. |
The shipped CLI, SDK (hero_shrimp_sdk), and web client read the token
file automatically — flipping SHRIMP_REQUIRE_AUTH=1 requires no client
change.
Tool-approval tiers
message.send accepts an optional approval_mode parameter
(plan | default | auto_edit | yolo) matching qwen-code's tiered
flow. When omitted, the engine derives the tier from sandbox_mode +
yolo. Invalid values are rejected with -32602 Invalid Params (no
silent downgrade).
Filesystem RPC sandbox
fs.read_file, fs.list_directory, fs.path_info, fs.git_diff, and
editor.open always refuse a hardcoded set of sensitive paths
(~/.ssh, ~/.aws/credentials, /etc/shadow, /proc/*/environ, ...).
Set SHRIMP_FS_RESTRICT_TO_WORKSPACE=1 to further restrict reads to
configured workspace roots — recommended for multi-tenant or
network-exposed deployments.
The operator UI shows:
- Inbox, conversations, and session history.
- Live jobs and recent jobs.
- Job proof bundles, files, audit rows, and phase state.
- Memory, schedules, playbooks, dreams, skills, settings, and diagnostics.
More Docs
docs/getting-started.md— 5-minute tutorial: start the daemon, use the CLI, open the web UI.docs/dataflow.md— high-level dataflow diagrams (subagents, MCP, custom tools, personality, the proof gate).docs/concepts.md— domain vocabulary and naming rules.docs/architecture.md— runtime architecture and data flow.docs/components.md— per-crate + per-module inventory.docs/internals.md— deep technical reference: Plan→Verify→Proof, hook taxonomy, sandbox backends, signed wire log, USD caps, memory layers, MCP, ACP.BRUTAL_REVIEW.md— adversarial head-to-head vs kimi-cli, qwen-code, picoclaw, hermes-agent.COMPETITOR_AUDIT_*.md— per-competitor source-level audits.
Naming Rule
Use assistant-first language for normal user flows. Use job for technical execution records, API/status fields, proof artifacts, and storage internals.