No description
  • Rust 77.7%
  • Python 7.7%
  • JavaScript 7.3%
  • TeX 4%
  • CSS 1.3%
  • Other 2%
Find a file
xmonader 8ceeb235ee
Some checks failed
Verify / verify (push) Failing after 8s
Build Linux / build-linux (push) Failing after 5m16s
fix(judge): unstick "create X and verify" — three-cut bundle
regression: "create hello.py and verify it works" was hanging
3+ minutes and ultimately marked failed even though hello.py worked.
Three cooperating bugs:

1. Default-verifier inference too loose (judge contract). The words
   "verify"/"verification"/"build" anywhere in the prompt would
   auto-install a required pytest verifier. Hello-world has no tests;
   pytest exited 5 (no tests collected); judge marked failed. Drop
   those three trigger words — keep only "test"/"tests"/"tested" and
   the explicit-framework path. Adds 2 regression tests.

2. Hard intra-phase verification gate (executor). complete_phase was
   refused while pending_verifications > 0, expecting a test/build to
   clear debt first. The `python3 hello.py` -> mutated-file path is
   supposed to clear it, but with reused workspaces (leftover files
   from prior runs) the mutated_paths set ends up with mismatched
   entries and the heuristic misses. Agent then burns minutes on
   consult_council retries. Soften: emit a trace event, allow the
   call. The contract-level verifier still gates explicit-test tasks.

3. Phantom `running` autonomy_jobs after unclean shutdown. On clean
   shutdown the daemon already marks running rows as paused. On crash
   /kill -9 that path skips, leaving rows that the autonomy reconciler
   re-touches each tick — symptom we hit in this session was the
   whole RPC layer wedging until the rows were manually edited out
   of sqlite. Mirror the shutdown sweep at startup whenever
   previous_run_was_unclean.

Smoke: "create hello.py and verify it works" now finishes in 24s with
status=completed, verifier_status=passed. Prior run was timing out at
180s with the agent still bouncing between complete_phase refusals.
2026-05-18 21:49:46 +02:00
.claude/worktrees chore: gitignore .claude/scheduled_tasks.lock (session-local) 2026-05-16 18:42:44 +02:00
.forgejo/workflows production: supply-chain audit in CI (cargo-deny + cargo-audit) 2026-05-16 18:49:43 +02:00
.githooks config: drop legacy un-prefixed env aliases (BREAKING) 2026-04-26 18:46:09 +03:00
.github/workflows production: supply-chain audit in CI (cargo-deny + cargo-audit) 2026-05-16 18:49:43 +02:00
benchmarks clean up 2026-05-06 16:18:17 +02:00
crates fix(judge): unstick "create X and verify" — three-cut bundle 2026-05-18 21:49:46 +02:00
deploy carving into more maintainable modules 2026-05-15 23:32:12 +02:00
docs feat(tools): YAML-declared user-defined exec tools 2026-05-18 11:49:31 +02:00
examples harden: trust-first defaults, wire-log verifier, glue consolidation, legacy purge 2026-05-17 13:37:53 +02:00
scripts feat(visibility): intro.active_agents RPC + shrimp who CLI + per-tool agent log 2026-05-18 14:32:50 +02:00
skills streamline: boot path + skill catalog warnings 2026-05-18 08:01:19 +02:00
.env.example arch: approval tiers, async tool handlers, retry budget, JSON-RPC batch 2026-05-16 09:03:42 +02:00
.gitignore fix(judge): unstick "create X and verify" — three-cut bundle 2026-05-18 21:49:46 +02:00
buildenv.sh update _ui crate to be _web create 2026-05-07 21:20:20 +02:00
Cargo.lock chore(engine): drop unused calamine + pdf-extract dependencies 2026-05-18 08:23:03 +02:00
Cargo.toml refactor: rename hero_shrimp_store → hero_shrimp_runtime (Move A) 2026-05-16 22:07:36 +02:00
CONTRIBUTING.md clean up 2026-05-06 16:18:17 +02:00
deny.toml production: supply-chain audit in CI (cargo-deny + cargo-audit) 2026-05-16 18:49:43 +02:00
Dockerfile carving into more maintainable modules 2026-05-15 23:32:12 +02:00
Makefile scripts: collapse to {shrimp, demo_showcase.sh}; absorb smoke into showcase 2026-05-18 13:34:49 +02:00
README.md docs: add 5-minute getting-started tutorial 2026-05-18 11:38:01 +02:00
shrimp.yml config: six-model OpenRouter catalog, properly tiered 2026-05-18 16:01:29 +02:00
shrimp.yml.example fix(security): harden runtime safety boundaries 2026-04-25 10:28:02 +03:00

Hero Shrimp

Hero Shrimp is a local Hero service for assistant-driven work. It combines quick answers, delegated background tasks, memory, schedules, tool execution, audit logging, skills, and proof review into one Rust workspace.

The everyday user surface is assistant-first: ask, do, remember, schedule, jobs, inbox, workspace, and proof. Internally, durable background work is still represented as a job with job.*, job_id, artifact_job_id, and .agent/job_artifacts/ proof files.

What It Does

  • Accepts assistant requests from the CLI, UI, SDK, and channel adapters.
  • Answers quick questions directly and turns delegated tasks into scoped, proof-backed background jobs.
  • Tracks delegated work through plans, phases, tool calls, verifier evidence, proof bundles, and postludes.
  • Persists conversations, jobs, tasks, subagents, memories, playbooks, audit rows, and work scopes in SQLite.
  • Provides an operator UI for chat, live jobs, proof review, memory management, schedules, and diagnostics.
  • Supports Hero-managed service lifecycle through the hero_shrimp manager binary.
  • Speaks the Agent Client Protocol so Zed (and other ACP-aware editors) can drive the agent natively.

What's Unique

What no other coding+personal agent ships in one piece:

  • Plan → Verify → Proof — every claim of phase completion is backed by a contract verifier that must exit zero. No "completed (unverified)" labels exist anywhere in the system.
  • Ed25519-signed wire log — every event is appended to a hash-chained JSONL with a per-process signing key. A third party given only the log + the public key can attest the file is unmodified since the daemon wrote it. Keys rotate via hero_shrimp wire-log rotate-keys.
  • Persistent USD spending cap — daily + monthly + per-provider caps survive pkill, hydrated from SQLite on boot. A runaway daemon can't outspend its budget.
  • Verifier-signed RL trajectories — every Plan→Verify→Proof cycle emits a structured row to $SHRIMP_TRAJECTORY_DIR/<day>.jsonl with verifier_passed: bool from the contract verifier (not self-assessment). hero_shrimp trajectory export --format atropos bundles them as a training dataset where reward = verifier outcome.
  • Persistent user model — Honcho-style dialectic state (preferences, working style, current goals) survives across sessions. LLM-callable via user_model_remember / user_model_recall; an optional background worker (SHRIMP_USER_MODEL_WORKER=1) proposes updates from recent dialogue with a confidence floor.
  • Drift-gated self-improving skills — markdown skills can be auto-revised by an LLM proposer (SHRIMP_SKILL_EVOLUTION=1); every candidate edit passes through skill_drift::detect_drift before touching disk, so a confused revision can't silently break references to missing tools.
  • Verifier flake- and repair-retriesSHRIMP_VERIFIER_RETRY=1 re-runs the verifier once on transient failures (timeouts, port races, connection refused); SHRIMP_VERIFIER_REPAIR=1 asks the LLM for a unified-diff fix on real failures, applies it via git apply, and re-verifies — bounded to one attempt with the audit row showing exactly what diff landed.

See docs/architecture.md for the full surface, docs/components.md for the module map, and docs/internals.md for deep technical details (hooks, sandbox, signed log, ACP).

5-Minute Quickstart

The shortest path from clone to a verifier-gated job with a custom agent.

1. Boot the daemon

make install-hero          # one-time: register Hero service
make run-hero              # start the daemon
make status-hero           # confirm it is running

The daemon listens on a Unix socket and mints a bearer token at $XDG_CONFIG_HOME/hero_shrimp/rpc.token (mode 0600). The shipped CLI, SDK, and Web UI read it automatically.

2. Ask a quick question

hero_shrimp ask "what is the difference between ask and do?"

Every session writes a JSONL wire log of tool calls and results. Resolve its path or tail it via RPC (see step 5).

3. Add a custom agent

Drop a YAML profile in ~/.shrimp/agents/ (or $WORKSPACE/.shrimp/agents/ for project-scoped overrides). A worked example ships at docs/examples/agents/code-reviewer.yml:

mkdir -p ~/.shrimp/agents
cp docs/examples/agents/code-reviewer.yml ~/.shrimp/agents/

List loaded agents and inspect rendered prompts:

hero_shrimp agents list
hero_shrimp agents show code-reviewer

Activate the agent for the current chat session by typing in a chat message:

/agent code-reviewer

The pin persists for that channel/user until you /agent clear. From now on, delegate_task calls and message turns honor the agent's tool allowlist, sandbox floor, and approval mode.

4. Start a verifier-gated job

hero_shrimp do "add validation for the config loader"
hero_shrimp jobs

Each job runs under an isolated workspace at jobs/<job_id>/. Job status carries a top-level verifier_status field — passed / failed / missing / running / not_required — so the contract verifier moat is visible to UIs and scripts, not buried in nested phase output.

hero_shrimp jobs --json | jq '.[].verifier_status'

A job will not be reported completed until verification passes. There is no "Completed (unverified)" status.

5. Plug in MCP servers

Hero Shrimp speaks MCP. Add servers under mcp.servers in shrimp.yml (stdio or socket transport). Inspect what's loaded:

hero_shrimp mcp list
hero_shrimp tools list   # MCP-sourced tools show source: "mcp"

The daemon performs an OSV malware preflight before spawning any npx or uvx stdio server.

6. Replay or audit a session

Every session writes a JSONL wire log of tool calls and results. Tail it with standard tools, or fetch the last N events via wire.tail (RPC). The tail reader seeks from the end in 64KB chunks, so it stays O(limit) regardless of file size.

7. Swap providers

The default config routes through OpenRouter. To use Anthropic, OpenAI, Ollama, or Mistral directly, copy a provider overlay from docs/examples/providers/:

cp docs/examples/providers/anthropic.yml ./shrimp.yml
export ANTHROPIC_API_KEY=sk-ant-...
make run-hero

See docs/examples/providers/README.md for the full matrix.

8. Benchmark

Run a JSONL fixture and emit per-task pass/fail plus an aggregate summary:

hero_shrimp eval docs/examples/eval/smoke.jsonl --judge engine

Judge modes: engine (contract verifier), external (your shell verifier exits 0), or both. Bring your own fixtures — docs/examples/eval/README.md documents the format.

What just happened

That walk-through exercised the load-bearing moats:

  • Bubblewrap-sandboxed shell — every shell_run invocation runs under bwrap with PID/UTS/IPC namespaces and an allowlisted /etc surface (DNS, certs, passwd, localtime — not the whole directory).
  • Custom agent profiles — YAML-driven personas with per-agent tool allowlists, model overrides, and ${SHRIMP_*} template vars, selectable per chat with /agent.
  • Contract verifier — proof-backed jobs that surface verifier_status as a first-class field, not a derived afterthought.
  • MCP support — stdio + HTTP-broker transports with OSV preflight, so external Model Context Protocol servers (GitHub, Postgres, filesystem, Linear, etc.) plug in without engine changes.
  • Provider matrix — OpenAI-compatible and Anthropic-compatible protocols, per-phase routing, ready-to-use overlays in docs/examples/providers/.
  • Wire log — every session emits a replayable JSONL audit trail.
  • Checkpoint store — file-level revert backed by a FIFO-capped store so memory cost stays bounded regardless of session length.

Workspace Layout

crates/hero_shrimp_types        shared domain and wire types
crates/hero_shrimp_store        config, SQLite storage, memory, queue, events
crates/hero_shrimp_engine       agent loop, tools, jobs, admin routes, skills
crates/hero_shrimp_server       daemon process and JSON-RPC server
crates/hero_shrimp              manager and thin CLI
crates/hero_shrimp_sdk          typed SDK and OpenRPC client helpers
crates/hero_shrimp_web          web UI service
crates/hero_shrimp_examples     integration examples
examples/                       plugin examples
scripts/                        Hero service scripts and build helpers
skills/                         embedded markdown skill bundles
docs/                           current architecture and concept docs

Build And Check

make check
make test
make lint
make build

Useful focused checks while editing:

cargo check --workspace --all-targets
cargo test -p hero_shrimp_server --lib
cargo test -p hero_shrimp_engine --test admin_endpoints
cargo test -p hero_shrimp_store work_contexts

Start The Service

For Hero-managed local service lifecycle:

make install-hero
make run-hero
make status-hero
make stop-hero

For a local development server with debug logging:

make dev

The manager binary is hero_shrimp. The server binary is hero_shrimp_server. The Web UI binary is hero_shrimp_web.

CLI Usage

Ask a quick question:

hero_shrimp ask "summarize the current project state"

Delegate a proof-backed background task:

hero_shrimp do "add validation for the config loader"

Remember a fact or preference:

hero_shrimp remember "status_style=concise with proof for file changes"

Schedule recurring work:

hero_shrimp schedule --cadence "daily 9am" "summarize active jobs and anything needing attention"

Check current work:

hero_shrimp jobs
hero_shrimp inbox
hero_shrimp inbox --json

inbox summarizes jobs needing attention, recurring schedules, and recent active memories so you do not have to inspect each surface separately.

Target a specific workspace for delegated file work:

hero_shrimp do --workdir /path/to/repo "fix the failing parser tests"

When no explicit workspace is supplied, delegated jobs use an isolated workspace under Shrimp's configured workspace directory, normally jobs/<job_id>/.

Job Artifacts

Each job has durable state in SQLite and human-readable proof artifacts on disk.

Common files and directories:

jobs/<job_id>/                         isolated job workspace
.agent/job_artifacts/<job_id>/         proof and execution-control bundle
.agent/job_artifacts/<job_id>/job.json job metadata snapshot
.agent/job_artifacts/<job_id>/job_contract.json
.agent/job_artifacts/<job_id>/job_journal.jsonl
.agent/job_artifacts/<job_id>/file_claims.json
.agent/job_artifacts/<job_id>/file_conflicts.json
.agent/job_artifacts/<job_id>/file_checkpoints/

The database is authoritative for queryable state. Artifact files are the operator-facing proof surface.

RPC And UI

The daemon exposes newline-delimited JSON-RPC over a Unix socket. Public job methods use the job.* namespace. Introspection methods use job-first names such as intro.recent_jobs.

The same socket speaks HTTP/1.1 for legacy SDK clients (POST /rpc, GET /openrpc.json, GET /health) and a Server-Sent Events feed for live chat streaming (GET /events/chat?session_id=<id>).

JSON-RPC requests may be sent singly (one object per NDJSON line or POST body) or batched as a JSON array; the server returns an array of responses in matching order, with notifications (no id) silently consumed per the JSON-RPC 2.0 spec.

Authentication

The daemon mints a bearer token at boot, stored mode 0600 at $XDG_CONFIG_HOME/hero_shrimp/rpc.token. Two enforcement modes:

Mode When Behavior
Open (default) SHRIMP_REQUIRE_AUTH unset Token exists but is not required; any UDS-local process is trusted. Existing CLIs work unchanged.
Enforced SHRIMP_REQUIRE_AUTH=1 Every non-public method requires Authorization: Bearer <token> (HTTP) or auth: "<token>" (JSON-RPC top-level). rpc.health and rpc.discover stay public so liveness probes still work.

The shipped CLI, SDK (hero_shrimp_sdk), and web client read the token file automatically — flipping SHRIMP_REQUIRE_AUTH=1 requires no client change.

Tool-approval tiers

message.send accepts an optional approval_mode parameter (plan | default | auto_edit | yolo) matching qwen-code's tiered flow. When omitted, the engine derives the tier from sandbox_mode + yolo. Invalid values are rejected with -32602 Invalid Params (no silent downgrade).

Filesystem RPC sandbox

fs.read_file, fs.list_directory, fs.path_info, fs.git_diff, and editor.open always refuse a hardcoded set of sensitive paths (~/.ssh, ~/.aws/credentials, /etc/shadow, /proc/*/environ, ...). Set SHRIMP_FS_RESTRICT_TO_WORKSPACE=1 to further restrict reads to configured workspace roots — recommended for multi-tenant or network-exposed deployments.

The operator UI shows:

  • Inbox, conversations, and session history.
  • Live jobs and recent jobs.
  • Job proof bundles, files, audit rows, and phase state.
  • Memory, schedules, playbooks, dreams, skills, settings, and diagnostics.

More Docs

  • docs/getting-started.md — 5-minute tutorial: start the daemon, use the CLI, open the web UI.
  • docs/dataflow.md — high-level dataflow diagrams (subagents, MCP, custom tools, personality, the proof gate).
  • docs/concepts.md — domain vocabulary and naming rules.
  • docs/architecture.md — runtime architecture and data flow.
  • docs/components.md — per-crate + per-module inventory.
  • docs/internals.md — deep technical reference: Plan→Verify→Proof, hook taxonomy, sandbox backends, signed wire log, USD caps, memory layers, MCP, ACP.
  • BRUTAL_REVIEW.md — adversarial head-to-head vs kimi-cli, qwen-code, picoclaw, hermes-agent.
  • COMPETITOR_AUDIT_*.md — per-competitor source-level audits.

Naming Rule

Use assistant-first language for normal user flows. Use job for technical execution records, API/status fields, proof artifacts, and storage internals.