AI Broker: OpenRPC-driven Python code generation and execution

mik-tf commented

2026-03-11 18:03:04 +00:00

Owner

Summary

Evolve the AI Broker into an intelligent code-generation broker that converts OpenRPC specs into lightweight Python clients and uses AI to generate and execute code based on user intent.

Design

Core Flow

Ingest OpenRPC specs — feed all OpenRPC specs we want to support (endpoint URLs are embedded in the specs)
Generate 2 Python files per spec:
- A client library to talk to the OpenRPC backend
- A minimal interface file with instructions on how to use it
Understand user intent — user asks something, the broker figures out what they want to do
Generate Python code — feed the interface (not the full spec) to a model (e.g. mercury2), which writes Python to fulfill the intent
Execute in a loop — run the generated Python, retry until it works 100%

Versioning

When an OpenRPC spec changes (detected via hash), automatically:

Regenerate the Python client
Run it to verify it compiles and works
Publish the new version

Why This Is Better Than MCP

Much less context — only the interface file is fed to the AI, not the full spec
Standard broker pattern — ask intent → populate the right RPC pieces → generate code → execute
Language-native — generates real Python code, not protocol-level tool calls

Technical Details

Python runtime: uv
Runs locally on the Hero
Model: mercury2 (or configurable) for intent understanding and code generation
Repo: hero_aibroker

Tasks

Define the OpenRPC → Python client generator
Define the OpenRPC → interface file generator (minimal, instruction-bearing)
Implement intent detection in the broker
Implement Python code generation from interface + intent
Implement execute-in-a-loop with retry until success
Implement hash-based versioning for spec changes
Integrate with uv for Python execution
Write instruction files per OpenRPC spec

From meeting notes 2026-03-11

## Summary Evolve the AI Broker into an intelligent code-generation broker that converts OpenRPC specs into lightweight Python clients and uses AI to generate and execute code based on user intent. ## Design ### Core Flow 1. **Ingest OpenRPC specs** — feed all OpenRPC specs we want to support (endpoint URLs are embedded in the specs) 2. **Generate 2 Python files per spec:** - A **client library** to talk to the OpenRPC backend - A **minimal interface file** with instructions on how to use it 3. **Understand user intent** — user asks something, the broker figures out what they want to do 4. **Generate Python code** — feed the interface (not the full spec) to a model (e.g. mercury2), which writes Python to fulfill the intent 5. **Execute in a loop** — run the generated Python, retry until it works 100% ### Versioning When an OpenRPC spec changes (detected via hash), automatically: - Regenerate the Python client - Run it to verify it compiles and works - Publish the new version ### Why This Is Better Than MCP - **Much less context** — only the interface file is fed to the AI, not the full spec - **Standard broker pattern** — ask intent → populate the right RPC pieces → generate code → execute - **Language-native** — generates real Python code, not protocol-level tool calls ## Technical Details - **Python runtime:** `uv` - **Runs locally** on the Hero - **Model:** mercury2 (or configurable) for intent understanding and code generation - **Repo:** [hero_aibroker](https://forge.ourworld.tf/lhumina_code/hero_aibroker) ## Tasks - [ ] Define the OpenRPC → Python client generator - [ ] Define the OpenRPC → interface file generator (minimal, instruction-bearing) - [ ] Implement intent detection in the broker - [ ] Implement Python code generation from interface + intent - [ ] Implement execute-in-a-loop with retry until success - [ ] Implement hash-based versioning for spec changes - [ ] Integrate with `uv` for Python execution - [ ] Write instruction files per OpenRPC spec --- *From meeting notes 2026-03-11*

timur commented

2026-03-12 09:36:26 +00:00

Owner

Architecture Design

Three-Layer Architecture

The AI Broker evolves from an LLM proxy into an Agent & MCP Broker with three layers:

┌─────────────────────────────────────────────────────────┐
│                     AI Broker                            │
│                                                          │
│  ┌────────────────────────────────────────────────────┐  │
│  │  ACP Layer (Agent Communication Protocol)           │  │
│  │  REST API: /agents, /threads, /runs                 │  │
│  │                                                     │  │
│  │  Agents:                                            │  │
│  │    agent_hero (intent → rerank → codegen → execute) │  │
│  │    (future: agent_deploy, agent_monitor, ...)       │  │
│  └────────────────────────────────────────────────────┘  │
│                          │ uses                          │
│  ┌────────────────────────────────────────────────────┐  │
│  │  MCP Layer                                          │  │
│  │  mcp_hero: discover, ingest, list, execute          │  │
│  │  mcp_serpapi, mcp_exa, ... (existing)               │  │
│  └────────────────────────────────────────────────────┘  │
│                          │ uses                          │
│  ┌────────────────────────────────────────────────────┐  │
│  │  LLM Layer (existing)                               │  │
│  │  Provider routing, mercury2 via OpenRouter           │  │
│  │  OpenAI-compatible API                               │  │
│  └────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Core Flow (agent_hero)

User sends intent via ACP (POST /runs/wait)
Intent + rerank (single mercury2 call): determine what the user wants + which Hero services are relevant
Load interface files for selected services (lightweight AI-generated Python interfaces, not full OpenRPC specs)
Generate Python code (mercury2): using interface files as context, generate a script to fulfill the intent
Execute with uv: run the script, capture stdout/stderr
Retry loop: on error, feed error back to mercury2, regenerate, max 3 attempts
Return result via ACP run response

Smart Caching (Hash-Based)

~/.hero/var/aibroker/services/
  hero_compute_manager/
    spec_hash: "a1b2c3..."     ← SHA-256 of OpenRPC spec
    openrpc.json                ← cached spec
    client.py                   ← AI-generated full Python client
    interface.py                ← AI-generated minimal interface

On discovery: call rpc.discover on each socket in ~/hero/var/sockets/
Hash the spec, compare to stored hash
Only regenerate client.py + interface.py when hash changes
Verify generated client works before accepting

Interface File Format

AI-generated, minimal, with summary line for reranking:

# SERVICE: hero_compute_manager
# SUMMARY: Manage compute resources, VMs, containers, and deployments
# SOCKET: ~/hero/var/sockets/hero_compute_manager.sock
#
# Available functions:
#   list_vms() -> list of virtual machines
#   deploy(config) -> deployment result
#   ...

The reranker only reads SERVICE + SUMMARY lines. Full interface files loaded only for selected services.

Protocol Layering

ACP (Agent Communication Protocol): agent-to-agent communication, task lifecycle, discovery — spec
MCP (Model Context Protocol): tool invocation, LLM-to-tool bridge
LLM routing: existing provider proxy (OpenRouter, Groq, SambaNova, OpenAI)

ACP REST endpoints added to existing hero_aibroker_ui Axum server.

Why Better Than MCP Alone

Much less context: only lightweight interface files sent to LLM, not full OpenRPC specs
Code generation: produces real Python, not protocol-level tool calls
Multi-service: single query can span multiple Hero services
Smart caching: regenerate only when specs change

## Architecture Design ### Three-Layer Architecture The AI Broker evolves from an LLM proxy into an **Agent & MCP Broker** with three layers: ``` ┌─────────────────────────────────────────────────────────┐ │ AI Broker │ │ │ │ ┌────────────────────────────────────────────────────┐ │ │ │ ACP Layer (Agent Communication Protocol) │ │ │ │ REST API: /agents, /threads, /runs │ │ │ │ │ │ │ │ Agents: │ │ │ │ agent_hero (intent → rerank → codegen → execute) │ │ │ │ (future: agent_deploy, agent_monitor, ...) │ │ │ └────────────────────────────────────────────────────┘ │ │ │ uses │ │ ┌────────────────────────────────────────────────────┐ │ │ │ MCP Layer │ │ │ │ mcp_hero: discover, ingest, list, execute │ │ │ │ mcp_serpapi, mcp_exa, ... (existing) │ │ │ └────────────────────────────────────────────────────┘ │ │ │ uses │ │ ┌────────────────────────────────────────────────────┐ │ │ │ LLM Layer (existing) │ │ │ │ Provider routing, mercury2 via OpenRouter │ │ │ │ OpenAI-compatible API │ │ │ └────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────┘ ``` ### Core Flow (agent_hero) 1. **User sends intent** via ACP (`POST /runs/wait`) 2. **Intent + rerank** (single mercury2 call): determine what the user wants + which Hero services are relevant 3. **Load interface files** for selected services (lightweight AI-generated Python interfaces, not full OpenRPC specs) 4. **Generate Python code** (mercury2): using interface files as context, generate a script to fulfill the intent 5. **Execute** with `uv`: run the script, capture stdout/stderr 6. **Retry loop**: on error, feed error back to mercury2, regenerate, max 3 attempts 7. **Return result** via ACP run response ### Smart Caching (Hash-Based) ``` ~/.hero/var/aibroker/services/ hero_compute_manager/ spec_hash: "a1b2c3..." ← SHA-256 of OpenRPC spec openrpc.json ← cached spec client.py ← AI-generated full Python client interface.py ← AI-generated minimal interface ``` - On discovery: call `rpc.discover` on each socket in `~/hero/var/sockets/` - Hash the spec, compare to stored hash - Only regenerate client.py + interface.py when hash changes - Verify generated client works before accepting ### Interface File Format AI-generated, minimal, with summary line for reranking: ```python # SERVICE: hero_compute_manager # SUMMARY: Manage compute resources, VMs, containers, and deployments # SOCKET: ~/hero/var/sockets/hero_compute_manager.sock # # Available functions: # list_vms() -> list of virtual machines # deploy(config) -> deployment result # ... ``` The reranker only reads SERVICE + SUMMARY lines. Full interface files loaded only for selected services. ### Protocol Layering - **ACP** (Agent Communication Protocol): agent-to-agent communication, task lifecycle, discovery — [spec](https://agentcommunicationprotocol.dev/) - **MCP** (Model Context Protocol): tool invocation, LLM-to-tool bridge - **LLM routing**: existing provider proxy (OpenRouter, Groq, SambaNova, OpenAI) ACP REST endpoints added to existing `hero_aibroker_ui` Axum server. ### Why Better Than MCP Alone - **Much less context**: only lightweight interface files sent to LLM, not full OpenRPC specs - **Code generation**: produces real Python, not protocol-level tool calls - **Multi-service**: single query can span multiple Hero services - **Smart caching**: regenerate only when specs change

timur commented

2026-03-12 09:36:56 +00:00

Owner

Implementation Plan

Crate Structure

crates/
  # Core (unchanged)
  hero_aibroker/                  ← core lib (add acp + codegen modules)
  hero_aibroker_server/           ← JSON-RPC server
  hero_aibroker_ui/               ← admin dashboard + ACP REST endpoints
  hero_aibroker_sdk/
  hero_aibroker_cli/
  hero_aibroker_examples/

  # MCP servers (regrouped under crates/mcp/)
  mcp/
    mcp_common/                   ← shared MCP traits
    mcp_hero/                     ← NEW: Hero service discovery + execution tools
    mcp_ping/
    mcp_serpapi/
    mcp_exa/
    mcp_serper/
    mcp_scraperapi/
    mcp_scrapfly/

  # Agents (agent logic + ACP interface in same crate)
  agent/
    agent_hero/                   ← NEW: intent → rerank → codegen → execute
      src/
        lib.rs                    ← agent entry point
        acp.rs                    ← ACP descriptor + REST handler integration
        intent.rs                 ← intent detection + service reranking (mercury2)
        codegen.rs                ← Python code generation from interface files
        executor.rs               ← uv execution + retry loop (max 3)
        cache.rs                  ← spec hash tracking, interface file caching

Implementation Phases

Phase 1: Foundation — mcp_hero + service discovery

Create crates/mcp/mcp_hero/ crate
Implement service auto-discovery (scan ~/hero/var/sockets/*.sock, call rpc.discover)
Implement spec hashing (SHA-256) and caching at ~/.hero/var/aibroker/services/
Implement discover_services, list_services, get_interface MCP tools
Refactor existing mcp-* crates into crates/mcp/ directory

Phase 2: Code generation — Python client + interface generation

Implement OpenRPC spec → Python client generation (AI-powered via mercury2/OpenRouter)
Implement OpenRPC spec → interface file generation (AI-powered, minimal with SERVICE/SUMMARY headers)
Implement interface validation (generate → test → accept)
Implement hash-based regeneration (only when spec changes)
Implement ingest_spec MCP tool

Phase 3: Execution — uv runtime integration

Implement uv Python environment management (shared venv with base deps: httpx, pydantic)
Implement script execution with stdout/stderr capture
Implement retry loop (error → feed back to LLM → regenerate → retry, max 3)
Implement execute_code MCP tool
Storage at ~/.hero/var/aibroker/python/

Phase 4: agent_hero — the agent loop

Create crates/agent/agent_hero/ crate
Implement intent detection + service reranking (single mercury2 call)
Implement code generation from selected interface files
Wire together: intent → rerank → codegen → execute → retry → result
Implement ACP descriptor for agent_hero

Phase 5: ACP integration — REST endpoints on UI server

Add ACP REST routes to hero_aibroker_ui Axum server
Implement POST /agents/search, GET /agents/{id}, GET /agents/{id}/descriptor
Implement POST /runs/wait (blocking execution), POST /runs/stream (streaming)
Implement POST /threads + POST /threads/{id}/runs (stateful conversations)
Implement run status tracking (pending, success, error, timeout)

Phase 6: UI updates

Add Services tab (discovered services, spec status, interface files)
Add Agents tab (registered agents, ACP descriptors)
Add Execution History tab (runs, results, retry traces)

Key Design Decisions

mercury2 via OpenRouter accessed through the broker's own LLM layer (self-referential)
ACP endpoints on existing UI server (not a separate service)
Intent + rerank merged into single LLM call for efficiency
Max 3 retries on code execution failures
Interface files include SERVICE: and SUMMARY: headers for fast reranking
MCP tools don't all need LLMs — discover/list/get are pure API, only ingest uses LLM

## Implementation Plan ### Crate Structure ``` crates/ # Core (unchanged) hero_aibroker/ ← core lib (add acp + codegen modules) hero_aibroker_server/ ← JSON-RPC server hero_aibroker_ui/ ← admin dashboard + ACP REST endpoints hero_aibroker_sdk/ hero_aibroker_cli/ hero_aibroker_examples/ # MCP servers (regrouped under crates/mcp/) mcp/ mcp_common/ ← shared MCP traits mcp_hero/ ← NEW: Hero service discovery + execution tools mcp_ping/ mcp_serpapi/ mcp_exa/ mcp_serper/ mcp_scraperapi/ mcp_scrapfly/ # Agents (agent logic + ACP interface in same crate) agent/ agent_hero/ ← NEW: intent → rerank → codegen → execute src/ lib.rs ← agent entry point acp.rs ← ACP descriptor + REST handler integration intent.rs ← intent detection + service reranking (mercury2) codegen.rs ← Python code generation from interface files executor.rs ← uv execution + retry loop (max 3) cache.rs ← spec hash tracking, interface file caching ``` ### Implementation Phases #### Phase 1: Foundation — mcp_hero + service discovery - [ ] Create `crates/mcp/mcp_hero/` crate - [ ] Implement service auto-discovery (scan `~/hero/var/sockets/*.sock`, call `rpc.discover`) - [ ] Implement spec hashing (SHA-256) and caching at `~/.hero/var/aibroker/services/` - [ ] Implement `discover_services`, `list_services`, `get_interface` MCP tools - [ ] Refactor existing `mcp-*` crates into `crates/mcp/` directory #### Phase 2: Code generation — Python client + interface generation - [ ] Implement OpenRPC spec → Python client generation (AI-powered via mercury2/OpenRouter) - [ ] Implement OpenRPC spec → interface file generation (AI-powered, minimal with SERVICE/SUMMARY headers) - [ ] Implement interface validation (generate → test → accept) - [ ] Implement hash-based regeneration (only when spec changes) - [ ] Implement `ingest_spec` MCP tool #### Phase 3: Execution — uv runtime integration - [ ] Implement `uv` Python environment management (shared venv with base deps: httpx, pydantic) - [ ] Implement script execution with stdout/stderr capture - [ ] Implement retry loop (error → feed back to LLM → regenerate → retry, max 3) - [ ] Implement `execute_code` MCP tool - [ ] Storage at `~/.hero/var/aibroker/python/` #### Phase 4: agent_hero — the agent loop - [ ] Create `crates/agent/agent_hero/` crate - [ ] Implement intent detection + service reranking (single mercury2 call) - [ ] Implement code generation from selected interface files - [ ] Wire together: intent → rerank → codegen → execute → retry → result - [ ] Implement ACP descriptor for agent_hero #### Phase 5: ACP integration — REST endpoints on UI server - [ ] Add ACP REST routes to `hero_aibroker_ui` Axum server - [ ] Implement `POST /agents/search`, `GET /agents/{id}`, `GET /agents/{id}/descriptor` - [ ] Implement `POST /runs/wait` (blocking execution), `POST /runs/stream` (streaming) - [ ] Implement `POST /threads` + `POST /threads/{id}/runs` (stateful conversations) - [ ] Implement run status tracking (`pending`, `success`, `error`, `timeout`) #### Phase 6: UI updates - [ ] Add Services tab (discovered services, spec status, interface files) - [ ] Add Agents tab (registered agents, ACP descriptors) - [ ] Add Execution History tab (runs, results, retry traces) ### Key Design Decisions - **mercury2 via OpenRouter** accessed through the broker's own LLM layer (self-referential) - **ACP endpoints** on existing UI server (not a separate service) - **Intent + rerank merged** into single LLM call for efficiency - **Max 3 retries** on code execution failures - **Interface files** include `SERVICE:` and `SUMMARY:` headers for fast reranking - **MCP tools don't all need LLMs** — discover/list/get are pure API, only ingest uses LLM

timur commented

2026-03-12 09:53:17 +00:00

Owner

Next Steps — Phase 2: Fix Critical Gaps + Verify

Issues Found After Phase 1

LLM endpoint config — agent_hero uses reqwest HTTP calls to /v1/chat/completions, but the broker only listens on Unix sockets. Fix: agent calls OpenRouter directly (not self-referential for now). Configurable via AGENT_LLM_ENDPOINT + AGENT_API_KEY.
Client files not available to scripts — generated Python scripts import from client.py but it lives in a different directory. Fix: executor copies required client files to the scripts directory before execution.
Agent config not configurable — hardcoded values. Fix: add AGENT_LLM_ENDPOINT, AGENT_MODEL, AGENT_API_KEY env vars to the Config struct.
Missing service interface endpoint — no way to view generated interface files. Fix: add GET /services/{name}/interface endpoint.

Implementation Plan

Fix agent LLM endpoint to use OpenRouter directly (default https://openrouter.ai/api/v1)
Add AGENT_LLM_ENDPOINT, AGENT_MODEL, AGENT_API_KEY to Config
Fix executor to copy client files to scripts directory before execution
Add GET /services/{name}/interface and GET /services/{name}/client endpoints
Verify full workspace compiles
Post verification results

## Next Steps — Phase 2: Fix Critical Gaps + Verify ### Issues Found After Phase 1 1. **LLM endpoint config** — agent_hero uses `reqwest` HTTP calls to `/v1/chat/completions`, but the broker only listens on Unix sockets. Fix: agent calls OpenRouter directly (not self-referential for now). Configurable via `AGENT_LLM_ENDPOINT` + `AGENT_API_KEY`. 2. **Client files not available to scripts** — generated Python scripts import from `client.py` but it lives in a different directory. Fix: executor copies required client files to the scripts directory before execution. 3. **Agent config not configurable** — hardcoded values. Fix: add `AGENT_LLM_ENDPOINT`, `AGENT_MODEL`, `AGENT_API_KEY` env vars to the `Config` struct. 4. **Missing service interface endpoint** — no way to view generated interface files. Fix: add `GET /services/{name}/interface` endpoint. ### Implementation Plan - [ ] Fix agent LLM endpoint to use OpenRouter directly (default `https://openrouter.ai/api/v1`) - [ ] Add `AGENT_LLM_ENDPOINT`, `AGENT_MODEL`, `AGENT_API_KEY` to Config - [ ] Fix executor to copy client files to scripts directory before execution - [ ] Add `GET /services/{name}/interface` and `GET /services/{name}/client` endpoints - [ ] Verify full workspace compiles - [ ] Post verification results

timur commented

2026-03-12 09:56:20 +00:00

Owner

Phase 2 Complete — Verification Results

All fixes implemented and verified:

1. LLM endpoint fixed — agent now calls OpenRouter directly (https://openrouter.ai/api/v1 by default). Configurable via AGENT_LLM_ENDPOINT env var. Falls back to first OPENROUTER_API_KEY for auth.

2. Client files staging — executor now copies service client files into the scripts directory before execution via stage_clients(). Scripts run with current_dir set to scripts dir so import statements resolve correctly.

3. Agent config via env vars — added to Config struct:

AGENT_LLM_ENDPOINT — LLM API base URL (default: OpenRouter)
AGENT_MODEL — model name (default: google/gemini-2.0-flash-001)
AGENT_API_KEY — API key (falls back to first OpenRouter key)

4. Service artifact endpoints — added:

GET /services/{name}/interface — view generated interface file
GET /services/{name}/client — view generated client file
GET /services/{name}/spec — view cached OpenRPC spec

Build verification:

cargo check: ✅ clean (1 minor warning)
cargo test:  ✅ 24/24 tests pass

Full ACP endpoint list:

Method	Path	Description
POST	`/agents/search`	Discover available agents
GET	`/agents/{id}`	Get agent info
GET	`/agents/{id}/descriptor`	Get ACP descriptor
POST	`/runs/wait`	Execute agent (blocking)
GET	`/services`	List cached services
POST	`/services/discover`	Trigger discovery + ingestion
GET	`/services/{name}/interface`	View interface file
GET	`/services/{name}/client`	View client file
GET	`/services/{name}/spec`	View OpenRPC spec

New crate summary:

Crate	Location	Purpose
`mcp_hero`	`crates/mcp/mcp_hero/`	Service discovery, spec caching, Python execution
`agent_hero`	`crates/agent/agent_hero/`	Intent detection, reranking, codegen, ACP interface

Remaining for Phase 3:

UI dashboard tabs (Services, Agents, Execution History)
Streaming run support (POST /runs/stream)
Thread-based stateful conversations (POST /threads)
End-to-end integration test with a real Hero service

## Phase 2 Complete — Verification Results ### All fixes implemented and verified: **1. LLM endpoint fixed** — agent now calls OpenRouter directly (`https://openrouter.ai/api/v1` by default). Configurable via `AGENT_LLM_ENDPOINT` env var. Falls back to first `OPENROUTER_API_KEY` for auth. **2. Client files staging** — executor now copies service client files into the scripts directory before execution via `stage_clients()`. Scripts run with `current_dir` set to scripts dir so `import` statements resolve correctly. **3. Agent config via env vars** — added to `Config` struct: - `AGENT_LLM_ENDPOINT` — LLM API base URL (default: OpenRouter) - `AGENT_MODEL` — model name (default: `google/gemini-2.0-flash-001`) - `AGENT_API_KEY` — API key (falls back to first OpenRouter key) **4. Service artifact endpoints** — added: - `GET /services/{name}/interface` — view generated interface file - `GET /services/{name}/client` — view generated client file - `GET /services/{name}/spec` — view cached OpenRPC spec ### Build verification: ``` cargo check: ✅ clean (1 minor warning) cargo test: ✅ 24/24 tests pass ``` ### Full ACP endpoint list: | Method | Path | Description | |--------|------|-------------| | POST | `/agents/search` | Discover available agents | | GET | `/agents/{id}` | Get agent info | | GET | `/agents/{id}/descriptor` | Get ACP descriptor | | POST | `/runs/wait` | Execute agent (blocking) | | GET | `/services` | List cached services | | POST | `/services/discover` | Trigger discovery + ingestion | | GET | `/services/{name}/interface` | View interface file | | GET | `/services/{name}/client` | View client file | | GET | `/services/{name}/spec` | View OpenRPC spec | ### New crate summary: | Crate | Location | Purpose | |-------|----------|--------| | `mcp_hero` | `crates/mcp/mcp_hero/` | Service discovery, spec caching, Python execution | | `agent_hero` | `crates/agent/agent_hero/` | Intent detection, reranking, codegen, ACP interface | ### Remaining for Phase 3: - [ ] UI dashboard tabs (Services, Agents, Execution History) - [ ] Streaming run support (`POST /runs/stream`) - [ ] Thread-based stateful conversations (`POST /threads`) - [ ] End-to-end integration test with a real Hero service

timur commented

2026-03-12 10:02:20 +00:00

Owner

Phase 2b: Self-Referential LLM Access via Unix Socket

Problem

The agent currently calls OpenRouter directly via HTTP. But the broker itself already handles LLM routing (cheapest/best strategy, multi-provider, API key rotation). The agent should use the broker's own /v1/chat/completions endpoint over its Unix socket — making it self-referential.

Implementation

Add a Unix socket HTTP client (unix_http) to mcp_hero — uses hyper + tokio::net::UnixStream to make HTTP requests over Unix sockets
Refactor agent's LLM calls to support both unix: and https: endpoints
Default AGENT_LLM_ENDPOINT to unix:~/hero/var/sockets/hero_aibroker_ui.sock
Agent calls broker → broker routes to OpenRouter/Groq/etc → mercury2 generates code

Flow

User → ACP /runs/wait → agent_hero
  → LLM call over Unix socket → hero_aibroker_ui
    → provider routing → OpenRouter → mercury2
  ← response
  → generate Python → execute → result

This way the agent benefits from the broker's model registry, routing strategy, and API key management.

## Phase 2b: Self-Referential LLM Access via Unix Socket ### Problem The agent currently calls OpenRouter directly via HTTP. But the broker itself already handles LLM routing (cheapest/best strategy, multi-provider, API key rotation). The agent should use the broker's own `/v1/chat/completions` endpoint over its Unix socket — making it self-referential. ### Implementation 1. Add a Unix socket HTTP client (`unix_http`) to `mcp_hero` — uses `hyper` + `tokio::net::UnixStream` to make HTTP requests over Unix sockets 2. Refactor agent's LLM calls to support both `unix:` and `https:` endpoints 3. Default `AGENT_LLM_ENDPOINT` to `unix:~/hero/var/sockets/hero_aibroker_ui.sock` 4. Agent calls broker → broker routes to OpenRouter/Groq/etc → mercury2 generates code ### Flow ``` User → ACP /runs/wait → agent_hero → LLM call over Unix socket → hero_aibroker_ui → provider routing → OpenRouter → mercury2 ← response → generate Python → execute → result ``` This way the agent benefits from the broker's model registry, routing strategy, and API key management.

timur commented

2026-03-12 10:09:12 +00:00

Owner

Phase 2b: Self-Referential LLM Access — Implemented ✓

What was done

The agent (agent_hero) can now call the broker's own LLM API through its Unix socket, making the system self-referential. This means the agent benefits from the broker's provider routing, API key management, model registry, and rate limiting — instead of making direct calls to external APIs.

New modules

mcp_hero::unix_http — HTTP-over-Unix-socket client

Uses hyper + tokio::net::UnixStream for HTTP/1.1 over Unix domain sockets
post_json(socket_path, path, body, bearer_token) — POST JSON and get JSON response
expand_tilde(path) — resolves ~/ to home directory

agent_hero::llm_client — Unified LLM client

call_llm(endpoint, model, api_key, messages, temperature, max_tokens) — routes to Unix socket or HTTP(S)
Detects unix: prefix → uses mcp_hero::unix_http over Unix socket
Otherwise → uses reqwest for standard HTTPS
Helper functions: extract_content(), strip_code_fences()

Changes

intent.rs + codegen.rs — Refactored to use llm_client::call_llm() instead of direct reqwest calls
Default AGENT_LLM_ENDPOINT changed from https://openrouter.ai/api/v1 to unix:~/hero/var/sockets/hero_aibroker_ui.sock
Can still override with AGENT_LLM_ENDPOINT=https://openrouter.ai/api/v1 env var for direct external access

Data flow (self-referential)

User → POST /runs/wait → agent_hero
  → intent detection → unix_http → broker socket → /v1/chat/completions → OpenRouter → LLM
  → code generation  → unix_http → broker socket → /v1/chat/completions → OpenRouter → LLM  
  → Python execution → uv venv → result
  ← response

Build status

cargo build ✓ (clean, 1 pre-existing warning)
cargo test ✓ (24/24 tests pass)

Dependencies added

http-body-util = "0.1" (workspace) — for hyper body types
hyper, hyper-util, bytes added to mcp_hero Cargo.toml

## Phase 2b: Self-Referential LLM Access — Implemented ✓ ### What was done The agent (`agent_hero`) can now call the broker's own LLM API through its Unix socket, making the system **self-referential**. This means the agent benefits from the broker's provider routing, API key management, model registry, and rate limiting — instead of making direct calls to external APIs. ### New modules **`mcp_hero::unix_http`** — HTTP-over-Unix-socket client - Uses `hyper` + `tokio::net::UnixStream` for HTTP/1.1 over Unix domain sockets - `post_json(socket_path, path, body, bearer_token)` — POST JSON and get JSON response - `expand_tilde(path)` — resolves `~/` to home directory **`agent_hero::llm_client`** — Unified LLM client - `call_llm(endpoint, model, api_key, messages, temperature, max_tokens)` — routes to Unix socket or HTTP(S) - Detects `unix:` prefix → uses `mcp_hero::unix_http` over Unix socket - Otherwise → uses `reqwest` for standard HTTPS - Helper functions: `extract_content()`, `strip_code_fences()` ### Changes 1. **`intent.rs` + `codegen.rs`** — Refactored to use `llm_client::call_llm()` instead of direct `reqwest` calls 2. **Default `AGENT_LLM_ENDPOINT`** changed from `https://openrouter.ai/api/v1` to `unix:~/hero/var/sockets/hero_aibroker_ui.sock` 3. Can still override with `AGENT_LLM_ENDPOINT=https://openrouter.ai/api/v1` env var for direct external access ### Data flow (self-referential) ``` User → POST /runs/wait → agent_hero → intent detection → unix_http → broker socket → /v1/chat/completions → OpenRouter → LLM → code generation → unix_http → broker socket → /v1/chat/completions → OpenRouter → LLM → Python execution → uv venv → result ← response ``` ### Build status - `cargo build` ✓ (clean, 1 pre-existing warning) - `cargo test` ✓ (24/24 tests pass) ### Dependencies added - `http-body-util = "0.1"` (workspace) — for hyper body types - `hyper`, `hyper-util`, `bytes` added to `mcp_hero` Cargo.toml

timur commented

2026-03-12 11:31:51 +00:00

Owner

Revised approach: MCP tools on the broker, Shrimp as the agent

After discussion with @thabeta and reconsidering the architecture, we're changing direction. The key insight: don't rebuild an agent loop inside the broker — use Shrimp's mature agent loop and expose the Hero service capabilities as MCP tools on the broker.

What was wrong with the previous approach

We built agent_hero with its own intent detection, code generation, retry loop, and ACP REST endpoints inside the broker. This duplicated what hero_shrimp already does better — a full agent loop with memory, tool routing, multi-model support, safety, retries, multi-channel (CLI/Telegram/WhatsApp).

New architecture

Shrimp (agent loop, memory, retries, multi-model, channels)
  │
  ├── Uses AI Broker as LLM backend
  │
  └── Uses AI Broker's MCP endpoint for hero_* tools
        │
        ├── hero_register_service   → register service by socket/URL
        ├── hero_list_services      → list registered services + summaries
        ├── hero_get_interface      → get Python interface file
        ├── hero_generate_code      → generate Python from intent + interfaces
        │                             (internally calls broker's own LLM)
        └── hero_execute_code       → run Python in managed uv venv

Key design decisions

Broker owns the service registry. Services are registered on the broker (via config, not auto-discovery) by socket path or URL. The broker calls rpc.discover, ingests the OpenRPC spec, generates Python clients + interface files via its own LLM, and caches with spec-hash-based regeneration.
MCP tools, not ACP endpoints. The capabilities are exposed as MCP tools on the broker's existing MCP server. Any MCP consumer (Shrimp, other agents) can use them.
Shrimp is the agent. Shrimp's agent loop handles orchestration — deciding which services to use, when to generate code, when to retry on failure, conversation memory, etc. We don't duplicate this in the broker.
Code-gen approach preserved. The core idea from the original issue remains: instead of exposing every RPC method as an individual MCP tool (many round trips), we generate a Python script that calls multiple service methods in one execution. The LLM sees lightweight interface files, generates a complete script, and the script runs in a managed Python venv.
Self-referential LLM. The hero_generate_code MCP tool calls the broker's own LLM API (via hero_aibroker_sdk) for code generation. The broker routes this to the configured provider. The agent (Shrimp) doesn't need to know about this — it just calls the MCP tool.

What stays from current work

mcp_hero crate — service discovery, spec caching, Python executor, Unix socket HTTP client
Codegen primitives (will move from agent_hero into mcp_hero)
Self-referential broker LLM access via Unix socket

What goes away

agent_hero crate (agent loop, intent detection, ACP interface)
hero_aibroker_ctl (already deleted)
ACP REST endpoints on the broker UI

Next steps

Move codegen into mcp_hero, wire LLM calls through hero_aibroker_sdk
Add HERO_SERVICES to broker config (list of socket paths / URLs)
Expose mcp_hero capabilities as MCP tools on the broker's existing MCP server
Remove agent_hero and ACP endpoints
Test with Shrimp connecting to broker's MCP endpoint

## Revised approach: MCP tools on the broker, Shrimp as the agent After discussion with @thabeta and reconsidering the architecture, we're changing direction. The key insight: **don't rebuild an agent loop inside the broker — use Shrimp's mature agent loop and expose the Hero service capabilities as MCP tools on the broker.** ### What was wrong with the previous approach We built `agent_hero` with its own intent detection, code generation, retry loop, and ACP REST endpoints inside the broker. This duplicated what [hero_shrimp](https://forge.ourworld.tf/lhumina_code/hero_shrimp) already does better — a full agent loop with memory, tool routing, multi-model support, safety, retries, multi-channel (CLI/Telegram/WhatsApp). ### New architecture ``` Shrimp (agent loop, memory, retries, multi-model, channels) │ ├── Uses AI Broker as LLM backend │ └── Uses AI Broker's MCP endpoint for hero_* tools │ ├── hero_register_service → register service by socket/URL ├── hero_list_services → list registered services + summaries ├── hero_get_interface → get Python interface file ├── hero_generate_code → generate Python from intent + interfaces │ (internally calls broker's own LLM) └── hero_execute_code → run Python in managed uv venv ``` ### Key design decisions 1. **Broker owns the service registry.** Services are registered on the broker (via config, not auto-discovery) by socket path or URL. The broker calls `rpc.discover`, ingests the OpenRPC spec, generates Python clients + interface files via its own LLM, and caches with spec-hash-based regeneration. 2. **MCP tools, not ACP endpoints.** The capabilities are exposed as MCP tools on the broker's existing MCP server. Any MCP consumer (Shrimp, other agents) can use them. 3. **Shrimp is the agent.** Shrimp's agent loop handles orchestration — deciding which services to use, when to generate code, when to retry on failure, conversation memory, etc. We don't duplicate this in the broker. 4. **Code-gen approach preserved.** The core idea from the original issue remains: instead of exposing every RPC method as an individual MCP tool (many round trips), we generate a Python script that calls multiple service methods in one execution. The LLM sees lightweight interface files, generates a complete script, and the script runs in a managed Python venv. 5. **Self-referential LLM.** The `hero_generate_code` MCP tool calls the broker's own LLM API (via `hero_aibroker_sdk`) for code generation. The broker routes this to the configured provider. The agent (Shrimp) doesn't need to know about this — it just calls the MCP tool. ### What stays from current work - `mcp_hero` crate — service discovery, spec caching, Python executor, Unix socket HTTP client - Codegen primitives (will move from `agent_hero` into `mcp_hero`) - Self-referential broker LLM access via Unix socket ### What goes away - `agent_hero` crate (agent loop, intent detection, ACP interface) - `hero_aibroker_ctl` (already deleted) - ACP REST endpoints on the broker UI ### Next steps 1. Move codegen into `mcp_hero`, wire LLM calls through `hero_aibroker_sdk` 2. Add `HERO_SERVICES` to broker config (list of socket paths / URLs) 3. Expose `mcp_hero` capabilities as MCP tools on the broker's existing MCP server 4. Remove `agent_hero` and ACP endpoints 5. Test with Shrimp connecting to broker's MCP endpoint

timur commented

2026-03-12 12:48:19 +00:00

Owner

Implementation update — 2026-03-12

All core tasks from this issue are implemented. Following discussion with Ahmed (hero_shrimp), we pivoted the architecture slightly: rather than an agent loop inside the broker, the broker exposes Hero service capabilities as MCP tools via a dedicated mcp_hero stdio server. Shrimp becomes the agent loop.

What was built

mcp_hero stdio JSON-RPC MCP server (hero_aibroker — development_timur)

Tool	Does
`register_service`	Calls `rpc.discover` on a socket, stores spec, generates Python client + interface via LLM
`list_services`	Lists all cached services and their status
`get_interface`	Returns the lightweight interface file for a service
`generate_code`	Feeds interface(s) + user intent to LLM, returns Python script
`execute_code`	Runs Python via `uv` in a managed venv, returns stdout/stderr/exit_code

Hash-based spec versioning: register_service detects changes and regenerates
Python runtime via uv, shared venv at ~/.hero/var/aibroker/python/
LLM calls route through broker Unix socket (unix:~/hero/var/sockets/hero_aibroker_ui.sock) or direct HTTPS
agent_hero crate and ACP endpoints removed from broker (Shrimp owns the agent loop now)

Shrimp integration (hero_shrimp — development_timur)

examples/skills/hero_services.skill.md — Shrimp skill with YAML frontmatter guiding the agent through the register → interface → generate → execute workflow
examples/mcp.json.hero_example — drop-in workspace MCP config template

Remaining

Execute-in-a-loop with retry (retry on failure, re-feed error back to generate_code) — PythonExecutor::execute_with_retry exists, needs wiring in Shrimp skill or a dedicated run_intent tool
Integration testing against a live Hero service socket

Relevant PRs: hero_aibroker#development_timur

## Implementation update — 2026-03-12 All core tasks from this issue are implemented. Following discussion with Ahmed (hero_shrimp), we pivoted the architecture slightly: rather than an agent loop inside the broker, the broker exposes Hero service capabilities as **MCP tools** via a dedicated `mcp_hero` stdio server. Shrimp becomes the agent loop. ### What was built **`mcp_hero` stdio JSON-RPC MCP server** (`hero_aibroker` — `development_timur`) | Tool | Does | |------|------| | `register_service` | Calls `rpc.discover` on a socket, stores spec, generates Python client + interface via LLM | | `list_services` | Lists all cached services and their status | | `get_interface` | Returns the lightweight interface file for a service | | `generate_code` | Feeds interface(s) + user intent to LLM, returns Python script | | `execute_code` | Runs Python via `uv` in a managed venv, returns stdout/stderr/exit_code | - Hash-based spec versioning: `register_service` detects changes and regenerates - Python runtime via `uv`, shared venv at `~/.hero/var/aibroker/python/` - LLM calls route through broker Unix socket (`unix:~/hero/var/sockets/hero_aibroker_ui.sock`) or direct HTTPS - `agent_hero` crate and ACP endpoints removed from broker (Shrimp owns the agent loop now) **Shrimp integration** (`hero_shrimp` — `development_timur`) - `examples/skills/hero_services.skill.md` — Shrimp skill with YAML frontmatter guiding the agent through the register → interface → generate → execute workflow - `examples/mcp.json.hero_example` — drop-in workspace MCP config template ### Remaining - Execute-in-a-loop with retry (retry on failure, re-feed error back to `generate_code`) — `PythonExecutor::execute_with_retry` exists, needs wiring in Shrimp skill or a dedicated `run_intent` tool - Integration testing against a live Hero service socket Relevant PRs: [hero_aibroker#development_timur](https://forge.ourworld.tf/lhumina_code/hero_aibroker/src/branch/development_timur)

~~mik-tf referenced this issue 2026-03-12 16:20:47 +00:00~~

Hero OS Enhancements #23

~~mik-tf referenced this issue 2026-03-12 16:33:41 +00:00~~

Hero OS Enhancements #23

~~mik-tf referenced this issue 2026-03-12 16:42:04 +00:00~~

Hero OS Enhancements #23

~~mik-tf referenced this issue 2026-03-12 17:02:11 +00:00~~

Hero OS Enhancements #23

~~mik-tf referenced this issue 2026-03-12 19:19:01 +00:00~~

Hero OS Enhancements #23

mik-tf referenced this issue from a commit

2026-03-12 19:32:17 +00:00

feat: wire MCP integration between AIBroker and Shrimp

mik-tf commented

2026-03-12 20:14:39 +00:00

Author

Owner

MCP Integration Status Update (from #23 Session 3)

The MCP integration between Hero Shrimp and AIBroker is complete and verified on herodev2.

Architecture Decision: Path B (broker-mediated)

After evaluating both approaches discussed in this issue, we chose Path B — Shrimp discovers and calls MCP tools via AIBroker's REST endpoints through hero_proxy, rather than spawning mcp_hero as a direct stdio child process.

Shrimp → hero_proxy → AIBroker UI (/mcp/*) → mcp_hero (stdio) → Hero services (Unix sockets)

Why Path B over Path A:

Single source of truth — AIBroker owns MCP server lifecycle for all consumers
Centralized model/key management — configure once in AIBroker
Consistent with Hero OS design — all inter-service comms go through hero_proxy
Zero code changes needed in Shrimp — already supports REST broker discovery

What's Working

All 5 mcp_hero tools are live and accessible through the broker:

register_service — register a Hero service by socket path, auto-discovers all RPC methods
list_services — list registered services and their method counts
get_interface — get typed Python interface for a service
generate_code — LLM-powered Python code generation against service interfaces
execute_code — run generated Python code via uv

Verified: registered hero_redis_server (20 methods) and retrieved its full typed interface.

Branches

hero_aibroker: development_timur — config fixes, mcp_hero wiring
hero_shrimp: development_timur — endpoint config
hero_services: development — TOML + build pipeline updates

No merges to development without devops review. Full details in #23.

Follow-up Fix: Unified LLM Routing

Shrimp's agent loop was bypassing AIBroker — it used OpenRouter-format model names (google/gemini-3-flash-preview) which AIBroker didn't recognize, so it fell back directly to OpenRouter.

Fixed: Changed SHRIMP_OPENROUTER_MODELS to AIBroker model names: gpt-4o-mini,claude-sonnet,llama-70b. Now all Shrimp LLM traffic (agent loop + MCP tools) routes through AIBroker, fully consistent with Path B architecture.

Verified on herodev2 — Shrimp Config tab shows gpt-4o-mini as primary, claude-sonnet, llama-70b as fallbacks.

## MCP Integration Status Update (from #23 Session 3) The MCP integration between Hero Shrimp and AIBroker is **complete and verified on herodev2**. ### Architecture Decision: Path B (broker-mediated) After evaluating both approaches discussed in this issue, we chose **Path B** — Shrimp discovers and calls MCP tools via AIBroker's REST endpoints through hero_proxy, rather than spawning `mcp_hero` as a direct stdio child process. ``` Shrimp → hero_proxy → AIBroker UI (/mcp/*) → mcp_hero (stdio) → Hero services (Unix sockets) ``` **Why Path B over Path A:** - Single source of truth — AIBroker owns MCP server lifecycle for all consumers - Centralized model/key management — configure once in AIBroker - Consistent with Hero OS design — all inter-service comms go through hero_proxy - Zero code changes needed in Shrimp — already supports REST broker discovery ### What's Working All 5 `mcp_hero` tools are live and accessible through the broker: - `register_service` — register a Hero service by socket path, auto-discovers all RPC methods - `list_services` — list registered services and their method counts - `get_interface` — get typed Python interface for a service - `generate_code` — LLM-powered Python code generation against service interfaces - `execute_code` — run generated Python code via `uv` Verified: registered `hero_redis_server` (20 methods) and retrieved its full typed interface. ### Branches - `hero_aibroker`: `development_timur` — config fixes, mcp_hero wiring - `hero_shrimp`: `development_timur` — endpoint config - `hero_services`: `development` — TOML + build pipeline updates No merges to `development` without devops review. Full details in #23. ### Follow-up Fix: Unified LLM Routing Shrimp's agent loop was bypassing AIBroker — it used OpenRouter-format model names (`google/gemini-3-flash-preview`) which AIBroker didn't recognize, so it fell back directly to OpenRouter. **Fixed**: Changed `SHRIMP_OPENROUTER_MODELS` to AIBroker model names: `gpt-4o-mini,claude-sonnet,llama-70b`. Now **all** Shrimp LLM traffic (agent loop + MCP tools) routes through AIBroker, fully consistent with Path B architecture. Verified on herodev2 — Shrimp Config tab shows `gpt-4o-mini` as primary, `claude-sonnet, llama-70b` as fallbacks.

thabeta commented

2026-03-12 21:19:01 +00:00

Owner

Follow-up Fix: Unified LLM Routing
Shrimp's agent loop was bypassing AIBroker — it used OpenRouter-format model names (google/gemini-3-flash-preview) which AIBroker didn't recognize, so it fell back directly to OpenRouter.
Fixed: Changed SHRIMP_OPENROUTER_MODELS to AIBroker model names: gpt-4o-mini,claude-sonnet,llama-70b. Now all Shrimp LLM traffic (agent loop + MCP tools) routes through AIBroker, fully consistent with Path B architecture.

you should comment SHRIMP_OPENROUTER_MODELS option so it goes to AIBROKER MODELS, it was meant to be usable without AI BROKER, but now you made AI BROKER to be hard requirement.

The correct fix is to comment SHRIMP_OPENROUTER_MODELS in the config.

> Follow-up Fix: Unified LLM Routing Shrimp's agent loop was bypassing AIBroker — it used OpenRouter-format model names (google/gemini-3-flash-preview) which AIBroker didn't recognize, so it fell back directly to OpenRouter. Fixed: Changed SHRIMP_OPENROUTER_MODELS to AIBroker model names: gpt-4o-mini,claude-sonnet,llama-70b. Now all Shrimp LLM traffic (agent loop + MCP tools) routes through AIBroker, fully consistent with Path B architecture. you should comment SHRIMP_OPENROUTER_MODELS option so it goes to AIBROKER MODELS, it was meant to be usable without AI BROKER, but now you made AI BROKER to be hard requirement. The correct fix is to comment SHRIMP_OPENROUTER_MODELS in the config.

❤️ 1

mik-tf commented

2026-03-12 21:23:35 +00:00

Author

Owner

Session 3 Progress Update — Shrimp Chat UI + MCP Integration

What's Working

Chat input in Shrimp admin dashboard — users can send messages from the browser
Model selector dropdown — 4 verified models: claude-sonnet (default), gpt-4o, claude-haiku, llama-70b
Agent loop through AIBroker — all LLM traffic routes through the broker, no direct OpenRouter bypass
MCP tool discovery — Shrimp sees all 5 mcp_hero tools via MCP_BROKER_ENDPOINT
Tool calling works — claude-sonnet successfully calls MCP tools (register, list, get_interface, generate_code, execute_code)
Click-to-expand rows — fixed message truncation in admin dashboard

Limitations Found

1. Service registration is expensive and slow

register_service makes 2-3 LLM calls internally (generate Python client + interface from OpenRPC spec). For hero_redis_server (20 methods), this took ~3 minutes and cost $0.05-0.10 with claude-sonnet. Auto-registering 15+ services at container startup would cost **$1-2 per deployment**.

2. Agent doesn't know socket paths

When asked "register hero_redis", the agent guesses paths like ~/hero/var/sockets/hero_redis.sock but the actual name is hero_redis_server.sock. Without a service discovery mechanism, users must provide exact socket paths.

3. No auto-registration

Services need to be manually registered before they can be listed/queried. The cache is empty after every fresh container restart.

Options to Discuss

For auto-registration cost:

Option A: Pre-generate Python clients at Docker build time (offline, free). Bake into image. mcp_hero loads from cache — zero LLM calls at runtime.
Option B: Use a cheaper model for code generation (e.g., llama-70b on Groq at $0.59/M tokens instead of claude-sonnet at $15/M). Faster too.
Option C: Skip LLM-based client generation entirely. Use template-based code gen from OpenRPC spec (no LLM needed). Deterministic, instant, free.

For service discovery:

Option A: Startup script scans /root/hero/var/sockets/*_server.sock and registers each.
Option B: mcp_hero gets a --sockets-dir flag to auto-discover on launch.
Option C: hero_services_server provides a manifest of running services + socket paths that mcp_hero can query.

Recommendation: Option C for both — template-based code gen + hero_services_server manifest. No LLM costs for registration, instant startup, deterministic Python clients. LLM is only used when the user asks generate_code for a specific task.

Current Branches

hero_shrimp: development_timur — chat UI, model selector, expandable rows
hero_services: development — 4-model config, TOML updates
hero_aibroker: development_timur — unchanged from Session 3

Ref: #23 Session 4

## Session 3 Progress Update — Shrimp Chat UI + MCP Integration ### What's Working - **Chat input** in Shrimp admin dashboard — users can send messages from the browser - **Model selector dropdown** — 4 verified models: `claude-sonnet` (default), `gpt-4o`, `claude-haiku`, `llama-70b` - **Agent loop through AIBroker** — all LLM traffic routes through the broker, no direct OpenRouter bypass - **MCP tool discovery** — Shrimp sees all 5 `mcp_hero` tools via `MCP_BROKER_ENDPOINT` - **Tool calling works** — `claude-sonnet` successfully calls MCP tools (register, list, get_interface, generate_code, execute_code) - **Click-to-expand rows** — fixed message truncation in admin dashboard ### Limitations Found #### 1. Service registration is expensive and slow `register_service` makes **2-3 LLM calls** internally (generate Python client + interface from OpenRPC spec). For `hero_redis_server` (20 methods), this took ~3 minutes and cost ~$0.05-0.10 with `claude-sonnet`. Auto-registering 15+ services at container startup would cost **~$1-2 per deployment**. #### 2. Agent doesn't know socket paths When asked "register hero_redis", the agent guesses paths like `~/hero/var/sockets/hero_redis.sock` but the actual name is `hero_redis_server.sock`. Without a service discovery mechanism, users must provide exact socket paths. #### 3. No auto-registration Services need to be manually registered before they can be listed/queried. The cache is empty after every fresh container restart. ### Options to Discuss **For auto-registration cost:** - **Option A**: Pre-generate Python clients at Docker build time (offline, free). Bake into image. `mcp_hero` loads from cache — zero LLM calls at runtime. - **Option B**: Use a cheaper model for code generation (e.g., `llama-70b` on Groq at $0.59/M tokens instead of `claude-sonnet` at $15/M). Faster too. - **Option C**: Skip LLM-based client generation entirely. Use template-based code gen from OpenRPC spec (no LLM needed). Deterministic, instant, free. **For service discovery:** - **Option A**: Startup script scans `/root/hero/var/sockets/*_server.sock` and registers each. - **Option B**: `mcp_hero` gets a `--sockets-dir` flag to auto-discover on launch. - **Option C**: `hero_services_server` provides a manifest of running services + socket paths that `mcp_hero` can query. **Recommendation**: Option C for both — template-based code gen + `hero_services_server` manifest. No LLM costs for registration, instant startup, deterministic Python clients. LLM is only used when the user asks `generate_code` for a specific task. ### Current Branches - `hero_shrimp`: `development_timur` — chat UI, model selector, expandable rows - `hero_services`: `development` — 4-model config, TOML updates - `hero_aibroker`: `development_timur` — unchanged from Session 3 Ref: #23 Session 4

mik-tf referenced this issue

2026-03-12 21:38:18 +00:00

Hero OS Enhancements #23

mik-tf commented

2026-03-12 21:43:29 +00:00

Author

Owner

@thabeta wrote in #18 (comment):

Follow-up Fix: Unified LLM Routing
Shrimp's agent loop was bypassing AIBroker — it used OpenRouter-format model names (google/gemini-3-flash-preview) which AIBroker didn't recognize, so it fell back directly to OpenRouter.
Fixed: Changed SHRIMP_OPENROUTER_MODELS to AIBroker model names: gpt-4o-mini,claude-sonnet,llama-70b. Now all Shrimp LLM traffic (agent loop + MCP tools) routes through AIBroker, fully consistent with Path B architecture.

you should comment SHRIMP_OPENROUTER_MODELS option so it goes to AIBROKER MODELS, it was meant to be usable without AI BROKER, but now you made AI BROKER to be hard requirement.

The correct fix is to comment SHRIMP_OPENROUTER_MODELS in the config.

Excellent idea thanks for the feedback!!

I will definitely implement this.
I took note in issue #23. It is not trivial so I will make sure that it's done properly and you can review the code.

Thanks again.

@thabeta wrote in https://forge.ourworld.tf/lhumina_code/home/issues/18#issuecomment-11910: > > Follow-up Fix: Unified LLM Routing > > Shrimp's agent loop was bypassing AIBroker — it used OpenRouter-format model names (google/gemini-3-flash-preview) which AIBroker didn't recognize, so it fell back directly to OpenRouter. > > Fixed: Changed SHRIMP_OPENROUTER_MODELS to AIBroker model names: gpt-4o-mini,claude-sonnet,llama-70b. Now all Shrimp LLM traffic (agent loop + MCP tools) routes through AIBroker, fully consistent with Path B architecture. > > you should comment SHRIMP_OPENROUTER_MODELS option so it goes to AIBROKER MODELS, it was meant to be usable without AI BROKER, but now you made AI BROKER to be hard requirement. > > The correct fix is to comment SHRIMP_OPENROUTER_MODELS in the config. Excellent idea thanks for the feedback!! I will definitely implement this. I took note in issue #23. It is not trivial so I will make sure that it's done properly and you can review the code. Thanks again.

mik-tf commented

2026-03-13 12:56:03 +00:00

Author

Owner

Status Update — 2026-03-13

All core tasks from this issue are implemented and verified on herodev2 and herodemo2.

What's done

Every task in the issue body is complete, implemented as mcp_hero MCP tools on the broker with Shrimp as the agent (per architecture pivot discussed with @thabeta):

Task	Implementation
OpenRPC → Python client generator	`mcp_hero register_service` — auto-discovers RPC methods, generates typed Python client via LLM
OpenRPC → interface file generator	`mcp_hero get_interface` — lightweight AI-generated interface with typed stubs
Intent detection	`mcp_hero generate_code` + Shrimp agent loop orchestration
Python code generation	`mcp_hero generate_code` — feeds interface + intent to LLM
Execute-in-a-loop with retry	`execute_with_retry` in mcp_hero, orchestrated by Shrimp agent loop
Hash-based versioning	SHA-256 spec hash in service cache, auto-regenerates on change
`uv` integration	Managed venv at `~/.hero/var/aibroker/python/`, Python 3.12, 30s timeout
Interface files per spec	Auto-generated per registered service

Full chain verified: chat → model selection → agent loop → MCP tools → code execution → result displayed.

What remains

@thabeta feedback (comment): Comment out SHRIMP_OPENROUTER_MODELS — Shrimp should fetch models dynamically from AIBroker /v1/models instead of duplicating the model list. Current config makes AIBroker a hard requirement, which breaks standalone Shrimp use.
Service registration cost: ~$0.05-0.10 per service with claude-sonnet, ~3 min per service. Options: pre-generate at build time, use cheaper model, or template-based code gen.
Auto-discovery: Agent doesn't know socket paths without manual input.

Branches

hero_aibroker: development_timur — PR #25 open (needs devops review)
hero_shrimp: development_timur — committed (no merge without devops review)
hero_services: development — TOMLs + build pipeline updated

## Status Update — 2026-03-13 All core tasks from this issue are **implemented and verified** on herodev2 and herodemo2. ### What's done Every task in the issue body is complete, implemented as `mcp_hero` MCP tools on the broker with Shrimp as the agent (per architecture pivot discussed with @thabeta): | Task | Implementation | |------|---------------| | OpenRPC → Python client generator | `mcp_hero register_service` — auto-discovers RPC methods, generates typed Python client via LLM | | OpenRPC → interface file generator | `mcp_hero get_interface` — lightweight AI-generated interface with typed stubs | | Intent detection | `mcp_hero generate_code` + Shrimp agent loop orchestration | | Python code generation | `mcp_hero generate_code` — feeds interface + intent to LLM | | Execute-in-a-loop with retry | `execute_with_retry` in mcp_hero, orchestrated by Shrimp agent loop | | Hash-based versioning | SHA-256 spec hash in service cache, auto-regenerates on change | | `uv` integration | Managed venv at `~/.hero/var/aibroker/python/`, Python 3.12, 30s timeout | | Interface files per spec | Auto-generated per registered service | Full chain verified: chat → model selection → agent loop → MCP tools → code execution → result displayed. ### What remains 1. **@thabeta feedback ([comment](https://forge.ourworld.tf/lhumina_code/home/issues/18#issuecomment-11910))**: Comment out `SHRIMP_OPENROUTER_MODELS` — Shrimp should fetch models dynamically from AIBroker `/v1/models` instead of duplicating the model list. Current config makes AIBroker a hard requirement, which breaks standalone Shrimp use. 2. **Service registration cost**: ~$0.05-0.10 per service with claude-sonnet, ~3 min per service. Options: pre-generate at build time, use cheaper model, or template-based code gen. 3. **Auto-discovery**: Agent doesn't know socket paths without manual input. ### Branches - `hero_aibroker`: `development_timur` — PR [#25](https://forge.ourworld.tf/lhumina_code/hero_aibroker/pulls/25) open (needs devops review) - `hero_shrimp`: `development_timur` — committed (no merge without devops review) - `hero_services`: `development` — TOMLs + build pipeline updated

~~mik-tf referenced this issue 2026-03-13 12:56:14 +00:00~~

Hero OS Enhancements #23

mik-tf referenced this issue

2026-03-18 17:24:39 +00:00

Hero OS — Master Roadmap #38

mik-tf referenced this issue

2026-03-18 17:35:30 +00:00

Bring all Hero services to 4-pillar standard (OpenRPC + MCP + Health + Socket) #34

~~mik-tf referenced this issue 2026-03-18 19:42:53 +00:00~~

Hero OS — Master Roadmap #38

~~mik-tf referenced this issue 2026-03-18 22:06:25 +00:00~~

Hero OS — Master Roadmap #38

~~mik-tf referenced this issue 2026-03-18 22:37:34 +00:00~~

Hero OS — Master Roadmap #38

~~mik-tf referenced this issue 2026-03-18 22:55:28 +00:00~~

Hero OS — Master Roadmap #38

~~mik-tf referenced this issue 2026-03-18 23:16:26 +00:00~~

Hero OS — Master Roadmap #38

~~mik-tf referenced this issue 2026-03-18 23:28:29 +00:00~~

Hero OS — Master Roadmap #38

~~mik-tf referenced this issue 2026-03-18 23:33:28 +00:00~~

Hero OS — Master Roadmap #38

~~mik-tf referenced this issue 2026-03-19 00:06:50 +00:00~~

Hero OS — Master Roadmap #38

~~mik-tf referenced this issue 2026-03-19 00:26:50 +00:00~~

Hero OS — Master Roadmap #38

mik-tf referenced this issue

2026-03-22 16:44:52 +00:00

AI Agent knowledge pipeline — MCP integration + Hero OS context #75

mik-tf referenced this issue

2026-03-24 17:24:46 +00:00

v0.7.0-dev — full service health audit and remaining fixes #87

mik-tf commented

2026-03-26 00:27:43 +00:00

Author

Owner

Closing — fully implemented

All tasks from this issue are complete, though the implementation evolved from the original design:

What was built

The OpenRPC → Python codegen pipeline exists as 5 MCP tools exposed by mcp_hero and used by hero_agent:

register_service — discovers Hero service via Unix socket, ingests OpenRPC spec, generates Python client + interface files
list_services — lists all registered services with cache status
get_interface — returns lightweight Python interface file (not full spec) for AI context
generate_code — AI generates Python from interface + user intent
execute_code — runs Python in managed uv environment, returns stdout/stderr/exit code

How it evolved

Original plan: Python codegen inside hero_aibroker (TypeScript/Bun)
What shipped: MCP tools in mcp_hero (Rust), called by hero_agent (Rust) — hero_aibroker became a pure LLM proxy (model routing, API keys)
The Shrimp (TypeScript) agent was fully replaced by hero_agent (Rust native)
All 8 tasks from the issue checklist are complete

Verification

On herodev.gent04.grid.tf (v0.7.2-dev):

hero_agent has 62 tools including 5 mcp:hero tools
hero_aibroker serves 25 RPC methods (LLM proxy role)
Python execution via uv is operational

Signed-off-by: mik-tf

## Closing — fully implemented All tasks from this issue are complete, though the implementation evolved from the original design: ### What was built The OpenRPC → Python codegen pipeline exists as **5 MCP tools** exposed by `mcp_hero` and used by hero_agent: 1. **`register_service`** — discovers Hero service via Unix socket, ingests OpenRPC spec, generates Python client + interface files 2. **`list_services`** — lists all registered services with cache status 3. **`get_interface`** — returns lightweight Python interface file (not full spec) for AI context 4. **`generate_code`** — AI generates Python from interface + user intent 5. **`execute_code`** — runs Python in managed `uv` environment, returns stdout/stderr/exit code ### How it evolved - **Original plan**: Python codegen inside hero_aibroker (TypeScript/Bun) - **What shipped**: MCP tools in mcp_hero (Rust), called by hero_agent (Rust) — hero_aibroker became a pure LLM proxy (model routing, API keys) - The Shrimp (TypeScript) agent was fully replaced by hero_agent (Rust native) - All 8 tasks from the issue checklist are complete ### Verification On herodev.gent04.grid.tf (v0.7.2-dev): - hero_agent has 62 tools including 5 `mcp:hero` tools - hero_aibroker serves 25 RPC methods (LLM proxy role) - Python execution via `uv` is operational Signed-off-by: mik-tf

mik-tf closed this issue

2026-03-26 00:27:44 +00:00

mik-tf referenced this issue from a commit

2026-05-28 06:00:13 +00:00

docs/hero_os/free/e2e_checklist: s173 code-shipped audit-log entry (no row flips)

Rows
Columns

AI Broker: OpenRPC-driven Python code generation and execution #18

Summary

Design

Core Flow

Versioning

Why This Is Better Than MCP

Technical Details

Tasks

Architecture Design

Three-Layer Architecture

Core Flow (agent_hero)

Smart Caching (Hash-Based)

Interface File Format

Protocol Layering

Why Better Than MCP Alone

Implementation Plan

Crate Structure

Implementation Phases

Phase 1: Foundation — mcp_hero + service discovery

Phase 2: Code generation — Python client + interface generation

Phase 3: Execution — uv runtime integration

Phase 4: agent_hero — the agent loop

Phase 5: ACP integration — REST endpoints on UI server

Phase 6: UI updates

Key Design Decisions

Next Steps — Phase 2: Fix Critical Gaps + Verify

Issues Found After Phase 1

Implementation Plan

Phase 2 Complete — Verification Results

All fixes implemented and verified:

Build verification:

Full ACP endpoint list:

New crate summary:

Remaining for Phase 3:

Phase 2b: Self-Referential LLM Access via Unix Socket

Problem

Implementation

Flow

Phase 2b: Self-Referential LLM Access — Implemented ✓

What was done

New modules

Changes

Data flow (self-referential)

Build status

Dependencies added

Revised approach: MCP tools on the broker, Shrimp as the agent

What was wrong with the previous approach

New architecture

Key design decisions

What stays from current work

What goes away

Next steps

Implementation update — 2026-03-12

What was built

Remaining

MCP Integration Status Update (from #23 Session 3)

Architecture Decision: Path B (broker-mediated)

What's Working

Branches

Follow-up Fix: Unified LLM Routing

Session 3 Progress Update — Shrimp Chat UI + MCP Integration

What's Working

Limitations Found

1. Service registration is expensive and slow

2. Agent doesn't know socket paths

3. No auto-registration

Options to Discuss

Current Branches

Status Update — 2026-03-13

What's done

What remains

Branches

Closing — fully implemented

What was built

How it evolved

Verification