AI Broker: OpenRPC-driven Python code generation and execution #18
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Evolve the AI Broker into an intelligent code-generation broker that converts OpenRPC specs into lightweight Python clients and uses AI to generate and execute code based on user intent.
Design
Core Flow
Versioning
When an OpenRPC spec changes (detected via hash), automatically:
Why This Is Better Than MCP
Technical Details
uvTasks
uvfor Python executionFrom meeting notes 2026-03-11
Architecture Design
Three-Layer Architecture
The AI Broker evolves from an LLM proxy into an Agent & MCP Broker with three layers:
Core Flow (agent_hero)
POST /runs/wait)uv: run the script, capture stdout/stderrSmart Caching (Hash-Based)
rpc.discoveron each socket in~/hero/var/sockets/Interface File Format
AI-generated, minimal, with summary line for reranking:
The reranker only reads SERVICE + SUMMARY lines. Full interface files loaded only for selected services.
Protocol Layering
ACP REST endpoints added to existing
hero_aibroker_uiAxum server.Why Better Than MCP Alone
Implementation Plan
Crate Structure
Implementation Phases
Phase 1: Foundation — mcp_hero + service discovery
crates/mcp/mcp_hero/crate~/hero/var/sockets/*.sock, callrpc.discover)~/.hero/var/aibroker/services/discover_services,list_services,get_interfaceMCP toolsmcp-*crates intocrates/mcp/directoryPhase 2: Code generation — Python client + interface generation
ingest_specMCP toolPhase 3: Execution — uv runtime integration
uvPython environment management (shared venv with base deps: httpx, pydantic)execute_codeMCP tool~/.hero/var/aibroker/python/Phase 4: agent_hero — the agent loop
crates/agent/agent_hero/cratePhase 5: ACP integration — REST endpoints on UI server
hero_aibroker_uiAxum serverPOST /agents/search,GET /agents/{id},GET /agents/{id}/descriptorPOST /runs/wait(blocking execution),POST /runs/stream(streaming)POST /threads+POST /threads/{id}/runs(stateful conversations)pending,success,error,timeout)Phase 6: UI updates
Key Design Decisions
SERVICE:andSUMMARY:headers for fast rerankingNext Steps — Phase 2: Fix Critical Gaps + Verify
Issues Found After Phase 1
LLM endpoint config — agent_hero uses
reqwestHTTP calls to/v1/chat/completions, but the broker only listens on Unix sockets. Fix: agent calls OpenRouter directly (not self-referential for now). Configurable viaAGENT_LLM_ENDPOINT+AGENT_API_KEY.Client files not available to scripts — generated Python scripts import from
client.pybut it lives in a different directory. Fix: executor copies required client files to the scripts directory before execution.Agent config not configurable — hardcoded values. Fix: add
AGENT_LLM_ENDPOINT,AGENT_MODEL,AGENT_API_KEYenv vars to theConfigstruct.Missing service interface endpoint — no way to view generated interface files. Fix: add
GET /services/{name}/interfaceendpoint.Implementation Plan
https://openrouter.ai/api/v1)AGENT_LLM_ENDPOINT,AGENT_MODEL,AGENT_API_KEYto ConfigGET /services/{name}/interfaceandGET /services/{name}/clientendpointsPhase 2 Complete — Verification Results
All fixes implemented and verified:
1. LLM endpoint fixed — agent now calls OpenRouter directly (
https://openrouter.ai/api/v1by default). Configurable viaAGENT_LLM_ENDPOINTenv var. Falls back to firstOPENROUTER_API_KEYfor auth.2. Client files staging — executor now copies service client files into the scripts directory before execution via
stage_clients(). Scripts run withcurrent_dirset to scripts dir soimportstatements resolve correctly.3. Agent config via env vars — added to
Configstruct:AGENT_LLM_ENDPOINT— LLM API base URL (default: OpenRouter)AGENT_MODEL— model name (default:google/gemini-2.0-flash-001)AGENT_API_KEY— API key (falls back to first OpenRouter key)4. Service artifact endpoints — added:
GET /services/{name}/interface— view generated interface fileGET /services/{name}/client— view generated client fileGET /services/{name}/spec— view cached OpenRPC specBuild verification:
Full ACP endpoint list:
/agents/search/agents/{id}/agents/{id}/descriptor/runs/wait/services/services/discover/services/{name}/interface/services/{name}/client/services/{name}/specNew crate summary:
mcp_herocrates/mcp/mcp_hero/agent_herocrates/agent/agent_hero/Remaining for Phase 3:
POST /runs/stream)POST /threads)Phase 2b: Self-Referential LLM Access via Unix Socket
Problem
The agent currently calls OpenRouter directly via HTTP. But the broker itself already handles LLM routing (cheapest/best strategy, multi-provider, API key rotation). The agent should use the broker's own
/v1/chat/completionsendpoint over its Unix socket — making it self-referential.Implementation
unix_http) tomcp_hero— useshyper+tokio::net::UnixStreamto make HTTP requests over Unix socketsunix:andhttps:endpointsAGENT_LLM_ENDPOINTtounix:~/hero/var/sockets/hero_aibroker_ui.sockFlow
This way the agent benefits from the broker's model registry, routing strategy, and API key management.
Phase 2b: Self-Referential LLM Access — Implemented ✓
What was done
The agent (
agent_hero) can now call the broker's own LLM API through its Unix socket, making the system self-referential. This means the agent benefits from the broker's provider routing, API key management, model registry, and rate limiting — instead of making direct calls to external APIs.New modules
mcp_hero::unix_http— HTTP-over-Unix-socket clienthyper+tokio::net::UnixStreamfor HTTP/1.1 over Unix domain socketspost_json(socket_path, path, body, bearer_token)— POST JSON and get JSON responseexpand_tilde(path)— resolves~/to home directoryagent_hero::llm_client— Unified LLM clientcall_llm(endpoint, model, api_key, messages, temperature, max_tokens)— routes to Unix socket or HTTP(S)unix:prefix → usesmcp_hero::unix_httpover Unix socketreqwestfor standard HTTPSextract_content(),strip_code_fences()Changes
intent.rs+codegen.rs— Refactored to usellm_client::call_llm()instead of directreqwestcallsAGENT_LLM_ENDPOINTchanged fromhttps://openrouter.ai/api/v1tounix:~/hero/var/sockets/hero_aibroker_ui.sockAGENT_LLM_ENDPOINT=https://openrouter.ai/api/v1env var for direct external accessData flow (self-referential)
Build status
cargo build✓ (clean, 1 pre-existing warning)cargo test✓ (24/24 tests pass)Dependencies added
http-body-util = "0.1"(workspace) — for hyper body typeshyper,hyper-util,bytesadded tomcp_heroCargo.tomlRevised approach: MCP tools on the broker, Shrimp as the agent
After discussion with @thabeta and reconsidering the architecture, we're changing direction. The key insight: don't rebuild an agent loop inside the broker — use Shrimp's mature agent loop and expose the Hero service capabilities as MCP tools on the broker.
What was wrong with the previous approach
We built
agent_herowith its own intent detection, code generation, retry loop, and ACP REST endpoints inside the broker. This duplicated what hero_shrimp already does better — a full agent loop with memory, tool routing, multi-model support, safety, retries, multi-channel (CLI/Telegram/WhatsApp).New architecture
Key design decisions
Broker owns the service registry. Services are registered on the broker (via config, not auto-discovery) by socket path or URL. The broker calls
rpc.discover, ingests the OpenRPC spec, generates Python clients + interface files via its own LLM, and caches with spec-hash-based regeneration.MCP tools, not ACP endpoints. The capabilities are exposed as MCP tools on the broker's existing MCP server. Any MCP consumer (Shrimp, other agents) can use them.
Shrimp is the agent. Shrimp's agent loop handles orchestration — deciding which services to use, when to generate code, when to retry on failure, conversation memory, etc. We don't duplicate this in the broker.
Code-gen approach preserved. The core idea from the original issue remains: instead of exposing every RPC method as an individual MCP tool (many round trips), we generate a Python script that calls multiple service methods in one execution. The LLM sees lightweight interface files, generates a complete script, and the script runs in a managed Python venv.
Self-referential LLM. The
hero_generate_codeMCP tool calls the broker's own LLM API (viahero_aibroker_sdk) for code generation. The broker routes this to the configured provider. The agent (Shrimp) doesn't need to know about this — it just calls the MCP tool.What stays from current work
mcp_herocrate — service discovery, spec caching, Python executor, Unix socket HTTP clientagent_herointomcp_hero)What goes away
agent_herocrate (agent loop, intent detection, ACP interface)hero_aibroker_ctl(already deleted)Next steps
mcp_hero, wire LLM calls throughhero_aibroker_sdkHERO_SERVICESto broker config (list of socket paths / URLs)mcp_herocapabilities as MCP tools on the broker's existing MCP serveragent_heroand ACP endpointsImplementation update — 2026-03-12
All core tasks from this issue are implemented. Following discussion with Ahmed (hero_shrimp), we pivoted the architecture slightly: rather than an agent loop inside the broker, the broker exposes Hero service capabilities as MCP tools via a dedicated
mcp_herostdio server. Shrimp becomes the agent loop.What was built
mcp_herostdio JSON-RPC MCP server (hero_aibroker—development_timur)register_servicerpc.discoveron a socket, stores spec, generates Python client + interface via LLMlist_servicesget_interfacegenerate_codeexecute_codeuvin a managed venv, returns stdout/stderr/exit_coderegister_servicedetects changes and regeneratesuv, shared venv at~/.hero/var/aibroker/python/unix:~/hero/var/sockets/hero_aibroker_ui.sock) or direct HTTPSagent_herocrate and ACP endpoints removed from broker (Shrimp owns the agent loop now)Shrimp integration (
hero_shrimp—development_timur)examples/skills/hero_services.skill.md— Shrimp skill with YAML frontmatter guiding the agent through the register → interface → generate → execute workflowexamples/mcp.json.hero_example— drop-in workspace MCP config templateRemaining
generate_code) —PythonExecutor::execute_with_retryexists, needs wiring in Shrimp skill or a dedicatedrun_intenttoolRelevant PRs: hero_aibroker#development_timur
mik-tf referenced this issue2026-03-12 16:20:47 +00:00
mik-tf referenced this issue2026-03-12 16:33:41 +00:00
mik-tf referenced this issue2026-03-12 16:42:04 +00:00
mik-tf referenced this issue2026-03-12 17:02:11 +00:00
mik-tf referenced this issue2026-03-12 19:19:01 +00:00
MCP Integration Status Update (from #23 Session 3)
The MCP integration between Hero Shrimp and AIBroker is complete and verified on herodev2.
Architecture Decision: Path B (broker-mediated)
After evaluating both approaches discussed in this issue, we chose Path B — Shrimp discovers and calls MCP tools via AIBroker's REST endpoints through hero_proxy, rather than spawning
mcp_heroas a direct stdio child process.Why Path B over Path A:
What's Working
All 5
mcp_herotools are live and accessible through the broker:register_service— register a Hero service by socket path, auto-discovers all RPC methodslist_services— list registered services and their method countsget_interface— get typed Python interface for a servicegenerate_code— LLM-powered Python code generation against service interfacesexecute_code— run generated Python code viauvVerified: registered
hero_redis_server(20 methods) and retrieved its full typed interface.Branches
hero_aibroker:development_timur— config fixes, mcp_hero wiringhero_shrimp:development_timur— endpoint confighero_services:development— TOML + build pipeline updatesNo merges to
developmentwithout devops review. Full details in #23.Follow-up Fix: Unified LLM Routing
Shrimp's agent loop was bypassing AIBroker — it used OpenRouter-format model names (
google/gemini-3-flash-preview) which AIBroker didn't recognize, so it fell back directly to OpenRouter.Fixed: Changed
SHRIMP_OPENROUTER_MODELSto AIBroker model names:gpt-4o-mini,claude-sonnet,llama-70b. Now all Shrimp LLM traffic (agent loop + MCP tools) routes through AIBroker, fully consistent with Path B architecture.Verified on herodev2 — Shrimp Config tab shows
gpt-4o-minias primary,claude-sonnet, llama-70bas fallbacks.you should comment SHRIMP_OPENROUTER_MODELS option so it goes to AIBROKER MODELS, it was meant to be usable without AI BROKER, but now you made AI BROKER to be hard requirement.
The correct fix is to comment SHRIMP_OPENROUTER_MODELS in the config.
Session 3 Progress Update — Shrimp Chat UI + MCP Integration
What's Working
claude-sonnet(default),gpt-4o,claude-haiku,llama-70bmcp_herotools viaMCP_BROKER_ENDPOINTclaude-sonnetsuccessfully calls MCP tools (register, list, get_interface, generate_code, execute_code)Limitations Found
1. Service registration is expensive and slow
register_servicemakes 2-3 LLM calls internally (generate Python client + interface from OpenRPC spec). Forhero_redis_server(20 methods), this took ~3 minutes and cost$0.05-0.10 with$1-2 per deployment**.claude-sonnet. Auto-registering 15+ services at container startup would cost **2. Agent doesn't know socket paths
When asked "register hero_redis", the agent guesses paths like
~/hero/var/sockets/hero_redis.sockbut the actual name ishero_redis_server.sock. Without a service discovery mechanism, users must provide exact socket paths.3. No auto-registration
Services need to be manually registered before they can be listed/queried. The cache is empty after every fresh container restart.
Options to Discuss
For auto-registration cost:
mcp_heroloads from cache — zero LLM calls at runtime.llama-70bon Groq at $0.59/M tokens instead ofclaude-sonnetat $15/M). Faster too.For service discovery:
/root/hero/var/sockets/*_server.sockand registers each.mcp_herogets a--sockets-dirflag to auto-discover on launch.hero_services_serverprovides a manifest of running services + socket paths thatmcp_herocan query.Recommendation: Option C for both — template-based code gen +
hero_services_servermanifest. No LLM costs for registration, instant startup, deterministic Python clients. LLM is only used when the user asksgenerate_codefor a specific task.Current Branches
hero_shrimp:development_timur— chat UI, model selector, expandable rowshero_services:development— 4-model config, TOML updateshero_aibroker:development_timur— unchanged from Session 3Ref: #23 Session 4
@thabeta wrote in #18 (comment):
Excellent idea thanks for the feedback!!
I will definitely implement this.
I took note in issue #23. It is not trivial so I will make sure that it's done properly and you can review the code.
Thanks again.
Status Update — 2026-03-13
All core tasks from this issue are implemented and verified on herodev2 and herodemo2.
What's done
Every task in the issue body is complete, implemented as
mcp_heroMCP tools on the broker with Shrimp as the agent (per architecture pivot discussed with @thabeta):mcp_hero register_service— auto-discovers RPC methods, generates typed Python client via LLMmcp_hero get_interface— lightweight AI-generated interface with typed stubsmcp_hero generate_code+ Shrimp agent loop orchestrationmcp_hero generate_code— feeds interface + intent to LLMexecute_with_retryin mcp_hero, orchestrated by Shrimp agent loopuvintegration~/.hero/var/aibroker/python/, Python 3.12, 30s timeoutFull chain verified: chat → model selection → agent loop → MCP tools → code execution → result displayed.
What remains
@thabeta feedback (comment): Comment out
SHRIMP_OPENROUTER_MODELS— Shrimp should fetch models dynamically from AIBroker/v1/modelsinstead of duplicating the model list. Current config makes AIBroker a hard requirement, which breaks standalone Shrimp use.Service registration cost: ~$0.05-0.10 per service with claude-sonnet, ~3 min per service. Options: pre-generate at build time, use cheaper model, or template-based code gen.
Auto-discovery: Agent doesn't know socket paths without manual input.
Branches
hero_aibroker:development_timur— PR #25 open (needs devops review)hero_shrimp:development_timur— committed (no merge without devops review)hero_services:development— TOMLs + build pipeline updatedmik-tf referenced this issue2026-03-13 12:56:14 +00:00