Agent loops on malformed tool-call JSON; raw JSON blocks render expanded in chat #91

Closed
opened 2026-06-07 13:21:24 +00:00 by rawan · 4 comments
Member

Symptom

After a normal reply (e.g. say "hello"), the agent starts emitting repeated file_write tool calls and loops, with reasoning like "the system is telling me my file_write call didn't actually execute." The calls also render as fully-expanded raw json code blocks in the chat (see screenshot).

Root cause

The model emits the wrong tool-call schema:

[{"tool": "file_write", "path": "hello.py", "content": "..."}]

but the recovery parser expects:

{"name": "file_write", "arguments": {"path": "hello.py", "content": "..."}}

No name key → call is silently dropped in tool_call_recovery_module.rs (~L143-146) → no execution → no role: "tool" result appended (result_processing_loop.rs L401-409) → model sees no response, assumes it failed, retries forever (until repetition/iteration guards trip).

Because the calls are never recognized as tool calls, they stay in the assistant's text as ```json fences and render via Markdown.tsx CodeBlock (no collapse) → always expanded. The UI is correct; the upstream misclassification is the bug.

Fix

  1. Prompt (orchestration/autonomy/prompt_builder.rs): steer the model to the {name, arguments} schema with explicit examples.
  2. Parser (agent_core/agent/tool_call_recovery.rs): tolerate the {"tool":..., <flat args>} dialect (map tool->name, collect remaining keys into arguments).

image

## Symptom After a normal reply (e.g. say "hello"), the agent starts emitting repeated `file_write` tool calls and loops, with reasoning like "the system is telling me my file_write call didn't actually execute." The calls also render as fully-expanded raw `json` code blocks in the chat (see screenshot). ## Root cause The model emits the wrong tool-call schema: ```json [{"tool": "file_write", "path": "hello.py", "content": "..."}] ``` but the recovery parser expects: ```json {"name": "file_write", "arguments": {"path": "hello.py", "content": "..."}} ``` No `name` key → call is silently dropped in `tool_call_recovery_module.rs` (~L143-146) → no execution → no `role: "tool"` result appended (`result_processing_loop.rs` L401-409) → model sees no response, assumes it failed, retries forever (until repetition/iteration guards trip). Because the calls are never recognized as tool calls, they stay in the assistant's text as ` ```json ` fences and render via `Markdown.tsx` `CodeBlock` (no collapse) → always expanded. The UI is correct; the upstream misclassification is the bug. ## Fix 1. **Prompt** (`orchestration/autonomy/prompt_builder.rs`): steer the model to the `{name, arguments}` schema with explicit examples. 2. **Parser** (`agent_core/agent/tool_call_recovery.rs`): tolerate the `{"tool":..., <flat args>}` dialect (map `tool`->`name`, collect remaining keys into `arguments`). ![image](/attachments/1e7d3df0-d605-4859-a15f-a0ad27ad8ab3)
rawan self-assigned this 2026-06-08 14:11:53 +00:00
Author
Member

Implementation Spec for Issue #91

Objective

Stop the agent from looping when a model emits the malformed tool-call dialect [{"tool":"file_write","path":"...","content":"..."}]. Achieve this by (1) teaching the recovery parser to recognize the {"tool": ..., <flat args>} dialect so the call is actually executed and a role:"tool" result is appended, and (2) steering the model toward the canonical {name, arguments} schema in the executor prompt with explicit examples.

Root cause (verified)

  • The bare-JSON recovery path runs through recover_tool_calls_from_content -> tool_call_from_value in tool_call_recovery.rs. The actual drop happens in tool_call_from_value (around L374-411).
  • tool_call_from_value only accepts two shapes: nested {"function":{"name",...}} and bare {"name":..., "arguments":...}. Both require a name key. The {"tool":..., <flat args>} shape has neither, so it is silently dropped.
  • Dropped -> no ToolCall promoted -> nothing executed -> no role:"tool" message pushed (result_processing_loop.rs L401-409). Model sees no tool result, assumes failure, retries forever.
  • The executor prompt (prompt_builder.rs::build_lean_executor_prompt) references tool names but never states the JSON wire schema, so a model not using native tool-calling guesses the shape.

Requirements

  • Parser must map a {"tool": "<name>", ...} object to a ToolCall with function.name = <name> and function.arguments = JSON of the remaining keys.
  • Reserved/meta keys (tool, name, function, arguments, id, tool_call_id, type) must NOT be folded into the synthesized arguments.
  • Existing accepted shapes ({name,arguments}, {function:{...}}, arrays, code-fenced, repaired JSON) must keep working unchanged.
  • The new dialect must be the lowest-priority branch so it never shadows the canonical shapes.
  • Empty/whitespace tool value must be rejected.
  • Prompt must explicitly show the canonical {"name": "...", "arguments": {...}} schema and warn against the flat {"tool":...} form.
  • New parser behavior covered by unit tests.

Files to Modify

  • crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs - Add a third branch to tool_call_from_value handling the flat-args dialect; update the module doc comment; add unit tests.
  • crates/hero_shrimp_engine/src/orchestration/autonomy/prompt_builder.rs - Add a tool-call schema directive to the executor prompt in build_lean_executor_prompt.

No new files. No changes needed to result_processing_loop.rs or tool_call_recovery_module.rs - they work correctly once the parser recognizes the call.

Implementation Plan

Step 1: Parser - accept the flat {"tool":..., <flat args>} dialect

Files: crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs

  • Make the bare {name, arguments} branch conditional on obj.contains_key("name") so a missing name falls through instead of short-circuiting the whole function.
  • Add a new final branch: require non-empty tool string (reuse nonempty_name); clone the object map and remove reserved keys (tool, name, function, arguments, id, tool_call_id, type); serialize the remainder as arguments; derive the call id from obj.get("id") or recovered_id(fallback_idx); return the ToolCall.
    Dependencies: none

Step 2: Parser - update module doc + add tests

Files: crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs

  • Update the "Shapes accepted" doc block to mention the flat dialect.
  • Add tests: flat dialect in array, flat dialect single object, reserved-key exclusion (id used as call id), empty-name dropped, canonical name wins over tool.
    Dependencies: Step 1

Step 3: Prompt - state the canonical tool-call schema

Files: crates/hero_shrimp_engine/src/orchestration/autonomy/prompt_builder.rs

  • In build_lean_executor_prompt, add a short directive to the shared trailing template so all modes get it once: the only accepted JSON shape is {"name": "<tool_name>", "arguments": { ... }} (or an array of these); include a concrete file_write example; warn against emitting {"tool":...} flat fields or wrapping tool calls in json fences.
    Dependencies: none (independent of Steps 1-2)

Step 4: Build & test

  • cargo build -p hero_shrimp_engine
  • cargo test -p hero_shrimp_engine tool_call_recovery
  • cargo test -p hero_shrimp_engine prompt_builder

Acceptance Criteria

  • tool_call_from_value returns a ToolCall for {"tool":"file_write","path":"a","content":"b"} with name file_write and arguments containing path/content but not tool.
  • Array form yields one call per element.
  • Reserved keys excluded from synthesized arguments; an id value, if present, becomes the call id.
  • Empty/whitespace tool value yields no call.
  • When both name and tool are present, the canonical name shape is used.
  • All pre-existing tests in tool_call_recovery.rs still pass.
  • Executor prompt contains the canonical schema with an example and an explicit warning.
  • cargo build and cargo test -p hero_shrimp_engine pass.

Notes

  • The caller validates recovered names against the live tool catalog, so a bogus tool value is harmless.
  • The new branch must remain lowest priority so canonical shapes are never shadowed.
  • Keep the prompt directive brief and provider-neutral to avoid undercutting the trivial fast path.
## Implementation Spec for Issue #91 ### Objective Stop the agent from looping when a model emits the malformed tool-call dialect `[{"tool":"file_write","path":"...","content":"..."}]`. Achieve this by (1) teaching the recovery parser to recognize the `{"tool": ..., <flat args>}` dialect so the call is actually executed and a `role:"tool"` result is appended, and (2) steering the model toward the canonical `{name, arguments}` schema in the executor prompt with explicit examples. ### Root cause (verified) - The bare-JSON recovery path runs through `recover_tool_calls_from_content` -> `tool_call_from_value` in `tool_call_recovery.rs`. The actual drop happens in `tool_call_from_value` (around L374-411). - `tool_call_from_value` only accepts two shapes: nested `{"function":{"name",...}}` and bare `{"name":..., "arguments":...}`. Both require a `name` key. The `{"tool":..., <flat args>}` shape has neither, so it is silently dropped. - Dropped -> no `ToolCall` promoted -> nothing executed -> no `role:"tool"` message pushed (`result_processing_loop.rs` L401-409). Model sees no tool result, assumes failure, retries forever. - The executor prompt (`prompt_builder.rs::build_lean_executor_prompt`) references tool names but never states the JSON wire schema, so a model not using native tool-calling guesses the shape. ### Requirements - Parser must map a `{"tool": "<name>", ...}` object to a `ToolCall` with `function.name = <name>` and `function.arguments` = JSON of the remaining keys. - Reserved/meta keys (`tool`, `name`, `function`, `arguments`, `id`, `tool_call_id`, `type`) must NOT be folded into the synthesized `arguments`. - Existing accepted shapes (`{name,arguments}`, `{function:{...}}`, arrays, code-fenced, repaired JSON) must keep working unchanged. - The new dialect must be the lowest-priority branch so it never shadows the canonical shapes. - Empty/whitespace `tool` value must be rejected. - Prompt must explicitly show the canonical `{"name": "...", "arguments": {...}}` schema and warn against the flat `{"tool":...}` form. - New parser behavior covered by unit tests. ### Files to Modify - `crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs` - Add a third branch to `tool_call_from_value` handling the flat-args dialect; update the module doc comment; add unit tests. - `crates/hero_shrimp_engine/src/orchestration/autonomy/prompt_builder.rs` - Add a tool-call schema directive to the executor prompt in `build_lean_executor_prompt`. No new files. No changes needed to `result_processing_loop.rs` or `tool_call_recovery_module.rs` - they work correctly once the parser recognizes the call. ### Implementation Plan #### Step 1: Parser - accept the flat `{"tool":..., <flat args>}` dialect Files: `crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs` - Make the bare `{name, arguments}` branch conditional on `obj.contains_key("name")` so a missing `name` falls through instead of short-circuiting the whole function. - Add a new final branch: require non-empty `tool` string (reuse `nonempty_name`); clone the object map and remove reserved keys (`tool`, `name`, `function`, `arguments`, `id`, `tool_call_id`, `type`); serialize the remainder as `arguments`; derive the call `id` from `obj.get("id")` or `recovered_id(fallback_idx)`; return the `ToolCall`. Dependencies: none #### Step 2: Parser - update module doc + add tests Files: `crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs` - Update the "Shapes accepted" doc block to mention the flat dialect. - Add tests: flat dialect in array, flat dialect single object, reserved-key exclusion (id used as call id), empty-name dropped, canonical `name` wins over `tool`. Dependencies: Step 1 #### Step 3: Prompt - state the canonical tool-call schema Files: `crates/hero_shrimp_engine/src/orchestration/autonomy/prompt_builder.rs` - In `build_lean_executor_prompt`, add a short directive to the shared trailing template so all modes get it once: the only accepted JSON shape is `{"name": "<tool_name>", "arguments": { ... }}` (or an array of these); include a concrete `file_write` example; warn against emitting `{"tool":...}` flat fields or wrapping tool calls in json fences. Dependencies: none (independent of Steps 1-2) #### Step 4: Build & test - `cargo build -p hero_shrimp_engine` - `cargo test -p hero_shrimp_engine tool_call_recovery` - `cargo test -p hero_shrimp_engine prompt_builder` ### Acceptance Criteria - [ ] `tool_call_from_value` returns a `ToolCall` for `{"tool":"file_write","path":"a","content":"b"}` with name `file_write` and arguments containing `path`/`content` but not `tool`. - [ ] Array form yields one call per element. - [ ] Reserved keys excluded from synthesized arguments; an `id` value, if present, becomes the call id. - [ ] Empty/whitespace `tool` value yields no call. - [ ] When both `name` and `tool` are present, the canonical `name` shape is used. - [ ] All pre-existing tests in `tool_call_recovery.rs` still pass. - [ ] Executor prompt contains the canonical schema with an example and an explicit warning. - [ ] `cargo build` and `cargo test -p hero_shrimp_engine` pass. ### Notes - The caller validates recovered names against the live tool catalog, so a bogus `tool` value is harmless. - The new branch must remain lowest priority so canonical shapes are never shadowed. - Keep the prompt directive brief and provider-neutral to avoid undercutting the trivial fast path.
Author
Member

Test Results

  • Toolchain: rustc 1.96, cargo test -p hero_shrimp_engine
  • Total: 1746
  • Passed: 1746
  • Failed: 0

All tests pass, including the 5 new tool-call recovery tests for the flat {tool, ...} dialect (recovers_flat_dialect_in_array, recovers_flat_dialect_bare_object, flat_dialect_excludes_reserved_keys, flat_dialect_empty_tool_is_dropped, canonical_name_wins_over_flat_tool) and the existing prompt_builder tests.

## Test Results - Toolchain: rustc 1.96, `cargo test -p hero_shrimp_engine` - Total: 1746 - Passed: 1746 - Failed: 0 All tests pass, including the 5 new tool-call recovery tests for the flat `{tool, ...}` dialect (recovers_flat_dialect_in_array, recovers_flat_dialect_bare_object, flat_dialect_excludes_reserved_keys, flat_dialect_empty_tool_is_dropped, canonical_name_wins_over_flat_tool) and the existing prompt_builder tests.
Author
Member

Implementation Summary

Changes

crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs

  • tool_call_from_value: the bare {name, arguments} branch is now conditional on the presence of a name key (falls through instead of short-circuiting when absent).
  • Added a new lowest-priority branch that recognizes the flat dialect {"tool": "<name>", <flat args>}: maps tool -> name, collects the remaining keys into arguments, and excludes reserved keys (tool, name, function, arguments, id, tool_call_id, type). The call id is taken from an id field if present, otherwise generated. Canonical {name, ...} and nested {function:{...}} shapes still take priority.
  • Updated the "Shapes accepted" module doc comment.
  • Added 5 unit tests: recovers_flat_dialect_in_array, recovers_flat_dialect_bare_object, flat_dialect_excludes_reserved_keys, flat_dialect_empty_tool_is_dropped, canonical_name_wins_over_flat_tool.

crates/hero_shrimp_engine/src/orchestration/autonomy/prompt_builder.rs

  • build_lean_executor_prompt: added a single TOOL CALLS directive to the shared trailing prompt template (applies to all modes once). It states the canonical accepted shape {"name": "<tool_name>", "arguments": { ... }} with a file_write example, and warns against the flat {"tool": ...} form and against wrapping tool calls in json code fences.

Effect

A model that emits [{"tool":"file_write","path":"...","content":"..."}] is now parsed into an executable tool call, so it runs once and a role:"tool" result is appended — breaking the retry loop. The prompt change reduces the chance the malformed dialect is produced in the first place. Because the calls are now classified as tool calls rather than left as assistant text, they no longer render as expanded raw json blocks in chat.

Tests

  • cargo test -p hero_shrimp_engine (rustc 1.96): 1746 passed, 0 failed.

Notes

  • result_processing_loop.rs and tool_call_recovery_module.rs required no changes — they work correctly once the parser recognizes the call.
  • The workspace requires rustc 1.96 (the default 1.95 toolchain is rejected by rust-version); use cargo +1.96.
## Implementation Summary ### Changes **`crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs`** - `tool_call_from_value`: the bare `{name, arguments}` branch is now conditional on the presence of a `name` key (falls through instead of short-circuiting when absent). - Added a new lowest-priority branch that recognizes the flat dialect `{"tool": "<name>", <flat args>}`: maps `tool` -> name, collects the remaining keys into `arguments`, and excludes reserved keys (`tool`, `name`, `function`, `arguments`, `id`, `tool_call_id`, `type`). The call `id` is taken from an `id` field if present, otherwise generated. Canonical `{name, ...}` and nested `{function:{...}}` shapes still take priority. - Updated the "Shapes accepted" module doc comment. - Added 5 unit tests: `recovers_flat_dialect_in_array`, `recovers_flat_dialect_bare_object`, `flat_dialect_excludes_reserved_keys`, `flat_dialect_empty_tool_is_dropped`, `canonical_name_wins_over_flat_tool`. **`crates/hero_shrimp_engine/src/orchestration/autonomy/prompt_builder.rs`** - `build_lean_executor_prompt`: added a single TOOL CALLS directive to the shared trailing prompt template (applies to all modes once). It states the canonical accepted shape `{"name": "<tool_name>", "arguments": { ... }}` with a `file_write` example, and warns against the flat `{"tool": ...}` form and against wrapping tool calls in json code fences. ### Effect A model that emits `[{"tool":"file_write","path":"...","content":"..."}]` is now parsed into an executable tool call, so it runs once and a `role:"tool"` result is appended — breaking the retry loop. The prompt change reduces the chance the malformed dialect is produced in the first place. Because the calls are now classified as tool calls rather than left as assistant text, they no longer render as expanded raw json blocks in chat. ### Tests - `cargo test -p hero_shrimp_engine` (rustc 1.96): 1746 passed, 0 failed. ### Notes - `result_processing_loop.rs` and `tool_call_recovery_module.rs` required no changes — they work correctly once the parser recognizes the call. - The workspace requires rustc 1.96 (the default 1.95 toolchain is rejected by `rust-version`); use `cargo +1.96`.
Author
Member

Model generation issue, not shrimp issue.

Model generation issue, not shrimp issue.
rawan closed this issue 2026-06-08 15:02:11 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_shrimp#91
No description provided.