lhumina_code/hero_shrimp

Fork 0

[P1] `unwrap()`/`expect()` in recovery hot paths (force_write, tool_call_recovery) can panic #42

New issue

Closed

opened 2026-05-23 21:52:26 +00:00 by thabeta · 6 comments

thabeta commented

2026-05-23 21:52:26 +00:00

Owner

Problem
File-creation/path code in force_write.rs and JSON parsing in tool_call_recovery.rs use unwrap() on fallible operations. A broken workspace, permission error, or malformed model tool-args can panic the recovery path instead of escalating gracefully.

Evidence

crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs (multiple .unwrap()).
crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs (serde_json::from_str(...).unwrap()).

Proposed fix
Return typed errors + escalate; a recovery routine must never crash the run.

Filed from a comparative audit of Hero Shrimp vs Qwen-Code / kimi-cli / picoclaw (2026-05-23). Severity in title: P0=correctness/trust, P1=reliability/UX, P2=cleanup.

**Problem** File-creation/path code in `force_write.rs` and JSON parsing in `tool_call_recovery.rs` use `unwrap()` on fallible operations. A broken workspace, permission error, or malformed model tool-args can panic the recovery path instead of escalating gracefully. **Evidence** - `crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs` (multiple `.unwrap()`). - `crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs` (`serde_json::from_str(...).unwrap()`). **Proposed fix** Return typed errors + escalate; a recovery routine must never crash the run. --- _Filed from a comparative audit of Hero Shrimp vs Qwen-Code / kimi-cli / picoclaw (2026-05-23). Severity in title: P0=correctness/trust, P1=reliability/UX, P2=cleanup._

thabeta added the

type_bug

label

2026-05-23 21:52:26 +00:00

rawan self-assigned this

2026-05-25 13:50:58 +00:00

rawan commented

2026-05-25 13:57:15 +00:00

Member

Implementation Spec for Issue #42

Objective

Eliminate every unwrap() and expect() call on a fallible operation in the two recovery hot paths (force_write.rs and tool_call_recovery.rs) so that a broken workspace, a permission error, or malformed model tool-args degrade gracefully (no-op + trace event) instead of panicking the run.

Requirements

No .unwrap() / .expect(...) on a fallible operation may remain in the production (non-#[cfg(test)]) code of force_write.rs or tool_call_recovery.rs.
Recovery routines must NEVER crash the agent loop. On parse / IO / regex failure they must return a neutral value (false, None, Vec::new(), or unchanged input) plus a tracing::warn! for observability.
Public function signatures must NOT change. Callers (no_tool_calls.rs::handle_no_tool_calls calling force_write_guard, tool_call_recovery_module.rs::lift_recovered_tool_calls calling the recovery helpers) treat these functions as infallible; converting them to Result would ripple through the loop and is out of scope. "Escalate gracefully" here means: silently no-op, log the failure, and let the existing escalation machinery downstream (stuck_escalation::escalate_if_ending_incomplete, the zero-deliverables proof_run check, the iteration budget) take over.
All existing tests must still pass; tests themselves are NOT part of this scope (the unwrap()s inside #[cfg(test)] mod tests blocks are acceptable test code and are out of scope for this issue).

Findings From Codebase Audit

The issue body says force_write.rs has "multiple .unwrap()" and tool_call_recovery.rs has serde_json::from_str(...).unwrap(). Current state on development:

force_write.rs: ZERO production unwrap()/expect() calls. All 20 unwrap() occurrences (lines 210, 213, 224, 233, 234, 238, 250, 262, 274, 287, 295, 307, 326, 374, 398, 413, 430, 435, 440, 451) are inside #[cfg(test)] mod tests (starts at line 198).
tool_call_recovery.rs: ZERO serde_json::from_str(...).unwrap() in production. The production code at line 492 already uses .unwrap_or(serde_json::Value::Object(serde_json::Map::new())). The only production-code expect() calls are on regex compilation at lines 457, 480, 484 — these are arguably infallible-by-construction (constant regex literals), but the issue's intent ("a recovery routine must never crash the run") still applies: a panic on regex compile is the only remaining panic path in this file.

So the substantive production-code work is: replace the three regex expect() calls in tool_call_recovery.rs with non-panicking fallbacks; everything else is verification that the test-only unwrap()s remain test-only.

Files to Modify

crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs — replace the three production expect() calls on Regex::new(...) at lines 457, 480, 484 with OnceLock<Option<Regex>> patterns that degrade gracefully on regex compilation failure.
crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs — verify only (no production changes needed). All existing unwrap()s are inside #[cfg(test)].

Implementation Plan

Step 1: Make `strip_bracket_tool_calls` regex-failure safe

Files: crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs

Current code (lines 453-470):

pub fn strip_bracket_tool_calls(content: &str) -> (String, Vec<String>) {
    let re = Regex::new(r"(?s)\[TOOL_CALL\](.*?)\[/TOOL_CALL\]")
        .expect("strip_bracket_tool_calls regex is valid");
    ...
}

Subtasks:

Replace the Regex::new(...).expect(...) on lines 456-457 with a OnceLock<Option<Regex>> pattern that, on Err, logs via tracing::warn! and returns (content.to_string(), Vec::new()) — semantically equivalent to "no bracket blocks found", which is the documented neutral fallback.
Hoist the compiled regex into a static BRACKET_RE: std::sync::OnceLock<Option<Regex>> so repeated calls do not re-compile the pattern.

Replacement pattern:

pub fn strip_bracket_tool_calls(content: &str) -> (String, Vec<String>) {
    static BRACKET_RE: std::sync::OnceLock<Option<Regex>> = std::sync::OnceLock::new();
    let re = match BRACKET_RE.get_or_init(|| {
        Regex::new(r"(?s)\[TOOL_CALL\](.*?)\[/TOOL_CALL\]").ok()
    }) {
        Some(r) => r,
        None => {
            tracing::warn!(
                target: "tool_call_recovery",
                "strip_bracket_tool_calls regex failed to compile; treating content as no-bracket-blocks"
            );
            return (content.to_string(), Vec::new());
        }
    };
    // ...rest unchanged...
}

Dependencies: none.

Step 2: Make `normalize_arrow_syntax` regex-failure safe

Files: crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs

Current code (lines 477-501):

pub(crate) fn normalize_arrow_syntax(body: &str) -> Option<String> {
    let tool_re = Regex::new(r#"tool\s*=>\s*"?([A-Za-z0-9_]+)"?"#)
        .expect("normalize_arrow_syntax tool regex is valid");
    let args_re = Regex::new(r"args\s*=>\s*(\{[\s\S]*\})")
        .expect("normalize_arrow_syntax args regex is valid");
    ...
}

Subtasks:

Replace the two .expect(...) calls on lines 480 and 484 with OnceLock<Option<Regex>> initializers identical to Step 1.
On failure, tracing::warn! and return None. None is already the documented fallback for "shape didn't match" (line 475-476), so callers (lift_recovered_tool_calls at line 48 of tool_call_recovery_module.rs) already handle it by falling through to the next recovery mechanism.

Replacement pattern:

pub(crate) fn normalize_arrow_syntax(body: &str) -> Option<String> {
    static TOOL_RE: std::sync::OnceLock<Option<Regex>> = std::sync::OnceLock::new();
    static ARGS_RE: std::sync::OnceLock<Option<Regex>> = std::sync::OnceLock::new();

    let tool_re = TOOL_RE
        .get_or_init(|| Regex::new(r#"tool\s*=>\s*"?([A-Za-z0-9_]+)"?"#).ok())
        .as_ref()?;
    let args_re = ARGS_RE
        .get_or_init(|| Regex::new(r"args\s*=>\s*(\{[\s\S]*\})").ok())
        .as_ref()?;
    // ...rest unchanged (already returns None on no-match via `?`)...
}

Dependencies: none (can be done alongside Step 1).

Step 3: Audit `force_write.rs`

Files: crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs

Subtasks:

Confirm via grep -n 'unwrap()\|expect(' force_write.rs that every match is at line >= 198 (inside #[cfg(test)] mod tests).
No production-code changes required for this file.

Dependencies: none.

Step 4: Verify behavior end-to-end

Run cargo test -p hero_shrimp_engine tool_call_recovery — every existing test must still pass with the regex changes, including the proptest! fuzz (lines 846-895) and the "never panics on adversarial input" suite.
Run cargo test -p hero_shrimp_engine force_write — every existing test (including scenario_*) must still pass.

Dependencies: Steps 1 and 2 complete.

Acceptance Criteria

No .unwrap() or .expect(...) remains in tool_call_recovery.rs outside #[cfg(test)] mod tests.
No .unwrap() or .expect(...) remains in force_write.rs outside #[cfg(test)] mod tests (already true — confirm with grep).
strip_bracket_tool_calls returns (content.to_string(), Vec::new()) instead of panicking if the bracket regex fails to compile.
normalize_arrow_syntax returns None instead of panicking if either of its two regexes fails to compile.
cargo test -p hero_shrimp_engine passes.
cargo build -p hero_shrimp_engine is clean — no new warnings.
Callers (no_tool_calls.rs::handle_no_tool_calls, tool_call_recovery_module.rs::lift_recovered_tool_calls) compile unchanged — public signatures are NOT changed.

Notes

Canonical error type in the crate: anyhow::Result everywhere. The crate does not use thiserror or a custom error enum.
What "escalate" means in practice: the agent loop already has multi-layered escalation when recovery fails. force_write_guard returning false causes the caller to proceed to stuck_escalation::escalate_if_ending_incomplete (no_tool_calls.rs line 171); lift_recovered_tool_calls returning a no-op falls through to recover_tool_calls_iter0 (prompt-retry then fallback model) in tool_call_recovery_module.rs. "Escalate gracefully" means returning the existing neutral value plus a tracing::warn!; the downstream stages take it from there. Changing these signatures to Result would ripple through the iteration loop and is out of scope for a P1 panic-hygiene fix.
Why regex expect() panics are real: although the three regex literals are valid today and the panic path is unreachable in normal conditions, the issue explicitly says "a recovery routine must never crash the run", and the codebase's adversarial test suite (tool_call_recovery.rs lines 696-707) already wraps calls in std::panic::catch_unwind to assert no-panic. Replacing .expect(...) with OnceLock<Option<Regex>> formalises that contract and pays the price (≈ 5 lines per regex) for guaranteed crash safety.
Hot-path performance: hoisting the regex into a OnceLock also eliminates the previous per-call Regex::new allocation.

## Implementation Spec for Issue #42 ### Objective Eliminate every `unwrap()` and `expect()` call on a fallible operation in the two recovery hot paths (`force_write.rs` and `tool_call_recovery.rs`) so that a broken workspace, a permission error, or malformed model tool-args degrade gracefully (no-op + trace event) instead of panicking the run. ### Requirements - No `.unwrap()` / `.expect(...)` on a fallible operation may remain in the **production** (non-`#[cfg(test)]`) code of `force_write.rs` or `tool_call_recovery.rs`. - Recovery routines must NEVER crash the agent loop. On parse / IO / regex failure they must return a neutral value (`false`, `None`, `Vec::new()`, or unchanged input) plus a `tracing::warn!` for observability. - Public function **signatures must NOT change**. Callers (`no_tool_calls.rs::handle_no_tool_calls` calling `force_write_guard`, `tool_call_recovery_module.rs::lift_recovered_tool_calls` calling the recovery helpers) treat these functions as infallible; converting them to `Result` would ripple through the loop and is out of scope. "Escalate gracefully" here means: silently no-op, log the failure, and let the existing escalation machinery downstream (`stuck_escalation::escalate_if_ending_incomplete`, the zero-deliverables proof_run check, the iteration budget) take over. - All existing tests must still pass; tests themselves are NOT part of this scope (the `unwrap()`s inside `#[cfg(test)] mod tests` blocks are acceptable test code and are out of scope for this issue). ### Findings From Codebase Audit The issue body says `force_write.rs` has "multiple `.unwrap()`" and `tool_call_recovery.rs` has `serde_json::from_str(...).unwrap()`. **Current state on `development`:** - `force_write.rs`: ZERO production `unwrap()`/`expect()` calls. All 20 `unwrap()` occurrences (lines 210, 213, 224, 233, 234, 238, 250, 262, 274, 287, 295, 307, 326, 374, 398, 413, 430, 435, 440, 451) are inside `#[cfg(test)] mod tests` (starts at line 198). - `tool_call_recovery.rs`: ZERO `serde_json::from_str(...).unwrap()` in production. The production code at line 492 already uses `.unwrap_or(serde_json::Value::Object(serde_json::Map::new()))`. The only production-code `expect()` calls are on **regex compilation** at lines 457, 480, 484 — these are arguably infallible-by-construction (constant regex literals), but the issue's intent ("a recovery routine must never crash the run") still applies: a panic on regex compile is the only remaining panic path in this file. So the substantive production-code work is: replace the three regex `expect()` calls in `tool_call_recovery.rs` with non-panicking fallbacks; everything else is verification that the test-only `unwrap()`s remain test-only. ### Files to Modify - `crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs` — replace the three production `expect()` calls on `Regex::new(...)` at lines 457, 480, 484 with `OnceLock<Option<Regex>>` patterns that degrade gracefully on regex compilation failure. - `crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs` — verify only (no production changes needed). All existing `unwrap()`s are inside `#[cfg(test)]`. ### Implementation Plan #### Step 1: Make `strip_bracket_tool_calls` regex-failure safe Files: `crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs` Current code (lines 453-470): ```rust pub fn strip_bracket_tool_calls(content: &str) -> (String, Vec<String>) { let re = Regex::new(r"(?s)\[TOOL_CALL\](.*?)\[/TOOL_CALL\]") .expect("strip_bracket_tool_calls regex is valid"); ... } ``` Subtasks: - Replace the `Regex::new(...).expect(...)` on lines 456-457 with a `OnceLock<Option<Regex>>` pattern that, on `Err`, logs via `tracing::warn!` and returns `(content.to_string(), Vec::new())` — semantically equivalent to "no bracket blocks found", which is the documented neutral fallback. - Hoist the compiled regex into a `static BRACKET_RE: std::sync::OnceLock<Option<Regex>>` so repeated calls do not re-compile the pattern. Replacement pattern: ```rust pub fn strip_bracket_tool_calls(content: &str) -> (String, Vec<String>) { static BRACKET_RE: std::sync::OnceLock<Option<Regex>> = std::sync::OnceLock::new(); let re = match BRACKET_RE.get_or_init(|| { Regex::new(r"(?s)\[TOOL_CALL\](.*?)\[/TOOL_CALL\]").ok() }) { Some(r) => r, None => { tracing::warn!( target: "tool_call_recovery", "strip_bracket_tool_calls regex failed to compile; treating content as no-bracket-blocks" ); return (content.to_string(), Vec::new()); } }; // ...rest unchanged... } ``` Dependencies: none. #### Step 2: Make `normalize_arrow_syntax` regex-failure safe Files: `crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs` Current code (lines 477-501): ```rust pub(crate) fn normalize_arrow_syntax(body: &str) -> Option<String> { let tool_re = Regex::new(r#"tool\s*=>\s*"?([A-Za-z0-9_]+)"?"#) .expect("normalize_arrow_syntax tool regex is valid"); let args_re = Regex::new(r"args\s*=>\s*(\{[\s\S]*\})") .expect("normalize_arrow_syntax args regex is valid"); ... } ``` Subtasks: - Replace the two `.expect(...)` calls on lines 480 and 484 with `OnceLock<Option<Regex>>` initializers identical to Step 1. - On failure, `tracing::warn!` and `return None`. `None` is already the documented fallback for "shape didn't match" (line 475-476), so callers (`lift_recovered_tool_calls` at line 48 of `tool_call_recovery_module.rs`) already handle it by falling through to the next recovery mechanism. Replacement pattern: ```rust pub(crate) fn normalize_arrow_syntax(body: &str) -> Option<String> { static TOOL_RE: std::sync::OnceLock<Option<Regex>> = std::sync::OnceLock::new(); static ARGS_RE: std::sync::OnceLock<Option<Regex>> = std::sync::OnceLock::new(); let tool_re = TOOL_RE .get_or_init(|| Regex::new(r#"tool\s*=>\s*"?([A-Za-z0-9_]+)"?"#).ok()) .as_ref()?; let args_re = ARGS_RE .get_or_init(|| Regex::new(r"args\s*=>\s*(\{[\s\S]*\})").ok()) .as_ref()?; // ...rest unchanged (already returns None on no-match via `?`)... } ``` Dependencies: none (can be done alongside Step 1). #### Step 3: Audit `force_write.rs` Files: `crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs` Subtasks: - Confirm via `grep -n 'unwrap()\|expect(' force_write.rs` that every match is at line >= 198 (inside `#[cfg(test)] mod tests`). - No production-code changes required for this file. Dependencies: none. #### Step 4: Verify behavior end-to-end - Run `cargo test -p hero_shrimp_engine tool_call_recovery` — every existing test must still pass with the regex changes, including the `proptest!` fuzz (lines 846-895) and the "never panics on adversarial input" suite. - Run `cargo test -p hero_shrimp_engine force_write` — every existing test (including `scenario_*`) must still pass. Dependencies: Steps 1 and 2 complete. ### Acceptance Criteria - [ ] No `.unwrap()` or `.expect(...)` remains in `tool_call_recovery.rs` outside `#[cfg(test)] mod tests`. - [ ] No `.unwrap()` or `.expect(...)` remains in `force_write.rs` outside `#[cfg(test)] mod tests` (already true — confirm with grep). - [ ] `strip_bracket_tool_calls` returns `(content.to_string(), Vec::new())` instead of panicking if the bracket regex fails to compile. - [ ] `normalize_arrow_syntax` returns `None` instead of panicking if either of its two regexes fails to compile. - [ ] `cargo test -p hero_shrimp_engine` passes. - [ ] `cargo build -p hero_shrimp_engine` is clean — no new warnings. - [ ] Callers (`no_tool_calls.rs::handle_no_tool_calls`, `tool_call_recovery_module.rs::lift_recovered_tool_calls`) compile unchanged — public signatures are NOT changed. ### Notes - **Canonical error type in the crate**: `anyhow::Result` everywhere. The crate does not use `thiserror` or a custom error enum. - **What "escalate" means in practice**: the agent loop already has multi-layered escalation when recovery fails. `force_write_guard` returning `false` causes the caller to proceed to `stuck_escalation::escalate_if_ending_incomplete` (`no_tool_calls.rs` line 171); `lift_recovered_tool_calls` returning a no-op falls through to `recover_tool_calls_iter0` (prompt-retry then fallback model) in `tool_call_recovery_module.rs`. "Escalate gracefully" means returning the existing neutral value plus a `tracing::warn!`; the downstream stages take it from there. Changing these signatures to `Result` would ripple through the iteration loop and is out of scope for a P1 panic-hygiene fix. - **Why regex `expect()` panics are real**: although the three regex literals are valid today and the panic path is unreachable in normal conditions, the issue explicitly says "a recovery routine must never crash the run", and the codebase's adversarial test suite (`tool_call_recovery.rs` lines 696-707) already wraps calls in `std::panic::catch_unwind` to assert no-panic. Replacing `.expect(...)` with `OnceLock<Option<Regex>>` formalises that contract and pays the price (≈ 5 lines per regex) for guaranteed crash safety. - **Hot-path performance**: hoisting the regex into a `OnceLock` also eliminates the previous per-call `Regex::new` allocation.

rawan commented

2026-05-25 14:08:31 +00:00

Member

Test Results

Branch: development_recovery_panic_hygiene

Targeted tests (the modules touched by this change)

cargo test -p hero_shrimp_engine --lib tool_call_recovery — 40 passed, 0 failed
Includes the property-based fuzz suite (prop_repair_never_panics_*, prop_recover_never_panics_*) and the adversarial-input suite (recovery_never_panics_on_adversarial_input, recovery_never_panics_on_huge_and_nested_input).
cargo test -p hero_shrimp_engine --lib force_write — 14 passed, 0 failed

Full crate suite

cargo test -p hero_shrimp_engine — 1639 passed, 9 failed

The 9 failures are pre-existing on development and unrelated to this change. They live in verification::runner::*, tools::external_cmd::*, tools::tool_catalog::verify::e2e_datetime_server::*, and tests::autonomy_* — all rely on spawning shell processes, opening sockets, or running isolated backends and fail in the local sandboxed environment regardless of branch. Verified by checking out development and re-running the same suite (1639 passed / 9 failed, identical test names).

Build

cargo build -p hero_shrimp_engine — clean, no new warnings.

What changed

strip_bracket_tool_calls now caches its regex in OnceLock<Option<Regex>> and returns the no-op fallback (content.to_string(), Vec::new()) plus a tracing::warn! if regex compilation ever fails.
normalize_arrow_syntax does the same for its two regexes and falls through to None (its documented "shape didn't match" return) plus a tracing::warn! on compile failure.
force_write.rs required no production-code changes — all unwrap()s in that file are inside the #[cfg(test)] mod tests block. Confirmed with grep -n 'unwrap()|expect(' force_write.rs (every match at line >= 210, past the #[cfg(test)] boundary at line 198).

## Test Results Branch: `development_recovery_panic_hygiene` ### Targeted tests (the modules touched by this change) - `cargo test -p hero_shrimp_engine --lib tool_call_recovery` — **40 passed, 0 failed** Includes the property-based fuzz suite (`prop_repair_never_panics_*`, `prop_recover_never_panics_*`) and the adversarial-input suite (`recovery_never_panics_on_adversarial_input`, `recovery_never_panics_on_huge_and_nested_input`). - `cargo test -p hero_shrimp_engine --lib force_write` — **14 passed, 0 failed** ### Full crate suite - `cargo test -p hero_shrimp_engine` — **1639 passed, 9 failed** The 9 failures are **pre-existing on `development`** and unrelated to this change. They live in `verification::runner::*`, `tools::external_cmd::*`, `tools::tool_catalog::verify::e2e_datetime_server::*`, and `tests::autonomy_*` — all rely on spawning shell processes, opening sockets, or running isolated backends and fail in the local sandboxed environment regardless of branch. Verified by checking out `development` and re-running the same suite (1639 passed / 9 failed, identical test names). ### Build - `cargo build -p hero_shrimp_engine` — clean, no new warnings. ### What changed - `strip_bracket_tool_calls` now caches its regex in `OnceLock<Option<Regex>>` and returns the no-op fallback `(content.to_string(), Vec::new())` plus a `tracing::warn!` if regex compilation ever fails. - `normalize_arrow_syntax` does the same for its two regexes and falls through to `None` (its documented "shape didn't match" return) plus a `tracing::warn!` on compile failure. - `force_write.rs` required no production-code changes — all `unwrap()`s in that file are inside the `#[cfg(test)] mod tests` block. Confirmed with `grep -n 'unwrap()|expect(' force_write.rs` (every match at line >= 210, past the `#[cfg(test)]` boundary at line 198).

rawan commented

2026-05-25 14:08:47 +00:00

Member

Test Results

Branch: development_recovery_panic_hygiene
Command: cargo test -p hero_shrimp_engine

Targeted tests (the changed surface)

Module	Result
`agent_core::agent::tool_call_recovery` (`cargo test -p hero_shrimp_engine --lib tool_call_recovery`)	40 passed, 0 failed
`agent_core::agent::force_write` (`cargo test -p hero_shrimp_engine --lib force_write`)	14 passed, 0 failed

Notable passing tests in scope:

strip_bracket_tool_calls_* (4 tests covering single-block, multi-block, no-markers, block-only)
normalize_arrow_syntax_* and the upstream recovers_* paths that consume it
recovery_never_panics_on_huge_and_nested_input — the adversarial catch_unwind-wrapped suite
prop_recover_never_panics_arbitrary_string, prop_recover_never_panics_arbitrary_bytes, prop_repair_never_panics_arbitrary_string, prop_repair_never_panics_json_soup, prop_recovered_calls_are_well_typed — proptest fuzzing

Full crate run

cargo test -p hero_shrimp_engine --lib: 1639 passed, 9 failed, 1 ignored (31s).

The 9 failures are all environment-related and pre-exist on development:

Failing test	Pre-existing failure cause
`tools::external_cmd::tests::spawn_failing_command_returns_failure_with_exit_code`	`sh: No such file or directory`
`tools::external_cmd::tests::spawn_runs_a_real_command_and_captures_stdout`	`sh: No such file or directory`
`tools::external_cmd::tests::spawn_timeout_returns_failure_not_hang`	`sh: No such file or directory`
`verification::runner::tests::command_succeeds_decides_purely_on_exit_code`	shell unavailable
`verification::runner::tests::command_runs_through_a_shell_so_cd_and_chaining_work`	shell unavailable
`tools::tool_catalog::verify::e2e_datetime_server::phase2_http_server_live_request`	`python3` not in PATH
`tools::tool_catalog::verify::e2e_datetime_server::phase3_edge_case_unknown_route_returns_404`	`python3` not in PATH
`tests::autonomy_auto_fallback_warns_when_no_isolated_backend_exists`	Bubblewrap detection — environment selected `Bubblewrap` where the test expected `Host`
`tests::autonomy_context_auto_selects_isolated_backends`	Bubblewrap detection — same

Baseline verification: checking out development and running the identical cargo test -p hero_shrimp_engine --lib reproduces the same 1639/9/1 numbers with the same failing test names. The regex-hardening changes in this branch introduce zero new failures.

Build

cargo build -p hero_shrimp_engine: clean, no new warnings.

## Test Results Branch: `development_recovery_panic_hygiene` Command: `cargo test -p hero_shrimp_engine` ### Targeted tests (the changed surface) | Module | Result | |---|---| | `agent_core::agent::tool_call_recovery` (`cargo test -p hero_shrimp_engine --lib tool_call_recovery`) | **40 passed, 0 failed** | | `agent_core::agent::force_write` (`cargo test -p hero_shrimp_engine --lib force_write`) | **14 passed, 0 failed** | Notable passing tests in scope: - `strip_bracket_tool_calls_*` (4 tests covering single-block, multi-block, no-markers, block-only) - `normalize_arrow_syntax_*` and the upstream `recovers_*` paths that consume it - `recovery_never_panics_on_huge_and_nested_input` — the adversarial `catch_unwind`-wrapped suite - `prop_recover_never_panics_arbitrary_string`, `prop_recover_never_panics_arbitrary_bytes`, `prop_repair_never_panics_arbitrary_string`, `prop_repair_never_panics_json_soup`, `prop_recovered_calls_are_well_typed` — proptest fuzzing ### Full crate run `cargo test -p hero_shrimp_engine --lib`: **1639 passed, 9 failed, 1 ignored** (31s). The 9 failures are all environment-related and **pre-exist on `development`**: | Failing test | Pre-existing failure cause | |---|---| | `tools::external_cmd::tests::spawn_failing_command_returns_failure_with_exit_code` | `sh: No such file or directory` | | `tools::external_cmd::tests::spawn_runs_a_real_command_and_captures_stdout` | `sh: No such file or directory` | | `tools::external_cmd::tests::spawn_timeout_returns_failure_not_hang` | `sh: No such file or directory` | | `verification::runner::tests::command_succeeds_decides_purely_on_exit_code` | shell unavailable | | `verification::runner::tests::command_runs_through_a_shell_so_cd_and_chaining_work` | shell unavailable | | `tools::tool_catalog::verify::e2e_datetime_server::phase2_http_server_live_request` | `python3` not in PATH | | `tools::tool_catalog::verify::e2e_datetime_server::phase3_edge_case_unknown_route_returns_404` | `python3` not in PATH | | `tests::autonomy_auto_fallback_warns_when_no_isolated_backend_exists` | Bubblewrap detection — environment selected `Bubblewrap` where the test expected `Host` | | `tests::autonomy_context_auto_selects_isolated_backends` | Bubblewrap detection — same | Baseline verification: checking out `development` and running the identical `cargo test -p hero_shrimp_engine --lib` reproduces the same 1639/9/1 numbers with the same failing test names. The regex-hardening changes in this branch introduce zero new failures. ### Build `cargo build -p hero_shrimp_engine`: clean, no new warnings.

rawan commented

2026-05-25 14:09:12 +00:00

Member

Implementation Summary

Scope

P1 panic-hygiene fix for the two recovery hot paths called out in this issue. Pre-change audit revealed the panic surface in production code is narrower than the issue body suggested:

force_write.rs: no production unwrap()/expect() — all 20 occurrences are inside #[cfg(test)] mod tests (boundary at line 198, first call at line 210).
tool_call_recovery.rs: no production serde_json::from_str(...).unwrap() — the call at line 492 already uses .unwrap_or(Value::Object(Map::new())). The only production panic surface was three Regex::new(...).expect(...) calls on constant regex literals (lines 457, 480, 484).

Files changed

crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs — replaced 3 Regex::new(...).expect(...) calls with OnceLock<Option<Regex>> patterns. On compile failure each callsite now emits tracing::warn!(target: "tool_call_recovery", ...) and returns its documented neutral value:
- strip_bracket_tool_calls returns (content.to_string(), Vec::new()) (same as the "no markers found" branch).
- normalize_arrow_syntax returns None (same as the "shape didn't match" branch).
As a side benefit the regexes are now compiled once per process via OnceLock instead of on every recovery turn.
crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs — no changes (audit confirmed all unwrap()s are test-only).

Public API

Function signatures unchanged. force_write_guard, strip_bracket_tool_calls, normalize_arrow_syntax, recover_tool_calls_from_content, and lift_recovered_tool_calls all keep their existing return types. Callers (no_tool_calls.rs::handle_no_tool_calls, tool_call_recovery_module.rs::lift_recovered_tool_calls) compile unchanged. Escalation continues through the existing downstream path (stuck_escalation::escalate_if_ending_incomplete, prompt-retry, fallback model).

Test results

Targeted: cargo test -p hero_shrimp_engine --lib tool_call_recovery — 40 passed, 0 failed (includes the property-based fuzz and recovery_never_panics_on_adversarial_input suite).
Targeted: cargo test -p hero_shrimp_engine --lib force_write — 14 passed, 0 failed.
Full crate: cargo test -p hero_shrimp_engine — 1639 passed, 9 failed. The 9 failures are pre-existing on development (verified by re-running on a clean checkout of development) and unrelated to this change — they live in verification::runner::*, tools::external_cmd::*, tools::tool_catalog::verify::e2e_datetime_server::*, and tests::autonomy_*, all of which spawn shell processes or open sockets and fail in the sandboxed local environment regardless of branch.
cargo build -p hero_shrimp_engine — clean, no new warnings.

Acceptance criteria

No .unwrap() or .expect(...) remains in tool_call_recovery.rs outside #[cfg(test)] mod tests.
No .unwrap() or .expect(...) remains in force_write.rs outside #[cfg(test)] mod tests (verified, already true).
strip_bracket_tool_calls returns (content.to_string(), Vec::new()) instead of panicking on regex-compile failure.
normalize_arrow_syntax returns None instead of panicking on regex-compile failure.
cargo test -p hero_shrimp_engine --lib tool_call_recovery and cargo test -p hero_shrimp_engine --lib force_write pass.
cargo build -p hero_shrimp_engine clean.
Public function signatures unchanged; callers compile unmodified.

Notes

The regex panic path is unreachable in practice (the patterns are constant literals known to compile), so this change is defense-in-depth: it formalises the file's existing contract — codified in the adversarial test at tool_call_recovery.rs lines 696-707 that wraps the recovery path in std::panic::catch_unwind — that no input or environmental condition may panic the recovery routine.

Branch: development_recovery_panic_hygiene.

## Implementation Summary ### Scope P1 panic-hygiene fix for the two recovery hot paths called out in this issue. Pre-change audit revealed the panic surface in production code is narrower than the issue body suggested: - `force_write.rs`: no production `unwrap()`/`expect()` — all 20 occurrences are inside `#[cfg(test)] mod tests` (boundary at line 198, first call at line 210). - `tool_call_recovery.rs`: no production `serde_json::from_str(...).unwrap()` — the call at line 492 already uses `.unwrap_or(Value::Object(Map::new()))`. The only production panic surface was three `Regex::new(...).expect(...)` calls on constant regex literals (lines 457, 480, 484). ### Files changed - `crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs` — replaced 3 `Regex::new(...).expect(...)` calls with `OnceLock<Option<Regex>>` patterns. On compile failure each callsite now emits `tracing::warn!(target: "tool_call_recovery", ...)` and returns its documented neutral value: - `strip_bracket_tool_calls` returns `(content.to_string(), Vec::new())` (same as the "no markers found" branch). - `normalize_arrow_syntax` returns `None` (same as the "shape didn't match" branch). As a side benefit the regexes are now compiled once per process via `OnceLock` instead of on every recovery turn. - `crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs` — no changes (audit confirmed all `unwrap()`s are test-only). ### Public API Function signatures unchanged. `force_write_guard`, `strip_bracket_tool_calls`, `normalize_arrow_syntax`, `recover_tool_calls_from_content`, and `lift_recovered_tool_calls` all keep their existing return types. Callers (`no_tool_calls.rs::handle_no_tool_calls`, `tool_call_recovery_module.rs::lift_recovered_tool_calls`) compile unchanged. Escalation continues through the existing downstream path (`stuck_escalation::escalate_if_ending_incomplete`, prompt-retry, fallback model). ### Test results - Targeted: `cargo test -p hero_shrimp_engine --lib tool_call_recovery` — 40 passed, 0 failed (includes the property-based fuzz and `recovery_never_panics_on_adversarial_input` suite). - Targeted: `cargo test -p hero_shrimp_engine --lib force_write` — 14 passed, 0 failed. - Full crate: `cargo test -p hero_shrimp_engine` — 1639 passed, 9 failed. The 9 failures are pre-existing on `development` (verified by re-running on a clean checkout of `development`) and unrelated to this change — they live in `verification::runner::*`, `tools::external_cmd::*`, `tools::tool_catalog::verify::e2e_datetime_server::*`, and `tests::autonomy_*`, all of which spawn shell processes or open sockets and fail in the sandboxed local environment regardless of branch. - `cargo build -p hero_shrimp_engine` — clean, no new warnings. ### Acceptance criteria - [x] No `.unwrap()` or `.expect(...)` remains in `tool_call_recovery.rs` outside `#[cfg(test)] mod tests`. - [x] No `.unwrap()` or `.expect(...)` remains in `force_write.rs` outside `#[cfg(test)] mod tests` (verified, already true). - [x] `strip_bracket_tool_calls` returns `(content.to_string(), Vec::new())` instead of panicking on regex-compile failure. - [x] `normalize_arrow_syntax` returns `None` instead of panicking on regex-compile failure. - [x] `cargo test -p hero_shrimp_engine --lib tool_call_recovery` and `cargo test -p hero_shrimp_engine --lib force_write` pass. - [x] `cargo build -p hero_shrimp_engine` clean. - [x] Public function signatures unchanged; callers compile unmodified. ### Notes The regex panic path is unreachable in practice (the patterns are constant literals known to compile), so this change is defense-in-depth: it formalises the file's existing contract — codified in the adversarial test at `tool_call_recovery.rs` lines 696-707 that wraps the recovery path in `std::panic::catch_unwind` — that no input or environmental condition may panic the recovery routine. Branch: `development_recovery_panic_hygiene`.

rawan commented

2026-05-25 14:10:18 +00:00

Member

Implementation Summary

Branch: development_recovery_panic_hygiene

Changes

Files modified

crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs — 39 insertions, 7 deletions.

Files audited (no changes required)

crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs — every unwrap()/expect() (20 occurrences, lines 210–451) is inside #[cfg(test)] mod tests (starts at line 198). Production code is panic-free.

What changed in `tool_call_recovery.rs`

Three production-code .expect(...) calls on Regex::new(...) were the only remaining panic paths in the recovery hot path. All three were replaced with OnceLock<Option<Regex>> patterns that:

Compile the regex at most once per process (previously the function allocated and compiled the regex on every recovery turn).
Degrade gracefully on compile failure by emitting a tracing::warn! and returning the function's documented neutral fallback — (content.to_string(), Vec::new()) for strip_bracket_tool_calls, None for normalize_arrow_syntax. Callers (lift_recovered_tool_calls, handle_no_tool_calls) already handle those fallback values by falling through to the next recovery mechanism (recover_tool_calls_iter0 → prompt-retry → fallback model → stuck_escalation::escalate_if_ending_incomplete).
Preserve public signatures — no caller in the iteration loop needs to change.

Specifically:

strip_bracket_tool_calls — the Regex::new(r"(?s)\[TOOL_CALL\](.*?)\[/TOOL_CALL\]").expect(...) at line 456 is now cached in a static BRACKET_RE: std::sync::OnceLock<Option<Regex>>. On compile failure: tracing::warn! and return (content.to_string(), Vec::new()) — semantically "no bracket blocks found".
normalize_arrow_syntax — both regexes (tool_re at line 479 and args_re at line 483) are cached in their own OnceLock<Option<Regex>>. On compile failure of either: tracing::warn! and return None, which routes the caller to the next recovery path.

Tests

Suite	Result
`tool_call_recovery` (40 tests)	passed
`force_write` (14 tests)	passed
Full `hero_shrimp_engine` lib	1639 passed, 9 failed, 1 ignored — the 9 failures pre-exist on `development` and are environment-related (missing `sh`, missing `python3`, Bubblewrap detection differences). No new failures introduced.

The adversarial suite — recovery_never_panics_on_huge_and_nested_input and the four prop_* proptest fuzzers (prop_recover_never_panics_arbitrary_string, prop_recover_never_panics_arbitrary_bytes, prop_repair_never_panics_arbitrary_string, prop_repair_never_panics_json_soup) — continues to pass.

Acceptance criteria

No .unwrap() / .expect(...) remains in tool_call_recovery.rs outside #[cfg(test)] mod tests.
No .unwrap() / .expect(...) remains in force_write.rs outside #[cfg(test)] mod tests.
strip_bracket_tool_calls returns (content.to_string(), Vec::new()) instead of panicking on regex compile failure.
normalize_arrow_syntax returns None instead of panicking on regex compile failure.
cargo test -p hero_shrimp_engine introduces no new failures vs. development.
cargo build -p hero_shrimp_engine clean — no new warnings.
Public signatures unchanged — callers compile without modification.

Notes

Side benefit: the OnceLock caching also removes a per-recovery-turn Regex::new allocation.
The PR will be opened after local verification by the reviewer.

## Implementation Summary Branch: `development_recovery_panic_hygiene` ### Changes **Files modified** - `crates/hero_shrimp_engine/src/agent_core/agent/tool_call_recovery.rs` — 39 insertions, 7 deletions. **Files audited (no changes required)** - `crates/hero_shrimp_engine/src/agent_core/agent/force_write.rs` — every `unwrap()`/`expect()` (20 occurrences, lines 210–451) is inside `#[cfg(test)] mod tests` (starts at line 198). Production code is panic-free. ### What changed in `tool_call_recovery.rs` Three production-code `.expect(...)` calls on `Regex::new(...)` were the only remaining panic paths in the recovery hot path. All three were replaced with `OnceLock<Option<Regex>>` patterns that: 1. **Compile the regex at most once per process** (previously the function allocated and compiled the regex on every recovery turn). 2. **Degrade gracefully on compile failure** by emitting a `tracing::warn!` and returning the function's documented neutral fallback — `(content.to_string(), Vec::new())` for `strip_bracket_tool_calls`, `None` for `normalize_arrow_syntax`. Callers (`lift_recovered_tool_calls`, `handle_no_tool_calls`) already handle those fallback values by falling through to the next recovery mechanism (`recover_tool_calls_iter0` → prompt-retry → fallback model → `stuck_escalation::escalate_if_ending_incomplete`). 3. **Preserve public signatures** — no caller in the iteration loop needs to change. Specifically: - `strip_bracket_tool_calls` — the `Regex::new(r"(?s)\[TOOL_CALL\](.*?)\[/TOOL_CALL\]").expect(...)` at line 456 is now cached in a `static BRACKET_RE: std::sync::OnceLock<Option<Regex>>`. On compile failure: `tracing::warn!` and return `(content.to_string(), Vec::new())` — semantically "no bracket blocks found". - `normalize_arrow_syntax` — both regexes (`tool_re` at line 479 and `args_re` at line 483) are cached in their own `OnceLock<Option<Regex>>`. On compile failure of either: `tracing::warn!` and return `None`, which routes the caller to the next recovery path. ### Tests | Suite | Result | |---|---| | `tool_call_recovery` (40 tests) | passed | | `force_write` (14 tests) | passed | | Full `hero_shrimp_engine` lib | 1639 passed, 9 failed, 1 ignored — the 9 failures pre-exist on `development` and are environment-related (missing `sh`, missing `python3`, Bubblewrap detection differences). No new failures introduced. | The adversarial suite — `recovery_never_panics_on_huge_and_nested_input` and the four `prop_*` proptest fuzzers (`prop_recover_never_panics_arbitrary_string`, `prop_recover_never_panics_arbitrary_bytes`, `prop_repair_never_panics_arbitrary_string`, `prop_repair_never_panics_json_soup`) — continues to pass. ### Acceptance criteria - [x] No `.unwrap()` / `.expect(...)` remains in `tool_call_recovery.rs` outside `#[cfg(test)] mod tests`. - [x] No `.unwrap()` / `.expect(...)` remains in `force_write.rs` outside `#[cfg(test)] mod tests`. - [x] `strip_bracket_tool_calls` returns `(content.to_string(), Vec::new())` instead of panicking on regex compile failure. - [x] `normalize_arrow_syntax` returns `None` instead of panicking on regex compile failure. - [x] `cargo test -p hero_shrimp_engine` introduces no new failures vs. `development`. - [x] `cargo build -p hero_shrimp_engine` clean — no new warnings. - [x] Public signatures unchanged — callers compile without modification. ### Notes - Side benefit: the `OnceLock` caching also removes a per-recovery-turn `Regex::new` allocation. - The PR will be opened after local verification by the reviewer.

rawan referenced this issue from a commit

2026-05-25 14:31:25 +00:00

fix(engine): harden recovery regex compilation against panics

rawan referenced this issue

2026-05-25 14:31:57 +00:00

fix(engine): harden recovery regex compilation against panics #56

rawan commented

2026-05-25 14:32:23 +00:00

Member

Pull request opened: #56

This PR implements the changes discussed in this issue.

Pull request opened: https://forge.ourworld.tf/lhumina_code/hero_shrimp/pulls/56 This PR implements the changes discussed in this issue.