Files
horus/benches/SUMMARY.md
2025-11-18 20:39:25 +01:00

6.2 KiB

Horus Stack Benchmarks - Summary

Created Comprehensive Benchmark Suite

Successfully created a complete benchmark suite for the Horus stack that tests the entire system through the official client APIs.

Files Created

  1. benches/horus_stack.rs - Main benchmark suite

    • API discovery and metadata retrieval
    • Runner management operations
    • Job lifecycle testing
    • Concurrent job submissions (1, 5, 10, 20 jobs)
    • Health checks
    • API latency measurements
  2. benches/stress_test.rs - Stress and load testing

    • High-frequency job submissions (50-200 jobs)
    • Sustained load testing
    • Large payload handling (1KB-100KB)
    • Rapid API calls (100 calls/iteration)
    • Mixed workload scenarios
    • Connection pool exhaustion tests (10-100 clients)
  3. benches/memory_usage.rs - Memory profiling

    • Job object memory footprint (10-200 jobs)
    • Client instance memory overhead (1-100 clients)
    • Payload size impact on memory (1KB-1MB)
    • Real-time memory delta reporting
  4. benches/README.md - Comprehensive documentation

    • Setup instructions
    • Benchmark descriptions
    • Performance targets
    • Customization guide
    • Troubleshooting tips
  5. benches/QUICK_START.md - Quick reference guide

    • Fast setup steps
    • Common commands
    • Expected performance metrics
  6. benches/MEMORY_BENCHMARKS.md - Memory profiling guide

    • Memory benchmark descriptions
    • Platform-specific measurement details
    • Advanced profiling tools
    • Memory optimization tips
  7. benches/run_benchmarks.sh - Helper script

    • Automated prerequisite checking
    • Service health verification
    • One-command benchmark execution

Architecture

The benchmarks interact with the Horus stack exclusively through the client libraries:

  • hero-supervisor-openrpc-client - Supervisor API (job management, runner coordination)
  • osiris-client - Osiris REST API (data queries)
  • hero-job - Job model definitions

This ensures benchmarks test the real-world API surface that users interact with.

Key Features

Async/await support - Uses Criterion's async_tokio feature
Realistic workloads - Tests actual job submission and execution
Concurrent testing - Measures performance under parallel load
Stress testing - Pushes system limits with high-frequency operations
HTML reports - Beautiful visualizations with historical comparison
Automated checks - Helper script verifies stack is running

Benchmark Categories

Performance Benchmarks (horus_stack)

  • supervisor_discovery - OpenRPC metadata (target: <10ms)
  • supervisor_get_info - Info retrieval (target: <5ms)
  • supervisor_list_runners - List operations (target: <5ms)
  • supervisor_job_create - Job creation (target: <10ms)
  • supervisor_job_list - Job listing (target: <10ms)
  • osiris_health_check - Health endpoint (target: <2ms)
  • job_full_lifecycle - Complete job cycle (target: <100ms)
  • concurrent_jobs - Parallel submissions (target: <500ms for 10 jobs)
  • get_all_runner_status - Status queries
  • api_latency/* - Detailed latency measurements

Stress Tests (stress_test)

  • stress_high_frequency_jobs - 50-200 concurrent jobs
  • stress_sustained_load - Continuous submissions over time
  • stress_large_payloads - 1KB-100KB payload handling
  • stress_rapid_api_calls - 100 rapid calls per iteration
  • stress_mixed_workload - Combined operations
  • stress_connection_pool - 10-100 concurrent clients

Memory Profiling (memory_usage)

  • memory_job_creation - Memory footprint per job (10-200 jobs)
  • memory_client_creation - Memory per client instance (1-100 clients)
  • memory_payload_sizes - Memory vs payload size (1KB-1MB)
  • Reports memory deltas in real-time during execution

Usage

# Quick start
./benches/run_benchmarks.sh

# Run specific suite
cargo bench --bench horus_stack
cargo bench --bench stress_test
cargo bench --bench memory_usage

# Run specific test
cargo bench -- supervisor_discovery

# Run memory benchmarks with verbose output (shows memory deltas)
cargo bench --bench memory_usage -- --verbose

# Save baseline
cargo bench -- --save-baseline main

# Compare against baseline
cargo bench -- --baseline main

Prerequisites

The benchmarks require the full Horus stack to be running:

# Start Redis
redis-server

# Start Horus (with auto port cleanup)
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports

Configuration

All benchmarks use these defaults (configurable in source):

  • Supervisor: http://127.0.0.1:3030
  • Osiris: http://127.0.0.1:8081
  • Coordinator HTTP: http://127.0.0.1:9652
  • Coordinator WS: ws://127.0.0.1:9653
  • Admin secret: SECRET

Results

Results are saved to target/criterion/ with:

  • HTML reports with graphs and statistics
  • JSON data for programmatic analysis
  • Historical comparison with previous runs
  • Detailed performance metrics (mean, median, std dev, throughput)

Integration

The benchmarks are integrated into the workspace:

  • Added to Cargo.toml with proper dependencies
  • Uses workspace-level dependencies for consistency
  • Configured with harness = false for Criterion
  • Includes all necessary dev-dependencies

Next Steps

  1. Run benchmarks to establish baseline performance
  2. Monitor performance over time as code changes
  3. Use stress tests to identify bottlenecks
  4. Customize benchmarks for specific use cases
  5. Integrate into CI/CD for automated performance tracking

Technical Details

Dependencies Added

  • criterion v0.5 with async_tokio and html_reports features
  • osiris-client from workspace
  • reqwest v0.12 with json feature
  • serde_json, uuid, chrono from workspace

Benchmark Harness

Uses Criterion.rs for:

  • Statistical analysis
  • Historical comparison
  • HTML report generation
  • Configurable sample sizes
  • Warm-up periods
  • Outlier detection

Job Creation

Helper function create_test_job() creates properly structured Job instances:

  • Unique UUIDs for each job
  • Proper timestamps
  • JSON-serialized payloads
  • Empty signatures (for testing)
  • Configurable runner and command

This ensures benchmarks test realistic job structures that match production usage.