Files

Timur Gordon 4142f62e54 add complete binary and benchmarking

2025-11-18 20:39:25 +01:00

6.2 KiB

Raw Blame History

Horus Stack Benchmarks - Summary

✅ Created Comprehensive Benchmark Suite

Successfully created a complete benchmark suite for the Horus stack that tests the entire system through the official client APIs.

Files Created

benches/horus_stack.rs - Main benchmark suite
- API discovery and metadata retrieval
- Runner management operations
- Job lifecycle testing
- Concurrent job submissions (1, 5, 10, 20 jobs)
- Health checks
- API latency measurements
benches/stress_test.rs - Stress and load testing
- High-frequency job submissions (50-200 jobs)
- Sustained load testing
- Large payload handling (1KB-100KB)
- Rapid API calls (100 calls/iteration)
- Mixed workload scenarios
- Connection pool exhaustion tests (10-100 clients)
benches/memory_usage.rs - Memory profiling
- Job object memory footprint (10-200 jobs)
- Client instance memory overhead (1-100 clients)
- Payload size impact on memory (1KB-1MB)
- Real-time memory delta reporting
benches/README.md - Comprehensive documentation
- Setup instructions
- Benchmark descriptions
- Performance targets
- Customization guide
- Troubleshooting tips
benches/QUICK_START.md - Quick reference guide
- Fast setup steps
- Common commands
- Expected performance metrics
benches/MEMORY_BENCHMARKS.md - Memory profiling guide
- Memory benchmark descriptions
- Platform-specific measurement details
- Advanced profiling tools
- Memory optimization tips
benches/run_benchmarks.sh - Helper script
- Automated prerequisite checking
- Service health verification
- One-command benchmark execution

Architecture

The benchmarks interact with the Horus stack exclusively through the client libraries:

hero-supervisor-openrpc-client - Supervisor API (job management, runner coordination)
osiris-client - Osiris REST API (data queries)
hero-job - Job model definitions

This ensures benchmarks test the real-world API surface that users interact with.

Key Features

✅ Async/await support - Uses Criterion's async_tokio feature
✅ Realistic workloads - Tests actual job submission and execution
✅ Concurrent testing - Measures performance under parallel load
✅ Stress testing - Pushes system limits with high-frequency operations
✅ HTML reports - Beautiful visualizations with historical comparison
✅ Automated checks - Helper script verifies stack is running

Benchmark Categories

Performance Benchmarks (`horus_stack`)

supervisor_discovery - OpenRPC metadata (target: <10ms)
supervisor_get_info - Info retrieval (target: <5ms)
supervisor_list_runners - List operations (target: <5ms)
supervisor_job_create - Job creation (target: <10ms)
supervisor_job_list - Job listing (target: <10ms)
osiris_health_check - Health endpoint (target: <2ms)
job_full_lifecycle - Complete job cycle (target: <100ms)
concurrent_jobs - Parallel submissions (target: <500ms for 10 jobs)
get_all_runner_status - Status queries
api_latency/* - Detailed latency measurements

Stress Tests (`stress_test`)

stress_high_frequency_jobs - 50-200 concurrent jobs
stress_sustained_load - Continuous submissions over time
stress_large_payloads - 1KB-100KB payload handling
stress_rapid_api_calls - 100 rapid calls per iteration
stress_mixed_workload - Combined operations
stress_connection_pool - 10-100 concurrent clients

Memory Profiling (`memory_usage`)

memory_job_creation - Memory footprint per job (10-200 jobs)
memory_client_creation - Memory per client instance (1-100 clients)
memory_payload_sizes - Memory vs payload size (1KB-1MB)
Reports memory deltas in real-time during execution

Usage

# Quick start
./benches/run_benchmarks.sh

# Run specific suite
cargo bench --bench horus_stack
cargo bench --bench stress_test
cargo bench --bench memory_usage

# Run specific test
cargo bench -- supervisor_discovery

# Run memory benchmarks with verbose output (shows memory deltas)
cargo bench --bench memory_usage -- --verbose

# Save baseline
cargo bench -- --save-baseline main

# Compare against baseline
cargo bench -- --baseline main

Prerequisites

The benchmarks require the full Horus stack to be running:

# Start Redis
redis-server

# Start Horus (with auto port cleanup)
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports

Configuration

All benchmarks use these defaults (configurable in source):

Supervisor: http://127.0.0.1:3030
Osiris: http://127.0.0.1:8081
Coordinator HTTP: http://127.0.0.1:9652
Coordinator WS: ws://127.0.0.1:9653
Admin secret: SECRET

Results

Results are saved to target/criterion/ with:

HTML reports with graphs and statistics
JSON data for programmatic analysis
Historical comparison with previous runs
Detailed performance metrics (mean, median, std dev, throughput)

Integration

The benchmarks are integrated into the workspace:

Added to Cargo.toml with proper dependencies
Uses workspace-level dependencies for consistency
Configured with harness = false for Criterion
Includes all necessary dev-dependencies

Next Steps

Run benchmarks to establish baseline performance
Monitor performance over time as code changes
Use stress tests to identify bottlenecks
Customize benchmarks for specific use cases
Integrate into CI/CD for automated performance tracking

Technical Details

Dependencies Added

criterion v0.5 with async_tokio and html_reports features
osiris-client from workspace
reqwest v0.12 with json feature
serde_json, uuid, chrono from workspace

Benchmark Harness

Uses Criterion.rs for:

Statistical analysis
Historical comparison
HTML report generation
Configurable sample sizes
Warm-up periods
Outlier detection

Job Creation

Helper function create_test_job() creates properly structured Job instances:

Unique UUIDs for each job
Proper timestamps
JSON-serialized payloads
Empty signatures (for testing)
Configurable runner and command

This ensures benchmarks test realistic job structures that match production usage.

6.2 KiB Raw Blame History