# HeroDB Performance Benchmarking Guide ## Overview This document describes the comprehensive benchmarking suite for HeroDB, designed to measure and compare the performance characteristics of the two storage backends: **redb** (default) and **sled**. ## Benchmark Architecture ### Design Principles 1. **Fair Comparison**: Identical test datasets and operations across all backends 2. **Statistical Rigor**: Using Criterion for statistically sound measurements 3. **Real-World Scenarios**: Mix of synthetic and realistic workload patterns 4. **Reproducibility**: Deterministic test data generation with fixed seeds 5. **Isolation**: Each benchmark runs in a clean environment ### Benchmark Categories #### 1. Single-Operation CRUD Benchmarks Measures the performance of individual database operations: - **String Operations** - `SET` - Write a single key-value pair - `GET` - Read a single key-value pair - `DEL` - Delete a single key - `EXISTS` - Check key existence - **Hash Operations** - `HSET` - Set single field in hash - `HGET` - Get single field from hash - `HGETALL` - Get all fields from hash - `HDEL` - Delete field from hash - `HEXISTS` - Check field existence - **List Operations** - `LPUSH` - Push to list head - `RPUSH` - Push to list tail - `LPOP` - Pop from list head - `RPOP` - Pop from list tail - `LRANGE` - Get range of elements #### 2. Bulk Operation Benchmarks Tests throughput with varying batch sizes: - **Bulk Insert**: 100, 1,000, 10,000 records - **Bulk Read**: Sequential and random access patterns - **Bulk Update**: Modify existing records - **Bulk Delete**: Remove multiple records #### 3. Query and Scan Benchmarks Evaluates iteration and filtering performance: - **SCAN**: Cursor-based key iteration - **HSCAN**: Hash field iteration - **KEYS**: Pattern matching (with various patterns) - **Range Queries**: List range operations #### 4. Concurrent Operation Benchmarks Simulates multi-client scenarios: - **10 Concurrent Clients**: Light load - **50 Concurrent Clients**: Medium load - **Mixed Workload**: 70% reads, 30% writes #### 5. Memory Profiling Tracks memory usage patterns: - **Allocation Tracking**: Total allocations per operation - **Peak Memory**: Maximum memory usage - **Memory Efficiency**: Bytes per record stored ### Test Data Specifications #### Dataset Sizes - **Small**: 1,000 - 10,000 records - **Medium**: 10,000 records (primary focus) #### Data Characteristics - **Key Format**: `bench:key:{id}` (predictable, sortable) - **Value Sizes**: - Small: 50-100 bytes - Medium: 500-1000 bytes - Large: 5000-10000 bytes - **Hash Fields**: 5-20 fields per hash - **List Elements**: 10-100 elements per list ### Metrics Collected For each benchmark, we collect: 1. **Latency Metrics** - Mean execution time - Median (p50) - 95th percentile (p95) - 99th percentile (p99) - Standard deviation 2. **Throughput Metrics** - Operations per second - Records per second (for bulk operations) 3. **Memory Metrics** - Total allocations - Peak memory usage - Average bytes per operation 4. **Initialization Overhead** - Database startup time - First operation latency (cold cache) ## Benchmark Structure ### Directory Layout ``` benches/ ├── common/ │ ├── mod.rs # Shared utilities │ ├── data_generator.rs # Test data generation │ ├── metrics.rs # Custom metrics collection │ └── backends.rs # Backend setup helpers ├── single_ops.rs # Single-operation benchmarks ├── bulk_ops.rs # Bulk operation benchmarks ├── scan_ops.rs # Scan and query benchmarks ├── concurrent_ops.rs # Concurrent operation benchmarks └── memory_profile.rs # Memory profiling benchmarks ``` ### Running Benchmarks #### Run All Benchmarks ```bash cargo bench ``` #### Run Specific Benchmark Suite ```bash cargo bench --bench single_ops cargo bench --bench bulk_ops cargo bench --bench concurrent_ops ``` #### Run Specific Backend ```bash cargo bench -- redb cargo bench -- sled ``` #### Generate Reports ```bash # Run benchmarks and save results cargo bench -- --save-baseline main # Compare against baseline cargo bench -- --baseline main # Export to CSV cargo bench -- --output-format csv > results.csv ``` ### Output Formats #### 1. Terminal Output (Default) Real-time progress with statistical summaries: ``` single_ops/redb/set/small time: [1.234 µs 1.245 µs 1.256 µs] thrpt: [802.5K ops/s 810.2K ops/s 818.1K ops/s] ``` #### 2. CSV Export Structured data for analysis: ```csv backend,operation,dataset_size,mean_ns,median_ns,p95_ns,p99_ns,throughput_ops_sec redb,set,small,1245,1240,1890,2100,810200 sled,set,small,1567,1550,2340,2890,638000 ``` #### 3. JSON Export Detailed metrics for programmatic processing: ```json { "benchmark": "single_ops/redb/set/small", "metrics": { "mean": 1245, "median": 1240, "p95": 1890, "p99": 2100, "std_dev": 145, "throughput": 810200 }, "memory": { "allocations": 3, "peak_bytes": 4096 } } ``` ## Benchmark Implementation Details ### Backend Setup Each benchmark creates isolated database instances: ```rust // Redb backend let temp_dir = TempDir::new()?; let db_path = temp_dir.path().join("bench.db"); let storage = Storage::new(db_path, false, None)?; // Sled backend let temp_dir = TempDir::new()?; let db_path = temp_dir.path().join("bench.sled"); let storage = SledStorage::new(db_path, false, None)?; ``` ### Data Generation Deterministic data generation ensures reproducibility: ```rust use rand::{SeedableRng, Rng}; use rand::rngs::StdRng; fn generate_test_data(count: usize, seed: u64) -> Vec<(String, String)> { let mut rng = StdRng::seed_from_u64(seed); (0..count) .map(|i| { let key = format!("bench:key:{:08}", i); let value = generate_value(&mut rng, 100); (key, value) }) .collect() } ``` ### Concurrent Testing Using Tokio for async concurrent operations: ```rust async fn concurrent_benchmark( storage: Arc, num_clients: usize, operations: usize ) { let tasks: Vec<_> = (0..num_clients) .map(|client_id| { let storage = storage.clone(); tokio::spawn(async move { for i in 0..operations { let key = format!("client:{}:key:{}", client_id, i); storage.set(key, "value".to_string()).unwrap(); } }) }) .collect(); futures::future::join_all(tasks).await; } ``` ## Interpreting Results ### Performance Comparison When comparing backends, consider: 1. **Latency vs Throughput Trade-offs** - Lower latency = better for interactive workloads - Higher throughput = better for batch processing 2. **Consistency** - Lower standard deviation = more predictable performance - Check p95/p99 for tail latency 3. **Scalability** - How performance changes with dataset size - Concurrent operation efficiency ### Backend Selection Guidelines Based on benchmark results, choose: **redb** when: - Need predictable latency - Working with structured data (separate tables) - Require high concurrent read performance - Memory efficiency is important **sled** when: - Need high write throughput - Working with uniform data types - Require lock-free operations - Crash recovery is critical ## Memory Profiling ### Using DHAT For detailed memory profiling: ```bash # Install valgrind and dhat sudo apt-get install valgrind # Run with DHAT cargo bench --bench memory_profile -- --profile-time=10 ``` ### Custom Allocation Tracking The benchmarks include custom allocation tracking: ```rust #[global_allocator] static ALLOC: dhat::Alloc = dhat::Alloc; fn track_allocations(f: F) -> AllocationStats where F: FnOnce(), { let _profiler = dhat::Profiler::new_heap(); f(); // Extract stats from profiler } ``` ## Continuous Benchmarking ### Regression Detection Compare against baseline to detect performance regressions: ```bash # Save current performance as baseline cargo bench -- --save-baseline v0.1.0 # After changes, compare cargo bench -- --baseline v0.1.0 # Criterion will highlight significant changes ``` ### CI Integration Add to CI pipeline: ```yaml - name: Run Benchmarks run: | cargo bench --no-fail-fast -- --output-format json > bench-results.json - name: Compare Results run: | python scripts/compare_benchmarks.py \ --baseline baseline.json \ --current bench-results.json \ --threshold 10 # Fail if >10% regression ``` ## Troubleshooting ### Common Issues 1. **Inconsistent Results** - Ensure system is idle during benchmarks - Disable CPU frequency scaling - Run multiple iterations 2. **Out of Memory** - Reduce dataset sizes - Run benchmarks sequentially - Increase system swap space 3. **Slow Benchmarks** - Reduce sample size in Criterion config - Use `--quick` flag for faster runs - Focus on specific benchmarks ### Performance Tips ```bash # Quick benchmark run (fewer samples) cargo bench -- --quick # Verbose output for debugging cargo bench -- --verbose # Profile specific operation cargo bench -- single_ops/redb/set ``` ## Future Enhancements Potential additions to the benchmark suite: 1. **Transaction Performance**: Measure MULTI/EXEC overhead 2. **Encryption Overhead**: Compare encrypted vs non-encrypted 3. **Persistence Testing**: Measure flush/sync performance 4. **Recovery Time**: Database restart and recovery speed 5. **Network Overhead**: Redis protocol parsing impact 6. **Long-Running Stability**: Performance over extended periods ## References - [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/) - [DHAT Memory Profiler](https://valgrind.org/docs/manual/dh-manual.html) - [Rust Performance Book](https://nnethercote.github.io/perf-book/)