benchmarking
This commit is contained in:
		
							
								
								
									
										409
									
								
								docs/benchmarking.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										409
									
								
								docs/benchmarking.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,409 @@ | ||||
| # HeroDB Performance Benchmarking Guide | ||||
|  | ||||
| ## Overview | ||||
|  | ||||
| This document describes the comprehensive benchmarking suite for HeroDB, designed to measure and compare the performance characteristics of the two storage backends: **redb** (default) and **sled**. | ||||
|  | ||||
| ## Benchmark Architecture | ||||
|  | ||||
| ### Design Principles | ||||
|  | ||||
| 1. **Fair Comparison**: Identical test datasets and operations across all backends | ||||
| 2. **Statistical Rigor**: Using Criterion for statistically sound measurements | ||||
| 3. **Real-World Scenarios**: Mix of synthetic and realistic workload patterns | ||||
| 4. **Reproducibility**: Deterministic test data generation with fixed seeds | ||||
| 5. **Isolation**: Each benchmark runs in a clean environment | ||||
|  | ||||
| ### Benchmark Categories | ||||
|  | ||||
| #### 1. Single-Operation CRUD Benchmarks | ||||
| Measures the performance of individual database operations: | ||||
|  | ||||
| - **String Operations** | ||||
|   - `SET` - Write a single key-value pair | ||||
|   - `GET` - Read a single key-value pair | ||||
|   - `DEL` - Delete a single key | ||||
|   - `EXISTS` - Check key existence | ||||
|  | ||||
| - **Hash Operations** | ||||
|   - `HSET` - Set single field in hash | ||||
|   - `HGET` - Get single field from hash | ||||
|   - `HGETALL` - Get all fields from hash | ||||
|   - `HDEL` - Delete field from hash | ||||
|   - `HEXISTS` - Check field existence | ||||
|  | ||||
| - **List Operations** | ||||
|   - `LPUSH` - Push to list head | ||||
|   - `RPUSH` - Push to list tail | ||||
|   - `LPOP` - Pop from list head | ||||
|   - `RPOP` - Pop from list tail | ||||
|   - `LRANGE` - Get range of elements | ||||
|  | ||||
| #### 2. Bulk Operation Benchmarks | ||||
| Tests throughput with varying batch sizes: | ||||
|  | ||||
| - **Bulk Insert**: 100, 1,000, 10,000 records | ||||
| - **Bulk Read**: Sequential and random access patterns | ||||
| - **Bulk Update**: Modify existing records | ||||
| - **Bulk Delete**: Remove multiple records | ||||
|  | ||||
| #### 3. Query and Scan Benchmarks | ||||
| Evaluates iteration and filtering performance: | ||||
|  | ||||
| - **SCAN**: Cursor-based key iteration | ||||
| - **HSCAN**: Hash field iteration | ||||
| - **KEYS**: Pattern matching (with various patterns) | ||||
| - **Range Queries**: List range operations | ||||
|  | ||||
| #### 4. Concurrent Operation Benchmarks | ||||
| Simulates multi-client scenarios: | ||||
|  | ||||
| - **10 Concurrent Clients**: Light load | ||||
| - **50 Concurrent Clients**: Medium load | ||||
| - **Mixed Workload**: 70% reads, 30% writes | ||||
|  | ||||
| #### 5. Memory Profiling | ||||
| Tracks memory usage patterns: | ||||
|  | ||||
| - **Allocation Tracking**: Total allocations per operation | ||||
| - **Peak Memory**: Maximum memory usage | ||||
| - **Memory Efficiency**: Bytes per record stored | ||||
|  | ||||
| ### Test Data Specifications | ||||
|  | ||||
| #### Dataset Sizes | ||||
| - **Small**: 1,000 - 10,000 records | ||||
| - **Medium**: 10,000 records (primary focus) | ||||
|  | ||||
| #### Data Characteristics | ||||
| - **Key Format**: `bench:key:{id}` (predictable, sortable) | ||||
| - **Value Sizes**:  | ||||
|   - Small: 50-100 bytes | ||||
|   - Medium: 500-1000 bytes | ||||
|   - Large: 5000-10000 bytes | ||||
| - **Hash Fields**: 5-20 fields per hash | ||||
| - **List Elements**: 10-100 elements per list | ||||
|  | ||||
| ### Metrics Collected | ||||
|  | ||||
| For each benchmark, we collect: | ||||
|  | ||||
| 1. **Latency Metrics** | ||||
|    - Mean execution time | ||||
|    - Median (p50) | ||||
|    - 95th percentile (p95) | ||||
|    - 99th percentile (p99) | ||||
|    - Standard deviation | ||||
|  | ||||
| 2. **Throughput Metrics** | ||||
|    - Operations per second | ||||
|    - Records per second (for bulk operations) | ||||
|  | ||||
| 3. **Memory Metrics** | ||||
|    - Total allocations | ||||
|    - Peak memory usage | ||||
|    - Average bytes per operation | ||||
|  | ||||
| 4. **Initialization Overhead** | ||||
|    - Database startup time | ||||
|    - First operation latency (cold cache) | ||||
|  | ||||
| ## Benchmark Structure | ||||
|  | ||||
| ### Directory Layout | ||||
|  | ||||
| ``` | ||||
| benches/ | ||||
| ├── common/ | ||||
| │   ├── mod.rs              # Shared utilities | ||||
| │   ├── data_generator.rs   # Test data generation | ||||
| │   ├── metrics.rs          # Custom metrics collection | ||||
| │   └── backends.rs         # Backend setup helpers | ||||
| ├── single_ops.rs           # Single-operation benchmarks | ||||
| ├── bulk_ops.rs             # Bulk operation benchmarks | ||||
| ├── scan_ops.rs             # Scan and query benchmarks | ||||
| ├── concurrent_ops.rs       # Concurrent operation benchmarks | ||||
| └── memory_profile.rs       # Memory profiling benchmarks | ||||
| ``` | ||||
|  | ||||
| ### Running Benchmarks | ||||
|  | ||||
| #### Run All Benchmarks | ||||
| ```bash | ||||
| cargo bench | ||||
| ``` | ||||
|  | ||||
| #### Run Specific Benchmark Suite | ||||
| ```bash | ||||
| cargo bench --bench single_ops | ||||
| cargo bench --bench bulk_ops | ||||
| cargo bench --bench concurrent_ops | ||||
| ``` | ||||
|  | ||||
| #### Run Specific Backend | ||||
| ```bash | ||||
| cargo bench -- redb | ||||
| cargo bench -- sled | ||||
| ``` | ||||
|  | ||||
| #### Generate Reports | ||||
| ```bash | ||||
| # Run benchmarks and save results | ||||
| cargo bench -- --save-baseline main | ||||
|  | ||||
| # Compare against baseline | ||||
| cargo bench -- --baseline main | ||||
|  | ||||
| # Export to CSV | ||||
| cargo bench -- --output-format csv > results.csv | ||||
| ``` | ||||
|  | ||||
| ### Output Formats | ||||
|  | ||||
| #### 1. Terminal Output (Default) | ||||
| Real-time progress with statistical summaries: | ||||
| ``` | ||||
| single_ops/redb/set/small | ||||
|                         time:   [1.234 µs 1.245 µs 1.256 µs] | ||||
|                         thrpt:  [802.5K ops/s 810.2K ops/s 818.1K ops/s] | ||||
| ``` | ||||
|  | ||||
| #### 2. CSV Export | ||||
| Structured data for analysis: | ||||
| ```csv | ||||
| backend,operation,dataset_size,mean_ns,median_ns,p95_ns,p99_ns,throughput_ops_sec | ||||
| redb,set,small,1245,1240,1890,2100,810200 | ||||
| sled,set,small,1567,1550,2340,2890,638000 | ||||
| ``` | ||||
|  | ||||
| #### 3. JSON Export | ||||
| Detailed metrics for programmatic processing: | ||||
| ```json | ||||
| { | ||||
|   "benchmark": "single_ops/redb/set/small", | ||||
|   "metrics": { | ||||
|     "mean": 1245, | ||||
|     "median": 1240, | ||||
|     "p95": 1890, | ||||
|     "p99": 2100, | ||||
|     "std_dev": 145, | ||||
|     "throughput": 810200 | ||||
|   }, | ||||
|   "memory": { | ||||
|     "allocations": 3, | ||||
|     "peak_bytes": 4096 | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ## Benchmark Implementation Details | ||||
|  | ||||
| ### Backend Setup | ||||
|  | ||||
| Each benchmark creates isolated database instances: | ||||
|  | ||||
| ```rust | ||||
| // Redb backend | ||||
| let temp_dir = TempDir::new()?; | ||||
| let db_path = temp_dir.path().join("bench.db"); | ||||
| let storage = Storage::new(db_path, false, None)?; | ||||
|  | ||||
| // Sled backend | ||||
| let temp_dir = TempDir::new()?; | ||||
| let db_path = temp_dir.path().join("bench.sled"); | ||||
| let storage = SledStorage::new(db_path, false, None)?; | ||||
| ``` | ||||
|  | ||||
| ### Data Generation | ||||
|  | ||||
| Deterministic data generation ensures reproducibility: | ||||
|  | ||||
| ```rust | ||||
| use rand::{SeedableRng, Rng}; | ||||
| use rand::rngs::StdRng; | ||||
|  | ||||
| fn generate_test_data(count: usize, seed: u64) -> Vec<(String, String)> { | ||||
|     let mut rng = StdRng::seed_from_u64(seed); | ||||
|     (0..count) | ||||
|         .map(|i| { | ||||
|             let key = format!("bench:key:{:08}", i); | ||||
|             let value = generate_value(&mut rng, 100); | ||||
|             (key, value) | ||||
|         }) | ||||
|         .collect() | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### Concurrent Testing | ||||
|  | ||||
| Using Tokio for async concurrent operations: | ||||
|  | ||||
| ```rust | ||||
| async fn concurrent_benchmark( | ||||
|     storage: Arc<dyn StorageBackend>, | ||||
|     num_clients: usize, | ||||
|     operations: usize | ||||
| ) { | ||||
|     let tasks: Vec<_> = (0..num_clients) | ||||
|         .map(|client_id| { | ||||
|             let storage = storage.clone(); | ||||
|             tokio::spawn(async move { | ||||
|                 for i in 0..operations { | ||||
|                     let key = format!("client:{}:key:{}", client_id, i); | ||||
|                     storage.set(key, "value".to_string()).unwrap(); | ||||
|                 } | ||||
|             }) | ||||
|         }) | ||||
|         .collect(); | ||||
|      | ||||
|     futures::future::join_all(tasks).await; | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ## Interpreting Results | ||||
|  | ||||
| ### Performance Comparison | ||||
|  | ||||
| When comparing backends, consider: | ||||
|  | ||||
| 1. **Latency vs Throughput Trade-offs** | ||||
|    - Lower latency = better for interactive workloads | ||||
|    - Higher throughput = better for batch processing | ||||
|  | ||||
| 2. **Consistency** | ||||
|    - Lower standard deviation = more predictable performance | ||||
|    - Check p95/p99 for tail latency | ||||
|  | ||||
| 3. **Scalability** | ||||
|    - How performance changes with dataset size | ||||
|    - Concurrent operation efficiency | ||||
|  | ||||
| ### Backend Selection Guidelines | ||||
|  | ||||
| Based on benchmark results, choose: | ||||
|  | ||||
| **redb** when: | ||||
| - Need predictable latency | ||||
| - Working with structured data (separate tables) | ||||
| - Require high concurrent read performance | ||||
| - Memory efficiency is important | ||||
|  | ||||
| **sled** when: | ||||
| - Need high write throughput | ||||
| - Working with uniform data types | ||||
| - Require lock-free operations | ||||
| - Crash recovery is critical | ||||
|  | ||||
| ## Memory Profiling | ||||
|  | ||||
| ### Using DHAT | ||||
|  | ||||
| For detailed memory profiling: | ||||
|  | ||||
| ```bash | ||||
| # Install valgrind and dhat | ||||
| sudo apt-get install valgrind | ||||
|  | ||||
| # Run with DHAT | ||||
| cargo bench --bench memory_profile -- --profile-time=10 | ||||
| ``` | ||||
|  | ||||
| ### Custom Allocation Tracking | ||||
|  | ||||
| The benchmarks include custom allocation tracking: | ||||
|  | ||||
| ```rust | ||||
| #[global_allocator] | ||||
| static ALLOC: dhat::Alloc = dhat::Alloc; | ||||
|  | ||||
| fn track_allocations<F>(f: F) -> AllocationStats | ||||
| where | ||||
|     F: FnOnce(), | ||||
| { | ||||
|     let _profiler = dhat::Profiler::new_heap(); | ||||
|     f(); | ||||
|     // Extract stats from profiler | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ## Continuous Benchmarking | ||||
|  | ||||
| ### Regression Detection | ||||
|  | ||||
| Compare against baseline to detect performance regressions: | ||||
|  | ||||
| ```bash | ||||
| # Save current performance as baseline | ||||
| cargo bench -- --save-baseline v0.1.0 | ||||
|  | ||||
| # After changes, compare | ||||
| cargo bench -- --baseline v0.1.0 | ||||
|  | ||||
| # Criterion will highlight significant changes | ||||
| ``` | ||||
|  | ||||
| ### CI Integration | ||||
|  | ||||
| Add to CI pipeline: | ||||
|  | ||||
| ```yaml | ||||
| - name: Run Benchmarks | ||||
|   run: | | ||||
|     cargo bench --no-fail-fast -- --output-format json > bench-results.json | ||||
|      | ||||
| - name: Compare Results | ||||
|   run: | | ||||
|     python scripts/compare_benchmarks.py \ | ||||
|       --baseline baseline.json \ | ||||
|       --current bench-results.json \ | ||||
|       --threshold 10  # Fail if >10% regression | ||||
| ``` | ||||
|  | ||||
| ## Troubleshooting | ||||
|  | ||||
| ### Common Issues | ||||
|  | ||||
| 1. **Inconsistent Results** | ||||
|    - Ensure system is idle during benchmarks | ||||
|    - Disable CPU frequency scaling | ||||
|    - Run multiple iterations | ||||
|  | ||||
| 2. **Out of Memory** | ||||
|    - Reduce dataset sizes | ||||
|    - Run benchmarks sequentially | ||||
|    - Increase system swap space | ||||
|  | ||||
| 3. **Slow Benchmarks** | ||||
|    - Reduce sample size in Criterion config | ||||
|    - Use `--quick` flag for faster runs | ||||
|    - Focus on specific benchmarks | ||||
|  | ||||
| ### Performance Tips | ||||
|  | ||||
| ```bash | ||||
| # Quick benchmark run (fewer samples) | ||||
| cargo bench -- --quick | ||||
|  | ||||
| # Verbose output for debugging | ||||
| cargo bench -- --verbose | ||||
|  | ||||
| # Profile specific operation | ||||
| cargo bench -- single_ops/redb/set | ||||
| ``` | ||||
|  | ||||
| ## Future Enhancements | ||||
|  | ||||
| Potential additions to the benchmark suite: | ||||
|  | ||||
| 1. **Transaction Performance**: Measure MULTI/EXEC overhead | ||||
| 2. **Encryption Overhead**: Compare encrypted vs non-encrypted | ||||
| 3. **Persistence Testing**: Measure flush/sync performance | ||||
| 4. **Recovery Time**: Database restart and recovery speed | ||||
| 5. **Network Overhead**: Redis protocol parsing impact | ||||
| 6. **Long-Running Stability**: Performance over extended periods | ||||
|  | ||||
| ## References | ||||
|  | ||||
| - [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/) | ||||
| - [DHAT Memory Profiler](https://valgrind.org/docs/manual/dh-manual.html) | ||||
| - [Rust Performance Book](https://nnethercote.github.io/perf-book/) | ||||
		Reference in New Issue
	
	Block a user