benchmarking

2025-10-30 11:17:26 +01:00
parent 592b6c1ea9
commit 9136e5f3c0
16 changed files with 3611 additions and 0 deletions
--- a/docs/benchmarking.md
+++ b/docs/benchmarking.md
@@ -0,0 +1,409 @@
+# HeroDB Performance Benchmarking Guide
+
+## Overview
+
+This document describes the comprehensive benchmarking suite for HeroDB, designed to measure and compare the performance characteristics of the two storage backends: **redb** (default) and **sled**.
+
+## Benchmark Architecture
+
+### Design Principles
+
+1. **Fair Comparison**: Identical test datasets and operations across all backends
+2. **Statistical Rigor**: Using Criterion for statistically sound measurements
+3. **Real-World Scenarios**: Mix of synthetic and realistic workload patterns
+4. **Reproducibility**: Deterministic test data generation with fixed seeds
+5. **Isolation**: Each benchmark runs in a clean environment
+
+### Benchmark Categories
+
+#### 1. Single-Operation CRUD Benchmarks
+Measures the performance of individual database operations:
+
+- **String Operations**
+  - `SET` - Write a single key-value pair
+  - `GET` - Read a single key-value pair
+  - `DEL` - Delete a single key
+  - `EXISTS` - Check key existence
+
+- **Hash Operations**
+  - `HSET` - Set single field in hash
+  - `HGET` - Get single field from hash
+  - `HGETALL` - Get all fields from hash
+  - `HDEL` - Delete field from hash
+  - `HEXISTS` - Check field existence
+
+- **List Operations**
+  - `LPUSH` - Push to list head
+  - `RPUSH` - Push to list tail
+  - `LPOP` - Pop from list head
+  - `RPOP` - Pop from list tail
+  - `LRANGE` - Get range of elements
+
+#### 2. Bulk Operation Benchmarks
+Tests throughput with varying batch sizes:
+
+- **Bulk Insert**: 100, 1,000, 10,000 records
+- **Bulk Read**: Sequential and random access patterns
+- **Bulk Update**: Modify existing records
+- **Bulk Delete**: Remove multiple records
+
+#### 3. Query and Scan Benchmarks
+Evaluates iteration and filtering performance:
+
+- **SCAN**: Cursor-based key iteration
+- **HSCAN**: Hash field iteration
+- **KEYS**: Pattern matching (with various patterns)
+- **Range Queries**: List range operations
+
+#### 4. Concurrent Operation Benchmarks
+Simulates multi-client scenarios:
+
+- **10 Concurrent Clients**: Light load
+- **50 Concurrent Clients**: Medium load
+- **Mixed Workload**: 70% reads, 30% writes
+
+#### 5. Memory Profiling
+Tracks memory usage patterns:
+
+- **Allocation Tracking**: Total allocations per operation
+- **Peak Memory**: Maximum memory usage
+- **Memory Efficiency**: Bytes per record stored
+
+### Test Data Specifications
+
+#### Dataset Sizes
+- **Small**: 1,000 - 10,000 records
+- **Medium**: 10,000 records (primary focus)
+
+#### Data Characteristics
+- **Key Format**: `bench:key:{id}` (predictable, sortable)
+- **Value Sizes**: 
+  - Small: 50-100 bytes
+  - Medium: 500-1000 bytes
+  - Large: 5000-10000 bytes
+- **Hash Fields**: 5-20 fields per hash
+- **List Elements**: 10-100 elements per list
+
+### Metrics Collected
+
+For each benchmark, we collect:
+
+1. **Latency Metrics**
+   - Mean execution time
+   - Median (p50)
+   - 95th percentile (p95)
+   - 99th percentile (p99)
+   - Standard deviation
+
+2. **Throughput Metrics**
+   - Operations per second
+   - Records per second (for bulk operations)
+
+3. **Memory Metrics**
+   - Total allocations
+   - Peak memory usage
+   - Average bytes per operation
+
+4. **Initialization Overhead**
+   - Database startup time
+   - First operation latency (cold cache)
+
+## Benchmark Structure
+
+### Directory Layout
+
+```
+benches/
+├── common/
+│   ├── mod.rs              # Shared utilities
+│   ├── data_generator.rs   # Test data generation
+│   ├── metrics.rs          # Custom metrics collection
+│   └── backends.rs         # Backend setup helpers
+├── single_ops.rs           # Single-operation benchmarks
+├── bulk_ops.rs             # Bulk operation benchmarks
+├── scan_ops.rs             # Scan and query benchmarks
+├── concurrent_ops.rs       # Concurrent operation benchmarks
+└── memory_profile.rs       # Memory profiling benchmarks
+```
+
+### Running Benchmarks
+
+#### Run All Benchmarks
+```bash
+cargo bench
+```
+
+#### Run Specific Benchmark Suite
+```bash
+cargo bench --bench single_ops
+cargo bench --bench bulk_ops
+cargo bench --bench concurrent_ops
+```
+
+#### Run Specific Backend
+```bash
+cargo bench -- redb
+cargo bench -- sled
+```
+
+#### Generate Reports
+```bash
+# Run benchmarks and save results
+cargo bench -- --save-baseline main
+
+# Compare against baseline
+cargo bench -- --baseline main
+
+# Export to CSV
+cargo bench -- --output-format csv > results.csv
+```
+
+### Output Formats
+
+#### 1. Terminal Output (Default)
+Real-time progress with statistical summaries:
+```
+single_ops/redb/set/small
+                        time:   [1.234 µs 1.245 µs 1.256 µs]
+                        thrpt:  [802.5K ops/s 810.2K ops/s 818.1K ops/s]
+```
+
+#### 2. CSV Export
+Structured data for analysis:
+```csv
+backend,operation,dataset_size,mean_ns,median_ns,p95_ns,p99_ns,throughput_ops_sec
+redb,set,small,1245,1240,1890,2100,810200
+sled,set,small,1567,1550,2340,2890,638000
+```
+
+#### 3. JSON Export
+Detailed metrics for programmatic processing:
+```json
+{
+  "benchmark": "single_ops/redb/set/small",
+  "metrics": {
+    "mean": 1245,
+    "median": 1240,
+    "p95": 1890,
+    "p99": 2100,
+    "std_dev": 145,
+    "throughput": 810200
+  },
+  "memory": {
+    "allocations": 3,
+    "peak_bytes": 4096
+  }
+}
+```
+
+## Benchmark Implementation Details
+
+### Backend Setup
+
+Each benchmark creates isolated database instances:
+
+```rust
+// Redb backend
+let temp_dir = TempDir::new()?;
+let db_path = temp_dir.path().join("bench.db");
+let storage = Storage::new(db_path, false, None)?;
+
+// Sled backend
+let temp_dir = TempDir::new()?;
+let db_path = temp_dir.path().join("bench.sled");
+let storage = SledStorage::new(db_path, false, None)?;
+```
+
+### Data Generation
+
+Deterministic data generation ensures reproducibility:
+
+```rust
+use rand::{SeedableRng, Rng};
+use rand::rngs::StdRng;
+
+fn generate_test_data(count: usize, seed: u64) -> Vec<(String, String)> {
+    let mut rng = StdRng::seed_from_u64(seed);
+    (0..count)
+        .map(|i| {
+            let key = format!("bench:key:{:08}", i);
+            let value = generate_value(&mut rng, 100);
+            (key, value)
+        })
+        .collect()
+}
+```
+
+### Concurrent Testing
+
+Using Tokio for async concurrent operations:
+
+```rust
+async fn concurrent_benchmark(
+    storage: Arc<dyn StorageBackend>,
+    num_clients: usize,
+    operations: usize
+) {
+    let tasks: Vec<_> = (0..num_clients)
+        .map(|client_id| {
+            let storage = storage.clone();
+            tokio::spawn(async move {
+                for i in 0..operations {
+                    let key = format!("client:{}:key:{}", client_id, i);
+                    storage.set(key, "value".to_string()).unwrap();
+                }
+            })
+        })
+        .collect();
+    
+    futures::future::join_all(tasks).await;
+}
+```
+
+## Interpreting Results
+
+### Performance Comparison
+
+When comparing backends, consider:
+
+1. **Latency vs Throughput Trade-offs**
+   - Lower latency = better for interactive workloads
+   - Higher throughput = better for batch processing
+
+2. **Consistency**
+   - Lower standard deviation = more predictable performance
+   - Check p95/p99 for tail latency
+
+3. **Scalability**
+   - How performance changes with dataset size
+   - Concurrent operation efficiency
+
+### Backend Selection Guidelines
+
+Based on benchmark results, choose:
+
+**redb** when:
+- Need predictable latency
+- Working with structured data (separate tables)
+- Require high concurrent read performance
+- Memory efficiency is important
+
+**sled** when:
+- Need high write throughput
+- Working with uniform data types
+- Require lock-free operations
+- Crash recovery is critical
+
+## Memory Profiling
+
+### Using DHAT
+
+For detailed memory profiling:
+
+```bash
+# Install valgrind and dhat
+sudo apt-get install valgrind
+
+# Run with DHAT
+cargo bench --bench memory_profile -- --profile-time=10
+```
+
+### Custom Allocation Tracking
+
+The benchmarks include custom allocation tracking:
+
+```rust
+#[global_allocator]
+static ALLOC: dhat::Alloc = dhat::Alloc;
+
+fn track_allocations<F>(f: F) -> AllocationStats
+where
+    F: FnOnce(),
+{
+    let _profiler = dhat::Profiler::new_heap();
+    f();
+    // Extract stats from profiler
+}
+```
+
+## Continuous Benchmarking
+
+### Regression Detection
+
+Compare against baseline to detect performance regressions:
+
+```bash
+# Save current performance as baseline
+cargo bench -- --save-baseline v0.1.0
+
+# After changes, compare
+cargo bench -- --baseline v0.1.0
+
+# Criterion will highlight significant changes
+```
+
+### CI Integration
+
+Add to CI pipeline:
+
+```yaml
+- name: Run Benchmarks
+  run: |
+    cargo bench --no-fail-fast -- --output-format json > bench-results.json
+    
+- name: Compare Results
+  run: |
+    python scripts/compare_benchmarks.py \
+      --baseline baseline.json \
+      --current bench-results.json \
+      --threshold 10  # Fail if >10% regression
+```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Inconsistent Results**
+   - Ensure system is idle during benchmarks
+   - Disable CPU frequency scaling
+   - Run multiple iterations
+
+2. **Out of Memory**
+   - Reduce dataset sizes
+   - Run benchmarks sequentially
+   - Increase system swap space
+
+3. **Slow Benchmarks**
+   - Reduce sample size in Criterion config
+   - Use `--quick` flag for faster runs
+   - Focus on specific benchmarks
+
+### Performance Tips
+
+```bash
+# Quick benchmark run (fewer samples)
+cargo bench -- --quick
+
+# Verbose output for debugging
+cargo bench -- --verbose
+
+# Profile specific operation
+cargo bench -- single_ops/redb/set
+```
+
+## Future Enhancements
+
+Potential additions to the benchmark suite:
+
+1. **Transaction Performance**: Measure MULTI/EXEC overhead
+2. **Encryption Overhead**: Compare encrypted vs non-encrypted
+3. **Persistence Testing**: Measure flush/sync performance
+4. **Recovery Time**: Database restart and recovery speed
+5. **Network Overhead**: Redis protocol parsing impact
+6. **Long-Running Stability**: Performance over extended periods
+
+## References
+
+- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
+- [DHAT Memory Profiler](https://valgrind.org/docs/manual/dh-manual.html)
+- [Rust Performance Book](https://nnethercote.github.io/perf-book/)