9.9 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	HeroDB Performance Benchmarking Guide
Overview
This document describes the comprehensive benchmarking suite for HeroDB, designed to measure and compare the performance characteristics of the two storage backends: redb (default) and sled.
Benchmark Architecture
Design Principles
- Fair Comparison: Identical test datasets and operations across all backends
- Statistical Rigor: Using Criterion for statistically sound measurements
- Real-World Scenarios: Mix of synthetic and realistic workload patterns
- Reproducibility: Deterministic test data generation with fixed seeds
- Isolation: Each benchmark runs in a clean environment
Benchmark Categories
1. Single-Operation CRUD Benchmarks
Measures the performance of individual database operations:
- 
String Operations - SET- Write a single key-value pair
- GET- Read a single key-value pair
- DEL- Delete a single key
- EXISTS- Check key existence
 
- 
Hash Operations - HSET- Set single field in hash
- HGET- Get single field from hash
- HGETALL- Get all fields from hash
- HDEL- Delete field from hash
- HEXISTS- Check field existence
 
- 
List Operations - LPUSH- Push to list head
- RPUSH- Push to list tail
- LPOP- Pop from list head
- RPOP- Pop from list tail
- LRANGE- Get range of elements
 
2. Bulk Operation Benchmarks
Tests throughput with varying batch sizes:
- Bulk Insert: 100, 1,000, 10,000 records
- Bulk Read: Sequential and random access patterns
- Bulk Update: Modify existing records
- Bulk Delete: Remove multiple records
3. Query and Scan Benchmarks
Evaluates iteration and filtering performance:
- SCAN: Cursor-based key iteration
- HSCAN: Hash field iteration
- KEYS: Pattern matching (with various patterns)
- Range Queries: List range operations
4. Concurrent Operation Benchmarks
Simulates multi-client scenarios:
- 10 Concurrent Clients: Light load
- 50 Concurrent Clients: Medium load
- Mixed Workload: 70% reads, 30% writes
5. Memory Profiling
Tracks memory usage patterns:
- Allocation Tracking: Total allocations per operation
- Peak Memory: Maximum memory usage
- Memory Efficiency: Bytes per record stored
Test Data Specifications
Dataset Sizes
- Small: 1,000 - 10,000 records
- Medium: 10,000 records (primary focus)
Data Characteristics
- Key Format: bench:key:{id}(predictable, sortable)
- Value Sizes:
- Small: 50-100 bytes
- Medium: 500-1000 bytes
- Large: 5000-10000 bytes
 
- Hash Fields: 5-20 fields per hash
- List Elements: 10-100 elements per list
Metrics Collected
For each benchmark, we collect:
- 
Latency Metrics - Mean execution time
- Median (p50)
- 95th percentile (p95)
- 99th percentile (p99)
- Standard deviation
 
- 
Throughput Metrics - Operations per second
- Records per second (for bulk operations)
 
- 
Memory Metrics - Total allocations
- Peak memory usage
- Average bytes per operation
 
- 
Initialization Overhead - Database startup time
- First operation latency (cold cache)
 
Benchmark Structure
Directory Layout
benches/
├── common/
│   ├── mod.rs              # Shared utilities
│   ├── data_generator.rs   # Test data generation
│   ├── metrics.rs          # Custom metrics collection
│   └── backends.rs         # Backend setup helpers
├── single_ops.rs           # Single-operation benchmarks
├── bulk_ops.rs             # Bulk operation benchmarks
├── scan_ops.rs             # Scan and query benchmarks
├── concurrent_ops.rs       # Concurrent operation benchmarks
└── memory_profile.rs       # Memory profiling benchmarks
Running Benchmarks
Run All Benchmarks
cargo bench
Run Specific Benchmark Suite
cargo bench --bench single_ops
cargo bench --bench bulk_ops
cargo bench --bench concurrent_ops
Run Specific Backend
cargo bench -- redb
cargo bench -- sled
Generate Reports
# Run benchmarks and save results
cargo bench -- --save-baseline main
# Compare against baseline
cargo bench -- --baseline main
# Export to CSV
cargo bench -- --output-format csv > results.csv
Output Formats
1. Terminal Output (Default)
Real-time progress with statistical summaries:
single_ops/redb/set/small
                        time:   [1.234 µs 1.245 µs 1.256 µs]
                        thrpt:  [802.5K ops/s 810.2K ops/s 818.1K ops/s]
2. CSV Export
Structured data for analysis:
backend,operation,dataset_size,mean_ns,median_ns,p95_ns,p99_ns,throughput_ops_sec
redb,set,small,1245,1240,1890,2100,810200
sled,set,small,1567,1550,2340,2890,638000
3. JSON Export
Detailed metrics for programmatic processing:
{
  "benchmark": "single_ops/redb/set/small",
  "metrics": {
    "mean": 1245,
    "median": 1240,
    "p95": 1890,
    "p99": 2100,
    "std_dev": 145,
    "throughput": 810200
  },
  "memory": {
    "allocations": 3,
    "peak_bytes": 4096
  }
}
Benchmark Implementation Details
Backend Setup
Each benchmark creates isolated database instances:
// Redb backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.db");
let storage = Storage::new(db_path, false, None)?;
// Sled backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.sled");
let storage = SledStorage::new(db_path, false, None)?;
Data Generation
Deterministic data generation ensures reproducibility:
use rand::{SeedableRng, Rng};
use rand::rngs::StdRng;
fn generate_test_data(count: usize, seed: u64) -> Vec<(String, String)> {
    let mut rng = StdRng::seed_from_u64(seed);
    (0..count)
        .map(|i| {
            let key = format!("bench:key:{:08}", i);
            let value = generate_value(&mut rng, 100);
            (key, value)
        })
        .collect()
}
Concurrent Testing
Using Tokio for async concurrent operations:
async fn concurrent_benchmark(
    storage: Arc<dyn StorageBackend>,
    num_clients: usize,
    operations: usize
) {
    let tasks: Vec<_> = (0..num_clients)
        .map(|client_id| {
            let storage = storage.clone();
            tokio::spawn(async move {
                for i in 0..operations {
                    let key = format!("client:{}:key:{}", client_id, i);
                    storage.set(key, "value".to_string()).unwrap();
                }
            })
        })
        .collect();
    
    futures::future::join_all(tasks).await;
}
Interpreting Results
Performance Comparison
When comparing backends, consider:
- 
Latency vs Throughput Trade-offs - Lower latency = better for interactive workloads
- Higher throughput = better for batch processing
 
- 
Consistency - Lower standard deviation = more predictable performance
- Check p95/p99 for tail latency
 
- 
Scalability - How performance changes with dataset size
- Concurrent operation efficiency
 
Backend Selection Guidelines
Based on benchmark results, choose:
redb when:
- Need predictable latency
- Working with structured data (separate tables)
- Require high concurrent read performance
- Memory efficiency is important
sled when:
- Need high write throughput
- Working with uniform data types
- Require lock-free operations
- Crash recovery is critical
Memory Profiling
Using DHAT
For detailed memory profiling:
# Install valgrind and dhat
sudo apt-get install valgrind
# Run with DHAT
cargo bench --bench memory_profile -- --profile-time=10
Custom Allocation Tracking
The benchmarks include custom allocation tracking:
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;
fn track_allocations<F>(f: F) -> AllocationStats
where
    F: FnOnce(),
{
    let _profiler = dhat::Profiler::new_heap();
    f();
    // Extract stats from profiler
}
Continuous Benchmarking
Regression Detection
Compare against baseline to detect performance regressions:
# Save current performance as baseline
cargo bench -- --save-baseline v0.1.0
# After changes, compare
cargo bench -- --baseline v0.1.0
# Criterion will highlight significant changes
CI Integration
Add to CI pipeline:
- name: Run Benchmarks
  run: |
    cargo bench --no-fail-fast -- --output-format json > bench-results.json
    
- name: Compare Results
  run: |
    python scripts/compare_benchmarks.py \
      --baseline baseline.json \
      --current bench-results.json \
      --threshold 10  # Fail if >10% regression
Troubleshooting
Common Issues
- 
Inconsistent Results - Ensure system is idle during benchmarks
- Disable CPU frequency scaling
- Run multiple iterations
 
- 
Out of Memory - Reduce dataset sizes
- Run benchmarks sequentially
- Increase system swap space
 
- 
Slow Benchmarks - Reduce sample size in Criterion config
- Use --quickflag for faster runs
- Focus on specific benchmarks
 
Performance Tips
# Quick benchmark run (fewer samples)
cargo bench -- --quick
# Verbose output for debugging
cargo bench -- --verbose
# Profile specific operation
cargo bench -- single_ops/redb/set
Future Enhancements
Potential additions to the benchmark suite:
- Transaction Performance: Measure MULTI/EXEC overhead
- Encryption Overhead: Compare encrypted vs non-encrypted
- Persistence Testing: Measure flush/sync performance
- Recovery Time: Database restart and recovery speed
- Network Overhead: Redis protocol parsing impact
- Long-Running Stability: Performance over extended periods