9.9 KiB
HeroDB Performance Benchmarking Guide
Overview
This document describes the comprehensive benchmarking suite for HeroDB, designed to measure and compare the performance characteristics of the two storage backends: redb (default) and sled.
Benchmark Architecture
Design Principles
- Fair Comparison: Identical test datasets and operations across all backends
- Statistical Rigor: Using Criterion for statistically sound measurements
- Real-World Scenarios: Mix of synthetic and realistic workload patterns
- Reproducibility: Deterministic test data generation with fixed seeds
- Isolation: Each benchmark runs in a clean environment
Benchmark Categories
1. Single-Operation CRUD Benchmarks
Measures the performance of individual database operations:
-
String Operations
SET- Write a single key-value pairGET- Read a single key-value pairDEL- Delete a single keyEXISTS- Check key existence
-
Hash Operations
HSET- Set single field in hashHGET- Get single field from hashHGETALL- Get all fields from hashHDEL- Delete field from hashHEXISTS- Check field existence
-
List Operations
LPUSH- Push to list headRPUSH- Push to list tailLPOP- Pop from list headRPOP- Pop from list tailLRANGE- Get range of elements
2. Bulk Operation Benchmarks
Tests throughput with varying batch sizes:
- Bulk Insert: 100, 1,000, 10,000 records
- Bulk Read: Sequential and random access patterns
- Bulk Update: Modify existing records
- Bulk Delete: Remove multiple records
3. Query and Scan Benchmarks
Evaluates iteration and filtering performance:
- SCAN: Cursor-based key iteration
- HSCAN: Hash field iteration
- KEYS: Pattern matching (with various patterns)
- Range Queries: List range operations
4. Concurrent Operation Benchmarks
Simulates multi-client scenarios:
- 10 Concurrent Clients: Light load
- 50 Concurrent Clients: Medium load
- Mixed Workload: 70% reads, 30% writes
5. Memory Profiling
Tracks memory usage patterns:
- Allocation Tracking: Total allocations per operation
- Peak Memory: Maximum memory usage
- Memory Efficiency: Bytes per record stored
Test Data Specifications
Dataset Sizes
- Small: 1,000 - 10,000 records
- Medium: 10,000 records (primary focus)
Data Characteristics
- Key Format:
bench:key:{id}(predictable, sortable) - Value Sizes:
- Small: 50-100 bytes
- Medium: 500-1000 bytes
- Large: 5000-10000 bytes
- Hash Fields: 5-20 fields per hash
- List Elements: 10-100 elements per list
Metrics Collected
For each benchmark, we collect:
-
Latency Metrics
- Mean execution time
- Median (p50)
- 95th percentile (p95)
- 99th percentile (p99)
- Standard deviation
-
Throughput Metrics
- Operations per second
- Records per second (for bulk operations)
-
Memory Metrics
- Total allocations
- Peak memory usage
- Average bytes per operation
-
Initialization Overhead
- Database startup time
- First operation latency (cold cache)
Benchmark Structure
Directory Layout
benches/
├── common/
│ ├── mod.rs # Shared utilities
│ ├── data_generator.rs # Test data generation
│ ├── metrics.rs # Custom metrics collection
│ └── backends.rs # Backend setup helpers
├── single_ops.rs # Single-operation benchmarks
├── bulk_ops.rs # Bulk operation benchmarks
├── scan_ops.rs # Scan and query benchmarks
├── concurrent_ops.rs # Concurrent operation benchmarks
└── memory_profile.rs # Memory profiling benchmarks
Running Benchmarks
Run All Benchmarks
cargo bench
Run Specific Benchmark Suite
cargo bench --bench single_ops
cargo bench --bench bulk_ops
cargo bench --bench concurrent_ops
Run Specific Backend
cargo bench -- redb
cargo bench -- sled
Generate Reports
# Run benchmarks and save results
cargo bench -- --save-baseline main
# Compare against baseline
cargo bench -- --baseline main
# Export to CSV
cargo bench -- --output-format csv > results.csv
Output Formats
1. Terminal Output (Default)
Real-time progress with statistical summaries:
single_ops/redb/set/small
time: [1.234 µs 1.245 µs 1.256 µs]
thrpt: [802.5K ops/s 810.2K ops/s 818.1K ops/s]
2. CSV Export
Structured data for analysis:
backend,operation,dataset_size,mean_ns,median_ns,p95_ns,p99_ns,throughput_ops_sec
redb,set,small,1245,1240,1890,2100,810200
sled,set,small,1567,1550,2340,2890,638000
3. JSON Export
Detailed metrics for programmatic processing:
{
"benchmark": "single_ops/redb/set/small",
"metrics": {
"mean": 1245,
"median": 1240,
"p95": 1890,
"p99": 2100,
"std_dev": 145,
"throughput": 810200
},
"memory": {
"allocations": 3,
"peak_bytes": 4096
}
}
Benchmark Implementation Details
Backend Setup
Each benchmark creates isolated database instances:
// Redb backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.db");
let storage = Storage::new(db_path, false, None)?;
// Sled backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.sled");
let storage = SledStorage::new(db_path, false, None)?;
Data Generation
Deterministic data generation ensures reproducibility:
use rand::{SeedableRng, Rng};
use rand::rngs::StdRng;
fn generate_test_data(count: usize, seed: u64) -> Vec<(String, String)> {
let mut rng = StdRng::seed_from_u64(seed);
(0..count)
.map(|i| {
let key = format!("bench:key:{:08}", i);
let value = generate_value(&mut rng, 100);
(key, value)
})
.collect()
}
Concurrent Testing
Using Tokio for async concurrent operations:
async fn concurrent_benchmark(
storage: Arc<dyn StorageBackend>,
num_clients: usize,
operations: usize
) {
let tasks: Vec<_> = (0..num_clients)
.map(|client_id| {
let storage = storage.clone();
tokio::spawn(async move {
for i in 0..operations {
let key = format!("client:{}:key:{}", client_id, i);
storage.set(key, "value".to_string()).unwrap();
}
})
})
.collect();
futures::future::join_all(tasks).await;
}
Interpreting Results
Performance Comparison
When comparing backends, consider:
-
Latency vs Throughput Trade-offs
- Lower latency = better for interactive workloads
- Higher throughput = better for batch processing
-
Consistency
- Lower standard deviation = more predictable performance
- Check p95/p99 for tail latency
-
Scalability
- How performance changes with dataset size
- Concurrent operation efficiency
Backend Selection Guidelines
Based on benchmark results, choose:
redb when:
- Need predictable latency
- Working with structured data (separate tables)
- Require high concurrent read performance
- Memory efficiency is important
sled when:
- Need high write throughput
- Working with uniform data types
- Require lock-free operations
- Crash recovery is critical
Memory Profiling
Using DHAT
For detailed memory profiling:
# Install valgrind and dhat
sudo apt-get install valgrind
# Run with DHAT
cargo bench --bench memory_profile -- --profile-time=10
Custom Allocation Tracking
The benchmarks include custom allocation tracking:
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;
fn track_allocations<F>(f: F) -> AllocationStats
where
F: FnOnce(),
{
let _profiler = dhat::Profiler::new_heap();
f();
// Extract stats from profiler
}
Continuous Benchmarking
Regression Detection
Compare against baseline to detect performance regressions:
# Save current performance as baseline
cargo bench -- --save-baseline v0.1.0
# After changes, compare
cargo bench -- --baseline v0.1.0
# Criterion will highlight significant changes
CI Integration
Add to CI pipeline:
- name: Run Benchmarks
run: |
cargo bench --no-fail-fast -- --output-format json > bench-results.json
- name: Compare Results
run: |
python scripts/compare_benchmarks.py \
--baseline baseline.json \
--current bench-results.json \
--threshold 10 # Fail if >10% regression
Troubleshooting
Common Issues
-
Inconsistent Results
- Ensure system is idle during benchmarks
- Disable CPU frequency scaling
- Run multiple iterations
-
Out of Memory
- Reduce dataset sizes
- Run benchmarks sequentially
- Increase system swap space
-
Slow Benchmarks
- Reduce sample size in Criterion config
- Use
--quickflag for faster runs - Focus on specific benchmarks
Performance Tips
# Quick benchmark run (fewer samples)
cargo bench -- --quick
# Verbose output for debugging
cargo bench -- --verbose
# Profile specific operation
cargo bench -- single_ops/redb/set
Future Enhancements
Potential additions to the benchmark suite:
- Transaction Performance: Measure MULTI/EXEC overhead
- Encryption Overhead: Compare encrypted vs non-encrypted
- Persistence Testing: Measure flush/sync performance
- Recovery Time: Database restart and recovery speed
- Network Overhead: Redis protocol parsing impact
- Long-Running Stability: Performance over extended periods