benchmarking

This commit is contained in:
Maxime Van Hees
2025-10-30 11:17:26 +01:00
parent 592b6c1ea9
commit 9136e5f3c0
16 changed files with 3611 additions and 0 deletions

409
docs/benchmarking.md Normal file
View File

@@ -0,0 +1,409 @@
# HeroDB Performance Benchmarking Guide
## Overview
This document describes the comprehensive benchmarking suite for HeroDB, designed to measure and compare the performance characteristics of the two storage backends: **redb** (default) and **sled**.
## Benchmark Architecture
### Design Principles
1. **Fair Comparison**: Identical test datasets and operations across all backends
2. **Statistical Rigor**: Using Criterion for statistically sound measurements
3. **Real-World Scenarios**: Mix of synthetic and realistic workload patterns
4. **Reproducibility**: Deterministic test data generation with fixed seeds
5. **Isolation**: Each benchmark runs in a clean environment
### Benchmark Categories
#### 1. Single-Operation CRUD Benchmarks
Measures the performance of individual database operations:
- **String Operations**
- `SET` - Write a single key-value pair
- `GET` - Read a single key-value pair
- `DEL` - Delete a single key
- `EXISTS` - Check key existence
- **Hash Operations**
- `HSET` - Set single field in hash
- `HGET` - Get single field from hash
- `HGETALL` - Get all fields from hash
- `HDEL` - Delete field from hash
- `HEXISTS` - Check field existence
- **List Operations**
- `LPUSH` - Push to list head
- `RPUSH` - Push to list tail
- `LPOP` - Pop from list head
- `RPOP` - Pop from list tail
- `LRANGE` - Get range of elements
#### 2. Bulk Operation Benchmarks
Tests throughput with varying batch sizes:
- **Bulk Insert**: 100, 1,000, 10,000 records
- **Bulk Read**: Sequential and random access patterns
- **Bulk Update**: Modify existing records
- **Bulk Delete**: Remove multiple records
#### 3. Query and Scan Benchmarks
Evaluates iteration and filtering performance:
- **SCAN**: Cursor-based key iteration
- **HSCAN**: Hash field iteration
- **KEYS**: Pattern matching (with various patterns)
- **Range Queries**: List range operations
#### 4. Concurrent Operation Benchmarks
Simulates multi-client scenarios:
- **10 Concurrent Clients**: Light load
- **50 Concurrent Clients**: Medium load
- **Mixed Workload**: 70% reads, 30% writes
#### 5. Memory Profiling
Tracks memory usage patterns:
- **Allocation Tracking**: Total allocations per operation
- **Peak Memory**: Maximum memory usage
- **Memory Efficiency**: Bytes per record stored
### Test Data Specifications
#### Dataset Sizes
- **Small**: 1,000 - 10,000 records
- **Medium**: 10,000 records (primary focus)
#### Data Characteristics
- **Key Format**: `bench:key:{id}` (predictable, sortable)
- **Value Sizes**:
- Small: 50-100 bytes
- Medium: 500-1000 bytes
- Large: 5000-10000 bytes
- **Hash Fields**: 5-20 fields per hash
- **List Elements**: 10-100 elements per list
### Metrics Collected
For each benchmark, we collect:
1. **Latency Metrics**
- Mean execution time
- Median (p50)
- 95th percentile (p95)
- 99th percentile (p99)
- Standard deviation
2. **Throughput Metrics**
- Operations per second
- Records per second (for bulk operations)
3. **Memory Metrics**
- Total allocations
- Peak memory usage
- Average bytes per operation
4. **Initialization Overhead**
- Database startup time
- First operation latency (cold cache)
## Benchmark Structure
### Directory Layout
```
benches/
├── common/
│ ├── mod.rs # Shared utilities
│ ├── data_generator.rs # Test data generation
│ ├── metrics.rs # Custom metrics collection
│ └── backends.rs # Backend setup helpers
├── single_ops.rs # Single-operation benchmarks
├── bulk_ops.rs # Bulk operation benchmarks
├── scan_ops.rs # Scan and query benchmarks
├── concurrent_ops.rs # Concurrent operation benchmarks
└── memory_profile.rs # Memory profiling benchmarks
```
### Running Benchmarks
#### Run All Benchmarks
```bash
cargo bench
```
#### Run Specific Benchmark Suite
```bash
cargo bench --bench single_ops
cargo bench --bench bulk_ops
cargo bench --bench concurrent_ops
```
#### Run Specific Backend
```bash
cargo bench -- redb
cargo bench -- sled
```
#### Generate Reports
```bash
# Run benchmarks and save results
cargo bench -- --save-baseline main
# Compare against baseline
cargo bench -- --baseline main
# Export to CSV
cargo bench -- --output-format csv > results.csv
```
### Output Formats
#### 1. Terminal Output (Default)
Real-time progress with statistical summaries:
```
single_ops/redb/set/small
time: [1.234 µs 1.245 µs 1.256 µs]
thrpt: [802.5K ops/s 810.2K ops/s 818.1K ops/s]
```
#### 2. CSV Export
Structured data for analysis:
```csv
backend,operation,dataset_size,mean_ns,median_ns,p95_ns,p99_ns,throughput_ops_sec
redb,set,small,1245,1240,1890,2100,810200
sled,set,small,1567,1550,2340,2890,638000
```
#### 3. JSON Export
Detailed metrics for programmatic processing:
```json
{
"benchmark": "single_ops/redb/set/small",
"metrics": {
"mean": 1245,
"median": 1240,
"p95": 1890,
"p99": 2100,
"std_dev": 145,
"throughput": 810200
},
"memory": {
"allocations": 3,
"peak_bytes": 4096
}
}
```
## Benchmark Implementation Details
### Backend Setup
Each benchmark creates isolated database instances:
```rust
// Redb backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.db");
let storage = Storage::new(db_path, false, None)?;
// Sled backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.sled");
let storage = SledStorage::new(db_path, false, None)?;
```
### Data Generation
Deterministic data generation ensures reproducibility:
```rust
use rand::{SeedableRng, Rng};
use rand::rngs::StdRng;
fn generate_test_data(count: usize, seed: u64) -> Vec<(String, String)> {
let mut rng = StdRng::seed_from_u64(seed);
(0..count)
.map(|i| {
let key = format!("bench:key:{:08}", i);
let value = generate_value(&mut rng, 100);
(key, value)
})
.collect()
}
```
### Concurrent Testing
Using Tokio for async concurrent operations:
```rust
async fn concurrent_benchmark(
storage: Arc<dyn StorageBackend>,
num_clients: usize,
operations: usize
) {
let tasks: Vec<_> = (0..num_clients)
.map(|client_id| {
let storage = storage.clone();
tokio::spawn(async move {
for i in 0..operations {
let key = format!("client:{}:key:{}", client_id, i);
storage.set(key, "value".to_string()).unwrap();
}
})
})
.collect();
futures::future::join_all(tasks).await;
}
```
## Interpreting Results
### Performance Comparison
When comparing backends, consider:
1. **Latency vs Throughput Trade-offs**
- Lower latency = better for interactive workloads
- Higher throughput = better for batch processing
2. **Consistency**
- Lower standard deviation = more predictable performance
- Check p95/p99 for tail latency
3. **Scalability**
- How performance changes with dataset size
- Concurrent operation efficiency
### Backend Selection Guidelines
Based on benchmark results, choose:
**redb** when:
- Need predictable latency
- Working with structured data (separate tables)
- Require high concurrent read performance
- Memory efficiency is important
**sled** when:
- Need high write throughput
- Working with uniform data types
- Require lock-free operations
- Crash recovery is critical
## Memory Profiling
### Using DHAT
For detailed memory profiling:
```bash
# Install valgrind and dhat
sudo apt-get install valgrind
# Run with DHAT
cargo bench --bench memory_profile -- --profile-time=10
```
### Custom Allocation Tracking
The benchmarks include custom allocation tracking:
```rust
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;
fn track_allocations<F>(f: F) -> AllocationStats
where
F: FnOnce(),
{
let _profiler = dhat::Profiler::new_heap();
f();
// Extract stats from profiler
}
```
## Continuous Benchmarking
### Regression Detection
Compare against baseline to detect performance regressions:
```bash
# Save current performance as baseline
cargo bench -- --save-baseline v0.1.0
# After changes, compare
cargo bench -- --baseline v0.1.0
# Criterion will highlight significant changes
```
### CI Integration
Add to CI pipeline:
```yaml
- name: Run Benchmarks
run: |
cargo bench --no-fail-fast -- --output-format json > bench-results.json
- name: Compare Results
run: |
python scripts/compare_benchmarks.py \
--baseline baseline.json \
--current bench-results.json \
--threshold 10 # Fail if >10% regression
```
## Troubleshooting
### Common Issues
1. **Inconsistent Results**
- Ensure system is idle during benchmarks
- Disable CPU frequency scaling
- Run multiple iterations
2. **Out of Memory**
- Reduce dataset sizes
- Run benchmarks sequentially
- Increase system swap space
3. **Slow Benchmarks**
- Reduce sample size in Criterion config
- Use `--quick` flag for faster runs
- Focus on specific benchmarks
### Performance Tips
```bash
# Quick benchmark run (fewer samples)
cargo bench -- --quick
# Verbose output for debugging
cargo bench -- --verbose
# Profile specific operation
cargo bench -- single_ops/redb/set
```
## Future Enhancements
Potential additions to the benchmark suite:
1. **Transaction Performance**: Measure MULTI/EXEC overhead
2. **Encryption Overhead**: Compare encrypted vs non-encrypted
3. **Persistence Testing**: Measure flush/sync performance
4. **Recovery Time**: Database restart and recovery speed
5. **Network Overhead**: Redis protocol parsing impact
6. **Long-Running Stability**: Performance over extended periods
## References
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
- [DHAT Memory Profiler](https://valgrind.org/docs/manual/dh-manual.html)
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)