herodb/docs/benchmarking.md

# HeroDB Performance Benchmarking Guide

## Overview

This document describes the comprehensive benchmarking suite for HeroDB, designed to measure and compare the performance characteristics of the two storage backends: **redb** (default) and **sled**.

## Benchmark Architecture

### Design Principles

1. **Fair Comparison**: Identical test datasets and operations across all backends
2. **Statistical Rigor**: Using Criterion for statistically sound measurements
3. **Real-World Scenarios**: Mix of synthetic and realistic workload patterns
4. **Reproducibility**: Deterministic test data generation with fixed seeds
5. **Isolation**: Each benchmark runs in a clean environment

### Benchmark Categories

#### 1. Single-Operation CRUD Benchmarks
Measures the performance of individual database operations:

- **String Operations**
  - `SET` - Write a single key-value pair
  - `GET` - Read a single key-value pair
  - `DEL` - Delete a single key
  - `EXISTS` - Check key existence

- **Hash Operations**
  - `HSET` - Set single field in hash
  - `HGET` - Get single field from hash
  - `HGETALL` - Get all fields from hash
  - `HDEL` - Delete field from hash
  - `HEXISTS` - Check field existence

- **List Operations**
  - `LPUSH` - Push to list head
  - `RPUSH` - Push to list tail
  - `LPOP` - Pop from list head
  - `RPOP` - Pop from list tail
  - `LRANGE` - Get range of elements

#### 2. Bulk Operation Benchmarks
Tests throughput with varying batch sizes:

- **Bulk Insert**: 100, 1,000, 10,000 records
- **Bulk Read**: Sequential and random access patterns
- **Bulk Update**: Modify existing records
- **Bulk Delete**: Remove multiple records

#### 3. Query and Scan Benchmarks
Evaluates iteration and filtering performance:

- **SCAN**: Cursor-based key iteration
- **HSCAN**: Hash field iteration
- **KEYS**: Pattern matching (with various patterns)
- **Range Queries**: List range operations

#### 4. Concurrent Operation Benchmarks
Simulates multi-client scenarios:

- **10 Concurrent Clients**: Light load
- **50 Concurrent Clients**: Medium load
- **Mixed Workload**: 70% reads, 30% writes

#### 5. Memory Profiling
Tracks memory usage patterns:

- **Allocation Tracking**: Total allocations per operation
- **Peak Memory**: Maximum memory usage
- **Memory Efficiency**: Bytes per record stored

### Test Data Specifications

#### Dataset Sizes
- **Small**: 1,000 - 10,000 records
- **Medium**: 10,000 records (primary focus)

#### Data Characteristics
- **Key Format**: `bench:key:{id}` (predictable, sortable)
- **Value Sizes**:
  - Small: 50-100 bytes
  - Medium: 500-1000 bytes
  - Large: 5000-10000 bytes
- **Hash Fields**: 5-20 fields per hash
- **List Elements**: 10-100 elements per list

### Metrics Collected

For each benchmark, we collect:

1. **Latency Metrics**
   - Mean execution time
   - Median (p50)
   - 95th percentile (p95)
   - 99th percentile (p99)
   - Standard deviation

2. **Throughput Metrics**
   - Operations per second
   - Records per second (for bulk operations)

3. **Memory Metrics**
   - Total allocations
   - Peak memory usage
   - Average bytes per operation

4. **Initialization Overhead**
   - Database startup time
   - First operation latency (cold cache)

## Benchmark Structure

### Directory Layout

```
benches/
├── common/
│   ├── mod.rs              # Shared utilities
│   ├── data_generator.rs   # Test data generation
│   ├── metrics.rs          # Custom metrics collection
│   └── backends.rs         # Backend setup helpers
├── single_ops.rs           # Single-operation benchmarks
├── bulk_ops.rs             # Bulk operation benchmarks
├── scan_ops.rs             # Scan and query benchmarks
├── concurrent_ops.rs       # Concurrent operation benchmarks
└── memory_profile.rs       # Memory profiling benchmarks
```

### Running Benchmarks

#### Run All Benchmarks
```bash
cargo bench
```

#### Run Specific Benchmark Suite
```bash
cargo bench --bench single_ops
cargo bench --bench bulk_ops
cargo bench --bench concurrent_ops
```

#### Run Specific Backend
```bash
cargo bench -- redb
cargo bench -- sled
```

#### Generate Reports
```bash
# Run benchmarks and save results
cargo bench -- --save-baseline main

# Compare against baseline
cargo bench -- --baseline main

# Export to CSV
cargo bench -- --output-format csv > results.csv
```

### Output Formats

#### 1. Terminal Output (Default)
Real-time progress with statistical summaries:
```
single_ops/redb/set/small
                        time:   [1.234 µs 1.245 µs 1.256 µs]
                        thrpt:  [802.5K ops/s 810.2K ops/s 818.1K ops/s]
```

#### 2. CSV Export
Structured data for analysis:
```csv
backend,operation,dataset_size,mean_ns,median_ns,p95_ns,p99_ns,throughput_ops_sec
redb,set,small,1245,1240,1890,2100,810200
sled,set,small,1567,1550,2340,2890,638000
```

#### 3. JSON Export
Detailed metrics for programmatic processing:
```json
{
  "benchmark": "single_ops/redb/set/small",
  "metrics": {
    "mean": 1245,
    "median": 1240,
    "p95": 1890,
    "p99": 2100,
    "std_dev": 145,
    "throughput": 810200
  },
  "memory": {
    "allocations": 3,
    "peak_bytes": 4096
  }
}
```

## Benchmark Implementation Details

### Backend Setup

Each benchmark creates isolated database instances:

```rust
// Redb backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.db");
let storage = Storage::new(db_path, false, None)?;

// Sled backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.sled");
let storage = SledStorage::new(db_path, false, None)?;
```

### Data Generation

Deterministic data generation ensures reproducibility:

```rust
use rand::{SeedableRng, Rng};
use rand::rngs::StdRng;

fn generate_test_data(count: usize, seed: u64) -> Vec<(String, String)> {
    let mut rng = StdRng::seed_from_u64(seed);
    (0..count)
        .map(|i| {
            let key = format!("bench:key:{:08}", i);
            let value = generate_value(&mut rng, 100);
            (key, value)
        })
        .collect()
}
```

### Concurrent Testing

Using Tokio for async concurrent operations:

```rust
async fn concurrent_benchmark(
    storage: Arc<dyn StorageBackend>,
    num_clients: usize,
    operations: usize
) {
    let tasks: Vec<_> = (0..num_clients)
        .map(|client_id| {
            let storage = storage.clone();
            tokio::spawn(async move {
                for i in 0..operations {
                    let key = format!("client:{}:key:{}", client_id, i);
                    storage.set(key, "value".to_string()).unwrap();
                }
            })
        })
        .collect();

    futures::future::join_all(tasks).await;
}
```

## Interpreting Results

### Performance Comparison

When comparing backends, consider:

1. **Latency vs Throughput Trade-offs**
   - Lower latency = better for interactive workloads
   - Higher throughput = better for batch processing

2. **Consistency**
   - Lower standard deviation = more predictable performance
   - Check p95/p99 for tail latency

3. **Scalability**
   - How performance changes with dataset size
   - Concurrent operation efficiency

### Backend Selection Guidelines

Based on benchmark results, choose:

**redb** when:
- Need predictable latency
- Working with structured data (separate tables)
- Require high concurrent read performance
- Memory efficiency is important

**sled** when:
- Need high write throughput
- Working with uniform data types
- Require lock-free operations
- Crash recovery is critical

## Memory Profiling

### Using DHAT

For detailed memory profiling:

```bash
# Install valgrind and dhat
sudo apt-get install valgrind

# Run with DHAT
cargo bench --bench memory_profile -- --profile-time=10
```

### Custom Allocation Tracking

The benchmarks include custom allocation tracking:

```rust
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;

fn track_allocations<F>(f: F) -> AllocationStats
where
    F: FnOnce(),
{
    let _profiler = dhat::Profiler::new_heap();
    f();
    // Extract stats from profiler
}
```

## Continuous Benchmarking

### Regression Detection

Compare against baseline to detect performance regressions:

```bash
# Save current performance as baseline
cargo bench -- --save-baseline v0.1.0

# After changes, compare
cargo bench -- --baseline v0.1.0

# Criterion will highlight significant changes
```

### CI Integration

Add to CI pipeline:

```yaml
- name: Run Benchmarks
  run: |
    cargo bench --no-fail-fast -- --output-format json > bench-results.json

- name: Compare Results
  run: |
    python scripts/compare_benchmarks.py \
      --baseline baseline.json \
      --current bench-results.json \
      --threshold 10  # Fail if >10% regression
```

## Troubleshooting

### Common Issues

1. **Inconsistent Results**
   - Ensure system is idle during benchmarks
   - Disable CPU frequency scaling
   - Run multiple iterations

2. **Out of Memory**
   - Reduce dataset sizes
   - Run benchmarks sequentially
   - Increase system swap space

3. **Slow Benchmarks**
   - Reduce sample size in Criterion config
   - Use `--quick` flag for faster runs
   - Focus on specific benchmarks

### Performance Tips

```bash
# Quick benchmark run (fewer samples)
cargo bench -- --quick

# Verbose output for debugging
cargo bench -- --verbose

# Profile specific operation
cargo bench -- single_ops/redb/set
```

## Future Enhancements

Potential additions to the benchmark suite:

1. **Transaction Performance**: Measure MULTI/EXEC overhead
2. **Encryption Overhead**: Compare encrypted vs non-encrypted
3. **Persistence Testing**: Measure flush/sync performance
4. **Recovery Time**: Database restart and recovery speed
5. **Network Overhead**: Redis protocol parsing impact
6. **Long-Running Stability**: Performance over extended periods

## References

- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
- [DHAT Memory Profiler](https://valgrind.org/docs/manual/dh-manual.html)
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)