Files
herodb/docs/benchmarking.md
Maxime Van Hees 9136e5f3c0 benchmarking
2025-10-30 11:17:26 +01:00

9.9 KiB

HeroDB Performance Benchmarking Guide

Overview

This document describes the comprehensive benchmarking suite for HeroDB, designed to measure and compare the performance characteristics of the two storage backends: redb (default) and sled.

Benchmark Architecture

Design Principles

  1. Fair Comparison: Identical test datasets and operations across all backends
  2. Statistical Rigor: Using Criterion for statistically sound measurements
  3. Real-World Scenarios: Mix of synthetic and realistic workload patterns
  4. Reproducibility: Deterministic test data generation with fixed seeds
  5. Isolation: Each benchmark runs in a clean environment

Benchmark Categories

1. Single-Operation CRUD Benchmarks

Measures the performance of individual database operations:

  • String Operations

    • SET - Write a single key-value pair
    • GET - Read a single key-value pair
    • DEL - Delete a single key
    • EXISTS - Check key existence
  • Hash Operations

    • HSET - Set single field in hash
    • HGET - Get single field from hash
    • HGETALL - Get all fields from hash
    • HDEL - Delete field from hash
    • HEXISTS - Check field existence
  • List Operations

    • LPUSH - Push to list head
    • RPUSH - Push to list tail
    • LPOP - Pop from list head
    • RPOP - Pop from list tail
    • LRANGE - Get range of elements

2. Bulk Operation Benchmarks

Tests throughput with varying batch sizes:

  • Bulk Insert: 100, 1,000, 10,000 records
  • Bulk Read: Sequential and random access patterns
  • Bulk Update: Modify existing records
  • Bulk Delete: Remove multiple records

3. Query and Scan Benchmarks

Evaluates iteration and filtering performance:

  • SCAN: Cursor-based key iteration
  • HSCAN: Hash field iteration
  • KEYS: Pattern matching (with various patterns)
  • Range Queries: List range operations

4. Concurrent Operation Benchmarks

Simulates multi-client scenarios:

  • 10 Concurrent Clients: Light load
  • 50 Concurrent Clients: Medium load
  • Mixed Workload: 70% reads, 30% writes

5. Memory Profiling

Tracks memory usage patterns:

  • Allocation Tracking: Total allocations per operation
  • Peak Memory: Maximum memory usage
  • Memory Efficiency: Bytes per record stored

Test Data Specifications

Dataset Sizes

  • Small: 1,000 - 10,000 records
  • Medium: 10,000 records (primary focus)

Data Characteristics

  • Key Format: bench:key:{id} (predictable, sortable)
  • Value Sizes:
    • Small: 50-100 bytes
    • Medium: 500-1000 bytes
    • Large: 5000-10000 bytes
  • Hash Fields: 5-20 fields per hash
  • List Elements: 10-100 elements per list

Metrics Collected

For each benchmark, we collect:

  1. Latency Metrics

    • Mean execution time
    • Median (p50)
    • 95th percentile (p95)
    • 99th percentile (p99)
    • Standard deviation
  2. Throughput Metrics

    • Operations per second
    • Records per second (for bulk operations)
  3. Memory Metrics

    • Total allocations
    • Peak memory usage
    • Average bytes per operation
  4. Initialization Overhead

    • Database startup time
    • First operation latency (cold cache)

Benchmark Structure

Directory Layout

benches/
├── common/
│   ├── mod.rs              # Shared utilities
│   ├── data_generator.rs   # Test data generation
│   ├── metrics.rs          # Custom metrics collection
│   └── backends.rs         # Backend setup helpers
├── single_ops.rs           # Single-operation benchmarks
├── bulk_ops.rs             # Bulk operation benchmarks
├── scan_ops.rs             # Scan and query benchmarks
├── concurrent_ops.rs       # Concurrent operation benchmarks
└── memory_profile.rs       # Memory profiling benchmarks

Running Benchmarks

Run All Benchmarks

cargo bench

Run Specific Benchmark Suite

cargo bench --bench single_ops
cargo bench --bench bulk_ops
cargo bench --bench concurrent_ops

Run Specific Backend

cargo bench -- redb
cargo bench -- sled

Generate Reports

# Run benchmarks and save results
cargo bench -- --save-baseline main

# Compare against baseline
cargo bench -- --baseline main

# Export to CSV
cargo bench -- --output-format csv > results.csv

Output Formats

1. Terminal Output (Default)

Real-time progress with statistical summaries:

single_ops/redb/set/small
                        time:   [1.234 µs 1.245 µs 1.256 µs]
                        thrpt:  [802.5K ops/s 810.2K ops/s 818.1K ops/s]

2. CSV Export

Structured data for analysis:

backend,operation,dataset_size,mean_ns,median_ns,p95_ns,p99_ns,throughput_ops_sec
redb,set,small,1245,1240,1890,2100,810200
sled,set,small,1567,1550,2340,2890,638000

3. JSON Export

Detailed metrics for programmatic processing:

{
  "benchmark": "single_ops/redb/set/small",
  "metrics": {
    "mean": 1245,
    "median": 1240,
    "p95": 1890,
    "p99": 2100,
    "std_dev": 145,
    "throughput": 810200
  },
  "memory": {
    "allocations": 3,
    "peak_bytes": 4096
  }
}

Benchmark Implementation Details

Backend Setup

Each benchmark creates isolated database instances:

// Redb backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.db");
let storage = Storage::new(db_path, false, None)?;

// Sled backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.sled");
let storage = SledStorage::new(db_path, false, None)?;

Data Generation

Deterministic data generation ensures reproducibility:

use rand::{SeedableRng, Rng};
use rand::rngs::StdRng;

fn generate_test_data(count: usize, seed: u64) -> Vec<(String, String)> {
    let mut rng = StdRng::seed_from_u64(seed);
    (0..count)
        .map(|i| {
            let key = format!("bench:key:{:08}", i);
            let value = generate_value(&mut rng, 100);
            (key, value)
        })
        .collect()
}

Concurrent Testing

Using Tokio for async concurrent operations:

async fn concurrent_benchmark(
    storage: Arc<dyn StorageBackend>,
    num_clients: usize,
    operations: usize
) {
    let tasks: Vec<_> = (0..num_clients)
        .map(|client_id| {
            let storage = storage.clone();
            tokio::spawn(async move {
                for i in 0..operations {
                    let key = format!("client:{}:key:{}", client_id, i);
                    storage.set(key, "value".to_string()).unwrap();
                }
            })
        })
        .collect();
    
    futures::future::join_all(tasks).await;
}

Interpreting Results

Performance Comparison

When comparing backends, consider:

  1. Latency vs Throughput Trade-offs

    • Lower latency = better for interactive workloads
    • Higher throughput = better for batch processing
  2. Consistency

    • Lower standard deviation = more predictable performance
    • Check p95/p99 for tail latency
  3. Scalability

    • How performance changes with dataset size
    • Concurrent operation efficiency

Backend Selection Guidelines

Based on benchmark results, choose:

redb when:

  • Need predictable latency
  • Working with structured data (separate tables)
  • Require high concurrent read performance
  • Memory efficiency is important

sled when:

  • Need high write throughput
  • Working with uniform data types
  • Require lock-free operations
  • Crash recovery is critical

Memory Profiling

Using DHAT

For detailed memory profiling:

# Install valgrind and dhat
sudo apt-get install valgrind

# Run with DHAT
cargo bench --bench memory_profile -- --profile-time=10

Custom Allocation Tracking

The benchmarks include custom allocation tracking:

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;

fn track_allocations<F>(f: F) -> AllocationStats
where
    F: FnOnce(),
{
    let _profiler = dhat::Profiler::new_heap();
    f();
    // Extract stats from profiler
}

Continuous Benchmarking

Regression Detection

Compare against baseline to detect performance regressions:

# Save current performance as baseline
cargo bench -- --save-baseline v0.1.0

# After changes, compare
cargo bench -- --baseline v0.1.0

# Criterion will highlight significant changes

CI Integration

Add to CI pipeline:

- name: Run Benchmarks
  run: |
    cargo bench --no-fail-fast -- --output-format json > bench-results.json
    
- name: Compare Results
  run: |
    python scripts/compare_benchmarks.py \
      --baseline baseline.json \
      --current bench-results.json \
      --threshold 10  # Fail if >10% regression

Troubleshooting

Common Issues

  1. Inconsistent Results

    • Ensure system is idle during benchmarks
    • Disable CPU frequency scaling
    • Run multiple iterations
  2. Out of Memory

    • Reduce dataset sizes
    • Run benchmarks sequentially
    • Increase system swap space
  3. Slow Benchmarks

    • Reduce sample size in Criterion config
    • Use --quick flag for faster runs
    • Focus on specific benchmarks

Performance Tips

# Quick benchmark run (fewer samples)
cargo bench -- --quick

# Verbose output for debugging
cargo bench -- --verbose

# Profile specific operation
cargo bench -- single_ops/redb/set

Future Enhancements

Potential additions to the benchmark suite:

  1. Transaction Performance: Measure MULTI/EXEC overhead
  2. Encryption Overhead: Compare encrypted vs non-encrypted
  3. Persistence Testing: Measure flush/sync performance
  4. Recovery Time: Database restart and recovery speed
  5. Network Overhead: Redis protocol parsing impact
  6. Long-Running Stability: Performance over extended periods

References