Files

Maxime Van Hees 9136e5f3c0 benchmarking

2025-10-30 11:17:26 +01:00

9.9 KiB

Raw Permalink Blame History

HeroDB Performance Benchmarking Guide

Overview

This document describes the comprehensive benchmarking suite for HeroDB, designed to measure and compare the performance characteristics of the two storage backends: redb (default) and sled.

Benchmark Architecture

Design Principles

Fair Comparison: Identical test datasets and operations across all backends
Statistical Rigor: Using Criterion for statistically sound measurements
Real-World Scenarios: Mix of synthetic and realistic workload patterns
Reproducibility: Deterministic test data generation with fixed seeds
Isolation: Each benchmark runs in a clean environment

Benchmark Categories

1. Single-Operation CRUD Benchmarks

Measures the performance of individual database operations:

String Operations
- SET - Write a single key-value pair
- GET - Read a single key-value pair
- DEL - Delete a single key
- EXISTS - Check key existence
Hash Operations
- HSET - Set single field in hash
- HGET - Get single field from hash
- HGETALL - Get all fields from hash
- HDEL - Delete field from hash
- HEXISTS - Check field existence
List Operations
- LPUSH - Push to list head
- RPUSH - Push to list tail
- LPOP - Pop from list head
- RPOP - Pop from list tail
- LRANGE - Get range of elements

2. Bulk Operation Benchmarks

Tests throughput with varying batch sizes:

Bulk Insert: 100, 1,000, 10,000 records
Bulk Read: Sequential and random access patterns
Bulk Update: Modify existing records
Bulk Delete: Remove multiple records

3. Query and Scan Benchmarks

Evaluates iteration and filtering performance:

SCAN: Cursor-based key iteration
HSCAN: Hash field iteration
KEYS: Pattern matching (with various patterns)
Range Queries: List range operations

4. Concurrent Operation Benchmarks

Simulates multi-client scenarios:

10 Concurrent Clients: Light load
50 Concurrent Clients: Medium load
Mixed Workload: 70% reads, 30% writes

5. Memory Profiling

Tracks memory usage patterns:

Allocation Tracking: Total allocations per operation
Peak Memory: Maximum memory usage
Memory Efficiency: Bytes per record stored

Test Data Specifications

Dataset Sizes

Small: 1,000 - 10,000 records
Medium: 10,000 records (primary focus)

Data Characteristics

Key Format: bench:key:{id} (predictable, sortable)
Value Sizes:
- Small: 50-100 bytes
- Medium: 500-1000 bytes
- Large: 5000-10000 bytes
Hash Fields: 5-20 fields per hash
List Elements: 10-100 elements per list

Metrics Collected

For each benchmark, we collect:

Latency Metrics
- Mean execution time
- Median (p50)
- 95th percentile (p95)
- 99th percentile (p99)
- Standard deviation
Throughput Metrics
- Operations per second
- Records per second (for bulk operations)
Memory Metrics
- Total allocations
- Peak memory usage
- Average bytes per operation
Initialization Overhead
- Database startup time
- First operation latency (cold cache)

Benchmark Structure

Directory Layout

benches/
├── common/
│   ├── mod.rs              # Shared utilities
│   ├── data_generator.rs   # Test data generation
│   ├── metrics.rs          # Custom metrics collection
│   └── backends.rs         # Backend setup helpers
├── single_ops.rs           # Single-operation benchmarks
├── bulk_ops.rs             # Bulk operation benchmarks
├── scan_ops.rs             # Scan and query benchmarks
├── concurrent_ops.rs       # Concurrent operation benchmarks
└── memory_profile.rs       # Memory profiling benchmarks

Running Benchmarks

Run All Benchmarks

cargo bench

Run Specific Benchmark Suite

cargo bench --bench single_ops
cargo bench --bench bulk_ops
cargo bench --bench concurrent_ops

Run Specific Backend

cargo bench -- redb
cargo bench -- sled

Generate Reports

# Run benchmarks and save results
cargo bench -- --save-baseline main

# Compare against baseline
cargo bench -- --baseline main

# Export to CSV
cargo bench -- --output-format csv > results.csv

Output Formats

1. Terminal Output (Default)

Real-time progress with statistical summaries:

single_ops/redb/set/small
                        time:   [1.234 µs 1.245 µs 1.256 µs]
                        thrpt:  [802.5K ops/s 810.2K ops/s 818.1K ops/s]

2. CSV Export

Structured data for analysis:

backend,operation,dataset_size,mean_ns,median_ns,p95_ns,p99_ns,throughput_ops_sec
redb,set,small,1245,1240,1890,2100,810200
sled,set,small,1567,1550,2340,2890,638000

3. JSON Export

Detailed metrics for programmatic processing:

{
  "benchmark": "single_ops/redb/set/small",
  "metrics": {
    "mean": 1245,
    "median": 1240,
    "p95": 1890,
    "p99": 2100,
    "std_dev": 145,
    "throughput": 810200
  },
  "memory": {
    "allocations": 3,
    "peak_bytes": 4096
  }
}

Benchmark Implementation Details

Backend Setup

Each benchmark creates isolated database instances:

// Redb backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.db");
let storage = Storage::new(db_path, false, None)?;

// Sled backend
let temp_dir = TempDir::new()?;
let db_path = temp_dir.path().join("bench.sled");
let storage = SledStorage::new(db_path, false, None)?;

Data Generation

Deterministic data generation ensures reproducibility:

use rand::{SeedableRng, Rng};
use rand::rngs::StdRng;

fn generate_test_data(count: usize, seed: u64) -> Vec<(String, String)> {
    let mut rng = StdRng::seed_from_u64(seed);
    (0..count)
        .map(|i| {
            let key = format!("bench:key:{:08}", i);
            let value = generate_value(&mut rng, 100);
            (key, value)
        })
        .collect()
}

Concurrent Testing

Using Tokio for async concurrent operations:

async fn concurrent_benchmark(
    storage: Arc<dyn StorageBackend>,
    num_clients: usize,
    operations: usize
) {
    let tasks: Vec<_> = (0..num_clients)
        .map(|client_id| {
            let storage = storage.clone();
            tokio::spawn(async move {
                for i in 0..operations {
                    let key = format!("client:{}:key:{}", client_id, i);
                    storage.set(key, "value".to_string()).unwrap();
                }
            })
        })
        .collect();
    
    futures::future::join_all(tasks).await;
}

Interpreting Results

Performance Comparison

When comparing backends, consider:

Latency vs Throughput Trade-offs
- Lower latency = better for interactive workloads
- Higher throughput = better for batch processing
Consistency
- Lower standard deviation = more predictable performance
- Check p95/p99 for tail latency
Scalability
- How performance changes with dataset size
- Concurrent operation efficiency

Backend Selection Guidelines

Based on benchmark results, choose:

redb when:

Need predictable latency
Working with structured data (separate tables)
Require high concurrent read performance
Memory efficiency is important

sled when:

Need high write throughput
Working with uniform data types
Require lock-free operations
Crash recovery is critical

Memory Profiling

Using DHAT

For detailed memory profiling:

# Install valgrind and dhat
sudo apt-get install valgrind

# Run with DHAT
cargo bench --bench memory_profile -- --profile-time=10

Custom Allocation Tracking

The benchmarks include custom allocation tracking:

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;

fn track_allocations<F>(f: F) -> AllocationStats
where
    F: FnOnce(),
{
    let _profiler = dhat::Profiler::new_heap();
    f();
    // Extract stats from profiler
}

Continuous Benchmarking

Regression Detection

Compare against baseline to detect performance regressions:

# Save current performance as baseline
cargo bench -- --save-baseline v0.1.0

# After changes, compare
cargo bench -- --baseline v0.1.0

# Criterion will highlight significant changes

CI Integration

Add to CI pipeline:

- name: Run Benchmarks
  run: |
    cargo bench --no-fail-fast -- --output-format json > bench-results.json
    
- name: Compare Results
  run: |
    python scripts/compare_benchmarks.py \
      --baseline baseline.json \
      --current bench-results.json \
      --threshold 10  # Fail if >10% regression

Troubleshooting

Common Issues

Inconsistent Results
- Ensure system is idle during benchmarks
- Disable CPU frequency scaling
- Run multiple iterations
Out of Memory
- Reduce dataset sizes
- Run benchmarks sequentially
- Increase system swap space
Slow Benchmarks
- Reduce sample size in Criterion config
- Use --quick flag for faster runs
- Focus on specific benchmarks

Performance Tips

# Quick benchmark run (fewer samples)
cargo bench -- --quick

# Verbose output for debugging
cargo bench -- --verbose

# Profile specific operation
cargo bench -- single_ops/redb/set

Future Enhancements

Potential additions to the benchmark suite:

Transaction Performance: Measure MULTI/EXEC overhead
Encryption Overhead: Compare encrypted vs non-encrypted
Persistence Testing: Measure flush/sync performance
Recovery Time: Database restart and recovery speed
Network Overhead: Redis protocol parsing impact
Long-Running Stability: Performance over extended periods

9.9 KiB Raw Permalink Blame History