`index_manager.create` is non-atomic; concurrent first-writes per kind can race the on-disk index creation #22

New issue

Open

opened 2026-05-01 15:07:53 +00:00 by mik-tf · 0 comments

mik-tf commented

2026-05-01 15:07:53 +00:00

Owner

Problem

The IndexManager::create path on hero_indexer_server (which leads to tantivy::Index::create_in_dir after std::fs::create_dir_all) is not atomic with respect to concurrent first-write requests for the same kind:

// crates/hero_indexer/src/modules/index_manager.rs:58-93
pub async fn create(
    &self,
    name: &str,
    schema_def: SchemaDefinition,
) -> Result<Arc<IndexHandle>, AppError> {
    let path = self.base_dir.join(name);

    if path.exists() {                                 // (1) check
        return Err(AppError::DatabaseExists { name: name.to_string() });
    }

    // ... spawn_blocking ...
    std::fs::create_dir_all(&path_clone)?;             // (2) mkdir
    let schema = schema_def.to_tantivy_schema()?;
    let index = tantivy::Index::create_in_dir(&path_clone, schema)?;  // (3) write meta.json
    // ...
}

Concurrent first-write requests for the same kind:

Request A arrives: path.exists() → false → spawn_blocking → create_dir_all → begins Index::create_in_dir.
Request B arrives mid-creation (before A's create_in_dir finishes writing meta.json).
B's path.exists() → true (A created it, step 2) → returns DatabaseExists early.
B's caller treats DatabaseExists as benign ("kind already initialized, proceed to write") and issues a db.select + write — but meta.json doesn't exist yet (A hasn't finished step 3).
B's subsequent db.select errors with FileDoesNotExist(meta.json).

This is a server-side TOCTOU between the dir-exists check and the index-meta-written observation. Two parallel calls to create_in_dir against the same path can also corrupt the index state.

Reproduction

Spawn N concurrent first-write requests for the same fresh kind. Some fraction will hit FileDoesNotExist(meta.json) on subsequent operations. In practice this happens during initial backfill of a fresh deployment when many entity kinds get their first write simultaneously, or under cold-start load when an admin reindex RPC fans out across many kinds.

Workaround (downstream)

Serialize per-kind first-write client-side. Hold a mutex across the entire (cache-check + db.create + cache-insert) sequence so a single flight per kind reaches the server. Pseudocode:

let mut cache = INITIALIZED_KINDS.lock().await;
if !cache.contains(kind) {
    indexer_client.db_create(kind, schema).await?;
    cache.insert(kind.to_string());
}
// cache mutex held until insertion — no second client can race

This is what downstream consumers have shipped to paper over the issue.

Suggested fix (server-side)

Make IndexManager::create atomic. Two options:

Per-kind in-flight set. Hold an RwLock<HashMap<String, ...>> keyed by kind name across the whole create operation; concurrent requests for the same kind block on it until the first completes.
create_or_get instead of create. Single idempotent operation that returns the existing index if creation is in flight (or completed), OR completes creation if not started. Removes the client-side round-trip and the TOCTOU window entirely.

Option 2 is preferred because it eliminates the need for clients to do their own caching at all.

Impact

Currently every consumer that does parallel first-writes per kind must implement the client-side serialization workaround. Without it, backfill flows fail intermittently in a way that's hard to diagnose (the error surfaces on a later db.select, not on the racing db.create).

Not blocking — workaround is small. Filing as ecosystem-correctness signal so the next consumer doesn't independently rediscover this.

## Problem The `IndexManager::create` path on `hero_indexer_server` (which leads to `tantivy::Index::create_in_dir` after `std::fs::create_dir_all`) is not atomic with respect to concurrent first-write requests for the same kind: ```rust // crates/hero_indexer/src/modules/index_manager.rs:58-93 pub async fn create( &self, name: &str, schema_def: SchemaDefinition, ) -> Result<Arc<IndexHandle>, AppError> { let path = self.base_dir.join(name); if path.exists() { // (1) check return Err(AppError::DatabaseExists { name: name.to_string() }); } // ... spawn_blocking ... std::fs::create_dir_all(&path_clone)?; // (2) mkdir let schema = schema_def.to_tantivy_schema()?; let index = tantivy::Index::create_in_dir(&path_clone, schema)?; // (3) write meta.json // ... } ``` Concurrent first-write requests for the same kind: 1. Request A arrives: `path.exists()` → false → spawn_blocking → `create_dir_all` → begins `Index::create_in_dir`. 2. Request B arrives mid-creation (before A's `create_in_dir` finishes writing `meta.json`). 3. B's `path.exists()` → **true** (A created it, step 2) → returns `DatabaseExists` early. 4. B's caller treats `DatabaseExists` as benign ("kind already initialized, proceed to write") and issues a `db.select` + write — but `meta.json` doesn't exist yet (A hasn't finished step 3). 5. B's subsequent `db.select` errors with `FileDoesNotExist(meta.json)`. This is a server-side TOCTOU between the dir-exists check and the index-meta-written observation. Two parallel calls to `create_in_dir` against the same path can also corrupt the index state. ## Reproduction Spawn N concurrent first-write requests for the same fresh kind. Some fraction will hit `FileDoesNotExist(meta.json)` on subsequent operations. In practice this happens during initial backfill of a fresh deployment when many entity kinds get their first write simultaneously, or under cold-start load when an admin reindex RPC fans out across many kinds. ## Workaround (downstream) Serialize per-kind first-write client-side. Hold a mutex across the entire (cache-check + db.create + cache-insert) sequence so a single flight per kind reaches the server. Pseudocode: ```rust let mut cache = INITIALIZED_KINDS.lock().await; if !cache.contains(kind) { indexer_client.db_create(kind, schema).await?; cache.insert(kind.to_string()); } // cache mutex held until insertion — no second client can race ``` This is what downstream consumers have shipped to paper over the issue. ## Suggested fix (server-side) Make `IndexManager::create` atomic. Two options: 1. **Per-kind in-flight set.** Hold an `RwLock<HashMap<String, ...>>` keyed by kind name across the whole create operation; concurrent requests for the same kind block on it until the first completes. 2. **`create_or_get` instead of `create`.** Single idempotent operation that returns the existing index if creation is in flight (or completed), OR completes creation if not started. Removes the client-side round-trip and the TOCTOU window entirely. Option 2 is preferred because it eliminates the need for clients to do their own caching at all. ## Impact Currently every consumer that does parallel first-writes per kind must implement the client-side serialization workaround. Without it, backfill flows fail intermittently in a way that's hard to diagnose (the error surfaces on a later `db.select`, not on the racing `db.create`). Not blocking — workaround is small. Filing as ecosystem-correctness signal so the next consumer doesn't independently rediscover this.