# Full-Text Search with Tantivy

HeroDB includes powerful full-text search capabilities powered by [Tantivy](https://github.com/quickwit-oss/tantivy), a fast full-text search engine library written in Rust. This provides Redis-compatible search commands similar to RediSearch.

## Overview

The search functionality allows you to:
- Create search indexes with custom schemas
- Index documents with multiple field types
- Perform complex queries with filters
- Support for text, numeric, date, and geographic data
- Real-time search with high performance

## Search Commands

### FT.CREATE - Create Search Index

Create a new search index with a defined schema.

```bash
FT.CREATE index_name SCHEMA field_name field_type [options] [field_name field_type [options] ...]
```

**Field Types:**
- `TEXT` - Full-text searchable text fields
- `NUMERIC` - Numeric fields (integers, floats)
- `TAG` - Tag fields for exact matching
- `GEO` - Geographic coordinates (lat,lon)
- `DATE` - Date/timestamp fields

**Field Options:**
- `STORED` - Store field value for retrieval
- `INDEXED` - Make field searchable
- `TOKENIZED` - Enable tokenization for text fields
- `FAST` - Enable fast access for numeric fields

**Example:**
```bash
# Create a product search index
FT.CREATE products SCHEMA 
  title TEXT STORED INDEXED TOKENIZED
  description TEXT STORED INDEXED TOKENIZED  
  price NUMERIC STORED INDEXED FAST
  category TAG STORED
  location GEO STORED
  created_date DATE STORED INDEXED
```

### FT.ADD - Add Document to Index

Add a document to a search index.

```bash
FT.ADD index_name doc_id [SCORE score] FIELDS field_name field_value [field_name field_value ...]
```

**Example:**
```bash
# Add a product document
FT.ADD products product:1 SCORE 1.0 FIELDS 
  title "Wireless Headphones" 
  description "High-quality wireless headphones with noise cancellation"
  price 199.99
  category "electronics"
  location "37.7749,-122.4194"
  created_date 1640995200000
```

### FT.SEARCH - Search Documents

Search for documents in an index.

```bash
FT.SEARCH index_name query [LIMIT offset count] [FILTER field min max] [RETURN field [field ...]]
```

**Query Syntax:**
- Simple terms: `wireless headphones`
- Phrase queries: `"noise cancellation"`
- Field-specific: `title:wireless`
- Boolean operators: `wireless AND headphones`
- Wildcards: `head*`

**Examples:**
```bash
# Simple text search
FT.SEARCH products "wireless headphones"

# Search with filters
FT.SEARCH products "headphones" FILTER price 100 300 LIMIT 0 10

# Field-specific search
FT.SEARCH products "title:wireless AND category:electronics"

# Return specific fields only
FT.SEARCH products "*" RETURN title price
```

### FT.DEL - Delete Document

Remove a document from the search index.

```bash
FT.DEL index_name doc_id
```

**Example:**
```bash
FT.DEL products product:1
```

### FT.INFO - Get Index Information

Get information about a search index.

```bash
FT.INFO index_name
```

**Returns:**
- Index name and document count
- Field definitions and types
- Index configuration

**Example:**
```bash
FT.INFO products
```

### FT.DROP - Drop Index

Delete an entire search index.

```bash
FT.DROP index_name
```

**Example:**
```bash
FT.DROP products
```

### FT.ALTER - Alter Index Schema

Add new fields to an existing index.

```bash
FT.ALTER index_name SCHEMA ADD field_name field_type [options]
```

**Example:**
```bash
FT.ALTER products SCHEMA ADD brand TAG STORED
```

### FT.AGGREGATE - Aggregate Search Results

Perform aggregations on search results.

```bash
FT.AGGREGATE index_name query [GROUPBY field] [REDUCE function field AS alias]
```

**Example:**
```bash
# Group products by category and count
FT.AGGREGATE products "*" GROUPBY category REDUCE COUNT 0 AS count
```

## Field Types in Detail

### TEXT Fields
- **Purpose**: Full-text search on natural language content
- **Features**: Tokenization, stemming, stop-word removal
- **Options**: `STORED`, `INDEXED`, `TOKENIZED`
- **Example**: Product titles, descriptions, content

### NUMERIC Fields  
- **Purpose**: Numeric data for range queries and sorting
- **Types**: I64, U64, F64
- **Options**: `STORED`, `INDEXED`, `FAST`
- **Example**: Prices, quantities, ratings

### TAG Fields
- **Purpose**: Exact-match categorical data
- **Features**: No tokenization, exact string matching
- **Options**: `STORED`, case sensitivity control
- **Example**: Categories, brands, status values

### GEO Fields
- **Purpose**: Geographic coordinates
- **Format**: "latitude,longitude" (e.g., "37.7749,-122.4194")
- **Features**: Geographic distance queries
- **Options**: `STORED`

### DATE Fields
- **Purpose**: Timestamp and date data
- **Format**: Unix timestamp in milliseconds
- **Features**: Range queries, temporal filtering
- **Options**: `STORED`, `INDEXED`, `FAST`

## Search Query Syntax

### Basic Queries
```bash
# Single term
FT.SEARCH products "wireless"

# Multiple terms (AND by default)
FT.SEARCH products "wireless headphones"

# Phrase query
FT.SEARCH products "\"noise cancellation\""
```

### Field-Specific Queries
```bash
# Search in specific field
FT.SEARCH products "title:wireless"

# Multiple field queries
FT.SEARCH products "title:wireless AND description:bluetooth"
```

### Boolean Operators
```bash
# AND operator
FT.SEARCH products "wireless AND headphones"

# OR operator  
FT.SEARCH products "wireless OR bluetooth"

# NOT operator
FT.SEARCH products "headphones NOT wired"
```

### Wildcards and Fuzzy Search
```bash
# Wildcard search
FT.SEARCH products "head*"

# Fuzzy search (approximate matching)
FT.SEARCH products "%headphone%"
```

### Range Queries
```bash
# Numeric range in query
FT.SEARCH products "@price:[100 300]"

# Date range
FT.SEARCH products "@created_date:[1640995200000 1672531200000]"
```

## Filtering and Sorting

### FILTER Clause
```bash
# Numeric filter
FT.SEARCH products "headphones" FILTER price 100 300

# Multiple filters
FT.SEARCH products "*" FILTER price 100 500 FILTER rating 4 5
```

### LIMIT Clause
```bash
# Pagination
FT.SEARCH products "wireless" LIMIT 0 10    # First 10 results
FT.SEARCH products "wireless" LIMIT 10 10   # Next 10 results
```

### RETURN Clause
```bash
# Return specific fields
FT.SEARCH products "*" RETURN title price

# Return all stored fields (default)
FT.SEARCH products "*"
```

## Performance Considerations

### Indexing Strategy
- Only index fields you need to search on
- Use `FAST` option for frequently filtered numeric fields
- Consider storage vs. search performance trade-offs

### Query Optimization
- Use specific field queries when possible
- Combine filters with text queries for better performance
- Use pagination with LIMIT for large result sets

### Memory Usage
- Tantivy indexes are memory-mapped for performance
- Index size depends on document count and field configuration
- Monitor disk space for index storage

## Integration with Redis Commands

Search indexes work alongside regular Redis data:

```bash
# Store product data in Redis hash
HSET product:1 title "Wireless Headphones" price "199.99"

# Index the same data for search
FT.ADD products product:1 FIELDS title "Wireless Headphones" price 199.99

# Search returns document IDs that can be used with Redis commands
FT.SEARCH products "wireless"
# Returns: product:1

# Retrieve full data using Redis
HGETALL product:1
```

## Example Use Cases

### E-commerce Product Search
```bash
# Create product catalog index
FT.CREATE catalog SCHEMA 
  name TEXT STORED INDEXED TOKENIZED
  description TEXT INDEXED TOKENIZED
  price NUMERIC STORED INDEXED FAST
  category TAG STORED
  brand TAG STORED
  rating NUMERIC STORED FAST

# Add products
FT.ADD catalog prod:1 FIELDS name "iPhone 14" price 999 category "phones" brand "apple" rating 4.5
FT.ADD catalog prod:2 FIELDS name "Samsung Galaxy" price 899 category "phones" brand "samsung" rating 4.3

# Search queries
FT.SEARCH catalog "iPhone"
FT.SEARCH catalog "phones" FILTER price 800 1000
FT.SEARCH catalog "@brand:apple"
```

### Content Management
```bash
# Create content index
FT.CREATE content SCHEMA
  title TEXT STORED INDEXED TOKENIZED
  body TEXT INDEXED TOKENIZED
  author TAG STORED
  published DATE STORED INDEXED
  tags TAG STORED

# Search content
FT.SEARCH content "machine learning"
FT.SEARCH content "@author:john AND @tags:ai"
FT.SEARCH content "*" FILTER published 1640995200000 1672531200000
```

### Geographic Search
```bash
# Create location-based index
FT.CREATE places SCHEMA
  name TEXT STORED INDEXED TOKENIZED
  location GEO STORED
  type TAG STORED

# Add locations
FT.ADD places place:1 FIELDS name "Golden Gate Bridge" location "37.8199,-122.4783" type "landmark"

# Geographic queries (future feature)
FT.SEARCH places "@location:[37.7749 -122.4194 10 km]"
```

## Error Handling

Common error responses:
- `ERR index not found` - Index doesn't exist
- `ERR field not found` - Field not defined in schema
- `ERR invalid query syntax` - Malformed query
- `ERR document not found` - Document ID doesn't exist

## Best Practices

1. **Schema Design**: Plan your schema carefully - changes require reindexing
2. **Field Selection**: Only store and index fields you actually need
3. **Batch Operations**: Add multiple documents efficiently
4. **Query Testing**: Test queries for performance with realistic data
5. **Monitoring**: Monitor index size and query performance
6. **Backup**: Include search indexes in backup strategies

## Future Enhancements

Planned features:
- Geographic distance queries
- Advanced aggregations and faceting
- Highlighting of search results
- Synonyms and custom analyzers
- Real-time suggestions and autocomplete
- Index replication and sharding