update api, fix tests and examples
This commit is contained in:
		
							
								
								
									
										280
									
								
								docs/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										280
									
								
								docs/README.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,280 @@
 | 
			
		||||
# Hero Supervisor Documentation
 | 
			
		||||
 | 
			
		||||
## Overview
 | 
			
		||||
 | 
			
		||||
Hero Supervisor is a distributed job execution system that manages runners and coordinates job processing across multiple worker nodes. It provides a robust OpenRPC API for job management and runner administration.
 | 
			
		||||
 | 
			
		||||
## Architecture
 | 
			
		||||
 | 
			
		||||
The supervisor consists of several key components:
 | 
			
		||||
 | 
			
		||||
- **Supervisor Core**: Central coordinator that manages runners and job dispatch
 | 
			
		||||
- **OpenRPC Server**: JSON-RPC API server for remote management
 | 
			
		||||
- **Redis Backend**: Job queue and state management
 | 
			
		||||
- **Process Manager**: Runner lifecycle management (Simple or Tmux)
 | 
			
		||||
- **Client Libraries**: Native Rust and WASM clients for integration
 | 
			
		||||
 | 
			
		||||
## Quick Start
 | 
			
		||||
 | 
			
		||||
### Starting the Supervisor
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# With default configuration
 | 
			
		||||
./supervisor
 | 
			
		||||
 | 
			
		||||
# With custom configuration file
 | 
			
		||||
./supervisor --config /path/to/config.toml
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Example Configuration
 | 
			
		||||
 | 
			
		||||
```toml
 | 
			
		||||
# config.toml
 | 
			
		||||
redis_url = "redis://localhost:6379"
 | 
			
		||||
namespace = "hero"
 | 
			
		||||
bind_address = "127.0.0.1"
 | 
			
		||||
port = 3030
 | 
			
		||||
 | 
			
		||||
# Admin secrets for full access
 | 
			
		||||
admin_secrets = ["admin-secret-123"]
 | 
			
		||||
 | 
			
		||||
# User secrets for job operations
 | 
			
		||||
user_secrets = ["user-secret-456"]
 | 
			
		||||
 | 
			
		||||
# Register secrets for runner registration
 | 
			
		||||
register_secrets = ["register-secret-789"]
 | 
			
		||||
 | 
			
		||||
[[actors]]
 | 
			
		||||
id = "sal_runner_1"
 | 
			
		||||
name = "sal_runner_1"
 | 
			
		||||
binary_path = "/path/to/sal_runner"
 | 
			
		||||
db_path = "/tmp/sal_db"
 | 
			
		||||
redis_url = "redis://localhost:6379"
 | 
			
		||||
process_manager = "simple"
 | 
			
		||||
 | 
			
		||||
[[actors]]
 | 
			
		||||
id = "osis_runner_1"
 | 
			
		||||
name = "osis_runner_1"
 | 
			
		||||
binary_path = "/path/to/osis_runner"
 | 
			
		||||
db_path = "/tmp/osis_db"
 | 
			
		||||
redis_url = "redis://localhost:6379"
 | 
			
		||||
process_manager = "tmux:osis_session"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## API Documentation
 | 
			
		||||
 | 
			
		||||
### Job API Convention
 | 
			
		||||
 | 
			
		||||
The Hero Supervisor follows a consistent naming convention for job operations:
 | 
			
		||||
 | 
			
		||||
- **`jobs.`** - General job operations (create, list)
 | 
			
		||||
- **`job.`** - Specific job operations (run, start, status, result)
 | 
			
		||||
 | 
			
		||||
See [Job API Convention](job-api-convention.md) for detailed documentation.
 | 
			
		||||
 | 
			
		||||
### Core Methods
 | 
			
		||||
 | 
			
		||||
#### Runner Management
 | 
			
		||||
- `register_runner` - Register a new runner
 | 
			
		||||
- `list_runners` - List all registered runners
 | 
			
		||||
- `start_runner` / `stop_runner` - Control runner lifecycle
 | 
			
		||||
- `get_runner_status` - Get runner status
 | 
			
		||||
- `get_runner_logs` - Retrieve runner logs
 | 
			
		||||
 | 
			
		||||
#### Job Management
 | 
			
		||||
- `jobs.create` - Create a job without queuing
 | 
			
		||||
- `jobs.list` - List all jobs with full details
 | 
			
		||||
- `job.run` - Run a job and return result
 | 
			
		||||
- `job.start` - Start a created job
 | 
			
		||||
- `job.stop` - Stop a running job
 | 
			
		||||
- `job.delete` - Delete a job from the system
 | 
			
		||||
- `job.status` - Get job status (non-blocking)
 | 
			
		||||
- `job.result` - Get job result (blocking)
 | 
			
		||||
 | 
			
		||||
#### Administration
 | 
			
		||||
- `add_secret` / `remove_secret` - Manage authentication secrets
 | 
			
		||||
- `get_supervisor_info` - Get system information
 | 
			
		||||
- `rpc.discover` - OpenRPC specification discovery
 | 
			
		||||
 | 
			
		||||
## Client Usage
 | 
			
		||||
 | 
			
		||||
### Rust Client
 | 
			
		||||
 | 
			
		||||
```rust
 | 
			
		||||
use hero_supervisor_openrpc_client::{SupervisorClient, JobBuilder};
 | 
			
		||||
 | 
			
		||||
// Create client
 | 
			
		||||
let client = SupervisorClient::new("http://localhost:3030")?;
 | 
			
		||||
 | 
			
		||||
// Create a job
 | 
			
		||||
let job = JobBuilder::new()
 | 
			
		||||
    .caller_id("my_client")
 | 
			
		||||
    .context_id("my_context")
 | 
			
		||||
    .payload("print('Hello World')")
 | 
			
		||||
    .executor("osis")
 | 
			
		||||
    .runner("osis_runner_1")
 | 
			
		||||
    .timeout(60)
 | 
			
		||||
    .build()?;
 | 
			
		||||
 | 
			
		||||
// Option 1: Fire-and-forget execution
 | 
			
		||||
let result = client.job_run("user-secret", job.clone()).await?;
 | 
			
		||||
match result {
 | 
			
		||||
    JobResult::Success { success } => println!("Output: {}", success),
 | 
			
		||||
    JobResult::Error { error } => println!("Error: {}", error),
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
// Option 2: Asynchronous execution
 | 
			
		||||
let job_id = client.jobs_create("user-secret", job).await?;
 | 
			
		||||
client.job_start("user-secret", &job_id).await?;
 | 
			
		||||
 | 
			
		||||
// Poll for completion
 | 
			
		||||
loop {
 | 
			
		||||
    let status = client.job_status(&job_id).await?;
 | 
			
		||||
    if status.status == "completed" || status.status == "failed" {
 | 
			
		||||
        break;
 | 
			
		||||
    }
 | 
			
		||||
    tokio::time::sleep(Duration::from_secs(1)).await;
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
let result = client.job_result(&job_id).await?;
 | 
			
		||||
 | 
			
		||||
// Option 3: Job management
 | 
			
		||||
// Stop a running job
 | 
			
		||||
client.job_stop("user-secret", &job_id).await?;
 | 
			
		||||
 | 
			
		||||
// Delete a job
 | 
			
		||||
client.job_delete("user-secret", &job_id).await?;
 | 
			
		||||
 | 
			
		||||
// List all jobs (returns full Job objects)
 | 
			
		||||
let jobs = client.jobs_list("user-secret").await?;
 | 
			
		||||
for job in jobs {
 | 
			
		||||
    println!("Job {}: {} ({})", job.id, job.executor, job.payload);
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### WASM Client
 | 
			
		||||
 | 
			
		||||
```javascript
 | 
			
		||||
import { WasmSupervisorClient, WasmJob } from 'hero-supervisor-openrpc-client';
 | 
			
		||||
 | 
			
		||||
// Create client
 | 
			
		||||
const client = new WasmSupervisorClient('http://localhost:3030');
 | 
			
		||||
 | 
			
		||||
// Create and run job
 | 
			
		||||
const job = new WasmJob('job-id', 'print("Hello")', 'osis', 'osis_runner_1');
 | 
			
		||||
const result = await client.create_job('user-secret', job);
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Security
 | 
			
		||||
 | 
			
		||||
### Authentication Levels
 | 
			
		||||
 | 
			
		||||
1. **Admin Secrets**: Full system access
 | 
			
		||||
   - All runner management operations
 | 
			
		||||
   - All job operations
 | 
			
		||||
   - Secret management
 | 
			
		||||
   - System information access
 | 
			
		||||
 | 
			
		||||
2. **User Secrets**: Job operations only
 | 
			
		||||
   - Create, run, start jobs
 | 
			
		||||
   - Get job status and results
 | 
			
		||||
   - No runner or secret management
 | 
			
		||||
 | 
			
		||||
3. **Register Secrets**: Runner registration only
 | 
			
		||||
   - Register new runners
 | 
			
		||||
   - No other operations
 | 
			
		||||
 | 
			
		||||
### Best Practices
 | 
			
		||||
 | 
			
		||||
- Use different secret types for different access levels
 | 
			
		||||
- Rotate secrets regularly
 | 
			
		||||
- Store secrets securely (environment variables, secret management systems)
 | 
			
		||||
- Use HTTPS in production environments
 | 
			
		||||
- Implement proper logging and monitoring
 | 
			
		||||
 | 
			
		||||
## Development
 | 
			
		||||
 | 
			
		||||
### Building
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# Build supervisor binary
 | 
			
		||||
cargo build --release
 | 
			
		||||
 | 
			
		||||
# Build with OpenRPC feature
 | 
			
		||||
cargo build --release --features openrpc
 | 
			
		||||
 | 
			
		||||
# Build client library
 | 
			
		||||
cd clients/openrpc
 | 
			
		||||
cargo build --release
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Testing
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# Run tests
 | 
			
		||||
cargo test
 | 
			
		||||
 | 
			
		||||
# Run with Redis (requires Redis server)
 | 
			
		||||
docker run -d -p 6379:6379 redis:alpine
 | 
			
		||||
cargo test -- --ignored
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Examples
 | 
			
		||||
 | 
			
		||||
See the `examples/` directory for:
 | 
			
		||||
- Basic supervisor setup
 | 
			
		||||
- Mock runner implementation
 | 
			
		||||
- Comprehensive OpenRPC client usage
 | 
			
		||||
- Configuration examples
 | 
			
		||||
 | 
			
		||||
## Troubleshooting
 | 
			
		||||
 | 
			
		||||
### Common Issues
 | 
			
		||||
 | 
			
		||||
1. **Redis Connection Failed**
 | 
			
		||||
   - Ensure Redis server is running
 | 
			
		||||
   - Check Redis URL in configuration
 | 
			
		||||
   - Verify network connectivity
 | 
			
		||||
 | 
			
		||||
2. **Runner Registration Failed**
 | 
			
		||||
   - Check register secret validity
 | 
			
		||||
   - Verify runner binary path exists
 | 
			
		||||
   - Ensure runner has proper permissions
 | 
			
		||||
 | 
			
		||||
3. **Job Execution Timeout**
 | 
			
		||||
   - Increase job timeout value
 | 
			
		||||
   - Check runner resource availability
 | 
			
		||||
   - Monitor runner logs for issues
 | 
			
		||||
 | 
			
		||||
4. **OpenRPC Method Not Found**
 | 
			
		||||
   - Verify method name spelling
 | 
			
		||||
   - Check OpenRPC specification
 | 
			
		||||
   - Ensure server supports the method
 | 
			
		||||
 | 
			
		||||
### Logging
 | 
			
		||||
 | 
			
		||||
Enable debug logging:
 | 
			
		||||
```bash
 | 
			
		||||
RUST_LOG=debug ./supervisor --config config.toml
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Monitoring
 | 
			
		||||
 | 
			
		||||
Monitor key metrics:
 | 
			
		||||
- Runner status and health
 | 
			
		||||
- Job queue lengths
 | 
			
		||||
- Job success/failure rates
 | 
			
		||||
- Response times
 | 
			
		||||
- Redis connection status
 | 
			
		||||
 | 
			
		||||
## Contributing
 | 
			
		||||
 | 
			
		||||
1. Fork the repository
 | 
			
		||||
2. Create a feature branch
 | 
			
		||||
3. Make changes with tests
 | 
			
		||||
4. Update documentation
 | 
			
		||||
5. Submit a pull request
 | 
			
		||||
 | 
			
		||||
## License
 | 
			
		||||
 | 
			
		||||
[License information here]
 | 
			
		||||
		Reference in New Issue
	
	Block a user