Squashed 'components/zinit/' content from commit 1b76c06

git-subtree-dir: components/zinit
git-subtree-split: 1b76c062fe31d552d1b7b23484ce163995a81482
This commit is contained in:
2025-08-16 21:12:16 +02:00
commit 2fda71af11
48 changed files with 11203 additions and 0 deletions

256
docs/cmd.md Normal file
View File

@@ -0,0 +1,256 @@
# Zinit Command Line Reference
This document provides a comprehensive reference for all Zinit command line options and commands.
## Command Structure
Zinit uses a command-based CLI with the following general structure:
```bash
zinit [FLAGS] [OPTIONS] [SUBCOMMAND]
```
## Global Flags and Options
These flags and options apply to all Zinit commands:
| Flag/Option | Description |
|-------------|-------------|
| `-d, --debug` | Run in debug mode with increased verbosity |
| `-h, --help` | Display help information |
| `-V, --version` | Display version information |
| `-s, --socket <PATH>` | Path to Unix socket (default: `/var/run/zinit.sock`) |
## Subcommands
### Main Mode
#### `init`
Run Zinit in init mode, starting and maintaining configured services.
```bash
zinit init [FLAGS] [OPTIONS]
```
**Flags:**
- `--container`: Run in container mode, exiting on signal instead of rebooting
**Options:**
- `-c, --config <DIR>`: Service configurations directory (default: `/etc/zinit/`)
- `-b, --buffer <SIZE>`: Buffer size (in lines) to keep service logs (default: `2000`)
**Example:**
```bash
# Run in init mode with custom config directory
zinit init -c /opt/services/
# Run in container mode
zinit init --container
```
### Service Management
#### `list`
Display a quick view of all currently known services and their status.
```bash
zinit list
```
**Output:**
A JSON object with service names as keys and their status as values.
**Example:**
```bash
# List all services
zinit list
```
#### `status`
Show detailed status information for a specific service.
```bash
zinit status <SERVICE>
```
**Arguments:**
- `<SERVICE>`: Name of the service to show status for
**Example:**
```bash
# Check status of redis service
zinit status redis
```
#### `start`
Start a service. Has no effect if the service is already running.
```bash
zinit start <SERVICE>
```
**Arguments:**
- `<SERVICE>`: Name of the service to start
**Example:**
```bash
# Start the nginx service
zinit start nginx
```
#### `stop`
Stop a service. Sets the target state to "down" and sends the stop signal.
```bash
zinit stop <SERVICE>
```
**Arguments:**
- `<SERVICE>`: Name of the service to stop
**Example:**
```bash
# Stop the redis service
zinit stop redis
```
#### `restart`
Restart a service. If it fails to stop, it will be killed and then started again.
```bash
zinit restart <SERVICE>
```
**Arguments:**
- `<SERVICE>`: Name of the service to restart
**Example:**
```bash
# Restart the web service
zinit restart web
```
#### `monitor`
Start monitoring a service. The configuration is loaded from the server's config directory.
```bash
zinit monitor <SERVICE>
```
**Arguments:**
- `<SERVICE>`: Name of the service to monitor
**Example:**
```bash
# Monitor the database service
zinit monitor database
```
#### `forget`
Remove a service from monitoring. You can only forget a stopped service.
```bash
zinit forget <SERVICE>
```
**Arguments:**
- `<SERVICE>`: Name of the service to forget
**Example:**
```bash
# Forget the backup service
zinit forget backup
```
#### `kill`
Send a signal to a running service.
```bash
zinit kill <SERVICE> <SIGNAL>
```
**Arguments:**
- `<SERVICE>`: Name of the service to send signal to
- `<SIGNAL>`: Signal name (e.g., SIGTERM, SIGKILL, SIGINT)
**Example:**
```bash
# Send SIGTERM to the redis service
zinit kill redis SIGTERM
# Send SIGKILL to force terminate a service
zinit kill stuck-service SIGKILL
```
### System Operations
#### `shutdown`
Stop all services in dependency order and power off the system.
```bash
zinit shutdown
```
**Example:**
```bash
# Shutdown the system
zinit shutdown
```
#### `reboot`
Stop all services in dependency order and reboot the system.
```bash
zinit reboot
```
**Example:**
```bash
# Reboot the system
zinit reboot
```
### Logging
#### `log`
View service logs from the Zinit ring buffer.
```bash
zinit log [FLAGS] [FILTER]
```
**Flags:**
- `-s, --snapshot`: If set, log prints current buffer without following
**Arguments:**
- `[FILTER]`: Optional service name to filter logs for
**Examples:**
```bash
# View logs for all services and follow new logs
zinit log
# View current logs for the nginx service without following
zinit log -s nginx
```
## Exit Codes
Zinit commands return the following exit codes:
| Code | Description |
|------|-------------|
| 0 | Success |
| 1 | Error (with error message printed to stderr) |

197
docs/installation.md Normal file
View File

@@ -0,0 +1,197 @@
# Installing Zinit
This guide provides detailed instructions for installing Zinit on various platforms.
## System Requirements
Zinit has minimal system requirements:
- Linux-based operating system
- Root access (for running as init system)
## Pre-built Binaries
If pre-built binaries are available for your system, you can install them directly:
```bash
# Download the binary (replace with actual URL)
wget https://github.com/threefoldtech/zinit/releases/download/vX.Y.Z/zinit-x86_64-unknown-linux-musl
# Make it executable
chmod +x zinit-x86_64-unknown-linux-musl
# Move to a location in your PATH
sudo mv zinit-x86_64-unknown-linux-musl /usr/local/bin/zinit
```
## Building from Source
### Prerequisites
To build Zinit from source, you'll need:
- Rust toolchain (1.46.0 or later recommended)
- musl and musl-tools packages
- GNU Make
#### Install Rust
If you don't have Rust installed, use rustup:
```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
```
#### Install musl development tools
On Debian/Ubuntu:
```bash
sudo apt update
sudo apt install musl musl-tools
```
On Fedora:
```bash
sudo dnf install musl musl-devel
```
On Alpine Linux (musl is already the default libc):
```bash
apk add build-base
```
### Build Process
1. Clone the repository:
```bash
git clone https://github.com/threefoldtech/zinit.git
cd zinit
```
2. Build using make:
```bash
make
```
This will create a statically linked binary at `target/x86_64-unknown-linux-musl/release/zinit`.
3. Install the binary:
```bash
sudo cp target/x86_64-unknown-linux-musl/release/zinit /usr/local/bin/
```
### Development Build
For development or debugging:
```bash
make dev
```
## Docker Installation
### Using the Provided Dockerfile
Zinit includes a test Docker image:
```bash
# Build the Docker image
make docker
# Run the container
docker run -dt --device=/dev/kmsg:/dev/kmsg:rw zinit
```
> Don't forget to port-forward a port to get access to the Zinit proxy using the `-p XXXX:YYYY` flag when running the container.
### Custom Docker Setup
To create your own Dockerfile with Zinit:
```dockerfile
FROM alpine:latest
# Install dependencies if needed
RUN apk add --no-cache bash curl
# Copy the zinit binary
COPY zinit /usr/local/bin/zinit
RUN chmod +x /usr/local/bin/zinit
# Create configuration directory
RUN mkdir -p /etc/zinit
# Add your service configurations
COPY services/*.yaml /etc/zinit/
# Set zinit as the entrypoint
ENTRYPOINT ["/usr/local/bin/zinit", "init", "--container"]
```
## Using Zinit as the Init System
To use Zinit as the init system (PID 1) on a Linux system:
### On a Standard Linux System
1. Install Zinit as described above
2. Create your service configurations in `/etc/zinit/`
3. Configure your bootloader to use zinit as init
For GRUB, add `init=/usr/local/bin/zinit` to the kernel command line:
```bash
# Edit GRUB configuration
sudo nano /etc/default/grub
# Add init parameter to GRUB_CMDLINE_LINUX
# Example:
# GRUB_CMDLINE_LINUX="init=/usr/local/bin/zinit"
# Update GRUB
sudo update-grub
```
### In a Container Environment
For containers, simply set Zinit as the entrypoint:
```bash
docker run -dt --device=/dev/kmsg:/dev/kmsg:rw \
--entrypoint /usr/local/bin/zinit \
your-image init --container
```
## First-time Setup
After installation, you'll need to create a basic configuration:
1. Create the configuration directory:
```bash
sudo mkdir -p /etc/zinit
```
2. Create a simple service configuration:
```bash
cat << EOF | sudo tee /etc/zinit/hello.yaml
exec: "echo 'Hello from Zinit!'"
oneshot: true
EOF
```
3. Test Zinit without running as init:
```bash
# For testing only - doesn't replace system init
sudo zinit init
```
If all is working correctly, you should see Zinit start and run your service.

78
docs/osx_cross_compile.md Normal file
View File

@@ -0,0 +1,78 @@
# macOS Guide for Zinit
This guide covers both building Zinit natively on macOS and cross-compiling from macOS to Linux targets.
## Building Zinit Natively on macOS
Zinit can now be built and run directly on macOS. The code has been updated to handle platform-specific differences between Linux and macOS.
### Building for macOS
```bash
# Build a release version for macOS
make release-macos
# Install to ~/hero/bin (if it exists)
make install-macos
```
The native macOS build provides most of Zinit's functionality, with the following limitations:
- System reboot and shutdown operations are not supported (they will exit the process instead)
- Some Linux-specific features are disabled
## Cross-Compilation from macOS to Linux
This section outlines the steps to set up your macOS environment for cross-compiling Rust projects to the `aarch64-unknown-linux-musl` target. This is particularly useful for building binaries that can run on ARM-based Linux systems (e.g., Raspberry Pi, AWS Graviton) using musl libc.
## Prerequisites
* Homebrew (https://brew.sh/) installed on your macOS system.
* Rust and Cargo installed (e.g., via `rustup`).
## Step 1: Install the `aarch64-linux-musl-gcc` Toolchain
The `aarch64-linux-musl-gcc` toolchain is required for linking when cross-compiling to `aarch64-unknown-linux-musl`. You can install it using Homebrew:
```bash
brew install messense/macos-cross-toolchains/aarch64-linux-musl-cross
```
## Step 2: Link `musl-gcc`
Some build scripts or tools might look for `musl-gcc`. To ensure compatibility, create a symbolic link:
```bash
sudo ln -s /opt/homebrew/bin/aarch64-linux-musl-gcc /opt/homebrew/bin/musl-gcc
```
You might be prompted for your system password to complete this operation.
## Step 3: Add the Rust Target
Add the `aarch64-unknown-linux-musl` target to your Rust toolchain:
```bash
rustup target add aarch64-unknown-linux-musl
```
## Step 4: Build Your Project
Now you can build your Rust project for the `aarch64-unknown-linux-musl` target using Cargo:
```bash
cargo build --release --target aarch64-unknown-linux-musl
```
Alternatively, if you are using the provided `Makefile`, you can use the new target:
```bash
make release-aarch64-musl
```
This will produce a release binary located in `target/aarch64-unknown-linux-musl/release/`.
## Step 5: copy to osx hero bin
```bash
cp target/aarch64-unknown-linux-musl/release/zinit ~/hero/bin
```

217
docs/services.md Normal file
View File

@@ -0,0 +1,217 @@
# Service Configuration Format
This document describes the structure and options for Zinit service configuration files.
## File Format
Zinit uses YAML files for service configuration. Each service has its own configuration file stored in the Zinit configuration directory (default: `/etc/zinit`).
### File Naming and Location
- **Location**: `/etc/zinit/` (default, can be changed with `-c` flag)
- on osx `~/hero/cfg/zinit`
- **Naming**: `<service-name>.yaml`
For example:
- `/etc/zinit/nginx.yaml`
- `/etc/zinit/redis.yaml`
## Configuration Schema
Service configuration files use the following schema:
```yaml
# Command to run (required)
exec: "command line to start service"
# Command to test if service is running (optional)
test: "command line to test service"
# Whether the service should be restarted (optional, default: false)
oneshot: true|false
# Maximum time to wait for service to stop during shutdown (optional, default: 10)
shutdown_timeout: 30
# Services that must be running before this one starts (optional)
after:
- service1_name
- service2_name
# Signals configuration (optional)
signal:
stop: SIGKILL # signal sent on 'stop' action (default: SIGTERM)
# Log handling configuration (optional, default: ring)
log: null|ring|stdout
# Environment variables for the service (optional)
env:
KEY1: "VALUE1"
KEY2: "VALUE2"
# Working directory for the service (optional)
dir: "/path/to/working/directory"
```
## Configuration Options
### Required Fields
| Field | Description |
|-------|-------------|
| `exec` | Command line to execute when starting the service |
### Optional Fields
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `test` | String | - | Command to determine if service is running |
| `oneshot` | Boolean | `false` | If true, service won't be restarted after exit |
| `shutdown_timeout` | Integer | 10 | Seconds to wait for service to stop during shutdown |
| `after` | String[] | `[]` | List of services that must be running first |
| `signal.stop` | String | `"sigterm"` | Signal to send when stopping the service |
| `log` | Enum | `ring` | How to handle service output (null, ring, stdout) |
| `env` | Object | `{}` | Environment variables to pass to the service |
| `dir` | String | `""` | Working directory for the service |
## Field Details
### exec
The command to run when starting the service. This is the only required field in the configuration.
```yaml
exec: "/usr/bin/redis-server --port 6379"
```
Shell-style commands are supported:
```yaml
exec: "sh -c 'echo Starting service && /usr/local/bin/myservice'"
```
### test
Command that tests whether the service is running properly. Zinit runs this command periodically until it succeeds (exit code 0), at which point the service is considered running.
```yaml
test: "redis-cli -p 6379 PING"
```
If no test command is provided, the service is considered running as soon as it's started.
### oneshot
When set to `true`, the service will not be automatically restarted when it exits. This is useful for initialization tasks or commands that should run only once.
```yaml
oneshot: true
```
Services that depend on a oneshot service will start only after the oneshot service has exited successfully.
### shutdown_timeout
How long (in seconds) to wait for the service to stop during system shutdown before giving up:
```yaml
shutdown_timeout: 30 # Wait up to 30 seconds
```
### after
List of service names that must be running (or completed successfully for oneshot services) before this service starts:
```yaml
after:
- networking
- database
```
### signal
Custom signals to use for operations. Currently, only the `stop` signal is configurable:
```yaml
signal:
stop: SIGKILL # Use SIGKILL instead of default SIGTERM
```
Valid signal names follow the standard UNIX signal naming (SIGTERM, SIGKILL, SIGINT, etc).
### log
How to handle stdout/stderr output from the service:
```yaml
log: stdout # Print output to zinit's stdout
```
Options:
- `null`: Ignore all service output (like redirecting to /dev/null)
- `ring`: Store logs in the kernel ring buffer with service name prefix (default)
- `stdout`: Send service output to zinit's stdout
> **Note**: To use `ring` inside Docker, make sure to add the `kmsg` device:
> ```
> docker run -dt --device=/dev/kmsg:/dev/kmsg:rw zinit
> ```
### env
Additional environment variables for the service. These are added to the existing environment:
```yaml
env:
PORT: "8080"
DEBUG: "true"
NODE_ENV: "production"
```
### dir
Working directory for the service process:
```yaml
dir: "/var/lib/myservice"
```
If not specified, the process inherits zinit's working directory.
## Example Configurations
### Web Server
```yaml
exec: "/usr/bin/nginx -g 'daemon off;'"
test: "curl -s http://localhost > /dev/null"
after:
- networking
log: stdout
```
### Database Initialization
```yaml
exec: "sh -c 'echo Creating database schema && /usr/bin/db-migrate'"
oneshot: true
dir: "/opt/myapp"
env:
DB_HOST: "localhost"
DB_USER: "admin"
```
### Application with Dependencies
```yaml
exec: "/usr/bin/myapp --config /etc/myapp.conf"
test: "curl -s http://localhost:8080/health > /dev/null"
after:
- database
- cache
signal:
stop: SIGINT # Use SIGINT for graceful shutdown
env:
PORT: "8080"
shutdown_timeout: 20

View File

@@ -0,0 +1,366 @@
# Zinit Shutdown Functionality Improvement Plan
## Current Issues
1. **Incomplete Child Process Termination**: When services are stopped, child processes may remain running.
2. **Lack of Verification**: There's no verification that all processes are actually terminated.
3. **Improper Graceful Shutdown**: Zinit doesn't wait for all processes to terminate before exiting.
## Solution Overview
We'll implement a robust shutdown mechanism that:
1. Uses our stats functionality to detect all child processes
2. Properly manages process groups
3. Verifies all processes are terminated before Zinit exits
## Implementation Plan
```mermaid
flowchart TD
A[Enhance stop method] --> B[Improve kill_process_tree]
B --> C[Add process verification]
C --> D[Implement graceful shutdown]
A1[Use stats to detect child processes] --> A
A2[Send signals to all processes] --> A
A3[Implement cascading termination] --> A
B1[Ensure proper process group handling] --> B
B2[Add timeout and escalation logic] --> B
C1[Create verification mechanism] --> C
C2[Add polling for process existence] --> C
D1[Wait for all processes to terminate] --> D
D2[Add cleanup of resources] --> D
D3[Implement clean exit] --> D
```
## Detailed Implementation Steps
### 1. Enhance the `stop` Method in `LifecycleManager`
```rust
pub async fn stop<S: AsRef<str>>(&self, name: S) -> Result<()> {
// Get service information
let table = self.services.read().await;
let service = table.get(name.as_ref())
.ok_or_else(|| ZInitError::unknown_service(name.as_ref()))?;
let mut service = service.write().await;
service.set_target(Target::Down);
// Get the main process PID
let pid = service.pid;
if pid.as_raw() == 0 {
return Ok(());
}
// Get the signal to use
let signal = signal::Signal::from_str(&service.service.signal.stop.to_uppercase())
.map_err(|err| anyhow::anyhow!("unknown stop signal: {}", err))?;
// Release the lock before potentially long-running operations
drop(service);
drop(table);
// Get all child processes using our stats functionality
let children = self.get_child_process_stats(pid.as_raw()).await?;
// First try to stop the process group
let _ = self.pm.signal(pid, signal);
// Wait a short time for processes to terminate gracefully
sleep(std::time::Duration::from_millis(500)).await;
// Check if processes are still running and use SIGKILL if needed
self.ensure_processes_terminated(pid.as_raw(), &children).await?;
Ok(())
}
```
### 2. Add a New `ensure_processes_terminated` Method
```rust
async fn ensure_processes_terminated(&self, parent_pid: i32, children: &[ProcessStats]) -> Result<()> {
// Check if parent is still running
let parent_running = self.is_process_running(parent_pid).await?;
// If parent is still running, send SIGKILL
if parent_running {
debug!("Process {} still running after SIGTERM, sending SIGKILL", parent_pid);
let _ = self.pm.signal(Pid::from_raw(parent_pid), signal::Signal::SIGKILL);
}
// Check and kill any remaining child processes
for child in children {
if self.is_process_running(child.pid).await? {
debug!("Child process {} still running, sending SIGKILL", child.pid);
let _ = signal::kill(Pid::from_raw(child.pid), signal::Signal::SIGKILL);
}
}
// Verify all processes are gone
let mut retries = 5;
while retries > 0 {
let mut all_terminated = true;
// Check parent
if self.is_process_running(parent_pid).await? {
all_terminated = false;
}
// Check children
for child in children {
if self.is_process_running(child.pid).await? {
all_terminated = false;
break;
}
}
if all_terminated {
return Ok(());
}
// Wait before retrying
sleep(std::time::Duration::from_millis(100)).await;
retries -= 1;
}
// If we get here, some processes might still be running
warn!("Some processes may still be running after shutdown attempts");
Ok(())
}
```
### 3. Add a Helper Method to Check if a Process is Running
```rust
async fn is_process_running(&self, pid: i32) -> Result<bool> {
// Use sysinfo to check if process exists
let mut system = System::new();
let sys_pid = sysinfo::Pid::from(pid as usize);
system.refresh_process(sys_pid);
Ok(system.process(sys_pid).is_some())
}
```
### 4. Improve the `kill_process_tree` Method
```rust
#[cfg(target_os = "linux")]
async fn kill_process_tree(
&self,
mut dag: ProcessDAG,
mut state_channels: HashMap<String, Watcher<State>>,
mut shutdown_timeouts: HashMap<String, u64>,
) -> Result<()> {
let (tx, mut rx) = mpsc::unbounded_channel();
tx.send(DUMMY_ROOT.into())?;
let mut count = dag.count;
while let Some(name) = rx.recv().await {
debug!("{} has been killed (or was inactive) adding its children", name);
for child in dag.adj.get(&name).unwrap_or(&Vec::new()) {
let child_indegree: &mut u32 = dag.indegree.entry(child.clone()).or_default();
*child_indegree -= 1;
debug!("decrementing child {} indegree to {}", child, child_indegree);
if *child_indegree == 0 {
let watcher = state_channels.remove(child);
if watcher.is_none() {
// not an active service
tx.send(child.to_string())?;
continue;
}
let shutdown_timeout = shutdown_timeouts.remove(child);
let lifecycle = self.clone_lifecycle();
// Spawn a task to kill the service and wait for it to terminate
let kill_task = tokio::spawn(Self::kill_wait_enhanced(
lifecycle,
child.to_string(),
tx.clone(),
watcher.unwrap(),
shutdown_timeout.unwrap_or(config::DEFAULT_SHUTDOWN_TIMEOUT),
));
// Add a timeout to ensure we don't wait forever
let _ = tokio::time::timeout(
std::time::Duration::from_secs(shutdown_timeout.unwrap_or(config::DEFAULT_SHUTDOWN_TIMEOUT) + 2),
kill_task
).await;
}
}
count -= 1;
if count == 0 {
break;
}
}
// Final verification that all processes are gone
self.verify_all_processes_terminated().await?;
Ok(())
}
```
### 5. Add an Enhanced `kill_wait` Method
```rust
#[cfg(target_os = "linux")]
async fn kill_wait_enhanced(
self,
name: String,
ch: mpsc::UnboundedSender<String>,
mut rx: Watcher<State>,
shutdown_timeout: u64,
) {
debug!("kill_wait {}", name);
// Try to stop the service gracefully
let stop_result = self.stop(name.clone()).await;
// Wait for the service to become inactive or timeout
let fut = timeout(
std::time::Duration::from_secs(shutdown_timeout),
async move {
while let Some(state) = rx.next().await {
if !state.is_active() {
return;
}
}
},
);
match stop_result {
Ok(_) => {
let _ = fut.await;
}
Err(e) => error!("couldn't stop service {}: {}", name.clone(), e),
}
// Verify the service is actually stopped
if let Ok(status) = self.status(&name).await {
if status.pid != 0 {
// Service is still running, try to kill it
let _ = self.kill(&name, signal::Signal::SIGKILL).await;
}
}
debug!("sending to the death channel {}", name.clone());
if let Err(e) = ch.send(name.clone()) {
error!(
"error: couldn't send the service {} to the shutdown loop: {}",
name, e
);
}
}
```
### 6. Add a Method to Verify All Processes are Terminated
```rust
async fn verify_all_processes_terminated(&self) -> Result<()> {
// Get all services
let table = self.services.read().await;
// Check each service
for (name, service) in table.iter() {
let service = service.read().await;
let pid = service.pid.as_raw();
// Skip services with no PID
if pid == 0 {
continue;
}
// Check if the main process is still running
if self.is_process_running(pid).await? {
warn!("Service {} (PID {}) is still running after shutdown", name, pid);
// Try to kill it with SIGKILL
let _ = signal::kill(Pid::from_raw(pid), signal::Signal::SIGKILL);
}
// Check for child processes
if let Ok(children) = self.get_child_process_stats(pid).await {
for child in children {
if self.is_process_running(child.pid).await? {
warn!("Child process {} of service {} is still running after shutdown",
child.pid, name);
// Try to kill it with SIGKILL
let _ = signal::kill(Pid::from_raw(child.pid), signal::Signal::SIGKILL);
}
}
}
}
Ok(())
}
```
### 7. Update the `shutdown` and `reboot` Methods
```rust
pub async fn shutdown(&self) -> Result<()> {
info!("shutting down");
// Set the shutdown flag
*self.shutdown.write().await = true;
#[cfg(target_os = "linux")]
{
// Power off using our enhanced method
let result = self.power(RebootMode::RB_POWER_OFF).await;
// Final verification before exit
self.verify_all_processes_terminated().await?;
return result;
}
#[cfg(not(target_os = "linux"))]
{
// Stop all services
let services = self.list().await?;
for service in services {
let _ = self.stop(&service).await;
}
// Verify all processes are terminated
self.verify_all_processes_terminated().await?;
if self.container {
std::process::exit(0);
} else {
info!("System shutdown not supported on this platform");
std::process::exit(0);
}
}
}
```
## Testing Plan
1. **Basic Service Termination**: Test that a simple service is properly terminated
2. **Child Process Termination**: Test that a service with child processes has all processes terminated
3. **Graceful Shutdown**: Test that Zinit exits cleanly after all services are stopped
4. **Edge Cases**:
- Test with services that spawn many child processes
- Test with services that spawn child processes that change their process group
- Test with services that ignore SIGTERM
## Implementation Timeline
1. **Phase 1**: Enhance the `stop` method and add the helper methods (1-2 hours)
2. **Phase 2**: Improve the `kill_process_tree` and `kill_wait` methods (1-2 hours)
3. **Phase 3**: Update the `shutdown` and `reboot` methods (1 hour)
4. **Phase 4**: Testing and debugging (2-3 hours)

125
docs/stats.md Normal file
View File

@@ -0,0 +1,125 @@
# Service Stats Functionality
This document describes the stats functionality in Zinit, which provides memory and CPU usage information for services and their child processes.
## Overview
The stats functionality allows you to monitor the resource usage of services managed by Zinit. It provides information about:
- Memory usage (in bytes)
- CPU usage (as a percentage)
- Child processes and their resource usage
This is particularly useful for monitoring system resources and identifying services that might be consuming excessive resources.
## Command Line Usage
To get stats for a service using the command line:
```bash
zinit stats <service-name>
```
Example:
```bash
zinit stats nginx
```
This will output YAML-formatted stats information:
```yaml
name: nginx
pid: 1234
memory_usage: 10485760 # Memory usage in bytes (10MB)
cpu_usage: 2.5 # CPU usage as percentage
children: # Stats for child processes
- pid: 1235
memory_usage: 5242880
cpu_usage: 1.2
- pid: 1236
memory_usage: 4194304
cpu_usage: 0.8
```
## JSON-RPC API
The stats functionality is also available through the JSON-RPC API:
### Method: `service_stats`
Get memory and CPU usage statistics for a service.
**Parameters:**
- `name` (string, required): The name of the service to get stats for
**Returns:**
- Object containing stats information:
- `name` (string): Service name
- `pid` (integer): Process ID of the service
- `memory_usage` (integer): Memory usage in bytes
- `cpu_usage` (number): CPU usage as a percentage (0-100)
- `children` (array): Stats for child processes
- Each child has:
- `pid` (integer): Process ID of the child process
- `memory_usage` (integer): Memory usage in bytes
- `cpu_usage` (number): CPU usage as a percentage (0-100)
**Example Request:**
```json
{
"jsonrpc": "2.0",
"id": 1,
"method": "service_stats",
"params": {
"name": "nginx"
}
}
```
**Example Response:**
```json
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"name": "nginx",
"pid": 1234,
"memory_usage": 10485760,
"cpu_usage": 2.5,
"children": [
{
"pid": 1235,
"memory_usage": 5242880,
"cpu_usage": 1.2
},
{
"pid": 1236,
"memory_usage": 4194304,
"cpu_usage": 0.8
}
]
}
}
```
**Possible Errors:**
- `-32000`: Service not found
- `-32003`: Service is down
## Implementation Details
The stats functionality works by:
1. Reading process information from `/proc/<pid>/` directories on Linux systems
2. Calculating memory usage from `/proc/<pid>/status` (VmRSS field)
3. Calculating CPU usage by sampling `/proc/<pid>/stat` over a short interval
4. Identifying child processes by checking the PPid field in `/proc/<pid>/status`
On non-Linux systems, the functionality provides placeholder values as the `/proc` filesystem is specific to Linux.
## Notes
- Memory usage is reported in bytes
- CPU usage is reported as a percentage (0-100)
- The service must be running to get stats (otherwise an error is returned)
- Child processes are identified by their parent PID matching the service's PID