Squashed 'components/zinit/' content from commit 1b76c06
git-subtree-dir: components/zinit git-subtree-split: 1b76c062fe31d552d1b7b23484ce163995a81482
This commit is contained in:
256
docs/cmd.md
Normal file
256
docs/cmd.md
Normal file
@@ -0,0 +1,256 @@
|
||||
# Zinit Command Line Reference
|
||||
|
||||
This document provides a comprehensive reference for all Zinit command line options and commands.
|
||||
|
||||
## Command Structure
|
||||
|
||||
Zinit uses a command-based CLI with the following general structure:
|
||||
|
||||
```bash
|
||||
zinit [FLAGS] [OPTIONS] [SUBCOMMAND]
|
||||
```
|
||||
|
||||
## Global Flags and Options
|
||||
|
||||
These flags and options apply to all Zinit commands:
|
||||
|
||||
| Flag/Option | Description |
|
||||
|-------------|-------------|
|
||||
| `-d, --debug` | Run in debug mode with increased verbosity |
|
||||
| `-h, --help` | Display help information |
|
||||
| `-V, --version` | Display version information |
|
||||
| `-s, --socket <PATH>` | Path to Unix socket (default: `/var/run/zinit.sock`) |
|
||||
|
||||
## Subcommands
|
||||
|
||||
### Main Mode
|
||||
|
||||
#### `init`
|
||||
|
||||
Run Zinit in init mode, starting and maintaining configured services.
|
||||
|
||||
```bash
|
||||
zinit init [FLAGS] [OPTIONS]
|
||||
```
|
||||
|
||||
**Flags:**
|
||||
- `--container`: Run in container mode, exiting on signal instead of rebooting
|
||||
|
||||
**Options:**
|
||||
- `-c, --config <DIR>`: Service configurations directory (default: `/etc/zinit/`)
|
||||
- `-b, --buffer <SIZE>`: Buffer size (in lines) to keep service logs (default: `2000`)
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Run in init mode with custom config directory
|
||||
zinit init -c /opt/services/
|
||||
|
||||
# Run in container mode
|
||||
zinit init --container
|
||||
```
|
||||
|
||||
### Service Management
|
||||
|
||||
#### `list`
|
||||
|
||||
Display a quick view of all currently known services and their status.
|
||||
|
||||
```bash
|
||||
zinit list
|
||||
```
|
||||
|
||||
**Output:**
|
||||
A JSON object with service names as keys and their status as values.
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# List all services
|
||||
zinit list
|
||||
```
|
||||
|
||||
#### `status`
|
||||
|
||||
Show detailed status information for a specific service.
|
||||
|
||||
```bash
|
||||
zinit status <SERVICE>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `<SERVICE>`: Name of the service to show status for
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Check status of redis service
|
||||
zinit status redis
|
||||
```
|
||||
|
||||
#### `start`
|
||||
|
||||
Start a service. Has no effect if the service is already running.
|
||||
|
||||
```bash
|
||||
zinit start <SERVICE>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `<SERVICE>`: Name of the service to start
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Start the nginx service
|
||||
zinit start nginx
|
||||
```
|
||||
|
||||
#### `stop`
|
||||
|
||||
Stop a service. Sets the target state to "down" and sends the stop signal.
|
||||
|
||||
```bash
|
||||
zinit stop <SERVICE>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `<SERVICE>`: Name of the service to stop
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Stop the redis service
|
||||
zinit stop redis
|
||||
```
|
||||
|
||||
#### `restart`
|
||||
|
||||
Restart a service. If it fails to stop, it will be killed and then started again.
|
||||
|
||||
```bash
|
||||
zinit restart <SERVICE>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `<SERVICE>`: Name of the service to restart
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Restart the web service
|
||||
zinit restart web
|
||||
```
|
||||
|
||||
#### `monitor`
|
||||
|
||||
Start monitoring a service. The configuration is loaded from the server's config directory.
|
||||
|
||||
```bash
|
||||
zinit monitor <SERVICE>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `<SERVICE>`: Name of the service to monitor
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Monitor the database service
|
||||
zinit monitor database
|
||||
```
|
||||
|
||||
#### `forget`
|
||||
|
||||
Remove a service from monitoring. You can only forget a stopped service.
|
||||
|
||||
```bash
|
||||
zinit forget <SERVICE>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `<SERVICE>`: Name of the service to forget
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Forget the backup service
|
||||
zinit forget backup
|
||||
```
|
||||
|
||||
#### `kill`
|
||||
|
||||
Send a signal to a running service.
|
||||
|
||||
```bash
|
||||
zinit kill <SERVICE> <SIGNAL>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `<SERVICE>`: Name of the service to send signal to
|
||||
- `<SIGNAL>`: Signal name (e.g., SIGTERM, SIGKILL, SIGINT)
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Send SIGTERM to the redis service
|
||||
zinit kill redis SIGTERM
|
||||
|
||||
# Send SIGKILL to force terminate a service
|
||||
zinit kill stuck-service SIGKILL
|
||||
```
|
||||
|
||||
### System Operations
|
||||
|
||||
#### `shutdown`
|
||||
|
||||
Stop all services in dependency order and power off the system.
|
||||
|
||||
```bash
|
||||
zinit shutdown
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Shutdown the system
|
||||
zinit shutdown
|
||||
```
|
||||
|
||||
#### `reboot`
|
||||
|
||||
Stop all services in dependency order and reboot the system.
|
||||
|
||||
```bash
|
||||
zinit reboot
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Reboot the system
|
||||
zinit reboot
|
||||
```
|
||||
|
||||
### Logging
|
||||
|
||||
#### `log`
|
||||
|
||||
View service logs from the Zinit ring buffer.
|
||||
|
||||
```bash
|
||||
zinit log [FLAGS] [FILTER]
|
||||
```
|
||||
|
||||
**Flags:**
|
||||
- `-s, --snapshot`: If set, log prints current buffer without following
|
||||
|
||||
**Arguments:**
|
||||
- `[FILTER]`: Optional service name to filter logs for
|
||||
|
||||
**Examples:**
|
||||
```bash
|
||||
# View logs for all services and follow new logs
|
||||
zinit log
|
||||
|
||||
# View current logs for the nginx service without following
|
||||
zinit log -s nginx
|
||||
```
|
||||
|
||||
## Exit Codes
|
||||
|
||||
Zinit commands return the following exit codes:
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| 0 | Success |
|
||||
| 1 | Error (with error message printed to stderr) |
|
||||
197
docs/installation.md
Normal file
197
docs/installation.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# Installing Zinit
|
||||
|
||||
This guide provides detailed instructions for installing Zinit on various platforms.
|
||||
|
||||
## System Requirements
|
||||
|
||||
Zinit has minimal system requirements:
|
||||
|
||||
- Linux-based operating system
|
||||
- Root access (for running as init system)
|
||||
|
||||
## Pre-built Binaries
|
||||
|
||||
If pre-built binaries are available for your system, you can install them directly:
|
||||
|
||||
```bash
|
||||
# Download the binary (replace with actual URL)
|
||||
wget https://github.com/threefoldtech/zinit/releases/download/vX.Y.Z/zinit-x86_64-unknown-linux-musl
|
||||
|
||||
# Make it executable
|
||||
chmod +x zinit-x86_64-unknown-linux-musl
|
||||
|
||||
# Move to a location in your PATH
|
||||
sudo mv zinit-x86_64-unknown-linux-musl /usr/local/bin/zinit
|
||||
```
|
||||
|
||||
## Building from Source
|
||||
|
||||
### Prerequisites
|
||||
|
||||
To build Zinit from source, you'll need:
|
||||
|
||||
- Rust toolchain (1.46.0 or later recommended)
|
||||
- musl and musl-tools packages
|
||||
- GNU Make
|
||||
|
||||
#### Install Rust
|
||||
|
||||
If you don't have Rust installed, use rustup:
|
||||
|
||||
```bash
|
||||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
||||
source $HOME/.cargo/env
|
||||
```
|
||||
|
||||
#### Install musl development tools
|
||||
|
||||
On Debian/Ubuntu:
|
||||
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt install musl musl-tools
|
||||
```
|
||||
|
||||
On Fedora:
|
||||
|
||||
```bash
|
||||
sudo dnf install musl musl-devel
|
||||
```
|
||||
|
||||
On Alpine Linux (musl is already the default libc):
|
||||
|
||||
```bash
|
||||
apk add build-base
|
||||
```
|
||||
|
||||
### Build Process
|
||||
|
||||
1. Clone the repository:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/threefoldtech/zinit.git
|
||||
cd zinit
|
||||
```
|
||||
|
||||
2. Build using make:
|
||||
|
||||
```bash
|
||||
make
|
||||
```
|
||||
|
||||
This will create a statically linked binary at `target/x86_64-unknown-linux-musl/release/zinit`.
|
||||
|
||||
3. Install the binary:
|
||||
|
||||
```bash
|
||||
sudo cp target/x86_64-unknown-linux-musl/release/zinit /usr/local/bin/
|
||||
```
|
||||
|
||||
### Development Build
|
||||
|
||||
For development or debugging:
|
||||
|
||||
```bash
|
||||
make dev
|
||||
```
|
||||
|
||||
## Docker Installation
|
||||
|
||||
### Using the Provided Dockerfile
|
||||
|
||||
Zinit includes a test Docker image:
|
||||
|
||||
```bash
|
||||
# Build the Docker image
|
||||
make docker
|
||||
|
||||
# Run the container
|
||||
docker run -dt --device=/dev/kmsg:/dev/kmsg:rw zinit
|
||||
```
|
||||
> Don't forget to port-forward a port to get access to the Zinit proxy using the `-p XXXX:YYYY` flag when running the container.
|
||||
|
||||
### Custom Docker Setup
|
||||
|
||||
To create your own Dockerfile with Zinit:
|
||||
|
||||
```dockerfile
|
||||
FROM alpine:latest
|
||||
|
||||
# Install dependencies if needed
|
||||
RUN apk add --no-cache bash curl
|
||||
|
||||
# Copy the zinit binary
|
||||
COPY zinit /usr/local/bin/zinit
|
||||
RUN chmod +x /usr/local/bin/zinit
|
||||
|
||||
# Create configuration directory
|
||||
RUN mkdir -p /etc/zinit
|
||||
|
||||
# Add your service configurations
|
||||
COPY services/*.yaml /etc/zinit/
|
||||
|
||||
# Set zinit as the entrypoint
|
||||
ENTRYPOINT ["/usr/local/bin/zinit", "init", "--container"]
|
||||
```
|
||||
|
||||
## Using Zinit as the Init System
|
||||
|
||||
To use Zinit as the init system (PID 1) on a Linux system:
|
||||
|
||||
### On a Standard Linux System
|
||||
|
||||
1. Install Zinit as described above
|
||||
2. Create your service configurations in `/etc/zinit/`
|
||||
3. Configure your bootloader to use zinit as init
|
||||
|
||||
For GRUB, add `init=/usr/local/bin/zinit` to the kernel command line:
|
||||
|
||||
```bash
|
||||
# Edit GRUB configuration
|
||||
sudo nano /etc/default/grub
|
||||
|
||||
# Add init parameter to GRUB_CMDLINE_LINUX
|
||||
# Example:
|
||||
# GRUB_CMDLINE_LINUX="init=/usr/local/bin/zinit"
|
||||
|
||||
# Update GRUB
|
||||
sudo update-grub
|
||||
```
|
||||
|
||||
### In a Container Environment
|
||||
|
||||
For containers, simply set Zinit as the entrypoint:
|
||||
|
||||
```bash
|
||||
docker run -dt --device=/dev/kmsg:/dev/kmsg:rw \
|
||||
--entrypoint /usr/local/bin/zinit \
|
||||
your-image init --container
|
||||
```
|
||||
|
||||
## First-time Setup
|
||||
|
||||
After installation, you'll need to create a basic configuration:
|
||||
|
||||
1. Create the configuration directory:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /etc/zinit
|
||||
```
|
||||
|
||||
2. Create a simple service configuration:
|
||||
|
||||
```bash
|
||||
cat << EOF | sudo tee /etc/zinit/hello.yaml
|
||||
exec: "echo 'Hello from Zinit!'"
|
||||
oneshot: true
|
||||
EOF
|
||||
```
|
||||
|
||||
3. Test Zinit without running as init:
|
||||
|
||||
```bash
|
||||
# For testing only - doesn't replace system init
|
||||
sudo zinit init
|
||||
```
|
||||
|
||||
If all is working correctly, you should see Zinit start and run your service.
|
||||
78
docs/osx_cross_compile.md
Normal file
78
docs/osx_cross_compile.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# macOS Guide for Zinit
|
||||
|
||||
This guide covers both building Zinit natively on macOS and cross-compiling from macOS to Linux targets.
|
||||
|
||||
## Building Zinit Natively on macOS
|
||||
|
||||
Zinit can now be built and run directly on macOS. The code has been updated to handle platform-specific differences between Linux and macOS.
|
||||
|
||||
### Building for macOS
|
||||
|
||||
```bash
|
||||
# Build a release version for macOS
|
||||
make release-macos
|
||||
|
||||
# Install to ~/hero/bin (if it exists)
|
||||
make install-macos
|
||||
```
|
||||
|
||||
The native macOS build provides most of Zinit's functionality, with the following limitations:
|
||||
- System reboot and shutdown operations are not supported (they will exit the process instead)
|
||||
- Some Linux-specific features are disabled
|
||||
|
||||
## Cross-Compilation from macOS to Linux
|
||||
|
||||
This section outlines the steps to set up your macOS environment for cross-compiling Rust projects to the `aarch64-unknown-linux-musl` target. This is particularly useful for building binaries that can run on ARM-based Linux systems (e.g., Raspberry Pi, AWS Graviton) using musl libc.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* Homebrew (https://brew.sh/) installed on your macOS system.
|
||||
* Rust and Cargo installed (e.g., via `rustup`).
|
||||
|
||||
## Step 1: Install the `aarch64-linux-musl-gcc` Toolchain
|
||||
|
||||
The `aarch64-linux-musl-gcc` toolchain is required for linking when cross-compiling to `aarch64-unknown-linux-musl`. You can install it using Homebrew:
|
||||
|
||||
```bash
|
||||
brew install messense/macos-cross-toolchains/aarch64-linux-musl-cross
|
||||
```
|
||||
|
||||
## Step 2: Link `musl-gcc`
|
||||
|
||||
Some build scripts or tools might look for `musl-gcc`. To ensure compatibility, create a symbolic link:
|
||||
|
||||
```bash
|
||||
sudo ln -s /opt/homebrew/bin/aarch64-linux-musl-gcc /opt/homebrew/bin/musl-gcc
|
||||
```
|
||||
|
||||
You might be prompted for your system password to complete this operation.
|
||||
|
||||
## Step 3: Add the Rust Target
|
||||
|
||||
Add the `aarch64-unknown-linux-musl` target to your Rust toolchain:
|
||||
|
||||
```bash
|
||||
rustup target add aarch64-unknown-linux-musl
|
||||
```
|
||||
|
||||
## Step 4: Build Your Project
|
||||
|
||||
Now you can build your Rust project for the `aarch64-unknown-linux-musl` target using Cargo:
|
||||
|
||||
```bash
|
||||
cargo build --release --target aarch64-unknown-linux-musl
|
||||
```
|
||||
|
||||
Alternatively, if you are using the provided `Makefile`, you can use the new target:
|
||||
|
||||
```bash
|
||||
make release-aarch64-musl
|
||||
```
|
||||
|
||||
This will produce a release binary located in `target/aarch64-unknown-linux-musl/release/`.
|
||||
|
||||
## Step 5: copy to osx hero bin
|
||||
|
||||
```bash
|
||||
cp target/aarch64-unknown-linux-musl/release/zinit ~/hero/bin
|
||||
```
|
||||
217
docs/services.md
Normal file
217
docs/services.md
Normal file
@@ -0,0 +1,217 @@
|
||||
# Service Configuration Format
|
||||
|
||||
This document describes the structure and options for Zinit service configuration files.
|
||||
|
||||
## File Format
|
||||
|
||||
Zinit uses YAML files for service configuration. Each service has its own configuration file stored in the Zinit configuration directory (default: `/etc/zinit`).
|
||||
|
||||
### File Naming and Location
|
||||
|
||||
- **Location**: `/etc/zinit/` (default, can be changed with `-c` flag)
|
||||
- on osx `~/hero/cfg/zinit`
|
||||
- **Naming**: `<service-name>.yaml`
|
||||
|
||||
For example:
|
||||
- `/etc/zinit/nginx.yaml`
|
||||
- `/etc/zinit/redis.yaml`
|
||||
|
||||
## Configuration Schema
|
||||
|
||||
Service configuration files use the following schema:
|
||||
|
||||
```yaml
|
||||
# Command to run (required)
|
||||
exec: "command line to start service"
|
||||
|
||||
# Command to test if service is running (optional)
|
||||
test: "command line to test service"
|
||||
|
||||
# Whether the service should be restarted (optional, default: false)
|
||||
oneshot: true|false
|
||||
|
||||
# Maximum time to wait for service to stop during shutdown (optional, default: 10)
|
||||
shutdown_timeout: 30
|
||||
|
||||
# Services that must be running before this one starts (optional)
|
||||
after:
|
||||
- service1_name
|
||||
- service2_name
|
||||
|
||||
# Signals configuration (optional)
|
||||
signal:
|
||||
stop: SIGKILL # signal sent on 'stop' action (default: SIGTERM)
|
||||
|
||||
# Log handling configuration (optional, default: ring)
|
||||
log: null|ring|stdout
|
||||
|
||||
# Environment variables for the service (optional)
|
||||
env:
|
||||
KEY1: "VALUE1"
|
||||
KEY2: "VALUE2"
|
||||
|
||||
# Working directory for the service (optional)
|
||||
dir: "/path/to/working/directory"
|
||||
```
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Required Fields
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `exec` | Command line to execute when starting the service |
|
||||
|
||||
### Optional Fields
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `test` | String | - | Command to determine if service is running |
|
||||
| `oneshot` | Boolean | `false` | If true, service won't be restarted after exit |
|
||||
| `shutdown_timeout` | Integer | 10 | Seconds to wait for service to stop during shutdown |
|
||||
| `after` | String[] | `[]` | List of services that must be running first |
|
||||
| `signal.stop` | String | `"sigterm"` | Signal to send when stopping the service |
|
||||
| `log` | Enum | `ring` | How to handle service output (null, ring, stdout) |
|
||||
| `env` | Object | `{}` | Environment variables to pass to the service |
|
||||
| `dir` | String | `""` | Working directory for the service |
|
||||
|
||||
## Field Details
|
||||
|
||||
### exec
|
||||
|
||||
The command to run when starting the service. This is the only required field in the configuration.
|
||||
|
||||
```yaml
|
||||
exec: "/usr/bin/redis-server --port 6379"
|
||||
```
|
||||
|
||||
Shell-style commands are supported:
|
||||
|
||||
```yaml
|
||||
exec: "sh -c 'echo Starting service && /usr/local/bin/myservice'"
|
||||
```
|
||||
|
||||
### test
|
||||
|
||||
Command that tests whether the service is running properly. Zinit runs this command periodically until it succeeds (exit code 0), at which point the service is considered running.
|
||||
|
||||
```yaml
|
||||
test: "redis-cli -p 6379 PING"
|
||||
```
|
||||
|
||||
If no test command is provided, the service is considered running as soon as it's started.
|
||||
|
||||
### oneshot
|
||||
|
||||
When set to `true`, the service will not be automatically restarted when it exits. This is useful for initialization tasks or commands that should run only once.
|
||||
|
||||
```yaml
|
||||
oneshot: true
|
||||
```
|
||||
|
||||
Services that depend on a oneshot service will start only after the oneshot service has exited successfully.
|
||||
|
||||
### shutdown_timeout
|
||||
|
||||
How long (in seconds) to wait for the service to stop during system shutdown before giving up:
|
||||
|
||||
```yaml
|
||||
shutdown_timeout: 30 # Wait up to 30 seconds
|
||||
```
|
||||
|
||||
### after
|
||||
|
||||
List of service names that must be running (or completed successfully for oneshot services) before this service starts:
|
||||
|
||||
```yaml
|
||||
after:
|
||||
- networking
|
||||
- database
|
||||
```
|
||||
|
||||
### signal
|
||||
|
||||
Custom signals to use for operations. Currently, only the `stop` signal is configurable:
|
||||
|
||||
```yaml
|
||||
signal:
|
||||
stop: SIGKILL # Use SIGKILL instead of default SIGTERM
|
||||
```
|
||||
|
||||
Valid signal names follow the standard UNIX signal naming (SIGTERM, SIGKILL, SIGINT, etc).
|
||||
|
||||
### log
|
||||
|
||||
How to handle stdout/stderr output from the service:
|
||||
|
||||
```yaml
|
||||
log: stdout # Print output to zinit's stdout
|
||||
```
|
||||
|
||||
Options:
|
||||
- `null`: Ignore all service output (like redirecting to /dev/null)
|
||||
- `ring`: Store logs in the kernel ring buffer with service name prefix (default)
|
||||
- `stdout`: Send service output to zinit's stdout
|
||||
|
||||
> **Note**: To use `ring` inside Docker, make sure to add the `kmsg` device:
|
||||
> ```
|
||||
> docker run -dt --device=/dev/kmsg:/dev/kmsg:rw zinit
|
||||
> ```
|
||||
|
||||
### env
|
||||
|
||||
Additional environment variables for the service. These are added to the existing environment:
|
||||
|
||||
```yaml
|
||||
env:
|
||||
PORT: "8080"
|
||||
DEBUG: "true"
|
||||
NODE_ENV: "production"
|
||||
```
|
||||
|
||||
### dir
|
||||
|
||||
Working directory for the service process:
|
||||
|
||||
```yaml
|
||||
dir: "/var/lib/myservice"
|
||||
```
|
||||
|
||||
If not specified, the process inherits zinit's working directory.
|
||||
|
||||
## Example Configurations
|
||||
|
||||
### Web Server
|
||||
|
||||
```yaml
|
||||
exec: "/usr/bin/nginx -g 'daemon off;'"
|
||||
test: "curl -s http://localhost > /dev/null"
|
||||
after:
|
||||
- networking
|
||||
log: stdout
|
||||
```
|
||||
|
||||
### Database Initialization
|
||||
|
||||
```yaml
|
||||
exec: "sh -c 'echo Creating database schema && /usr/bin/db-migrate'"
|
||||
oneshot: true
|
||||
dir: "/opt/myapp"
|
||||
env:
|
||||
DB_HOST: "localhost"
|
||||
DB_USER: "admin"
|
||||
```
|
||||
|
||||
### Application with Dependencies
|
||||
|
||||
```yaml
|
||||
exec: "/usr/bin/myapp --config /etc/myapp.conf"
|
||||
test: "curl -s http://localhost:8080/health > /dev/null"
|
||||
after:
|
||||
- database
|
||||
- cache
|
||||
signal:
|
||||
stop: SIGINT # Use SIGINT for graceful shutdown
|
||||
env:
|
||||
PORT: "8080"
|
||||
shutdown_timeout: 20
|
||||
366
docs/shutdown_improvement_plan.md
Normal file
366
docs/shutdown_improvement_plan.md
Normal file
@@ -0,0 +1,366 @@
|
||||
# Zinit Shutdown Functionality Improvement Plan
|
||||
|
||||
## Current Issues
|
||||
|
||||
1. **Incomplete Child Process Termination**: When services are stopped, child processes may remain running.
|
||||
2. **Lack of Verification**: There's no verification that all processes are actually terminated.
|
||||
3. **Improper Graceful Shutdown**: Zinit doesn't wait for all processes to terminate before exiting.
|
||||
|
||||
## Solution Overview
|
||||
|
||||
We'll implement a robust shutdown mechanism that:
|
||||
1. Uses our stats functionality to detect all child processes
|
||||
2. Properly manages process groups
|
||||
3. Verifies all processes are terminated before Zinit exits
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[Enhance stop method] --> B[Improve kill_process_tree]
|
||||
B --> C[Add process verification]
|
||||
C --> D[Implement graceful shutdown]
|
||||
|
||||
A1[Use stats to detect child processes] --> A
|
||||
A2[Send signals to all processes] --> A
|
||||
A3[Implement cascading termination] --> A
|
||||
|
||||
B1[Ensure proper process group handling] --> B
|
||||
B2[Add timeout and escalation logic] --> B
|
||||
|
||||
C1[Create verification mechanism] --> C
|
||||
C2[Add polling for process existence] --> C
|
||||
|
||||
D1[Wait for all processes to terminate] --> D
|
||||
D2[Add cleanup of resources] --> D
|
||||
D3[Implement clean exit] --> D
|
||||
```
|
||||
|
||||
## Detailed Implementation Steps
|
||||
|
||||
### 1. Enhance the `stop` Method in `LifecycleManager`
|
||||
|
||||
```rust
|
||||
pub async fn stop<S: AsRef<str>>(&self, name: S) -> Result<()> {
|
||||
// Get service information
|
||||
let table = self.services.read().await;
|
||||
let service = table.get(name.as_ref())
|
||||
.ok_or_else(|| ZInitError::unknown_service(name.as_ref()))?;
|
||||
|
||||
let mut service = service.write().await;
|
||||
service.set_target(Target::Down);
|
||||
|
||||
// Get the main process PID
|
||||
let pid = service.pid;
|
||||
if pid.as_raw() == 0 {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Get the signal to use
|
||||
let signal = signal::Signal::from_str(&service.service.signal.stop.to_uppercase())
|
||||
.map_err(|err| anyhow::anyhow!("unknown stop signal: {}", err))?;
|
||||
|
||||
// Release the lock before potentially long-running operations
|
||||
drop(service);
|
||||
drop(table);
|
||||
|
||||
// Get all child processes using our stats functionality
|
||||
let children = self.get_child_process_stats(pid.as_raw()).await?;
|
||||
|
||||
// First try to stop the process group
|
||||
let _ = self.pm.signal(pid, signal);
|
||||
|
||||
// Wait a short time for processes to terminate gracefully
|
||||
sleep(std::time::Duration::from_millis(500)).await;
|
||||
|
||||
// Check if processes are still running and use SIGKILL if needed
|
||||
self.ensure_processes_terminated(pid.as_raw(), &children).await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Add a New `ensure_processes_terminated` Method
|
||||
|
||||
```rust
|
||||
async fn ensure_processes_terminated(&self, parent_pid: i32, children: &[ProcessStats]) -> Result<()> {
|
||||
// Check if parent is still running
|
||||
let parent_running = self.is_process_running(parent_pid).await?;
|
||||
|
||||
// If parent is still running, send SIGKILL
|
||||
if parent_running {
|
||||
debug!("Process {} still running after SIGTERM, sending SIGKILL", parent_pid);
|
||||
let _ = self.pm.signal(Pid::from_raw(parent_pid), signal::Signal::SIGKILL);
|
||||
}
|
||||
|
||||
// Check and kill any remaining child processes
|
||||
for child in children {
|
||||
if self.is_process_running(child.pid).await? {
|
||||
debug!("Child process {} still running, sending SIGKILL", child.pid);
|
||||
let _ = signal::kill(Pid::from_raw(child.pid), signal::Signal::SIGKILL);
|
||||
}
|
||||
}
|
||||
|
||||
// Verify all processes are gone
|
||||
let mut retries = 5;
|
||||
while retries > 0 {
|
||||
let mut all_terminated = true;
|
||||
|
||||
// Check parent
|
||||
if self.is_process_running(parent_pid).await? {
|
||||
all_terminated = false;
|
||||
}
|
||||
|
||||
// Check children
|
||||
for child in children {
|
||||
if self.is_process_running(child.pid).await? {
|
||||
all_terminated = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if all_terminated {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Wait before retrying
|
||||
sleep(std::time::Duration::from_millis(100)).await;
|
||||
retries -= 1;
|
||||
}
|
||||
|
||||
// If we get here, some processes might still be running
|
||||
warn!("Some processes may still be running after shutdown attempts");
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Add a Helper Method to Check if a Process is Running
|
||||
|
||||
```rust
|
||||
async fn is_process_running(&self, pid: i32) -> Result<bool> {
|
||||
// Use sysinfo to check if process exists
|
||||
let mut system = System::new();
|
||||
let sys_pid = sysinfo::Pid::from(pid as usize);
|
||||
system.refresh_process(sys_pid);
|
||||
|
||||
Ok(system.process(sys_pid).is_some())
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Improve the `kill_process_tree` Method
|
||||
|
||||
```rust
|
||||
#[cfg(target_os = "linux")]
|
||||
async fn kill_process_tree(
|
||||
&self,
|
||||
mut dag: ProcessDAG,
|
||||
mut state_channels: HashMap<String, Watcher<State>>,
|
||||
mut shutdown_timeouts: HashMap<String, u64>,
|
||||
) -> Result<()> {
|
||||
let (tx, mut rx) = mpsc::unbounded_channel();
|
||||
tx.send(DUMMY_ROOT.into())?;
|
||||
|
||||
let mut count = dag.count;
|
||||
while let Some(name) = rx.recv().await {
|
||||
debug!("{} has been killed (or was inactive) adding its children", name);
|
||||
|
||||
for child in dag.adj.get(&name).unwrap_or(&Vec::new()) {
|
||||
let child_indegree: &mut u32 = dag.indegree.entry(child.clone()).or_default();
|
||||
*child_indegree -= 1;
|
||||
|
||||
debug!("decrementing child {} indegree to {}", child, child_indegree);
|
||||
|
||||
if *child_indegree == 0 {
|
||||
let watcher = state_channels.remove(child);
|
||||
if watcher.is_none() {
|
||||
// not an active service
|
||||
tx.send(child.to_string())?;
|
||||
continue;
|
||||
}
|
||||
|
||||
let shutdown_timeout = shutdown_timeouts.remove(child);
|
||||
let lifecycle = self.clone_lifecycle();
|
||||
|
||||
// Spawn a task to kill the service and wait for it to terminate
|
||||
let kill_task = tokio::spawn(Self::kill_wait_enhanced(
|
||||
lifecycle,
|
||||
child.to_string(),
|
||||
tx.clone(),
|
||||
watcher.unwrap(),
|
||||
shutdown_timeout.unwrap_or(config::DEFAULT_SHUTDOWN_TIMEOUT),
|
||||
));
|
||||
|
||||
// Add a timeout to ensure we don't wait forever
|
||||
let _ = tokio::time::timeout(
|
||||
std::time::Duration::from_secs(shutdown_timeout.unwrap_or(config::DEFAULT_SHUTDOWN_TIMEOUT) + 2),
|
||||
kill_task
|
||||
).await;
|
||||
}
|
||||
}
|
||||
|
||||
count -= 1;
|
||||
if count == 0 {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Final verification that all processes are gone
|
||||
self.verify_all_processes_terminated().await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Add an Enhanced `kill_wait` Method
|
||||
|
||||
```rust
|
||||
#[cfg(target_os = "linux")]
|
||||
async fn kill_wait_enhanced(
|
||||
self,
|
||||
name: String,
|
||||
ch: mpsc::UnboundedSender<String>,
|
||||
mut rx: Watcher<State>,
|
||||
shutdown_timeout: u64,
|
||||
) {
|
||||
debug!("kill_wait {}", name);
|
||||
|
||||
// Try to stop the service gracefully
|
||||
let stop_result = self.stop(name.clone()).await;
|
||||
|
||||
// Wait for the service to become inactive or timeout
|
||||
let fut = timeout(
|
||||
std::time::Duration::from_secs(shutdown_timeout),
|
||||
async move {
|
||||
while let Some(state) = rx.next().await {
|
||||
if !state.is_active() {
|
||||
return;
|
||||
}
|
||||
}
|
||||
},
|
||||
);
|
||||
|
||||
match stop_result {
|
||||
Ok(_) => {
|
||||
let _ = fut.await;
|
||||
}
|
||||
Err(e) => error!("couldn't stop service {}: {}", name.clone(), e),
|
||||
}
|
||||
|
||||
// Verify the service is actually stopped
|
||||
if let Ok(status) = self.status(&name).await {
|
||||
if status.pid != 0 {
|
||||
// Service is still running, try to kill it
|
||||
let _ = self.kill(&name, signal::Signal::SIGKILL).await;
|
||||
}
|
||||
}
|
||||
|
||||
debug!("sending to the death channel {}", name.clone());
|
||||
if let Err(e) = ch.send(name.clone()) {
|
||||
error!(
|
||||
"error: couldn't send the service {} to the shutdown loop: {}",
|
||||
name, e
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Add a Method to Verify All Processes are Terminated
|
||||
|
||||
```rust
|
||||
async fn verify_all_processes_terminated(&self) -> Result<()> {
|
||||
// Get all services
|
||||
let table = self.services.read().await;
|
||||
|
||||
// Check each service
|
||||
for (name, service) in table.iter() {
|
||||
let service = service.read().await;
|
||||
let pid = service.pid.as_raw();
|
||||
|
||||
// Skip services with no PID
|
||||
if pid == 0 {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Check if the main process is still running
|
||||
if self.is_process_running(pid).await? {
|
||||
warn!("Service {} (PID {}) is still running after shutdown", name, pid);
|
||||
|
||||
// Try to kill it with SIGKILL
|
||||
let _ = signal::kill(Pid::from_raw(pid), signal::Signal::SIGKILL);
|
||||
}
|
||||
|
||||
// Check for child processes
|
||||
if let Ok(children) = self.get_child_process_stats(pid).await {
|
||||
for child in children {
|
||||
if self.is_process_running(child.pid).await? {
|
||||
warn!("Child process {} of service {} is still running after shutdown",
|
||||
child.pid, name);
|
||||
|
||||
// Try to kill it with SIGKILL
|
||||
let _ = signal::kill(Pid::from_raw(child.pid), signal::Signal::SIGKILL);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### 7. Update the `shutdown` and `reboot` Methods
|
||||
|
||||
```rust
|
||||
pub async fn shutdown(&self) -> Result<()> {
|
||||
info!("shutting down");
|
||||
|
||||
// Set the shutdown flag
|
||||
*self.shutdown.write().await = true;
|
||||
|
||||
#[cfg(target_os = "linux")]
|
||||
{
|
||||
// Power off using our enhanced method
|
||||
let result = self.power(RebootMode::RB_POWER_OFF).await;
|
||||
|
||||
// Final verification before exit
|
||||
self.verify_all_processes_terminated().await?;
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "linux"))]
|
||||
{
|
||||
// Stop all services
|
||||
let services = self.list().await?;
|
||||
for service in services {
|
||||
let _ = self.stop(&service).await;
|
||||
}
|
||||
|
||||
// Verify all processes are terminated
|
||||
self.verify_all_processes_terminated().await?;
|
||||
|
||||
if self.container {
|
||||
std::process::exit(0);
|
||||
} else {
|
||||
info!("System shutdown not supported on this platform");
|
||||
std::process::exit(0);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Testing Plan
|
||||
|
||||
1. **Basic Service Termination**: Test that a simple service is properly terminated
|
||||
2. **Child Process Termination**: Test that a service with child processes has all processes terminated
|
||||
3. **Graceful Shutdown**: Test that Zinit exits cleanly after all services are stopped
|
||||
4. **Edge Cases**:
|
||||
- Test with services that spawn many child processes
|
||||
- Test with services that spawn child processes that change their process group
|
||||
- Test with services that ignore SIGTERM
|
||||
|
||||
## Implementation Timeline
|
||||
|
||||
1. **Phase 1**: Enhance the `stop` method and add the helper methods (1-2 hours)
|
||||
2. **Phase 2**: Improve the `kill_process_tree` and `kill_wait` methods (1-2 hours)
|
||||
3. **Phase 3**: Update the `shutdown` and `reboot` methods (1 hour)
|
||||
4. **Phase 4**: Testing and debugging (2-3 hours)
|
||||
125
docs/stats.md
Normal file
125
docs/stats.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# Service Stats Functionality
|
||||
|
||||
This document describes the stats functionality in Zinit, which provides memory and CPU usage information for services and their child processes.
|
||||
|
||||
## Overview
|
||||
|
||||
The stats functionality allows you to monitor the resource usage of services managed by Zinit. It provides information about:
|
||||
|
||||
- Memory usage (in bytes)
|
||||
- CPU usage (as a percentage)
|
||||
- Child processes and their resource usage
|
||||
|
||||
This is particularly useful for monitoring system resources and identifying services that might be consuming excessive resources.
|
||||
|
||||
## Command Line Usage
|
||||
|
||||
To get stats for a service using the command line:
|
||||
|
||||
```bash
|
||||
zinit stats <service-name>
|
||||
```
|
||||
|
||||
Example:
|
||||
```bash
|
||||
zinit stats nginx
|
||||
```
|
||||
|
||||
This will output YAML-formatted stats information:
|
||||
|
||||
```yaml
|
||||
name: nginx
|
||||
pid: 1234
|
||||
memory_usage: 10485760 # Memory usage in bytes (10MB)
|
||||
cpu_usage: 2.5 # CPU usage as percentage
|
||||
children: # Stats for child processes
|
||||
- pid: 1235
|
||||
memory_usage: 5242880
|
||||
cpu_usage: 1.2
|
||||
- pid: 1236
|
||||
memory_usage: 4194304
|
||||
cpu_usage: 0.8
|
||||
```
|
||||
|
||||
## JSON-RPC API
|
||||
|
||||
The stats functionality is also available through the JSON-RPC API:
|
||||
|
||||
### Method: `service_stats`
|
||||
|
||||
Get memory and CPU usage statistics for a service.
|
||||
|
||||
**Parameters:**
|
||||
- `name` (string, required): The name of the service to get stats for
|
||||
|
||||
**Returns:**
|
||||
- Object containing stats information:
|
||||
- `name` (string): Service name
|
||||
- `pid` (integer): Process ID of the service
|
||||
- `memory_usage` (integer): Memory usage in bytes
|
||||
- `cpu_usage` (number): CPU usage as a percentage (0-100)
|
||||
- `children` (array): Stats for child processes
|
||||
- Each child has:
|
||||
- `pid` (integer): Process ID of the child process
|
||||
- `memory_usage` (integer): Memory usage in bytes
|
||||
- `cpu_usage` (number): CPU usage as a percentage (0-100)
|
||||
|
||||
**Example Request:**
|
||||
```json
|
||||
{
|
||||
"jsonrpc": "2.0",
|
||||
"id": 1,
|
||||
"method": "service_stats",
|
||||
"params": {
|
||||
"name": "nginx"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Example Response:**
|
||||
```json
|
||||
{
|
||||
"jsonrpc": "2.0",
|
||||
"id": 1,
|
||||
"result": {
|
||||
"name": "nginx",
|
||||
"pid": 1234,
|
||||
"memory_usage": 10485760,
|
||||
"cpu_usage": 2.5,
|
||||
"children": [
|
||||
{
|
||||
"pid": 1235,
|
||||
"memory_usage": 5242880,
|
||||
"cpu_usage": 1.2
|
||||
},
|
||||
{
|
||||
"pid": 1236,
|
||||
"memory_usage": 4194304,
|
||||
"cpu_usage": 0.8
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Possible Errors:**
|
||||
- `-32000`: Service not found
|
||||
- `-32003`: Service is down
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The stats functionality works by:
|
||||
|
||||
1. Reading process information from `/proc/<pid>/` directories on Linux systems
|
||||
2. Calculating memory usage from `/proc/<pid>/status` (VmRSS field)
|
||||
3. Calculating CPU usage by sampling `/proc/<pid>/stat` over a short interval
|
||||
4. Identifying child processes by checking the PPid field in `/proc/<pid>/status`
|
||||
|
||||
On non-Linux systems, the functionality provides placeholder values as the `/proc` filesystem is specific to Linux.
|
||||
|
||||
## Notes
|
||||
|
||||
- Memory usage is reported in bytes
|
||||
- CPU usage is reported as a percentage (0-100)
|
||||
- The service must be running to get stats (otherwise an error is returned)
|
||||
- Child processes are identified by their parent PID matching the service's PID
|
||||
Reference in New Issue
Block a user