build: remove testing.sh in favor of runit.sh; add claude.md reference

Replace inline boot testing with standalone runit.sh runner for clarity:
- Remove scripts/lib/testing.sh source and boot_tests stage from build.sh
- Remove --skip-tests option from build.sh and rebuild-after-zinit.sh
- Update all docs to reference runit.sh for QEMU/cloud-hypervisor testing
- Add comprehensive claude.md as AI assistant entry point with guidelines

Testing is now fully decoupled from build pipeline; use ./runit.sh for
QEMU/cloud-hypervisor validation after builds complete.
This commit is contained in:
2025-11-04 13:47:24 +01:00
parent 334821dacf
commit 721e26a855
8 changed files with 525 additions and 68 deletions

474
claude.md Normal file
View File

@@ -0,0 +1,474 @@
# Claude Code Reference for Zero-OS Builder
This document provides essential context for Claude Code (or any AI assistant) working with this Zero-OS Alpine Initramfs Builder repository.
## Project Overview
**What is this?**
A sophisticated build system for creating custom Alpine Linux 3.22 x86_64 initramfs images with zinit process management, designed for Zero-OS deployment on ThreeFold Grid.
**Key Features:**
- Container-based reproducible builds (rootless podman/docker)
- Incremental staged build pipeline with completion markers
- zinit process manager (complete OpenRC replacement)
- RFS (Remote File System) for lazy-loading modules/firmware from S3
- Rust components built with musl static linking
- Aggressive size optimization (strip + UPX)
- Embedded initramfs in kernel (single vmlinuz.efi output)
## Repository Structure
```
zosbuilder/
├── config/ # All configuration files
│ ├── build.conf # Build settings (versions, paths, flags)
│ ├── packages.list # Alpine packages to install
│ ├── sources.conf # ThreeFold components to build
│ ├── modules.conf # 2-stage kernel module loading
│ ├── firmware.conf # Firmware to include in initramfs
│ ├── kernel.config # Linux kernel configuration
│ ├── init # /init script for initramfs
│ └── zinit/ # zinit service definitions (YAML)
├── scripts/
│ ├── build.sh # Main orchestrator (DO NOT EDIT LIGHTLY)
│ ├── clean.sh # Clean all artifacts
│ ├── dev-container.sh # Persistent dev container manager
│ ├── rebuild-after-zinit.sh # Quick rebuild helper
│ ├── lib/ # Modular build libraries
│ │ ├── common.sh # Logging, path normalization, utilities
│ │ ├── stages.sh # Incremental stage tracking
│ │ ├── docker.sh # Container lifecycle
│ │ ├── alpine.sh # Alpine extraction, packages, cleanup
│ │ ├── components.sh # Build Rust components from sources.conf
│ │ ├── initramfs.sh # Assembly, optimization, CPIO creation
│ │ └── kernel.sh # Kernel download, config, build, embed
│ └── rfs/ # RFS flist generation scripts
│ ├── common.sh # S3 config, version computation
│ ├── pack-modules.sh # Create modules flist
│ ├── pack-firmware.sh # Create firmware flist
│ └── verify-flist.sh # Inspect/test flists
├── docs/ # Detailed documentation
│ ├── NOTES.md # Operational knowledge & troubleshooting
│ ├── PROMPT.md # Agent guidance (strict debugger mode)
│ ├── TODO.md # Persistent checklist with code refs
│ ├── AGENTS.md # Quick reference for agents
│ ├── rfs-flists.md # RFS design and runtime flow
│ ├── review-rfs-integration.md # Integration points
│ └── depmod-behavior.md # Module dependency details
├── runit.sh # Test runner (QEMU/cloud-hypervisor)
├── initramfs/ # Generated initramfs tree
├── components/ # Generated component sources
├── kernel/ # Generated kernel source
├── dist/ # Final outputs
│ ├── vmlinuz.efi # Kernel with embedded initramfs
│ └── initramfs.cpio.xz # Standalone initramfs archive
└── .build-stages/ # Incremental build markers (*.done files)
```
## Core Concepts
### 1. Incremental Staged Builds
**How it works:**
- Each stage creates a `.build-stages/<stage_name>.done` marker on success
- Subsequent builds skip completed stages unless forced
- Use `./scripts/build.sh --show-stages` to see status
- Use `./scripts/build.sh --rebuild-from=<stage>` to restart from a specific stage
- Manually remove `.done` files to re-run specific stages
**Build stages (in order):**
```
alpine_extract → alpine_configure → alpine_packages → alpine_firmware
→ components_build → components_verify → kernel_modules
→ init_script → components_copy → zinit_setup
→ modules_setup → modules_copy → cleanup → rfs_flists
→ validation → initramfs_create → initramfs_test → kernel_build
```
**Key insight:** The build ALWAYS runs inside a container. Host invocations auto-spawn containers.
### 2. Container-First Architecture
**Why containers?**
- Reproducible toolchain (Alpine 3.22 base with exact dependencies)
- Rootless execution (no privileged access needed)
- Isolation from host environment
- GitHub Actions compatible
**Container modes:**
- **Transient:** `./scripts/build.sh` spawns, builds, exits
- **Persistent:** `./scripts/dev-container.sh start/shell/build`
**Important:** Directory paths are normalized to absolute PROJECT_ROOT to avoid CWD issues when stages change directories (especially kernel builds).
### 3. Component Build System
**sources.conf format:**
```
TYPE:NAME:URL:VERSION:BUILD_FUNCTION[:EXTRA]
```
**Example:**
```bash
git:zinit:https://github.com/threefoldtech/zinit:master:build_zinit
git:rfs:https://github.com/threefoldtech/rfs:development:build_rfs
git:mycelium:https://github.com/threefoldtech/mycelium:0.6.1:build_mycelium
release:corex:https://github.com/threefoldtech/corex/releases/download/2.1.4/corex-2.1.4-amd64-linux-static:2.1.4:install_corex:rename=corex
```
**Build functions** are defined in `scripts/lib/components.sh` and handle:
- Rust builds with `x86_64-unknown-linux-musl` target
- Static linking via `RUSTFLAGS="-C target-feature=+crt-static"`
- Special cases (e.g., mycelium builds in `myceliumd/` subdirectory)
### 4. RFS Flists (Remote File System)
**Purpose:** Lazy-load kernel modules and firmware from S3 at runtime
**Flow:**
1. Build stage creates flists: `modules-<KERNEL_VERSION>.fl` and `firmware-<TAG>.fl`
2. Flists are SQLite databases containing:
- Content-addressed blob references
- S3 store URIs (patched for read-only access)
- Directory tree metadata
3. Flists embedded in initramfs at `/etc/rfs/`
4. Runtime: zinit units mount flists over `/lib/modules/` and `/lib/firmware/`
5. Dual udev coldplug: early (before RFS) for networking, post-RFS for new hardware
**Key files:**
- `scripts/rfs/pack-modules.sh` - Creates modules flist from container `/lib/modules/`
- `scripts/rfs/pack-firmware.sh` - Creates firmware flist from Alpine packages
- `config/zinit/init/modules.sh` - Runtime mount script
- `config/zinit/init/firmware.sh` - Runtime mount script
### 5. zinit Service Management
**No OpenRC:** This system uses zinit exclusively for process management.
**Service graph:**
```
/init → zinit → [stage1-modules, udevd, depmod]
→ udev-trigger (early coldplug)
→ network
→ rfs-modules + rfs-firmware (mount flists)
→ udev-rfs (post-RFS coldplug)
→ services
```
**Service definitions:** YAML files in `config/zinit/` with `after:`, `needs:`, `wants:` dependencies
## Critical Conventions
### Path Normalization
**Problem:** Stages can change CWD (kernel build uses `/workspace/kernel/current`)
**Solution:** All paths normalized to absolute at startup in `scripts/lib/common.sh:244`
**Variables affected:**
- `INSTALL_DIR` (initramfs/)
- `COMPONENTS_DIR` (components/)
- `KERNEL_DIR` (kernel/)
- `DIST_DIR` (dist/)
**Never use relative paths** when calling functions that might be in different CWDs.
### Branding and Security
**Passwordless root enforcement:**
- Applied in `scripts/lib/initramfs.sh:575` via `passwd -d -R "${initramfs_dir}" root`
- Creates `root::` in `/etc/shadow` (empty password field)
- Controlled by `ZEROOS_BRANDING` and `ZEROOS_PASSWORDLESS_ROOT` flags
**Never edit /etc/shadow manually** - always use `passwd` or `chpasswd` with chroot.
### Module Loading Strategy
**2-stage approach:**
- **Stage 1:** Critical boot modules (virtio, e1000, scsi) - loaded by zinit early
- **Stage 2:** Extended hardware (igb, ixgbe, i40e) - loaded after network
**Config:** `config/modules.conf` with `stage1:` and `stage2:` prefixes
**Dependency resolution:**
- Uses `modinfo` to build dependency tree
- Resolves from container `/lib/modules/<FULL_VERSION>/`
- Must run after `kernel_modules` stage
### Firmware Policy
**For initramfs:** `config/firmware.conf` is the SINGLE source of truth
- Any firmware hints in `modules.conf` are IGNORED
- Prevents duplication/version mismatches
**For RFS:** Full Alpine `linux-firmware*` packages installed in container
- Packed from container `/lib/firmware/`
- Overmounts at runtime for extended hardware
## Common Workflows
### Full Build from Scratch
```bash
# Clean everything and rebuild
./scripts/build.sh --clean
# Or just rebuild all stages
./scripts/build.sh --force-rebuild
```
### Quick Iteration After Config Changes
```bash
# After editing zinit configs, init script, or modules.conf
./scripts/rebuild-after-zinit.sh
# With kernel rebuild
./scripts/rebuild-after-zinit.sh --with-kernel
# Dry-run to see what changed
./scripts/rebuild-after-zinit.sh --verify-only
```
### Minimal Manual Rebuild
```bash
# Remove specific stages
rm -f .build-stages/initramfs_create.done
rm -f .build-stages/validation.done
# Rebuild only those stages
DEBUG=1 ./scripts/build.sh
```
### Testing the Built Kernel
```bash
# QEMU (default)
./runit.sh
# cloud-hypervisor with 5 disks
./runit.sh --hypervisor ch --disks 5 --reset
# Custom memory and bridge
./runit.sh --memory 4096 --bridge zosbr
```
### Persistent Dev Container
```bash
# Start persistent container
./scripts/dev-container.sh start
# Enter shell
./scripts/dev-container.sh shell
# Run build inside
./scripts/dev-container.sh build
# Stop container
./scripts/dev-container.sh stop
```
## Debugging Guidelines
### Diagnostics-First Approach
**ALWAYS add diagnostics before fixes:**
1. Enable `DEBUG=1` for verbose safe_execute logs
2. Add strategic `log_debug` statements
3. Confirm hypothesis in logs
4. Then apply minimal fix
**Example:**
```bash
# Bad: Guess and fix
Edit file to fix suspected issue
# Good: Diagnose first
1. Add log_debug "Variable X=${X}, resolved=${resolved_path}"
2. DEBUG=1 ./scripts/build.sh
3. Confirm in output
4. Apply fix with evidence
```
### Key Diagnostic Functions
- `scripts/lib/common.sh`: `log_info`, `log_warn`, `log_error`, `log_debug`
- `scripts/lib/initramfs.sh:820`: Validation debug prints (input, PWD, PROJECT_ROOT, resolved paths)
- `scripts/lib/initramfs.sh:691`: Pre-CPIO sanity checks with file listings
### Common Issues and Solutions
**"Initramfs directory not found"**
- **Cause:** INSTALL_DIR interpreted as relative in different CWD
- **Fix:** Already patched - paths normalized at startup
- **Check:** Look for "Validation debug:" logs showing resolved paths
**"INITRAMFS_ARCHIVE unbound"**
- **Cause:** Incremental build skipped initramfs_create stage
- **Fix:** Already patched - stages default INITRAMFS_ARCHIVE if unset
- **Check:** `scripts/build.sh:401` logs "defaulting INITRAMFS_ARCHIVE"
**"Module dependency resolution fails"**
- **Cause:** Container `/lib/modules/<FULL_VERSION>` missing or stale
- **Fix:** `./scripts/rebuild-after-zinit.sh --refresh-container-mods`
- **Check:** Ensure `kernel_modules` stage completed successfully
**"Passwordless root not working"**
- **Cause:** Branding disabled or shadow file not updated
- **Fix:** Check ZEROOS_BRANDING=true in logs, verify /etc/shadow has `root::`
- **Verify:** Extract initramfs and `grep '^root:' etc/shadow`
## Important Files Quick Reference
### Must-Read Before Editing
- `scripts/build.sh` - Orchestrator with precise stage order
- `scripts/lib/common.sh` - Path normalization, logging, utilities
- `scripts/lib/stages.sh` - Stage tracking logic
- `config/build.conf` - Version pins, directory settings, flags
### Safe to Edit
- `config/zinit/*.yaml` - Service definitions
- `config/zinit/init/*.sh` - Runtime initialization scripts
- `config/modules.conf` - Module lists (stage1/stage2)
- `config/firmware.conf` - Initramfs firmware selection
- `config/packages.list` - Alpine packages
### Generated (Never Edit)
- `initramfs/` - Assembled initramfs tree
- `components/` - Downloaded component sources
- `kernel/` - Kernel source tree
- `dist/` - Build outputs
- `.build-stages/` - Completion markers
## Testing Architecture
**No built-in tests during build** - Tests run separately via `runit.sh`
**Why?**
- Build is for assembly, not validation
- Tests require hypervisor (QEMU/cloud-hypervisor)
- Separation allows faster iteration
**runit.sh features:**
- Multi-disk support (qcow2 for QEMU, raw for cloud-hypervisor)
- Network bridge/TAP configuration
- Persistent volumes (reset with `--reset`)
- Serial console logging
## Quick Command Reference
```bash
# Build
./scripts/build.sh # Incremental build
./scripts/build.sh --clean # Clean build
./scripts/build.sh --show-stages # Show completion status
./scripts/build.sh --rebuild-from=zinit_setup # Rebuild from stage
DEBUG=1 ./scripts/build.sh # Verbose output
# Rebuild helpers
./scripts/rebuild-after-zinit.sh # After zinit/init/modules changes
./scripts/rebuild-after-zinit.sh --with-kernel # Also rebuild kernel
./scripts/rebuild-after-zinit.sh --verify-only # Dry-run
# Testing
./runit.sh # QEMU test
./runit.sh --hypervisor ch # cloud-hypervisor test
./runit.sh --help # All options
# Dev container
./scripts/dev-container.sh start # Start persistent container
./scripts/dev-container.sh shell # Enter shell
./scripts/dev-container.sh build # Build inside container
./scripts/dev-container.sh stop # Stop container
# Cleanup
./scripts/clean.sh # Remove all generated files
rm -rf .build-stages/ # Reset stage markers
```
## Environment Variables
**Build control:**
- `DEBUG=1` - Enable verbose logging
- `FORCE_REBUILD=true` - Force rebuild all stages
- `REBUILD_FROM_STAGE=<name>` - Rebuild from specific stage
**Version overrides:**
- `ALPINE_VERSION=3.22` - Alpine Linux version
- `KERNEL_VERSION=6.12.44` - Linux kernel version
- `RUST_TARGET=x86_64-unknown-linux-musl` - Rust compilation target
**Firmware tagging:**
- `FIRMWARE_TAG=20250908` - Firmware flist version tag
**RFS configuration:**
- See `config/rfs.conf` for S3 endpoint, credentials, paths
## Documentation Hierarchy
**Start here:**
1. `README.md` - User-facing guide with features and setup
2. This file (`claude.md`) - AI assistant context
**For development:**
3. `docs/NOTES.md` - Operational knowledge, troubleshooting
4. `docs/AGENTS.md` - Quick agent reference
5. `docs/TODO.md` - Current work checklist with code links
**For deep dives:**
6. `docs/PROMPT.md` - Strict debugger agent mode (diagnostics-first)
7. `docs/rfs-flists.md` - RFS design and implementation
8. `docs/review-rfs-integration.md` - Integration points analysis
9. `docs/depmod-behavior.md` - Module dependency deep dive
**Historical:**
10. `IMPLEMENTATION_PLAN.md` - Original design document
11. `GITHUB_ACTIONS.md` - CI/CD setup guide
## Project Philosophy
1. **Reproducibility:** Container-based builds ensure identical results
2. **Incrementality:** Stage markers minimize rebuild time
3. **Diagnostics-first:** Log before fixing, validate assumptions
4. **Minimal intervention:** Alpine + zinit only, no systemd/OpenRC
5. **Size-optimized:** Aggressive cleanup, strip, UPX compression
6. **Remote-ready:** RFS enables lazy-loading for extended hardware support
## Commit Message Guidelines
**DO NOT add Claude Code or AI assistant references to commit messages.**
Keep commits clean and professional:
- Focus on what changed and why
- Use conventional commit prefixes: `build:`, `docs:`, `fix:`, `feat:`, `refactor:`
- Be concise but descriptive
- No emoji unless project convention
- No "Generated with Claude Code" or "Co-Authored-By: Claude" footers
**Good example:**
```
build: remove testing.sh in favor of runit.sh
Replace inline boot testing with standalone runit.sh runner.
Tests now run separately from build pipeline for faster iteration.
```
**Bad example:**
```
build: remove testing.sh 🤖
Made some changes to testing.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
```
## Keywords for Quick Search
- **Build fails:** Check `DEBUG=1` logs, stage completion markers, container state
- **Module issues:** `kernel_modules` stage, `CONTAINER_MODULES_PATH`, depmod logs
- **Firmware missing:** `config/firmware.conf` for initramfs, RFS flist for runtime
- **zinit problems:** Service YAML syntax, dependency order, init script errors
- **Path errors:** Absolute path normalization in `common.sh:244`
- **Size too large:** Check cleanup stage, strip/UPX execution, package list
- **Container issues:** Rootless setup, subuid/subgid, podman vs docker
- **RFS mount fails:** S3 credentials, network readiness, flist manifest paths
---
**Last updated:** 2025-01-04
**Maintainer notes:** This file is the entry point for AI assistants. Keep it updated when architecture changes. Cross-reference with `docs/NOTES.md` for operational details.