build: remove testing.sh in favor of runit.sh; add claude.md reference

Replace inline boot testing with standalone runit.sh runner for clarity: - Remove scripts/lib/testing.sh source and boot_tests stage from build.sh - Remove --skip-tests option from build.sh and rebuild-after-zinit.sh - Update all docs to reference runit.sh for QEMU/cloud-hypervisor testing - Add comprehensive claude.md as AI assistant entry point with guidelines Testing is now fully decoupled from build pipeline; use ./runit.sh for QEMU/cloud-hypervisor validation after builds complete.
2025-11-04 13:47:24 +01:00
parent 334821dacf
commit 721e26a855
8 changed files with 525 additions and 68 deletions
--- a/claude.md
+++ b/claude.md
@@ -0,0 +1,474 @@
+# Claude Code Reference for Zero-OS Builder
+
+This document provides essential context for Claude Code (or any AI assistant) working with this Zero-OS Alpine Initramfs Builder repository.
+
+## Project Overview
+
+**What is this?**
+A sophisticated build system for creating custom Alpine Linux 3.22 x86_64 initramfs images with zinit process management, designed for Zero-OS deployment on ThreeFold Grid.
+
+**Key Features:**
+- Container-based reproducible builds (rootless podman/docker)
+- Incremental staged build pipeline with completion markers
+- zinit process manager (complete OpenRC replacement)
+- RFS (Remote File System) for lazy-loading modules/firmware from S3
+- Rust components built with musl static linking
+- Aggressive size optimization (strip + UPX)
+- Embedded initramfs in kernel (single vmlinuz.efi output)
+
+## Repository Structure
+
+```
+zosbuilder/
+├── config/                  # All configuration files
+│   ├── build.conf          # Build settings (versions, paths, flags)
+│   ├── packages.list       # Alpine packages to install
+│   ├── sources.conf        # ThreeFold components to build
+│   ├── modules.conf        # 2-stage kernel module loading
+│   ├── firmware.conf       # Firmware to include in initramfs
+│   ├── kernel.config       # Linux kernel configuration
+│   ├── init                # /init script for initramfs
+│   └── zinit/              # zinit service definitions (YAML)
+│
+├── scripts/
+│   ├── build.sh            # Main orchestrator (DO NOT EDIT LIGHTLY)
+│   ├── clean.sh            # Clean all artifacts
+│   ├── dev-container.sh    # Persistent dev container manager
+│   ├── rebuild-after-zinit.sh  # Quick rebuild helper
+│   ├── lib/                # Modular build libraries
+│   │   ├── common.sh       # Logging, path normalization, utilities
+│   │   ├── stages.sh       # Incremental stage tracking
+│   │   ├── docker.sh       # Container lifecycle
+│   │   ├── alpine.sh       # Alpine extraction, packages, cleanup
+│   │   ├── components.sh   # Build Rust components from sources.conf
+│   │   ├── initramfs.sh    # Assembly, optimization, CPIO creation
+│   │   └── kernel.sh       # Kernel download, config, build, embed
+│   └── rfs/                # RFS flist generation scripts
+│       ├── common.sh       # S3 config, version computation
+│       ├── pack-modules.sh # Create modules flist
+│       ├── pack-firmware.sh # Create firmware flist
+│       └── verify-flist.sh  # Inspect/test flists
+│
+├── docs/                   # Detailed documentation
+│   ├── NOTES.md           # Operational knowledge & troubleshooting
+│   ├── PROMPT.md          # Agent guidance (strict debugger mode)
+│   ├── TODO.md            # Persistent checklist with code refs
+│   ├── AGENTS.md          # Quick reference for agents
+│   ├── rfs-flists.md      # RFS design and runtime flow
+│   ├── review-rfs-integration.md  # Integration points
+│   └── depmod-behavior.md # Module dependency details
+│
+├── runit.sh               # Test runner (QEMU/cloud-hypervisor)
+├── initramfs/             # Generated initramfs tree
+├── components/            # Generated component sources
+├── kernel/                # Generated kernel source
+├── dist/                  # Final outputs
+│   ├── vmlinuz.efi        # Kernel with embedded initramfs
+│   └── initramfs.cpio.xz  # Standalone initramfs archive
+└── .build-stages/         # Incremental build markers (*.done files)
+```
+
+## Core Concepts
+
+### 1. Incremental Staged Builds
+
+**How it works:**
+- Each stage creates a `.build-stages/<stage_name>.done` marker on success
+- Subsequent builds skip completed stages unless forced
+- Use `./scripts/build.sh --show-stages` to see status
+- Use `./scripts/build.sh --rebuild-from=<stage>` to restart from a specific stage
+- Manually remove `.done` files to re-run specific stages
+
+**Build stages (in order):**
+```
+alpine_extract → alpine_configure → alpine_packages → alpine_firmware
+  → components_build → components_verify → kernel_modules
+  → init_script → components_copy → zinit_setup
+  → modules_setup → modules_copy → cleanup → rfs_flists
+  → validation → initramfs_create → initramfs_test → kernel_build
+```
+
+**Key insight:** The build ALWAYS runs inside a container. Host invocations auto-spawn containers.
+
+### 2. Container-First Architecture
+
+**Why containers?**
+- Reproducible toolchain (Alpine 3.22 base with exact dependencies)
+- Rootless execution (no privileged access needed)
+- Isolation from host environment
+- GitHub Actions compatible
+
+**Container modes:**
+- **Transient:** `./scripts/build.sh` spawns, builds, exits
+- **Persistent:** `./scripts/dev-container.sh start/shell/build`
+
+**Important:** Directory paths are normalized to absolute PROJECT_ROOT to avoid CWD issues when stages change directories (especially kernel builds).
+
+### 3. Component Build System
+
+**sources.conf format:**
+```
+TYPE:NAME:URL:VERSION:BUILD_FUNCTION[:EXTRA]
+```
+
+**Example:**
+```bash
+git:zinit:https://github.com/threefoldtech/zinit:master:build_zinit
+git:rfs:https://github.com/threefoldtech/rfs:development:build_rfs
+git:mycelium:https://github.com/threefoldtech/mycelium:0.6.1:build_mycelium
+release:corex:https://github.com/threefoldtech/corex/releases/download/2.1.4/corex-2.1.4-amd64-linux-static:2.1.4:install_corex:rename=corex
+```
+
+**Build functions** are defined in `scripts/lib/components.sh` and handle:
+- Rust builds with `x86_64-unknown-linux-musl` target
+- Static linking via `RUSTFLAGS="-C target-feature=+crt-static"`
+- Special cases (e.g., mycelium builds in `myceliumd/` subdirectory)
+
+### 4. RFS Flists (Remote File System)
+
+**Purpose:** Lazy-load kernel modules and firmware from S3 at runtime
+
+**Flow:**
+1. Build stage creates flists: `modules-<KERNEL_VERSION>.fl` and `firmware-<TAG>.fl`
+2. Flists are SQLite databases containing:
+   - Content-addressed blob references
+   - S3 store URIs (patched for read-only access)
+   - Directory tree metadata
+3. Flists embedded in initramfs at `/etc/rfs/`
+4. Runtime: zinit units mount flists over `/lib/modules/` and `/lib/firmware/`
+5. Dual udev coldplug: early (before RFS) for networking, post-RFS for new hardware
+
+**Key files:**
+- `scripts/rfs/pack-modules.sh` - Creates modules flist from container `/lib/modules/`
+- `scripts/rfs/pack-firmware.sh` - Creates firmware flist from Alpine packages
+- `config/zinit/init/modules.sh` - Runtime mount script
+- `config/zinit/init/firmware.sh` - Runtime mount script
+
+### 5. zinit Service Management
+
+**No OpenRC:** This system uses zinit exclusively for process management.
+
+**Service graph:**
+```
+/init → zinit → [stage1-modules, udevd, depmod]
+                → udev-trigger (early coldplug)
+                → network
+                → rfs-modules + rfs-firmware (mount flists)
+                → udev-rfs (post-RFS coldplug)
+                → services
+```
+
+**Service definitions:** YAML files in `config/zinit/` with `after:`, `needs:`, `wants:` dependencies
+
+## Critical Conventions
+
+### Path Normalization
+**Problem:** Stages can change CWD (kernel build uses `/workspace/kernel/current`)
+**Solution:** All paths normalized to absolute at startup in `scripts/lib/common.sh:244`
+
+**Variables affected:**
+- `INSTALL_DIR` (initramfs/)
+- `COMPONENTS_DIR` (components/)
+- `KERNEL_DIR` (kernel/)
+- `DIST_DIR` (dist/)
+
+**Never use relative paths** when calling functions that might be in different CWDs.
+
+### Branding and Security
+**Passwordless root enforcement:**
+- Applied in `scripts/lib/initramfs.sh:575` via `passwd -d -R "${initramfs_dir}" root`
+- Creates `root::` in `/etc/shadow` (empty password field)
+- Controlled by `ZEROOS_BRANDING` and `ZEROOS_PASSWORDLESS_ROOT` flags
+
+**Never edit /etc/shadow manually** - always use `passwd` or `chpasswd` with chroot.
+
+### Module Loading Strategy
+**2-stage approach:**
+- **Stage 1:** Critical boot modules (virtio, e1000, scsi) - loaded by zinit early
+- **Stage 2:** Extended hardware (igb, ixgbe, i40e) - loaded after network
+
+**Config:** `config/modules.conf` with `stage1:` and `stage2:` prefixes
+
+**Dependency resolution:**
+- Uses `modinfo` to build dependency tree
+- Resolves from container `/lib/modules/<FULL_VERSION>/`
+- Must run after `kernel_modules` stage
+
+### Firmware Policy
+**For initramfs:** `config/firmware.conf` is the SINGLE source of truth
+- Any firmware hints in `modules.conf` are IGNORED
+- Prevents duplication/version mismatches
+
+**For RFS:** Full Alpine `linux-firmware*` packages installed in container
+- Packed from container `/lib/firmware/`
+- Overmounts at runtime for extended hardware
+
+## Common Workflows
+
+### Full Build from Scratch
+```bash
+# Clean everything and rebuild
+./scripts/build.sh --clean
+
+# Or just rebuild all stages
+./scripts/build.sh --force-rebuild
+```
+
+### Quick Iteration After Config Changes
+```bash
+# After editing zinit configs, init script, or modules.conf
+./scripts/rebuild-after-zinit.sh
+
+# With kernel rebuild
+./scripts/rebuild-after-zinit.sh --with-kernel
+
+# Dry-run to see what changed
+./scripts/rebuild-after-zinit.sh --verify-only
+```
+
+### Minimal Manual Rebuild
+```bash
+# Remove specific stages
+rm -f .build-stages/initramfs_create.done
+rm -f .build-stages/validation.done
+
+# Rebuild only those stages
+DEBUG=1 ./scripts/build.sh
+```
+
+### Testing the Built Kernel
+```bash
+# QEMU (default)
+./runit.sh
+
+# cloud-hypervisor with 5 disks
+./runit.sh --hypervisor ch --disks 5 --reset
+
+# Custom memory and bridge
+./runit.sh --memory 4096 --bridge zosbr
+```
+
+### Persistent Dev Container
+```bash
+# Start persistent container
+./scripts/dev-container.sh start
+
+# Enter shell
+./scripts/dev-container.sh shell
+
+# Run build inside
+./scripts/dev-container.sh build
+
+# Stop container
+./scripts/dev-container.sh stop
+```
+
+## Debugging Guidelines
+
+### Diagnostics-First Approach
+**ALWAYS add diagnostics before fixes:**
+1. Enable `DEBUG=1` for verbose safe_execute logs
+2. Add strategic `log_debug` statements
+3. Confirm hypothesis in logs
+4. Then apply minimal fix
+
+**Example:**
+```bash
+# Bad: Guess and fix
+Edit file to fix suspected issue
+
+# Good: Diagnose first
+1. Add log_debug "Variable X=${X}, resolved=${resolved_path}"
+2. DEBUG=1 ./scripts/build.sh
+3. Confirm in output
+4. Apply fix with evidence
+```
+
+### Key Diagnostic Functions
+- `scripts/lib/common.sh`: `log_info`, `log_warn`, `log_error`, `log_debug`
+- `scripts/lib/initramfs.sh:820`: Validation debug prints (input, PWD, PROJECT_ROOT, resolved paths)
+- `scripts/lib/initramfs.sh:691`: Pre-CPIO sanity checks with file listings
+
+### Common Issues and Solutions
+
+**"Initramfs directory not found"**
+- **Cause:** INSTALL_DIR interpreted as relative in different CWD
+- **Fix:** Already patched - paths normalized at startup
+- **Check:** Look for "Validation debug:" logs showing resolved paths
+
+**"INITRAMFS_ARCHIVE unbound"**
+- **Cause:** Incremental build skipped initramfs_create stage
+- **Fix:** Already patched - stages default INITRAMFS_ARCHIVE if unset
+- **Check:** `scripts/build.sh:401` logs "defaulting INITRAMFS_ARCHIVE"
+
+**"Module dependency resolution fails"**
+- **Cause:** Container `/lib/modules/<FULL_VERSION>` missing or stale
+- **Fix:** `./scripts/rebuild-after-zinit.sh --refresh-container-mods`
+- **Check:** Ensure `kernel_modules` stage completed successfully
+
+**"Passwordless root not working"**
+- **Cause:** Branding disabled or shadow file not updated
+- **Fix:** Check ZEROOS_BRANDING=true in logs, verify /etc/shadow has `root::`
+- **Verify:** Extract initramfs and `grep '^root:' etc/shadow`
+
+## Important Files Quick Reference
+
+### Must-Read Before Editing
+- `scripts/build.sh` - Orchestrator with precise stage order
+- `scripts/lib/common.sh` - Path normalization, logging, utilities
+- `scripts/lib/stages.sh` - Stage tracking logic
+- `config/build.conf` - Version pins, directory settings, flags
+
+### Safe to Edit
+- `config/zinit/*.yaml` - Service definitions
+- `config/zinit/init/*.sh` - Runtime initialization scripts
+- `config/modules.conf` - Module lists (stage1/stage2)
+- `config/firmware.conf` - Initramfs firmware selection
+- `config/packages.list` - Alpine packages
+
+### Generated (Never Edit)
+- `initramfs/` - Assembled initramfs tree
+- `components/` - Downloaded component sources
+- `kernel/` - Kernel source tree
+- `dist/` - Build outputs
+- `.build-stages/` - Completion markers
+
+## Testing Architecture
+
+**No built-in tests during build** - Tests run separately via `runit.sh`
+
+**Why?**
+- Build is for assembly, not validation
+- Tests require hypervisor (QEMU/cloud-hypervisor)
+- Separation allows faster iteration
+
+**runit.sh features:**
+- Multi-disk support (qcow2 for QEMU, raw for cloud-hypervisor)
+- Network bridge/TAP configuration
+- Persistent volumes (reset with `--reset`)
+- Serial console logging
+
+## Quick Command Reference
+
+```bash
+# Build
+./scripts/build.sh                      # Incremental build
+./scripts/build.sh --clean             # Clean build
+./scripts/build.sh --show-stages       # Show completion status
+./scripts/build.sh --rebuild-from=zinit_setup  # Rebuild from stage
+DEBUG=1 ./scripts/build.sh             # Verbose output
+
+# Rebuild helpers
+./scripts/rebuild-after-zinit.sh       # After zinit/init/modules changes
+./scripts/rebuild-after-zinit.sh --with-kernel  # Also rebuild kernel
+./scripts/rebuild-after-zinit.sh --verify-only  # Dry-run
+
+# Testing
+./runit.sh                             # QEMU test
+./runit.sh --hypervisor ch             # cloud-hypervisor test
+./runit.sh --help                      # All options
+
+# Dev container
+./scripts/dev-container.sh start       # Start persistent container
+./scripts/dev-container.sh shell       # Enter shell
+./scripts/dev-container.sh build       # Build inside container
+./scripts/dev-container.sh stop        # Stop container
+
+# Cleanup
+./scripts/clean.sh                     # Remove all generated files
+rm -rf .build-stages/                  # Reset stage markers
+```
+
+## Environment Variables
+
+**Build control:**
+- `DEBUG=1` - Enable verbose logging
+- `FORCE_REBUILD=true` - Force rebuild all stages
+- `REBUILD_FROM_STAGE=<name>` - Rebuild from specific stage
+
+**Version overrides:**
+- `ALPINE_VERSION=3.22` - Alpine Linux version
+- `KERNEL_VERSION=6.12.44` - Linux kernel version
+- `RUST_TARGET=x86_64-unknown-linux-musl` - Rust compilation target
+
+**Firmware tagging:**
+- `FIRMWARE_TAG=20250908` - Firmware flist version tag
+
+**RFS configuration:**
+- See `config/rfs.conf` for S3 endpoint, credentials, paths
+
+## Documentation Hierarchy
+
+**Start here:**
+1. `README.md` - User-facing guide with features and setup
+2. This file (`claude.md`) - AI assistant context
+
+**For development:**
+3. `docs/NOTES.md` - Operational knowledge, troubleshooting
+4. `docs/AGENTS.md` - Quick agent reference
+5. `docs/TODO.md` - Current work checklist with code links
+
+**For deep dives:**
+6. `docs/PROMPT.md` - Strict debugger agent mode (diagnostics-first)
+7. `docs/rfs-flists.md` - RFS design and implementation
+8. `docs/review-rfs-integration.md` - Integration points analysis
+9. `docs/depmod-behavior.md` - Module dependency deep dive
+
+**Historical:**
+10. `IMPLEMENTATION_PLAN.md` - Original design document
+11. `GITHUB_ACTIONS.md` - CI/CD setup guide
+
+## Project Philosophy
+
+1. **Reproducibility:** Container-based builds ensure identical results
+2. **Incrementality:** Stage markers minimize rebuild time
+3. **Diagnostics-first:** Log before fixing, validate assumptions
+4. **Minimal intervention:** Alpine + zinit only, no systemd/OpenRC
+5. **Size-optimized:** Aggressive cleanup, strip, UPX compression
+6. **Remote-ready:** RFS enables lazy-loading for extended hardware support
+
+## Commit Message Guidelines
+
+**DO NOT add Claude Code or AI assistant references to commit messages.**
+
+Keep commits clean and professional:
+- Focus on what changed and why
+- Use conventional commit prefixes: `build:`, `docs:`, `fix:`, `feat:`, `refactor:`
+- Be concise but descriptive
+- No emoji unless project convention
+- No "Generated with Claude Code" or "Co-Authored-By: Claude" footers
+
+**Good example:**
+```
+build: remove testing.sh in favor of runit.sh
+
+Replace inline boot testing with standalone runit.sh runner.
+Tests now run separately from build pipeline for faster iteration.
+```
+
+**Bad example:**
+```
+build: remove testing.sh 🤖
+
+Made some changes to testing.
+
+🤖 Generated with [Claude Code](https://claude.com/claude-code)
+Co-Authored-By: Claude <noreply@anthropic.com>
+```
+
+## Keywords for Quick Search
+
+- **Build fails:** Check `DEBUG=1` logs, stage completion markers, container state
+- **Module issues:** `kernel_modules` stage, `CONTAINER_MODULES_PATH`, depmod logs
+- **Firmware missing:** `config/firmware.conf` for initramfs, RFS flist for runtime
+- **zinit problems:** Service YAML syntax, dependency order, init script errors
+- **Path errors:** Absolute path normalization in `common.sh:244`
+- **Size too large:** Check cleanup stage, strip/UPX execution, package list
+- **Container issues:** Rootless setup, subuid/subgid, podman vs docker
+- **RFS mount fails:** S3 credentials, network readiness, flist manifest paths
+
+---
+
+**Last updated:** 2025-01-04
+
+**Maintainer notes:** This file is the entry point for AI assistants. Keep it updated when architecture changes. Cross-reference with `docs/NOTES.md` for operational details.