manual removed files
Before Width: | Height: | Size: 16 KiB |
@@ -1,12 +0,0 @@
|
||||
@startuml
|
||||
start
|
||||
:power on node;
|
||||
repeat
|
||||
:mount boot flist;
|
||||
:copy files to node root;
|
||||
:reconfigure services;
|
||||
:restart services;
|
||||
repeat while (new flist version?) is (yes)
|
||||
-> power off;
|
||||
stop
|
||||
@enduml
|
Before Width: | Height: | Size: 25 KiB |
@@ -1,50 +0,0 @@
|
||||
@startuml
|
||||
|
||||
package "node-ready"{
|
||||
[local-modprobe]
|
||||
[udev-trigger]
|
||||
[redis]
|
||||
[haveged]
|
||||
[cgroup]
|
||||
[redis]
|
||||
}
|
||||
|
||||
package "boot" {
|
||||
[storaged]
|
||||
[internet]
|
||||
[networkd]
|
||||
[identityd]
|
||||
}
|
||||
|
||||
package "internal modules"{
|
||||
[flistd]
|
||||
[containerd]
|
||||
[contd]
|
||||
[upgraded]
|
||||
[provisiond]
|
||||
}
|
||||
|
||||
[local-modprobe]<-- [udev-trigger]
|
||||
[udev-trigger] <-- [storaged]
|
||||
[udev-trigger] <-- [internet]
|
||||
[storaged] <-- [identityd]
|
||||
|
||||
[identityd] <- [networkd]
|
||||
|
||||
[internet] <-- [networkd]
|
||||
[networkd] <-- [containerd]
|
||||
[storaged] <-- [containerd]
|
||||
|
||||
[containerd] <-- [contd]
|
||||
|
||||
[storaged] <-- [flistd]
|
||||
[networkd] <-- [flistd]
|
||||
|
||||
[flistd] <-- [upgraded]
|
||||
[networkd] <-- [upgraded]
|
||||
|
||||
[networkd] <-- [provisiond]
|
||||
[flistd] <-- [provisiond]
|
||||
[contd] <-- [provisiond]
|
||||
|
||||
@enduml
|
Before Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 47 KiB |
Before Width: | Height: | Size: 45 KiB |
@@ -1,37 +0,0 @@
|
||||
@startuml
|
||||
title Provisioning of a resource space
|
||||
|
||||
autonumber
|
||||
actor User as user
|
||||
' entity Farmer as farmer
|
||||
entity Network as network
|
||||
database Blockchain as bc
|
||||
boundary Node as node
|
||||
collections "Resource space" as rs
|
||||
|
||||
== Resource research ==
|
||||
user -> network: Send resource request
|
||||
activate network
|
||||
network -> node: broadcast resource request
|
||||
activate node
|
||||
deactivate network
|
||||
...broadcast to all nodes...
|
||||
node -> user: Send offer
|
||||
user -> user: inspect offer
|
||||
|
||||
== Resource space negotiation ==
|
||||
user -> node: accept offer
|
||||
user <-> node: key exchange
|
||||
user -> bc: money is locked on blockchain
|
||||
...
|
||||
node -> rs: create resrouce space
|
||||
activate rs
|
||||
node -> user: notify space is created
|
||||
node -> bc: notify he created the space
|
||||
user -> rs: make sure it can access the space
|
||||
user -> bc: validate can access the space
|
||||
bc -> node: money is released to the node
|
||||
deactivate node
|
||||
== Usage of the space ==
|
||||
user -> rs: deploy workload
|
||||
@enduml
|
Before Width: | Height: | Size: 70 KiB |
@@ -1,42 +0,0 @@
|
||||
@startuml
|
||||
title Provisioning a workload on the TFGrid
|
||||
|
||||
autonumber
|
||||
actor "User" as user
|
||||
actor "Farmer" as farmer
|
||||
database "TF Explorer" as explorer
|
||||
database Blockchain as blockchain
|
||||
boundary Node as node
|
||||
|
||||
== Price definition ==
|
||||
farmer -> explorer: Farmer set the price of its Resource units
|
||||
== Resource research ==
|
||||
activate explorer
|
||||
user -> explorer: User look where to deploy the workload
|
||||
user <- explorer: Gives detail about the farmer owning the node selected
|
||||
== Resource reservation ==
|
||||
user -> explorer: write description of the workload
|
||||
explorer -> user: return a list of transaction to execute on the blockchain
|
||||
== Reservation processing ==
|
||||
user -> blockchain: execute transactions
|
||||
explorer <-> blockchain: verify transactions are done
|
||||
explorer -> explorer: reservation status changed to `deploy`
|
||||
== Resource provisioning ==
|
||||
node <-> explorer: read description of the workloads
|
||||
node -> node: provision workload
|
||||
alt provision successfull
|
||||
node -> explorer: write result of the provisining
|
||||
explorer -> blockchain: forward token to the farmer
|
||||
blockchain -> farmer: tokens are available to the farmer
|
||||
user <- explorer: read the connection information to his workload
|
||||
else provision error
|
||||
node -> explorer: write result of the provisining
|
||||
explorer -> explorer: cancel reservation
|
||||
node -> node: free up capacity
|
||||
explorer -> blockchain: token refunded to user
|
||||
blockchain <-> user: tokens are available to the user again
|
||||
end
|
||||
deactivate explorer
|
||||
== Resource monitoring ==
|
||||
user <-> node: use / monitor workload
|
||||
@enduml
|
@@ -1,20 +0,0 @@
|
||||
@startuml
|
||||
|
||||
== Initialization ==
|
||||
Module -> MsgBroker: Announce Module
|
||||
MsgBroker -> Module: create bi-directional channel
|
||||
|
||||
== Utilisation ==
|
||||
loop
|
||||
DSL -> MsgBroker: put RPC message
|
||||
activate MsgBroker
|
||||
Module <- MsgBroker: pull RPC message
|
||||
activate Module
|
||||
Module -> Module: execute method
|
||||
Module -> MsgBroker: put reponse
|
||||
deactivate Module
|
||||
MsgBroker -> DSL : read reponse
|
||||
deactivate MsgBroker
|
||||
end
|
||||
|
||||
@enduml
|
Before Width: | Height: | Size: 15 KiB |
Before Width: | Height: | Size: 23 KiB |
@@ -1,22 +0,0 @@
|
||||
@startuml
|
||||
actor User as user
|
||||
box "To Be Defined" #LightBlue
|
||||
participant Market
|
||||
end box
|
||||
entity Farmer as farmer
|
||||
boundary Node as node
|
||||
|
||||
user -> farmer: Request space
|
||||
activate farmer
|
||||
farmer -> node: reserve space
|
||||
activate node
|
||||
farmer -> user: confirmation
|
||||
deactivate farmer
|
||||
...
|
||||
note over user, node: communication allows only owner of space
|
||||
user -> node: deploy services
|
||||
...
|
||||
user -> farmer: destroy space
|
||||
farmer -> node: delete space
|
||||
deactivate node
|
||||
@enduml
|
@@ -1,30 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# This is the same as the first case at qemu/README.md in a single script
|
||||
|
||||
sudo ip link add zos0 type bridge
|
||||
sudo ip link set zos0 up
|
||||
|
||||
sudo ip addr add 192.168.123.1/24 dev zos0
|
||||
md5=$(echo $USER| md5sum )
|
||||
ULA=${md5:0:2}:${md5:2:4}:${md5:6:4}
|
||||
sudo ip addr add fd${ULA}::1/64 dev zos0
|
||||
# you might want to add fe80::1/64
|
||||
sudo ip addr add fe80::1/64 dev zos0
|
||||
|
||||
sudo iptables -t nat -I POSTROUTING -s 192.168.123.0/24 -j MASQUERADE
|
||||
sudo ip6tables -t nat -I POSTROUTING -s fd${ULA}::/64 -j MASQUERADE
|
||||
sudo iptables -t filter -I FORWARD --source 192.168.123.0/24 -j ACCEPT
|
||||
sudo iptables -t filter -I FORWARD --destination 192.168.123.0/24 -j ACCEPT
|
||||
sudo sysctl -w net.ipv4.ip_forward=1
|
||||
|
||||
sudo dnsmasq --strict-order \
|
||||
--except-interface=lo \
|
||||
--interface=zos0 \
|
||||
--bind-interfaces \
|
||||
--dhcp-range=192.168.123.20,192.168.123.50 \
|
||||
--dhcp-range=::1000,::1fff,constructor:zos0,ra-stateless,12h \
|
||||
--conf-file="" \
|
||||
--pid-file=/var/run/qemu-dnsmasq-zos0.pid \
|
||||
--dhcp-leasefile=/var/run/qemu-dnsmasq-zos0.leases \
|
||||
--dhcp-no-override
|
@@ -1,61 +0,0 @@
|
||||
# Adding a new package
|
||||
|
||||
Binary packages are added via providing [a build script](../../bins/), then an automated workflow will build/publish an flist with this binary.
|
||||
|
||||
For example, to add `rmb` binary, we need to provide a bash script with a `build_rmb` function:
|
||||
|
||||
|
||||
```bash
|
||||
RMB_VERSION="0.1.2"
|
||||
RMB_CHECKSUM="4fefd664f261523b348fc48e9f1c980b"
|
||||
RMB_LINK="https://github.com/threefoldtech/rmb-rs/releases/download/v${RMB_VERSION}/rmb"
|
||||
|
||||
download_rmb() {
|
||||
echo "download rmb"
|
||||
download_file ${RMB_LINK} ${RMB_CHECKSUM} rmb
|
||||
}
|
||||
|
||||
prepare_rmb() {
|
||||
echo "[+] prepare rmb"
|
||||
github_name "rmb-${RMB_VERSION}"
|
||||
}
|
||||
|
||||
install_rmb() {
|
||||
echo "[+] install rmb"
|
||||
|
||||
mkdir -p "${ROOTDIR}/bin"
|
||||
|
||||
cp ${DISTDIR}/rmb ${ROOTDIR}/bin/
|
||||
chmod +x ${ROOTDIR}/bin/*
|
||||
}
|
||||
|
||||
build_rmb() {
|
||||
pushd "${DISTDIR}"
|
||||
|
||||
download_rmb
|
||||
popd
|
||||
|
||||
prepare_rmb
|
||||
install_rmb
|
||||
}
|
||||
```
|
||||
|
||||
Note that, you can just download a statically build binary instead of building it.
|
||||
|
||||
|
||||
The other step is to add it to workflow to be built automatically, in [bins workflow](../../.github/workflows/bins.yaml), add your binary's job:
|
||||
|
||||
```yaml
|
||||
jobs:
|
||||
containerd:
|
||||
...
|
||||
...
|
||||
rmb:
|
||||
uses: ./.github/workflows/bin-package.yaml
|
||||
with:
|
||||
package: rmb
|
||||
secrets:
|
||||
token: ${{ secrets.HUB_JWT }}
|
||||
```
|
||||
|
||||
Once e.g. a `devnet` release is published, your package will be built then pushed to an flist repository. After that, you can start your local zos node, wait for it to finish downloading, then you should find your binary available.
|
@@ -1,70 +0,0 @@
|
||||
# Quick start
|
||||
|
||||
- [Quick start](#quick-start)
|
||||
- [Starting a local zos node](#starting-a-local-zos-node)
|
||||
- [Accessing node](#accessing-node)
|
||||
- [Development](#development)
|
||||
|
||||
## Starting a local zos node
|
||||
|
||||
* Make sure `qemu` and `dnsmasq` are installed
|
||||
* [Create a farm](../manual/manual.md#creating-a-farm)
|
||||
* [Download a zos image](https://bootstrap.grid.tf/kernel/zero-os-development-zos-v3-generic-7e587e499a.efi)
|
||||
* Make sure `zos0` bridge is allowed by qemu, you can add `allow zos0` in `/etc/qemu/bridge.conf` (create the file if it's not there)
|
||||
* Setup the network using this script [this script](./net.sh)
|
||||
|
||||
Then, inside zos repository
|
||||
|
||||
```
|
||||
make -C cmds
|
||||
cd qemu
|
||||
mv <downloaded image path> ./zos.efi
|
||||
sudo ./vm.sh -n myzos-01 -c "farmer_id=<your farm id here> printk.devmsg=on runmode=dev"
|
||||
```
|
||||
|
||||
You should see the qemu console and boot logs, wait for awhile and you can [browse farms](https://dashboard.dev.grid.tf/explorer/farms) to see your node is added/detected automatically.
|
||||
|
||||
To stop the machine you can do `Control + a` then `x`.
|
||||
|
||||
You can read more about setting up a qemu development environment and more network options [here](../../qemu/README.md).
|
||||
|
||||
## Accessing node
|
||||
|
||||
After booting up, the node should start downloading external packages, this would take some time depending on your internet connection.
|
||||
|
||||
See [how to ssh into it.](../../qemu/README.md#to-ssh-into-the-machine)
|
||||
|
||||
How to get the node IP?
|
||||
Given the network script `dhcp-range`, it usually would be one of `192.168.123.43`, `192.168.123.44` or `192.168.123.45`.
|
||||
|
||||
Or you can simply install `arp-scan` then do something like:
|
||||
|
||||
```
|
||||
✗ sudo arp-scan --interface=zos0 --localnet
|
||||
Interface: zos0, type: EN10MB, MAC: de:26:45:e6:87:95, IPv4: 192.168.123.1
|
||||
Starting arp-scan 1.9.7 with 256 hosts (https://github.com/royhills/arp-scan)
|
||||
192.168.123.44 54:43:83:1f:eb:81 (Unknown)
|
||||
```
|
||||
|
||||
Now we know for sure it's `192.168.123.44`.
|
||||
|
||||
To check logs and see if the downloading of packages is still in progress, you can simply do:
|
||||
|
||||
```
|
||||
zinit log
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
While the overlay will enable your to boot with the binaries that's been built locally, sometimes you'll need to test the changes of certain modules without restarting the node (or intending to do so for e.g. testing a migration).
|
||||
|
||||
For example if we changed anything related to `noded`, we can do the following:
|
||||
|
||||
Inside zos repository:
|
||||
|
||||
* Build binaries locally
|
||||
* `make -C cmds`
|
||||
* Copy the binary inside the machine
|
||||
* `scp bin/zos root@192.168.123.44:/bin/noded`
|
||||
* SSH into the machine then use `zinit` to restart it:
|
||||
* `zinit stop noded && zinit start noded`
|
@@ -1,6 +0,0 @@
|
||||
Development
|
||||
===========
|
||||
|
||||
* [Quick start](./quickstart.md)
|
||||
* [Testing](./testing.md)
|
||||
* [Binary packages](./packages.md)
|
@@ -1,157 +0,0 @@
|
||||
# Testing
|
||||
|
||||
Beside unit testing, you might want to test your change in an integrated environment, the following are two options to do it.
|
||||
|
||||
- [Testing](#testing)
|
||||
- [Using grid/node client](#using-gridnode-client)
|
||||
- [Using a test app](#using-a-test-app)
|
||||
- [An example to talk to container and qsfs modules](#an-example-to-talk-to-container-and-qsfs-modules)
|
||||
- [An example of directly using zinit package](#an-example-of-directly-using-zinit-package)
|
||||
|
||||
|
||||
## Using grid/node client
|
||||
|
||||
You can simply use any grid client to deploy a workload of any type, you should specify your node's twin ID (and make sure you are on the correct network).
|
||||
|
||||
Inside the node, you can do `noded -id` and `noded -net` to get your current node ID and network. Also, [you can check your farm](https://dashboard.dev.grid.tf/explorer/farms) and get node information from there.
|
||||
|
||||
Another option is the golang [node client](../manual/manual.md#interaction).
|
||||
|
||||
While deploying on your local node, logs with `zinit log` would be helpful to see any possible errors and to debug your code.
|
||||
|
||||
## Using a test app
|
||||
|
||||
If you need to test a specific module or functionality, you can create a simple test app inside e.g. [tools directory](../../tools/).
|
||||
|
||||
Inside this simple test app, you can import any module or talk to another one using [zbus](../internals/internals.md#ipc).
|
||||
|
||||
### An example to talk to container and qsfs modules
|
||||
|
||||
|
||||
```go
|
||||
// tools/del/main.go
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"flag"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/rs/zerolog"
|
||||
"github.com/rs/zerolog/log"
|
||||
|
||||
"github.com/threefoldtech/zbus"
|
||||
"github.com/threefoldtech/zos/pkg"
|
||||
"github.com/threefoldtech/zos/pkg/stubs"
|
||||
)
|
||||
|
||||
func main() {
|
||||
zerolog.SetGlobalLevel(zerolog.DebugLevel)
|
||||
|
||||
zbus, err := zbus.NewRedisClient("unix:///var/run/redis.sock")
|
||||
if err != nil {
|
||||
log.Err(err).Msg("cannot init zbus client")
|
||||
return
|
||||
}
|
||||
|
||||
var workloadType, workloadID string
|
||||
|
||||
flag.StringVar(&workloadType, "type", "", "workload type (qsfs or container)")
|
||||
flag.StringVar(&workloadID, "id", "", "workload ID")
|
||||
|
||||
flag.Parse()
|
||||
|
||||
if workloadType == "" || workloadID == "" {
|
||||
log.Error().Msg("you need to provide both type and id")
|
||||
return
|
||||
}
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
|
||||
if workloadType == "qsfs" {
|
||||
qsfsd := stubs.NewQSFSDStub(zbus)
|
||||
err := qsfsd.SignalDelete(ctx, workloadID)
|
||||
if err != nil {
|
||||
log.Err(err).Msg("cannot delete qsfs workload")
|
||||
}
|
||||
} else if workloadType == "container" {
|
||||
args := strings.Split(workloadID, ":")
|
||||
if len(args) != 2 {
|
||||
log.Error().Msg("container id must contain namespace, e.g. qsfs:wl129")
|
||||
}
|
||||
|
||||
containerd := stubs.NewContainerModuleStub(zbus)
|
||||
err := containerd.SignalDelete(ctx, args[0], pkg.ContainerID(args[1]))
|
||||
if err != nil {
|
||||
log.Err(err).Msg("cannot delete container workload")
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
```
|
||||
|
||||
Then we can simply build, upload and execute this in our node:
|
||||
|
||||
```
|
||||
cd tools/del
|
||||
go build
|
||||
scp del root@192.168.123.44:/root/del
|
||||
```
|
||||
|
||||
Then ssh into `192.168.123.44` and simply execute your test app:
|
||||
|
||||
```
|
||||
./del
|
||||
```
|
||||
|
||||
### An example of directly using zinit package
|
||||
|
||||
```go
|
||||
// tools/zinit_test
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"regexp"
|
||||
|
||||
"github.com/rs/zerolog"
|
||||
"github.com/rs/zerolog/log"
|
||||
|
||||
"github.com/threefoldtech/zos/pkg/zinit"
|
||||
)
|
||||
|
||||
func main() {
|
||||
zerolog.SetGlobalLevel(zerolog.DebugLevel)
|
||||
z := zinit.New("/var/run/zinit.sock")
|
||||
|
||||
regex := fmt.Sprintf(`^ip netns exec %s %s`, "ndmz", "/sbin/udhcpc")
|
||||
_, err := regexp.Compile(regex)
|
||||
if err != nil {
|
||||
log.Err(err).Msgf("cannot compile %s", regex)
|
||||
return
|
||||
}
|
||||
|
||||
// try match
|
||||
matched, err := z.Matches(zinit.WithExecRegex(regex))
|
||||
if err != nil {
|
||||
log.Err(err).Msg("cannot filter services")
|
||||
}
|
||||
|
||||
matchedStr, err := json.Marshal(matched)
|
||||
if err != nil {
|
||||
log.Err(err).Msg("cannot convert matched map to json")
|
||||
}
|
||||
|
||||
log.Debug().Str("matched", string(matchedStr)).Msg("matched services")
|
||||
|
||||
// // try destroy
|
||||
// err = z.Destroy(10*time.Second, matched...)
|
||||
// if err != nil {
|
||||
// log.Err(err).Msg("cannot destroy matched services")
|
||||
// }
|
||||
}
|
||||
```
|
@@ -1,6 +0,0 @@
|
||||
# FAQ
|
||||
|
||||
This section consolidated all the common question we get about how 0-OS work and how to operate it.
|
||||
|
||||
- **Q**: What is the preferred configuration for my raid controller when running 0-OS ?
|
||||
**A**: 0-OS goal is to expose raw capacity. So it is best to always try to give him access to the most raw access to the disks. In case of raid controllers, the best is to try to set it up in [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD) mode if available.
|
@@ -1,11 +0,0 @@
|
||||
# Services Boot Sequence
|
||||
|
||||
Here is dependency graph of all the services started by 0-OS:
|
||||
|
||||

|
||||
|
||||
## Pseudo boot steps
|
||||
|
||||
both `node-ready` and `boot` are not actual services, but instead they are there to define a `boot stage`. for example once `node-ready` service is (ready) it means all crucial system services defined by 0-initramfs are now running.
|
||||
|
||||
`boot` service is similar, but guarantees that some 0-OS services are running (for example `storaged`), before starting other services like `flistd` which requires `storaged`
|
@@ -1,89 +0,0 @@
|
||||
<h1>Capacity</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [System reserved capacity](#system-reserved-capacity)
|
||||
- [Reserved Memory](#reserved-memory)
|
||||
- [Reserved Storage](#reserved-storage)
|
||||
- [User Capacity](#user-capacity)
|
||||
- [Memory](#memory)
|
||||
- [Storage](#storage)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
This document describes how ZOS does the following tasks:
|
||||
|
||||
- Reserved system resources
|
||||
- Memory
|
||||
- Storage
|
||||
- Calculation of free usable capacity for user workloads
|
||||
|
||||
## System reserved capacity
|
||||
|
||||
ZOS always reserve some amount of the available physical resources to its own operation. The system tries to be as protective
|
||||
as possible of it's critical services to make sure that the node is always reachable and usable even if it's under heavy load
|
||||
|
||||
ZOS make sure it reserves Memory and Storage (but not CPU) as per the following:
|
||||
|
||||
### Reserved Memory
|
||||
|
||||
ZOS reserve 10% of the available system memory for basic services AND operation overhead. The operation overhead can happen as a side effect of running user workloads. For example, a user network while in theory does not consume any memory, in matter of fact it also consume some memory (kernel buffers, etc...). Same for a VM. A user VM can be assigned say 5G but the process that running the VM can/will take few extra megabytes to operate.
|
||||
|
||||
This is why we decided to play on the safe side, and reserve 10% of total system memory to the system overhead, with a **MIN** reserved memory of 2GB
|
||||
|
||||
```python
|
||||
reserved = min(total_in_gb * 0.1, 2G)
|
||||
```
|
||||
|
||||
### Reserved Storage
|
||||
|
||||
While ZOS does not require installation, but it needs to download and store many things to operate correctly. This include the following:
|
||||
|
||||
- Node identity. Information about the node id and keys
|
||||
- The system binaries, those what include all zos to join the grid and operate as expected
|
||||
- Workload flists. Those are the flists of the user workloads. Those are downloaded on demand so they don't always exist.
|
||||
- State information. Tracking information maintained by ZOS to track the state of workloads, owner-ship, and more.
|
||||
|
||||
This is why the system on first start allocates and reserve a part of the available SSD storage and is called `zos-cache`. Initially is `5G` (was 100G in older version) but because the `dynamic` nature of the cache we can't fix it at `5G`
|
||||
|
||||
The required space to be reserved by the system can dramatically change based on the amount of workloads running on the system. For example if many users are running many different VMs, the system will need to download (and cache) different VM images, hence requiring more cache.
|
||||
|
||||
This is why the system periodically checks the reserved storage and then dynamically expand or shrink to a more suitable value in increments of 5G. The expansion happens around the 20% of current cache size, and shrinking if went below 20%.
|
||||
|
||||
## User Capacity
|
||||
|
||||
All workloads requires some sort of a resource(s) to run and that is actually what the user hae to pay for. Any workload can consume resources in one of the following criteria:
|
||||
|
||||
- CU (compute unit in vCPU)
|
||||
- MU (memory unit in bytes)
|
||||
- NU (network unit in bytes)
|
||||
- SU (ssd storage in bytes)
|
||||
- HU (hdd storage in bytes)
|
||||
|
||||
A workloads, based on the type can consume one or more of those resource types. Some workloads will have a well known "size" on creation, others might be dynamic and won't be know until later.
|
||||
|
||||
For example, a disk workload SU consumption will be know ahead. Unlike the NU used by a network which will only be known after usage over a certain period of time.
|
||||
|
||||
A single deployment can has multiple workloads each requires a certain amount of one or more capacity types (listed above). ZOS then for each workloads type compute the amount of resources needed per workload, and then check if it can provide this amount of capacity.
|
||||
|
||||
> This means that a deployment that define 2 VMs can partially succeed to deploy one of the VMs but not the other one if the amount of resources it requested are higher than what the node can provide
|
||||
|
||||
### Memory
|
||||
|
||||
How the system decide if there are enough memory to run a certain workload that demands MU resources goes as follows:
|
||||
|
||||
- compute the "theoretically used" memory by all user workloads excluding `self`. This is basically the sum of all consumed MU units of all active workloads (as defined by their corresponding deployments, not as per actually used in the system).
|
||||
- The theoretically used memory is topped with the system reserved memory.
|
||||
- The the system checks actually used memory on the system this is done simply by doing `actual_used = memory.total - memory.available`
|
||||
- The system now can simply `assume` an accurate used memory by doing `used = max(actual_used, theoretically_used)`
|
||||
- Then `available = total - used`
|
||||
- Then simply checks that `available` memory is enough to hold requested workload memory!
|
||||
|
||||
### Storage
|
||||
|
||||
Storage is much simpler to allocate than memory. It's completely left to the storage subsystem to find out if it can fit the requested storage on the available physical disks or not, if not possible the workloads is marked as error.
|
||||
|
||||
Storage tries to find the requested space based on type (SU or HU), then find the optimal way to fit that on the available disks, or spin up a new one if needed.
|
@@ -1,14 +0,0 @@
|
||||
# Compatibility list
|
||||
|
||||
This document track all the hardware that have been tested, the issues encountered and possible workarounds.
|
||||
|
||||
**Legend**
|
||||
✅ : fully supported
|
||||
⚠️ : supported with some tweaking
|
||||
🛑 : not supported
|
||||
|
||||
|
||||
| vendor | Hardware | Support | Issues | workaround |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| Supermicro | SYS-5038ML-H8TRF | ✅ | | |
|
||||
| Gigabyte Technology Co | AB350N-Gaming WIFI | ✅ | | |
|
@@ -1,106 +0,0 @@
|
||||
<h1>Container Module</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [ZBus](#zbus)
|
||||
- [Home Directory](#home-directory)
|
||||
- [Introduction](#introduction)
|
||||
- [zinit unit](#zinit-unit)
|
||||
- [Interface](#interface)
|
||||
|
||||
***
|
||||
|
||||
## ZBus
|
||||
|
||||
Storage module is available on zbus over the following channel
|
||||
|
||||
| module | object | version |
|
||||
|--------|--------|---------|
|
||||
| container|[container](#interface)| 0.0.1|
|
||||
|
||||
## Home Directory
|
||||
|
||||
contd keeps some data in the following locations
|
||||
| directory | path|
|
||||
|----|---|
|
||||
| root| `/var/cache/modules/containerd`|
|
||||
|
||||
## Introduction
|
||||
|
||||
The container module, is a proxy to [containerd](https://github.com/containerd/containerd). The proxy provides integration with zbus.
|
||||
|
||||
The implementation is the moment is straight forward, which includes preparing the OCI spec for the container, the tenant containerd namespace,
|
||||
setting up proper capabilities, and finally creating the container instance on `containerd`.
|
||||
|
||||
The module is fully stateless, all container information is queried during runtime from `containerd`.
|
||||
|
||||
### zinit unit
|
||||
|
||||
`contd` must run after containerd is running, and the node boot process is complete. Since it doesn't keep state, no dependency on `stroaged` is needed
|
||||
|
||||
```yaml
|
||||
exec: contd -broker unix:///var/run/redis.sock -root /var/cache/modules/containerd
|
||||
after:
|
||||
- containerd
|
||||
- boot
|
||||
```
|
||||
|
||||
## Interface
|
||||
|
||||
```go
|
||||
package pkg
|
||||
|
||||
// ContainerID type
|
||||
type ContainerID string
|
||||
|
||||
// NetworkInfo defines a network configuration for a container
|
||||
type NetworkInfo struct {
|
||||
// Currently a container can only join one (and only one)
|
||||
// network namespace that has to be pre defined on the node
|
||||
// for the container tenant
|
||||
|
||||
// Containers don't need to know about anything about bridges,
|
||||
// IPs, wireguards since this is all is only known by the network
|
||||
// resource which is out of the scope of this module
|
||||
Namespace string
|
||||
}
|
||||
|
||||
// MountInfo defines a mount point
|
||||
type MountInfo struct {
|
||||
Source string // source of the mount point on the host
|
||||
Target string // target of mount inside the container
|
||||
Type string // mount type
|
||||
Options []string // mount options
|
||||
}
|
||||
|
||||
//Container creation info
|
||||
type Container struct {
|
||||
// Name of container
|
||||
Name string
|
||||
// path to the rootfs of the container
|
||||
RootFS string
|
||||
// Env env variables to container in format {'KEY=VALUE', 'KEY2=VALUE2'}
|
||||
Env []string
|
||||
// Network network info for container
|
||||
Network NetworkInfo
|
||||
// Mounts extra mounts for container
|
||||
Mounts []MountInfo
|
||||
// Entrypoint the process to start inside the container
|
||||
Entrypoint string
|
||||
// Interactivity enable Core X as PID 1 on the container
|
||||
Interactive bool
|
||||
}
|
||||
|
||||
// ContainerModule defines rpc interface to containerd
|
||||
type ContainerModule interface {
|
||||
// Run creates and starts a container on the node. It also auto
|
||||
// starts command defined by `entrypoint` inside the container
|
||||
// ns: tenant namespace
|
||||
// data: Container info
|
||||
Run(ns string, data Container) (ContainerID, error)
|
||||
|
||||
// Inspect, return information about the container, given its container id
|
||||
Inspect(ns string, id ContainerID) (Container, error)
|
||||
Delete(ns string, id ContainerID) error
|
||||
}
|
||||
```
|
@@ -1,74 +0,0 @@
|
||||
<h1>Flist Module</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Zbus](#zbus)
|
||||
- [Home Directory](#home-directory)
|
||||
- [Introduction](#introduction)
|
||||
- [Public interface ](#public-interface-)
|
||||
- [zinit unit](#zinit-unit)
|
||||
|
||||
***
|
||||
|
||||
## Zbus
|
||||
|
||||
Flist module is available on zbus over the following channel:
|
||||
|
||||
| module | object | version |
|
||||
|--------|--------|---------|
|
||||
|flist |[flist](#public-interface)| 0.0.1
|
||||
|
||||
## Home Directory
|
||||
flist keeps some data in the following locations:
|
||||
| directory | path|
|
||||
|----|---|
|
||||
| root| `/var/cache/modules/containerd`|
|
||||
|
||||
## Introduction
|
||||
|
||||
This module is responsible to "mount an flist" in the filesystem of the node. The mounted directory contains all the files required by containers or (in the future) VMs.
|
||||
|
||||
The flist module interface is very simple. It does not expose any way to choose where to mount the flist or have any reference to containers or VM. The only functionality is to mount a given flist and receive the location where it is mounted. It is up to the above layer to do something useful with this information.
|
||||
|
||||
The flist module itself doesn't contain the logic to understand the flist format or to run the fuse filesystem. It is just a wrapper that manages [0-fs](https://github.com/threefoldtech/0-fs) processes.
|
||||
|
||||
Its only job is to download the flist, prepare the isolation of all the data and then start 0-fs with the proper arguments.
|
||||
|
||||
## Public interface [](https://godoc.org/github.com/threefoldtech/zos/pkg/flist)
|
||||
|
||||
```go
|
||||
|
||||
//Flister is the interface for the flist module
|
||||
type Flister interface {
|
||||
// Mount mounts an flist located at url using the 0-db located at storage
|
||||
// in a RO mode. note that there is no way u can unmount a ro flist because
|
||||
// it can be shared by many users, it's then up to system to decide if the
|
||||
// mount is not needed anymore and clean it up
|
||||
Mount(name, url string, opt MountOptions) (path string, err error)
|
||||
|
||||
// UpdateMountSize change the mount size
|
||||
UpdateMountSize(name string, limit gridtypes.Unit) (path string, err error)
|
||||
|
||||
// Umount a RW mount. this only unmounts the RW layer and remove the assigned
|
||||
// volume.
|
||||
Unmount(name string) error
|
||||
|
||||
// HashFromRootPath returns flist hash from a running g8ufs mounted with NamedMount
|
||||
HashFromRootPath(name string) (string, error)
|
||||
|
||||
// FlistHash returns md5 of flist if available (requesting the hub)
|
||||
FlistHash(url string) (string, error)
|
||||
|
||||
Exists(name string) (bool, error)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
## zinit unit
|
||||
|
||||
The zinit unit file of the module specifies the command line, test command, and the order in which the services need to be booted.
|
||||
|
||||
Flist module depends on the storage and network pkg.
|
||||
This is because it needs connectivity to download flist and data and it needs storage to be able to cache the data once downloaded.
|
||||
|
||||
Flist doesn't do anything special on the system except creating a bunch of directories it will use during its lifetime.
|
@@ -1,121 +0,0 @@
|
||||
# Gateway Module
|
||||
|
||||
## ZBus
|
||||
|
||||
Gateway module is available on zbus over the following channel
|
||||
|
||||
| module | object | version |
|
||||
| ------- | --------------------- | ------- |
|
||||
| gateway | [gateway](#interface) | 0.0.1 |
|
||||
|
||||
## Home Directory
|
||||
|
||||
gateway keeps some data in the following locations
|
||||
| directory | path |
|
||||
| --------- | ---------------------------- |
|
||||
| root | `/var/cache/modules/gateway` |
|
||||
|
||||
The directory `/var/cache/modules/gateway/proxy` contains the route information used by traefik to forward traffic.
|
||||
## Introduction
|
||||
|
||||
The gateway modules is used to register traefik routes and services to act as a reverse proxy. It's the backend supporting two kinds of workloads: `gateway-fqdn-proxy` and `gateway-name-proxy`.
|
||||
|
||||
For the FQDN type, it receives the domain and a list of backends in the form `http://ip:port` or `https://ip:port` and registers a route for this domain forwarding traffic to these backends. It's a requirement that the domain resolves to the gateway public ip. The `tls_passthrough` parameter determines whether the tls termination happens on the gateway or in the backends. When it's true, the backends must be in the form `https://ip:port`, and the backends must be https-enabled servers.
|
||||
|
||||
The name type is the same as the FQDN type except that the `name` parameter is added as a prefix to the gatweay domain to determine the fqdn. It's forbidden to use a FQDN type workload to reserve a domain managed by the gateway.
|
||||
|
||||
The fqdn type is enabled only if there's a public config on the node. The name type works only if a domain exists in the public config. To make a full-fledged gateway node, these DNS records are required:
|
||||
```
|
||||
gatwaydomain.com A ip.of.the.gateway
|
||||
*.gatewaydomain.com CNAME gatewaydomain.com
|
||||
__acme-challenge.gatewaydomain.com NS gatdwaydomain.com
|
||||
```
|
||||
|
||||
### zinit unit
|
||||
|
||||
```yaml
|
||||
exec: gateway --broker unix:///var/run/redis.sock --root /var/cache/modules/gateway
|
||||
after:
|
||||
- boot
|
||||
```
|
||||
## Implementation details
|
||||
|
||||
Traefik is used as the reverse proxy forwarding traffic to upstream servers. All worklaods deployed on the node is associated with a domain that resolves to the node IP. In the name workload case, it's a subdomain of the gateway main domain. In the FQDN case, the user must create a DNS A record pointing it to the node IP. The node by default redirects all http traffic to https.
|
||||
|
||||
When an https request reaches the node, it looks at the domain and determines the correct service that should handle the request. The services defintions are in `/var/cache/modules/gateway/proxy/` and is hot-reloaded by traefik every time a service is added/removed to/from it. Zos currently supports enabling `tls_passthrough` in which case the https request is passed as is to the backend (at the TCP level). The default is `tls_passthrough` is false which means the node terminates the TLS traffic and then forwards the request as http to the backend.
|
||||
Example of a FQDN service definition with tls_passthrough enabled:
|
||||
```yaml
|
||||
tcp:
|
||||
routers:
|
||||
37-2039-testname-route:
|
||||
rule: HostSNI(`remote.omar.grid.tf`)
|
||||
service: 37-2039-testname
|
||||
tls:
|
||||
passthrough: "true"
|
||||
services:
|
||||
37-2039-testname:
|
||||
loadbalancer:
|
||||
servers:
|
||||
- address: 137.184.106.152:443
|
||||
```
|
||||
Example of a "name" service definition with tls_passthrough disabled:
|
||||
```yaml
|
||||
http:
|
||||
routers:
|
||||
37-1976-workloadname-route:
|
||||
rule: Host(`workloadname.gent01.dev.grid.tf`)
|
||||
service: 40-1976-workloadname
|
||||
tls:
|
||||
certResolver: dnsresolver
|
||||
domains:
|
||||
- sans:
|
||||
- '*.gent01.dev.grid.tf'
|
||||
services:
|
||||
40-1976-workloadname:
|
||||
loadbalancer:
|
||||
servers:
|
||||
- url: http://[backendip]:9000
|
||||
```
|
||||
|
||||
The `certResolver` option has two valid values, `resolver` and `dnsresolver`. The `resolver` is an http resolver and is used in FQDN services with `tls_passthrough` disabled. It uses the http challenge to generate a single-domain certificate. The `dnsresolver` is used for name services with `tls_passthrough` disabled. The `dnsresolver` is responsible for generating a wildcard certificate to be used for all subdomains of the gateway domain. Its flow is described below.
|
||||
|
||||
The CNAME record is used to make all subdomains (reserved or not) resolve to the ip of the gateway. Generating a wildcard certificate requires adding a TXT record at `__acme-challenge.gatewaydomain.com`. The NS record is used to delegate this specific subdomain to the node. So if someone did `dig TXT __acme-challenge.gatewaydomain.com`, the query is served by the node, not the DNS provider used for the gateway domain.
|
||||
|
||||
Traefik has, as a config parameter, multiple dns [providers](https://doc.traefik.io/traefik/https/acme/#providers) to communicate with when it wants to add the required TXT record. For non-supported providers, a bash script can be provided to do the record generation and clean up (i.e. External program). The bash [script](https://github.com/threefoldtech/zos/blob/main/pkg/gateway/static/cert.sh) starts dnsmasq managing a dns zone for the `__acme-challenge` subdomain with the given TXT record. It then kills the dnsmasq process and removes the config file during cleanup.
|
||||
## Interface
|
||||
|
||||
```go
|
||||
type Backend string
|
||||
|
||||
// GatewayFQDNProxy definition. this will proxy name.<zos.domain> to backends
|
||||
type GatewayFQDNProxy struct {
|
||||
// FQDN the fully qualified domain name to use (cannot be present with Name)
|
||||
FQDN string `json:"fqdn"`
|
||||
|
||||
// Passthroug whether to pass tls traffic or not
|
||||
TLSPassthrough bool `json:"tls_passthrough"`
|
||||
|
||||
// Backends are list of backend ips
|
||||
Backends []Backend `json:"backends"`
|
||||
}
|
||||
|
||||
|
||||
// GatewayNameProxy definition. this will proxy name.<zos.domain> to backends
|
||||
type GatewayNameProxy struct {
|
||||
// Name the fully qualified domain name to use (cannot be present with Name)
|
||||
Name string `json:"name"`
|
||||
|
||||
// Passthroug whether to pass tls traffic or not
|
||||
TLSPassthrough bool `json:"tls_passthrough"`
|
||||
|
||||
// Backends are list of backend ips
|
||||
Backends []Backend `json:"backends"`
|
||||
}
|
||||
|
||||
type Gateway interface {
|
||||
SetNamedProxy(wlID string, prefix string, backends []string, TLSPassthrough bool) (string, error)
|
||||
SetFQDNProxy(wlID string, fqdn string, backends []string, TLSPassthrough bool) error
|
||||
DeleteNamedProxy(wlID string) error
|
||||
Metrics() (GatewayMetrics, error)
|
||||
}
|
||||
```
|
@@ -1,99 +0,0 @@
|
||||
# 0-OS, a bit of history and introduction to Version 2
|
||||
|
||||
## Once upon a time
|
||||
----
|
||||
A few years ago, we were trying to come up with some solutions to the problem of self-healing IT.
|
||||
We boldly started that : the current model of cloud computing in huge data-centers is not going to be able to scale to fit the demand in IT capacity.
|
||||
|
||||
The approach we took to solve this problem was to enable localized compute and storage units at the edge of the network, close to where it is needed.
|
||||
That basically meant that if we were to deploy physical hardware to the edges, nearby the users, we would have to allow information providers to deploy their solutions on that edge network and hardware. That means also sharing hardware resources between users, where we would have to make damn sure noone can peek around in things that are not his.
|
||||
|
||||
When we talk about sharing capacity in a secure environment, virtualization comes to mind. It's not a new technology and it has been around for quite some time. This solution comes with a cost though. Virtual machines, emulating a full hardware platform on real hardware is costly in terms of used resources, and eat away at the already scarce resources we want to provide for our users.
|
||||
|
||||
Containerizing technologies were starting to get some hype at the time. Containers provide for basically the same level of isolation as Full Virtualisation, but are a lot less expensive in terms of resource utilization.
|
||||
|
||||
With that in mind, we started designing the first version of 0-OS. The required features were:
|
||||
|
||||
- be able to be fully in control of the hardware
|
||||
- give the possibility to different users to share the same hardware
|
||||
- deploy this capacity at the edge, close to where it is needed
|
||||
- the System needs to self-heal. Because of their location and sheer scale, manual maintenance was not an option. Self-healing is a broad topic, and will require a lot of experience and fine-tuning, but it was meant to culminate at some point so that most of the actions that sysadmins execute, would be automated.
|
||||
- Have an a small as possible attack surface, as well for remote types of attack, as well as protecting users from each-other
|
||||
|
||||
The result of that thought process resulted in 0-OS v1. A linux kernel with the minimal components on top that allows to provide for these features.
|
||||
|
||||
In the first incantation of 0-OS, the core framework was a single big binary that got started as the first process of the system (PID 1). All the managment features were exposed through an API that was only accessible locally.
|
||||
|
||||
The idea was to have an orchestration system running on top that was going to be responsible to deploy Virtual Machines and Containers on the system using that API.
|
||||
|
||||
This API exposes 3 main primitives:
|
||||
|
||||
- networking: zerotier, vlan, macvlan, bridge, openvswitch...
|
||||
- storage: plain disk, 0-db, ...
|
||||
- compute: VM, containers
|
||||
|
||||
That was all great and it allowed us to learn a lot. But some limitations started to appear. Here is a non exhaustive list of the limitations we had to face after a couple of years of utilization:
|
||||
|
||||
- Difficulty to push new versions and fixes on the nodes. The fact that 0-OS was a single process running as PID 1, forced us to completely reboot the node every time we wanted to push an update.
|
||||
- The API, while powerful, still required to have some logic on top to actually deploy usable solutions.
|
||||
- We noticed that some features we implemented were never or extremely rarely used. This was just increasing the possible attack surface for no real benefits.
|
||||
- The main networking solution we choose at the time, zerotier, was not scaling as well as we hoped for.
|
||||
- We wrote a lot of code ourselves, instead of relying on already existing open source libraries that would have made that task a lot easier, but also, these libraries were a lot more mature and have had a lot more exposure for ironing out possible bugs and vulnerabilities than we could have created and tested ourselves with the little resources we have at hand.
|
||||
|
||||
## Now what ?
|
||||
With the knowledge and lessons gathered during these first years of usage, we
|
||||
concluded that trying to fix the already existing codebase would be cumbersome
|
||||
and we also wanted to avoid any technical debt that could haunt us for years
|
||||
after. So we decided for a complete rewrite of that stack, taking a new and
|
||||
fully modular approach, where every component could be easily replaced and
|
||||
upgraded without the need for a reboot.
|
||||
|
||||
Hence Version 2 saw the light of day.
|
||||
|
||||
Instead of trial and error, and muddling along trying to fit new features in
|
||||
that big monolithic codebase, we wanted to be sure that the components were
|
||||
reduced to a more manageable size, having a clearly cut Domain Separation.
|
||||
|
||||
Instead of creating solutions waiting for a problem, we started looking at things the other way around. Which is logical, as by now, we learned what the real puzzles to solve were, albeit sometimes by painful experience.
|
||||
|
||||
## Tadaa!
|
||||
----
|
||||
The [first commit](https://github.com/threefoldtech/zosv2/commit/7b783c888673d1e9bc400e4abbb17272e995f5a4) of the v2 repository took place the 11 of February 2019.
|
||||
We are now 6 months in, and about to bake the first release of 0-OS v2.
|
||||
Clocking in at almost 27KLoc, it was a very busy half-year. (admitted, there are the spec and docs too in that count ;-) )
|
||||
|
||||
Let's go over the main design decisions that were made and explain briefly each component.
|
||||
|
||||
While this is just an introduction, we'll add more articles digging deeper in the technicalities and approaches of each component.
|
||||
|
||||
## Solutions to puzzles (there are no problems)
|
||||
----
|
||||
**UPDATES**
|
||||
|
||||
One of the first puzzles we wanted to solve was the difficulty to push upgrades.
|
||||
In order to solve that, we designed 0-OS components as completely stand-alone modules. Each subsystem, be it storage, networking, containers/VMs, is managed by it's own component (mostly a daemon), and communicate with each-other through a local bus. And as we said, each component can then be upgraded separately, together with the necessary data migrations that could be required.
|
||||
|
||||
**WHAT API?**
|
||||
|
||||
The second big change is our approach to the API, or better, lack thereof.
|
||||
In V2 we dropped the idea to expose the primitives of the Node over an API.
|
||||
Instead, all the required knowledge to deploy workloads is directly embedded in 0-OS.
|
||||
So in order to have the node deploy a workload, we have created a blueprint like system where the user describes what his requirements in terms of compute power, storage and networking are, and the node applies that blueprint to make it reality.
|
||||
That approach has a few advantages:
|
||||
- It greatly reduces the attack surface of the node because there is no more direct interaction between a user and a node.
|
||||
- And it also allows us to have a greater control over how things are organized in the node itself. The node being its own boss, can decide to re-organize itself whenever needed to optimize the capacity it can provide.
|
||||
- Having a blueprint with requirements, gives the grid the possibility to verify that blueprint on multiple levels before applying it. That is: as well on top level as on node level a blueprint can be verified for validity and signatures before any other action will be executed.
|
||||
|
||||
**PING**
|
||||
|
||||
The last major change is how we want to handle networking.
|
||||
The solution used during the lifetime of V1 exposed its limitations when we started scaling our networks to hundreds of nodes.
|
||||
So here again we started from scratch and created our own overlay network solution.
|
||||
That solution is based on the 'new kid on the block' in terms of VPN: [Wireguard](https://wireguard.io) and it's approach and usage will be fully explained in the next 0-OS article.
|
||||
For the eager ones of you, there are some specifications and also some documentation [here](https://github.com/threefoldtech/zosv2/tree/master/docs/network) and [there](https://github.com/threefoldtech/zosv2/tree/master/specs/network).
|
||||
|
||||
## That's All, Folks (for now)
|
||||
So this little article as an intro to the brave new world of 0-OS.
|
||||
The Zero-OS team engages itself to regularly keep you updated on it's progress, the new features that will surely be added, and for the so inclined, add a lot more content for techies on how to actually use that novel beast.
|
||||
|
||||
[Till next time](https://youtu.be/b9434BoGkNQ)
|
@@ -1,143 +0,0 @@
|
||||
<h1> Node ID Generation</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [ZBus](#zbus)
|
||||
- [Home Directory](#home-directory)
|
||||
- [Introduction](#introduction-1)
|
||||
- [On Node Booting](#on-node-booting)
|
||||
- [ID generation](#id-generation)
|
||||
- [Cryptography](#cryptography)
|
||||
- [zinit unit](#zinit-unit)
|
||||
- [Interface](#interface)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
We explain the node ID generation process.
|
||||
|
||||
## ZBus
|
||||
|
||||
Identity module is available on zbus over the following channel
|
||||
|
||||
| module | object | version |
|
||||
|--------|--------|---------|
|
||||
| identity|[manager](#interface)| 0.0.1|
|
||||
|
||||
## Home Directory
|
||||
|
||||
identity keeps some data in the following locations
|
||||
|
||||
| directory | path|
|
||||
|----|---|
|
||||
| root| `/var/cache/modules/identity`|
|
||||
|
||||
## Introduction
|
||||
|
||||
Identity manager is responsible for maintaining the node identity (public key). The manager make sure the node has one valid ID during the entire lifetime of the node. It also provide service to sign, encrypt and decrypt data using the node identity.
|
||||
|
||||
On first boot, the identity manager will generate an ID and then persist this ID for life.
|
||||
|
||||
Since the identity daemon is the only one that can access the node private key, it provides an interface to sign, verify and encrypt data. This methods are available for other modules on the local node to use.
|
||||
|
||||
## On Node Booting
|
||||
|
||||
- Check if node already has a seed generated
|
||||
- If yes, load the node identity
|
||||
- If not, generate a new ID
|
||||
- Start the zbus daemon.
|
||||
|
||||
## ID generation
|
||||
|
||||
At this time of development the ID generated by identityd is the base58 encoded public key of a ed25519 key pair.
|
||||
|
||||
The key pair itself is generated from a random seed of 32 bytes. It is this seed that is actually saved on the node. And during boot the key pair is re-generated from this seed if it exists.
|
||||
|
||||
## Cryptography
|
||||
|
||||
The signing and encryption capabilities of the identity module rely on this ed25519 key pair.
|
||||
|
||||
For signing, it directly used the key pair.
|
||||
For public key encryption, the ed25519 key pair is converted to its cure25519 equivalent and then use use to encrypt the data.
|
||||
|
||||
### zinit unit
|
||||
|
||||
The zinit unit file of the module specify the command line, test command, and the order where the services need to be booted.
|
||||
|
||||
`identityd` require `storaged` to make sure the seed is persisted over reboots, to make sure node has the same ID during the full life time of the node.
|
||||
The identityd daemon is only considered running if the seed file exists.
|
||||
|
||||
```yaml
|
||||
exec: /bin/identityd
|
||||
test: test -e /var/cache/modules/identity/seed.txt
|
||||
after:
|
||||
- storaged
|
||||
```
|
||||
|
||||
## Interface
|
||||
|
||||
For an up to date interface please check code [here](https://github.com/threefoldtech/zos/blob/main/pkg/identity.go)
|
||||
```go
|
||||
package pkg
|
||||
|
||||
// Identifier is the interface that defines
|
||||
// how an object can be used as an identity
|
||||
type Identifier interface {
|
||||
Identity() string
|
||||
}
|
||||
|
||||
// StrIdentifier is a helper type that implement the Identifier interface
|
||||
// on top of simple string
|
||||
type StrIdentifier string
|
||||
|
||||
// Identity implements the Identifier interface
|
||||
func (s StrIdentifier) Identity() string {
|
||||
return string(s)
|
||||
}
|
||||
|
||||
// IdentityManager interface.
|
||||
type IdentityManager interface {
|
||||
// NodeID returns the node id (public key)
|
||||
NodeID() StrIdentifier
|
||||
|
||||
// NodeIDNumeric returns the node registered ID.
|
||||
NodeIDNumeric() (uint32, error)
|
||||
|
||||
// FarmID return the farm id this node is part of. this is usually a configuration
|
||||
// that the node is booted with. An error is returned if the farmer id is not configured
|
||||
FarmID() (FarmID, error)
|
||||
|
||||
// Farm returns name of the farm. Or error
|
||||
Farm() (string, error)
|
||||
|
||||
//FarmSecret get the farm secret as defined in the boot params
|
||||
FarmSecret() (string, error)
|
||||
|
||||
// Sign signs the message with privateKey and returns a signature.
|
||||
Sign(message []byte) ([]byte, error)
|
||||
|
||||
// Verify reports whether sig is a valid signature of message by publicKey.
|
||||
Verify(message, sig []byte) error
|
||||
|
||||
// Encrypt encrypts message with the public key of the node
|
||||
Encrypt(message []byte) ([]byte, error)
|
||||
|
||||
// Decrypt decrypts message with the private of the node
|
||||
Decrypt(message []byte) ([]byte, error)
|
||||
|
||||
// EncryptECDH aes encrypt msg using a shared key derived from private key of the node and public key of the other party using Elliptic curve Diffie Helman algorithm
|
||||
// the nonce if prepended to the encrypted message
|
||||
EncryptECDH(msg []byte, publicKey []byte) ([]byte, error)
|
||||
|
||||
// DecryptECDH decrypt aes encrypted msg using a shared key derived from private key of the node and public key of the other party using Elliptic curve Diffie Helman algorithm
|
||||
DecryptECDH(msg []byte, publicKey []byte) ([]byte, error)
|
||||
|
||||
// PrivateKey sends the keypair
|
||||
PrivateKey() []byte
|
||||
}
|
||||
|
||||
// FarmID is the identification of a farm
|
||||
type FarmID uint32
|
||||
```
|
@@ -1,8 +0,0 @@
|
||||
<h1> Identity Module </h1>
|
||||
|
||||
Identity daemon is responsible for two major operations that are crucial for the node operation.
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Node ID Generation](identity.md)
|
||||
- [Node Live Software Update](upgrade.md)
|
@@ -1,98 +0,0 @@
|
||||
<h1> Node Upgrade</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Philosophy](#philosophy)
|
||||
- [Booting a new node](#booting-a-new-node)
|
||||
- [Runtime upgrade of a node](#runtime-upgrade-of-a-node)
|
||||
- [Technical](#technical)
|
||||
- [Flist layout](#flist-layout)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
We provide information concerning node upgrade with ZOS. We also explain the philosophy behind ZOS.
|
||||
|
||||
## Philosophy
|
||||
|
||||
0-OS is meant to be a black box no one can access. While this provide some nice security features it also makes it harder to manage. Specially when it comes to update/upgrade.
|
||||
|
||||
Hence, zos only trust few sources for upgrade packages. When the node boots up it checks the sources for the latest release and make sure all the local binaries are up-to-date before continuing the booting. The flist source must be rock-solid secured, that's another topic for different documentation.
|
||||
|
||||
The run mode defines which flist the node is going to use to boot. Run mode can be specified by passing `runmode=<mode>` to the kernel boot params. Currently we have those different run modes.
|
||||
|
||||
- dev: ephemeral network only setup to develop and test new features. Can be created and reset at anytime
|
||||
- test: Mostly stable features that need to be tested at scale, allow preview and test of new features. Always the latest and greatest. This network can be reset sometimes, but should be relatively stable.
|
||||
- prod: Released of stable version. Used to run the real grid with real money. Cannot be reset ever. Only stable and battle tested feature reach this level.
|
||||
|
||||
## Booting a new node
|
||||
|
||||
The base image for zos contains a very small subset of tools, plus the boot program. Standing alone, the image is not really useful. On boot and
|
||||
after initial start of the system, the boot program kicks in and it does the following:
|
||||
|
||||
- Detect the boot flist that the node must use to fully start. The default is hard-coded into zos, but this can be overridden by the `flist=` kernel param. The `flist=` kernel param can get deprecated without a warning, since it's a development flag.
|
||||
- The bootstrap, will then mount this flist using 0-fs, this of course requires a working connection to the internet. Hence bootstrap is configured to wait for the `internet` service.
|
||||
- The flist information (name, and version) is saved under `/tmp/flist.name` and `/tmp/flist.info`.
|
||||
- The bootstrap makes sure to copy all files in the flist to the proper locations under the system rootfs, this include `zinit` config files.
|
||||
- Then zinit is asked to monitor new installed services, zinit takes care of those services and make sure they are properly working at all times.
|
||||
- Bootstrap, umounts the flist, cleans up before it exits.
|
||||
- Boot process continues.
|
||||
|
||||
## Runtime upgrade of a node
|
||||
|
||||
Once the node is up and running, identityd takes over and it does the following:
|
||||
|
||||
- It loads the boot info files `/tmp/flist.name` and `/tmp/flist.info`
|
||||
- If the `flist.name` file does **not** exist, `identityd` will assume the node is booted with other means than an flist (for example overlay). In that case, identityd will log this, and disable live upgrade of the node.
|
||||
- If the `flist.name` file exists, the flist will be monitored on the `https://hub.grid.tf` for changes. Any change in the version will initiate a life upgrade routine.
|
||||
- Once the flist change is detected, identityd will mount the flist, make sure identityd is running the latest version. If not, identityd will update itself first before continuing.
|
||||
- services that will need update will be gracefully stopped.
|
||||
- `identityd` will then make sure to update all services from the flist, and config files. and restart the services properly.
|
||||
- services are started again after all binaries has been copied
|
||||
|
||||
## Technical
|
||||
|
||||
0-OS is designed to provide maximum uptime for its workload, rebooting a node should never be required to upgrade any of its component (except when we push a kernel upgrade).
|
||||
|
||||

|
||||
|
||||
### Flist layout
|
||||
|
||||
The files in the upgrade flist needs to be located in the filesystem tree at the same destination they would need to be in 0-OS. This allow the upgrade code to stays simple and only does a copy from the flist to the root filesystem of the node.
|
||||
|
||||
Booting a new node, or updating a node uses the same flist. Hence, a boot flist must container all required services for node operation.
|
||||
|
||||
Example:
|
||||
|
||||
0-OS filesystem:
|
||||
|
||||
```
|
||||
/etc/zinit/identityd.yaml
|
||||
/etc/zinit/networkd.yaml
|
||||
/etc/zinit/contd.yaml
|
||||
/etc/zinit/init/node-ready.sh
|
||||
/etc/zinit/init
|
||||
/etc/zinit/redis.yaml
|
||||
/etc/zinit/storaged.yaml
|
||||
/etc/zinit/flistd.yaml
|
||||
/etc/zinit/readme.md
|
||||
/etc/zinit/internet.yaml
|
||||
/etc/zinit/containerd.yaml
|
||||
/etc/zinit/boot.yaml
|
||||
/etc/zinit/provisiond.yaml
|
||||
/etc/zinit/node-ready.yaml
|
||||
/etc/zinit
|
||||
/etc
|
||||
/bin/zlf
|
||||
/bin/provisiond
|
||||
/bin/flistd
|
||||
/bin/identityd
|
||||
/bin/contd
|
||||
/bin/capacityd
|
||||
/bin/storaged
|
||||
/bin/networkd
|
||||
/bin/internet
|
||||
/bin
|
||||
```
|
@@ -1,88 +0,0 @@
|
||||
<h1> Internal Modules</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Booting](#booting)
|
||||
- [Bootstrap](#bootstrap)
|
||||
- [Zinit](#zinit)
|
||||
- [Architecture](#architecture)
|
||||
- [IPC](#ipc)
|
||||
- [ZOS Processes (modules)](#zos-processes-modules)
|
||||
- [Capacity](#capacity)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
This document explains in a nutshell the internals of ZOS. This includes the boot process, architecture, the internal modules (and their responsibilities), and the inter-process communication.
|
||||
|
||||
## Booting
|
||||
|
||||
ZOS is a linux based operating system in the sense that we use the main-stream linux kernel with no modifications (but heavily customized). The base image of ZOS includes linux, busybox, [zinit](https://github.com/threefoldtech/zinit) and other required tools that are needed during the boot process. The base image is also shipped with a bootstrap utility that is self-updating on boot which kick starts everything.
|
||||
|
||||
For more details about the ZOS base image please check [0-initramfs](https://github.com/threefoldtech/0-initramfs).
|
||||
|
||||
`ZOS` uses zinit as its `init` or `PID 1` process. `zinit` acts as a process manager and it takes care of starting all required services in the right order. Using simple configuration that is available under `/etc/zinit`.
|
||||
|
||||
The base `ZOS` image has a zinit config to start the basic services that are required for booting. These include (mainly) but are not limited to:
|
||||
|
||||
- internet: A very basic service that tries to connect zos to the internet as fast (and as simple) as possible (over ethernet) using dhcp. This is needed so the system can continue the boot process. Once this one succeeds, it exits and leaves node network management to the more sophisticated ZOS module `networkd` which is yet to be downloaded and started by bootstrap.
|
||||
- redis: This is required by all zos modules for its IPC (inter process communication).
|
||||
- bootstrap: The bootstrap process which takes care of downloading all required zos binaries and modules. This one requires the `internet` service to actually succeed.
|
||||
|
||||
## Bootstrap
|
||||
|
||||
`bootstrap` is a utility that resides on the base image. It takes care of downloading and configuring all zos main services by doing the following:
|
||||
|
||||
- It checks if there is a more recent version of itself available. If it exists, the process first updates itself before proceeding.
|
||||
- It checks zos boot parameters (for example, which network you are booting into) as set by <https://bootstrap.grid.tf/>.
|
||||
- Once the network is known, let's call it `${network}`. This can either be `production`, `testing`, or `development`. The proper release is downloaded as follows:
|
||||
- All flists are downloaded from one of the [hub](https://hub.grid.tf/) `tf-zos-v3-bins.dev`, `tf-zos-v3-bins.test`, or `tf-zos-v3-bins` repos. Based on the network, only one of those repos is used to download all the support tools and binaries. Those are not included in the base image because they can be updated, added, or removed.
|
||||
- The flist `https://hub.grid.tf/tf-zos/zos:${network}-3:latest.flist.md` is downloaded (note that ${network} is replaced with the actual value). This flist includes all zos services from this repository. More information about the zos modules are explained later.
|
||||
- Once all binaries are downloaded, `bootstrap` finishes by asking zinit to start monitoring the newly installed services. The bootstrap exits and will never be started again as long as zos is running.
|
||||
- If zos is restarted the entire bootstrap process happens again including downloading the binaries because ZOS is completely stateless (except for some cached runtime data that is preserved across reboots on a cache disk).
|
||||
|
||||
## Zinit
|
||||
|
||||
As mentioned earlier, `zinit` is the process manager of zos. Bootstrap makes sure it registers all zos services for zinit to monitor. This means that zinit will take care that those services are always running, and restart them if they have crashed for any reason.
|
||||
|
||||
## Architecture
|
||||
|
||||
For `ZOS` to be able to run workloads of different types it has split its functionality into smaller modules. Where each module is responsible for providing a single functionality. For example `storaged` which manages machine storages, hence it can provide low level storage capacity to other services that need it.
|
||||
|
||||
As an example, imagine that you want to start a `virtual machine`. For a `virtual machine` to be able to run it will require a `rootfs` image or the image of the VM itself this is normally provided via an `flist` (managed by `flistd`), then you would need an actual persistent storage (managed by `storaged`), a virtual nic (managed by `networkd`), another service that can put everything together in a form of a VM (`vmd`). Then finally a service that orchestrates all of this and translates the user request to an actual workload `provisiond`, you get the picture.
|
||||
|
||||
### IPC
|
||||
|
||||
All modules running in zos needs to be able to interact with each other. As it shows from the previous example. For example, `provision` daemon need to be able to ask `storage` daemon to prepare a virtual disk. A new `inter-process communication` protocol and library was developed to enable this with those extra features:
|
||||
|
||||
- Modules do not need to know where other modules live, there are no ports, and/or urls that have to be known by all services.
|
||||
- A single module can run multiple versions of an API.
|
||||
- Ease of development.
|
||||
- Auto generated clients.
|
||||
|
||||
For more details about the message bus please check [zbus](https://github.com/threefoldtech/zbus)
|
||||
|
||||
`zbus` uses redis as a message bus, hence redis is started in the early stages of zos booting.
|
||||
|
||||
`zbus` allows auto generation of `stubs` which are generated clients against a certain module interface. Hence a module X can interact with a module Y by importing the generated clients and then start making function calls.
|
||||
|
||||
## ZOS Processes (modules)
|
||||
|
||||
Modules of zos are completely internal. There is no way for an external user to talk to them directly. The idea is the node exposes a public API over rmb, while internally this API can talk to internal modules over `zbus`.
|
||||
|
||||
Here is a list of the major ZOS modules.
|
||||
|
||||
- [Identity](identity/index.md)
|
||||
- [Node](node/index.md)
|
||||
- [Storage](storage/index.md)
|
||||
- [Network](network/index.md)
|
||||
- [Flist](flist/index.md)
|
||||
- [Container](container/index.md)
|
||||
- [VM](vmd/index.md)
|
||||
- [Provision](provision/index.md)
|
||||
|
||||
## Capacity
|
||||
|
||||
In [this document](./capacity.md), you can find detail description of how ZOS does capacity planning.
|
@@ -1,57 +0,0 @@
|
||||
> Note: This is unmaintained, try on your own responsibility
|
||||
|
||||
# MacOS Developer
|
||||
|
||||
0-OS (v2) uses a Linux kernel and is really build with a linux environment in mind.
|
||||
As a developer working from a MacOS environment you will have troubles running the 0-OS code.
|
||||
|
||||
Using [Docker][docker] you can work from a Linux development environment, hosted from your MacOS Host machine.
|
||||
In this README we'll do exactly that using the standard Ubuntu [Docker][docker] container as our base.
|
||||
|
||||
## Setup
|
||||
|
||||
0. Make sure to have Docker installed, and configured (also make sure you have your code folder path shared in your Docker preferences).
|
||||
1. Start an _Ubuntu_ Docker container with your shared code directory mounted as a volume:
|
||||
```bash
|
||||
docker run -ti -v "$HOME/oss":/oss ubuntu /bin/bash
|
||||
```
|
||||
2. Make sure your environment is updated and upgraded using `apt-get`.
|
||||
3. Install Go (`1.13`) from src using the following link or the one you found on [the downloads page](https://golang.org/dl/):
|
||||
```bash
|
||||
wget https://dl.google.com/go/go1.13.3.linux-amd64.tar.gz
|
||||
sudo tar -xvf go1.13.3.linux-amd64.tar.gz
|
||||
sudo mv go /usr/local
|
||||
```
|
||||
4. Add the following to your `$HOME/.bashrc` and `source` it:
|
||||
```vim
|
||||
export GOROOT=/usr/local/go
|
||||
export GOPATH=$HOME/go
|
||||
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH
|
||||
```
|
||||
5. Confirm you have Go installed correctly:
|
||||
```
|
||||
go version && go env
|
||||
```
|
||||
6. Go to your `zos` code `pkg` directory hosted from your MacOS development machine within your docker `/bin/bash`:
|
||||
```bash
|
||||
cd /oss/github.com/threefoldtech/zos/pkg
|
||||
```
|
||||
7. Install the dependencies for testing:
|
||||
```bash
|
||||
make getdeps
|
||||
```
|
||||
8. Run tests and verify all works as expected:
|
||||
```bash
|
||||
make test
|
||||
```
|
||||
9. Build `zos`:
|
||||
```bash
|
||||
make build
|
||||
```
|
||||
|
||||
If you can successfully do step (8) and step (9) you
|
||||
can now contribute to `zos` as a MacOS developer.
|
||||
Testing and compiling you'll do from within your container's shell,
|
||||
coding you can do from your beloved IDE on your MacOS development environment.
|
||||
|
||||
[docker]: https://www.docker.com
|
@@ -1,66 +0,0 @@
|
||||
## Farmers providing transit for Tenant Networks (TN or Network)
|
||||
|
||||
For networks of a user to be reachable, these networks need penultimate Network resources that act as exit nodes for the WireGuard mesh.
|
||||
|
||||
For that Users need to sollicit a routable network with farmers that provide such a service.
|
||||
|
||||
### Global registry for network resources. (`GRNR`?)
|
||||
|
||||
Threefold through BCDB shoud keep a store where Farmers can register also a network service for Tenant Network (TN) reachablility.
|
||||
|
||||
In a network transaction the first thing asked should be where a user wants to purchase it's transit. That can be with a nearby (latency or geolocation) Exit Provider (can e.g. be a Farmer), or with a Exit Provider outside of the geolocation for easier routing towards the primary entrypoint. (VPN-like services coming to mind)
|
||||
|
||||
With this, we could envision in a later stage to have the Network Resources to be IPv6 multihomed with policy-based routing. That adds the possibiltiy to have multiple exit nodes for the same Network, with different IPv6 routes to them.
|
||||
|
||||
### Datastructure
|
||||
|
||||
A registered Farmer can also register his (dc-located?) network to be sold as transit space. For that he registers:
|
||||
- the IPv4 addresses that can be allocated to exit nodes.
|
||||
- the IPv6 prefix he obtained to be used in the Grid
|
||||
- the nodes that will serve as exit nodes.
|
||||
These nodes need to have IPv[46] access to routable address space through:
|
||||
- Physical access in an interface of the node
|
||||
- Access on a public `vlan` or via `vxlan / mpls / gre`
|
||||
|
||||
Together with the registered nodes that will be part of that Public segment, the TNoDB (BCDB) can verify a Network Object containing an ExitPoint for a Network and add it to the queue for ExitNodes to fetch and apply.
|
||||
|
||||
Physcally Nodes can be connected in several ways:
|
||||
- living directly on the Internet (with a routable IPv4 and/or IPv6 Address) without Provider-enforced firewalling (outgoing traffic only)
|
||||
- having an IPv4 allocation --and-- and IPv6 allocation
|
||||
- having a single IPv4 address --and-- a single IPv6 allocation (/64) or even (Oh God Why) a single IPv6 addr.
|
||||
- living in a Farm that has Nodes only reachable through NAT for IPv4 and no IPv6
|
||||
- living in a Farm that has NAT IPv4 and routable IPv6 with an allocation
|
||||
- living in a single-segment having IPv4 RFC1918 and only one IPv6 /64 prefix (home Nodes mostly)
|
||||
|
||||
#### A Network resource allocation.
|
||||
We define Network Resource (NR) as a routable IPv6 `/64` Prefix, so for every time a new TNo is generated and validated, containing a new serial number and an added/removed NR, there has been a request to obtain a valid IPv6 Prefix (/64) to be added to the TNo.
|
||||
|
||||
Basically it's just a list of allocations in that prefix, that are in use. Any free Prefix will do, as we do routing in the exit nodes with a `/64` granularity.
|
||||
|
||||
The TNoDB (BCDB) then validates/updates the Tenant Network object with that new Network Resource and places it on a queue to be fetched by the interested Nodes.
|
||||
|
||||
#### The Nodes responsible for ExitPoints
|
||||
|
||||
A Node responsible for ExitPoints as wel as a Public endpoint will know so because of how it's registered in the TNoDB (BCDB). That is :
|
||||
- it is defined as an exit node
|
||||
- the TNoDB hands out an Object that describes it's public connectivity. i.e. :
|
||||
- the public IPv4 address(es) it can use
|
||||
- the IPv6 Prefix in the network segment that contains the penultimate default route
|
||||
- an eventual Private BGP AS number for announcing the `/64` Prefixes of a Tenant Network, and the BGP peer(s).
|
||||
|
||||
With that information, a Node can then build the Network Namespace from which it builds the Wireguard Interfaces prior to sending them in the ExitPoint Namespace.
|
||||
|
||||
So the TNoDB (BCDB) hands out
|
||||
- Tenant Network Objects
|
||||
- Public Interface Objects
|
||||
|
||||
They are related :
|
||||
- A Node can have Network Resources
|
||||
- A Network Resource can have (1) Public Interface
|
||||
- Both are part of a Tenant Network
|
||||
|
||||
A TNo defines a Network where ONLY the ExitPoint is flagged as being one. No more.
|
||||
When the Node (networkd) needs to setup a Public node, it will need to act differently.
|
||||
- Verify if the Node is **really** public, if so use standard WG interface setup
|
||||
- If not, verify if there is already a Public Exit Namespace defined, create WG interface there.
|
||||
- If there is Public Exit Namespace, request one, and set it up first.
|
@@ -1,264 +0,0 @@
|
||||
# Network
|
||||
|
||||
- [How does a farmer configure a node as exit node](#How-does-a-farmer-configure-a-node-as-exit-node)
|
||||
- [How to create a user private network](#How-to-create-a-user-private-network)
|
||||
|
||||
## How does a farmer configure a node as exit node
|
||||
|
||||
For the network of the grid to work properly, some of the nodes in the grid need to be configured as "exit nodes". An "exit node" is a node that has a publicly accessible IP address and that is responsible routing IPv6 traffic, or proxy IPv4 traffic.
|
||||
|
||||
A farmer that wants to configure one of his nodes as "exit node", needs to register it in the TNODB. The node will then automatically detect it has been configured to be an exit node and do the necessary network configuration to start acting as one.
|
||||
|
||||
At the current state of the development, we have a [TNODB mock](../../tools/tnodb_mock) server and a [tffarmer CLI](../../tools/tffarm) tool that can be used to do these configuration.
|
||||
|
||||
Here is an example of how a farmer could register one of his node as "exit node":
|
||||
|
||||
1. Farmer needs to create its farm identity
|
||||
|
||||
```bash
|
||||
tffarmer register --seed myfarm.seed "mytestfarm"
|
||||
Farm registered successfully
|
||||
Name: mytestfarm
|
||||
Identity: ZF6jtCblLhTgAqp2jvxKkOxBgSSIlrRh1mRGiZaRr7E=
|
||||
```
|
||||
|
||||
2. Boot your nodes with your farm identity specified in the kernel parameters.
|
||||
|
||||
Take that farm identity create at step 1 and boot your node with the kernel parameters `farmer_id=<identity>`
|
||||
|
||||
for your test farm that would be `farmer_id=ZF6jtCblLhTgAqp2jvxKkOxBgSSIlrRh1mRGiZaRr7E=`
|
||||
|
||||
Once the node is booted, it will automatically register itself as being part of your farm into the [TNODB](../../tools/tnodb_mock) server.
|
||||
|
||||
You can verify that you node registered itself properly by listing all the node from the TNODB by doing a GET request on the `/nodes` endpoints:
|
||||
|
||||
```bash
|
||||
curl http://tnodb_addr/nodes
|
||||
[{"node_id":"kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA=","farm_id":"ZF6jtCblLhTgAqp2jvxKkOxBgSSIlrRh1mRGiZaRr7E=","Ifaces":[]}]
|
||||
```
|
||||
|
||||
3. Farmer needs to specify its public allocation range to the TNODB
|
||||
|
||||
```bash
|
||||
tffarmer give-alloc 2a02:2788:0000::/32 --seed myfarm.seed
|
||||
prefix registered successfully
|
||||
```
|
||||
|
||||
4. Configure the public interface of the exit node if needed
|
||||
|
||||
In this step the farmer will tell his node how it needs to connect to the public internet. This configuration depends on the farm network setup, this is why this is up to the farmer to provide the detail on how the node needs to configure itself.
|
||||
|
||||
In a first phase, we create the internet access in 2 ways:
|
||||
|
||||
- the node is fully public: you don't need to configure a public interface, you can skip this step
|
||||
- the node has a management interface and a nic for public
|
||||
then `configure-public` is required, and the farmer has the public interface connected to a specific public segment with a router to the internet in front.
|
||||
|
||||
```bash
|
||||
tffarmer configure-public --ip 172.20.0.2/24 --gw 172.20.0.1 --iface eth1 kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA=
|
||||
#public interface configured on node kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA=
|
||||
```
|
||||
|
||||
We still need to figure out a way to get the routes properly installed, we'll do static on the toplevel router for now to do a demo.
|
||||
|
||||
The node is now configured to be used as an exit node.
|
||||
|
||||
5. Mark a node as being an exit node
|
||||
|
||||
The farmer then needs to select which node he agrees to use as an exit node for the grid
|
||||
|
||||
```bash
|
||||
tffarmer select-exit kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA=
|
||||
#Node kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA= marked as exit node
|
||||
```
|
||||
|
||||
## How to create a user private network
|
||||
|
||||
1. Choose an exit node
|
||||
2. Request an new allocation from the farm of the exit node
|
||||
- a GET request on the tnodb_mock at `/allocations/{farm_id}` will give you a new allocation
|
||||
3. Creates the network schema
|
||||
|
||||
Steps 1 and 2 are easy enough to be done even manually but step 3 requires a deep knowledge of how networking works
|
||||
as well as the specific requirement of 0-OS network system.
|
||||
This is why we provide a tool that simplify this process for you, [tfuser](../../tools/tfuser).
|
||||
|
||||
Using tfuser creating a network becomes trivial:
|
||||
|
||||
```bash
|
||||
# creates a new network with node DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk as exit node
|
||||
# and output the result into network.json
|
||||
tfuser generate --schema network.json network create --node DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk
|
||||
```
|
||||
|
||||
network.json will now contains something like:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "",
|
||||
"tenant": "",
|
||||
"reply-to": "",
|
||||
"type": "network",
|
||||
"data": {
|
||||
"network_id": "J1UHHAizuCU6s9jPax1i1TUhUEQzWkKiPhBA452RagEp",
|
||||
"resources": [
|
||||
{
|
||||
"node_id": {
|
||||
"id": "DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk",
|
||||
"farmer_id": "7koUE4nRbdsqEbtUVBhx3qvRqF58gfeHGMRGJxjqwfZi",
|
||||
"reachability_v4": "public",
|
||||
"reachability_v6": "public"
|
||||
},
|
||||
"prefix": "2001:b:a:8ac6::/64",
|
||||
"link_local": "fe80::8ac6/64",
|
||||
"peers": [
|
||||
{
|
||||
"type": "wireguard",
|
||||
"prefix": "2001:b:a:8ac6::/64",
|
||||
"Connection": {
|
||||
"ip": "2a02:1802:5e::223",
|
||||
"port": 1600,
|
||||
"key": "PK1L7n+5Fo1znwD/Dt9lAupL19i7a6zzDopaEY7uOUE=",
|
||||
"private_key": "9220e4e29f0acbf3bd7ef500645b78ae64b688399eb0e9e4e7e803afc4dd72418a1c5196208cb147308d7faf1212758042f19f06f64bad6ffe1f5ed707142dc8cc0a67130b9124db521e3a65e4aee18a0abf00b6f57dd59829f59662"
|
||||
}
|
||||
}
|
||||
],
|
||||
"exit_point": true
|
||||
}
|
||||
],
|
||||
"prefix_zero": "2001:b:a::/64",
|
||||
"exit_point": {
|
||||
"ipv4_conf": null,
|
||||
"ipv4_dnat": null,
|
||||
"ipv6_conf": {
|
||||
"addr": "fe80::8ac6/64",
|
||||
"gateway": "fe80::1",
|
||||
"metric": 0,
|
||||
"iface": "public"
|
||||
},
|
||||
"ipv6_allow": []
|
||||
},
|
||||
"allocation_nr": 0,
|
||||
"version": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Which is a valid network schema. This network only contains a single exit node though, so not really useful.
|
||||
Let's add another node to the network:
|
||||
|
||||
```bash
|
||||
tfuser generate --schema network.json network add-node --node 4hpUjrbYS4YeFbvLoeSR8LGJKVkB97JyS83UEhFUU3S4
|
||||
```
|
||||
|
||||
result looks like:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "",
|
||||
"tenant": "",
|
||||
"reply-to": "",
|
||||
"type": "network",
|
||||
"data": {
|
||||
"network_id": "J1UHHAizuCU6s9jPax1i1TUhUEQzWkKiPhBA452RagEp",
|
||||
"resources": [
|
||||
{
|
||||
"node_id": {
|
||||
"id": "DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk",
|
||||
"farmer_id": "7koUE4nRbdsqEbtUVBhx3qvRqF58gfeHGMRGJxjqwfZi",
|
||||
"reachability_v4": "public",
|
||||
"reachability_v6": "public"
|
||||
},
|
||||
"prefix": "2001:b:a:8ac6::/64",
|
||||
"link_local": "fe80::8ac6/64",
|
||||
"peers": [
|
||||
{
|
||||
"type": "wireguard",
|
||||
"prefix": "2001:b:a:8ac6::/64",
|
||||
"Connection": {
|
||||
"ip": "2a02:1802:5e::223",
|
||||
"port": 1600,
|
||||
"key": "PK1L7n+5Fo1znwD/Dt9lAupL19i7a6zzDopaEY7uOUE=",
|
||||
"private_key": "9220e4e29f0acbf3bd7ef500645b78ae64b688399eb0e9e4e7e803afc4dd72418a1c5196208cb147308d7faf1212758042f19f06f64bad6ffe1f5ed707142dc8cc0a67130b9124db521e3a65e4aee18a0abf00b6f57dd59829f59662"
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "wireguard",
|
||||
"prefix": "2001:b:a:b744::/64",
|
||||
"Connection": {
|
||||
"ip": "<nil>",
|
||||
"port": 0,
|
||||
"key": "3auHJw3XHFBiaI34C9pB/rmbomW3yQlItLD4YSzRvwc=",
|
||||
"private_key": "96dc64ff11d05e8860272b91bf09d52d306b8ad71e5c010c0ccbcc8d8d8f602c57a30e786d0299731b86908382e4ea5a82f15b41ebe6ce09a61cfb8373d2024c55786be3ecad21fe0ee100339b5fa904961fbbbd25699198c1da86c5"
|
||||
}
|
||||
}
|
||||
],
|
||||
"exit_point": true
|
||||
},
|
||||
{
|
||||
"node_id": {
|
||||
"id": "4hpUjrbYS4YeFbvLoeSR8LGJKVkB97JyS83UEhFUU3S4",
|
||||
"farmer_id": "7koUE4nRbdsqEbtUVBhx3qvRqF58gfeHGMRGJxjqwfZi",
|
||||
"reachability_v4": "hidden",
|
||||
"reachability_v6": "hidden"
|
||||
},
|
||||
"prefix": "2001:b:a:b744::/64",
|
||||
"link_local": "fe80::b744/64",
|
||||
"peers": [
|
||||
{
|
||||
"type": "wireguard",
|
||||
"prefix": "2001:b:a:8ac6::/64",
|
||||
"Connection": {
|
||||
"ip": "2a02:1802:5e::223",
|
||||
"port": 1600,
|
||||
"key": "PK1L7n+5Fo1znwD/Dt9lAupL19i7a6zzDopaEY7uOUE=",
|
||||
"private_key": "9220e4e29f0acbf3bd7ef500645b78ae64b688399eb0e9e4e7e803afc4dd72418a1c5196208cb147308d7faf1212758042f19f06f64bad6ffe1f5ed707142dc8cc0a67130b9124db521e3a65e4aee18a0abf00b6f57dd59829f59662"
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "wireguard",
|
||||
"prefix": "2001:b:a:b744::/64",
|
||||
"Connection": {
|
||||
"ip": "<nil>",
|
||||
"port": 0,
|
||||
"key": "3auHJw3XHFBiaI34C9pB/rmbomW3yQlItLD4YSzRvwc=",
|
||||
"private_key": "96dc64ff11d05e8860272b91bf09d52d306b8ad71e5c010c0ccbcc8d8d8f602c57a30e786d0299731b86908382e4ea5a82f15b41ebe6ce09a61cfb8373d2024c55786be3ecad21fe0ee100339b5fa904961fbbbd25699198c1da86c5"
|
||||
}
|
||||
}
|
||||
],
|
||||
"exit_point": false
|
||||
}
|
||||
],
|
||||
"prefix_zero": "2001:b:a::/64",
|
||||
"exit_point": {
|
||||
"ipv4_conf": null,
|
||||
"ipv4_dnat": null,
|
||||
"ipv6_conf": {
|
||||
"addr": "fe80::8ac6/64",
|
||||
"gateway": "fe80::1",
|
||||
"metric": 0,
|
||||
"iface": "public"
|
||||
},
|
||||
"ipv6_allow": []
|
||||
},
|
||||
"allocation_nr": 0,
|
||||
"version": 1
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Our network schema is now ready, but before we can provision it onto a node, we need to sign it and send it to the bcdb.
|
||||
To be able to sign it we need to have a pair of key. You can use `tfuser id` command to create an identity:
|
||||
|
||||
```bash
|
||||
tfuser id --output user.seed
|
||||
```
|
||||
|
||||
We can now provision the network on both nodes:
|
||||
|
||||
```bash
|
||||
tfuser provision --schema network.json \
|
||||
--node DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk \
|
||||
--node 4hpUjrbYS4YeFbvLoeSR8LGJKVkB97JyS83UEhFUU3S4 \
|
||||
--seed user.seed
|
||||
```
|
@@ -1,54 +0,0 @@
|
||||
#!/usr/bin/bash
|
||||
|
||||
mgmtnic=(
|
||||
0c:c4:7a:51:e3:6a
|
||||
0c:c4:7a:51:e9:e6
|
||||
0c:c4:7a:51:ea:18
|
||||
0c:c4:7a:51:e3:78
|
||||
0c:c4:7a:51:e7:f8
|
||||
0c:c4:7a:51:e8:ba
|
||||
0c:c4:7a:51:e8:0c
|
||||
0c:c4:7a:51:e7:fa
|
||||
)
|
||||
|
||||
ipminic=(
|
||||
0c:c4:7a:4c:f3:b6
|
||||
0c:c4:7a:4d:02:8c
|
||||
0c:c4:7a:4d:02:91
|
||||
0c:c4:7a:4d:02:62
|
||||
0c:c4:7a:4c:f3:7e
|
||||
0c:c4:7a:4d:02:98
|
||||
0c:c4:7a:4d:02:19
|
||||
0c:c4:7a:4c:f2:e0
|
||||
)
|
||||
cnt=1
|
||||
for i in ${mgmtnic[*]} ; do
|
||||
cat << EOF
|
||||
config host
|
||||
option name 'zosv2tst-${cnt}'
|
||||
option dns '1'
|
||||
option mac '${i}'
|
||||
option ip '10.5.0.$((${cnt} + 10))'
|
||||
|
||||
EOF
|
||||
let cnt++
|
||||
done
|
||||
|
||||
|
||||
|
||||
cnt=1
|
||||
for i in ${ipminic[*]} ; do
|
||||
cat << EOF
|
||||
config host
|
||||
option name 'ipmiv2tst-${cnt}'
|
||||
option dns '1'
|
||||
option mac '${i}'
|
||||
option ip '10.5.0.$((${cnt} + 100))'
|
||||
|
||||
EOF
|
||||
let cnt++
|
||||
done
|
||||
|
||||
for i in ${mgmtnic[*]} ; do
|
||||
echo ln -s zoststconf 01-$(echo $i | sed s/:/-/g)
|
||||
done
|
@@ -1,35 +0,0 @@
|
||||
<h1> Definitions</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Node](#node)
|
||||
- [TNo : Tenant Network Object](#tno--tenant-network-object)
|
||||
- [NR: Network Resource](#nr-network-resource)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
We present definitions of words used through the documentation.
|
||||
|
||||
## Node
|
||||
|
||||
TL;DR: Computer.
|
||||
A Node is a computer with CPU, Memory, Disks (or SSD's, NVMe) connected to _A_ network that has Internet access. (i.e. it can reach www.google.com, just like you on your phone, at home)
|
||||
That Node will, once it has received an IP address (IPv4 or IPv6), register itself when it's new, or confirm it's identity and it's online-ness (for lack of a better word).
|
||||
|
||||
## TNo : Tenant Network Object
|
||||
|
||||
TL;DR: The Network Description.
|
||||
We named it so, because it is a data structure that describes the __whole__ network a user can request (or setup).
|
||||
That network is a virtualized overlay network.
|
||||
Basically that means that transfer of data in that network *always* is encrypted, protected from prying eyes, and __resources in that network can only communicate with each other__ **unless** there is a special rule that allows access. Be it by allowing access through firewall rules, *and/or* through a proxy (a service that forwards requests on behalf of, and ships replies back to the client).
|
||||
|
||||
## NR: Network Resource
|
||||
|
||||
TL;DR: the Node-local part of a TNo.
|
||||
The main building block of a TNo; i.e. each service of a user in a Node lives in an NR.
|
||||
Each Node hosts User services, whatever type of service that is. Every service in that specific node will always be solely part of the Tenant's Network. (read that twice).
|
||||
So: A Network Resource is the thing that interconnects all other network resources of the TN (Tenant Network), and provides routing/firewalling for these interconnects, including the default route to the BBI (Big Bad Internet), aka ExitPoint.
|
||||
All User services that run in a Node are in some way or another connected to the Network Resource (NR), which will provide ip packet forwarding and firewalling to all other network resources (including the Exitpoint) of the TN (Tenant Network) of the user. (read that three times, and the last time, read it slowly and out loud)
|
@@ -1,74 +0,0 @@
|
||||
# 0-OS v2 and it's network setup
|
||||
|
||||
## Introduction
|
||||
|
||||
0-OS nodes participating in the Threefold grid, need connectivity of course. They need to be able to communicate over
|
||||
the Internet with each-other in order to do various things:
|
||||
|
||||
- download it's OS modules
|
||||
- perform OS module upgrades
|
||||
- register itself to the grid, and send regular updates about it's status
|
||||
- query the grid for tasks to execute
|
||||
- build and run the Overlay Network
|
||||
- download flists and the effective files to cache
|
||||
|
||||
The nodes themselves can have connectivity in a few different ways:
|
||||
|
||||
- Only have RFC1918 private addresses, connected to the Internet through NAT, NO IPv6
|
||||
Mostly, these are single-NIC (Network card) machines that can host some workloads through the Overlay Network, but
|
||||
cant't expose services directly. These are HIDDEN nodes, and are mostly booted with an USB stick from
|
||||
bootstrap.grid.tf .
|
||||
- Dual-stacked: having RFC1918 private IPv4 and public IPv6 , where the IPv6 addresses are received from a home router,
|
||||
but firewalled for outgoing traffic only. These nodes are effectively also HIDDEN
|
||||
- Nodes with 2 NICs, one that has effectively a NIC connected to a segment that has real public
|
||||
addresses (IPv4 and/or IPv6) and one NIC that is used for booting and local
|
||||
management. (OOB) (like in the drawing for farmer setup)
|
||||
|
||||
For Farmers, we need to have Nodes to be reachable over IPv6, so that the nodes can:
|
||||
|
||||
- expose services to be proxied into containers/vms
|
||||
- act as aggregating nodes for Overlay Networks for HIDDEN Nodes
|
||||
|
||||
Some Nodes in Farms should also have a publicly reachable IPv4, to make sure that clients that only have IPv4 can
|
||||
effectively reach exposed services.
|
||||
|
||||
But we need to stress the importance of IPv6 availability when you're running a multi-node farm in a datacentre: as the
|
||||
grid is boldly claiming to be a new Internet, we should make sure we adhere to the new protocols that are future-proof.
|
||||
Hence: IPv6 is the base, and IPv4 is just there to accomodate the transition.
|
||||
|
||||
Nowadays, RIPE can't even hand out consecutive /22 IPv4 blocks any more for new LIRs, so you'll be bound to market to
|
||||
get IPv4, mostly at rates of 10-15 Euro per IP. Things tend to get costly that way.
|
||||
|
||||
So anyway, IPv6 is not an afterthought in 0-OS, we're starting with it.
|
||||
|
||||
## Network setup for farmers
|
||||
|
||||
This is a quick manual to what is needed for connecting a node with zero-OS V2.0
|
||||
|
||||
### Step 1. Testing for IPv6 availability in your location
|
||||
As descibed above the network in which the node is instaleld has to be IPv6 enabled. This is not an afterthought as we are building a new internet it has to ba based on the new and forward looking IP addressing scheme. This is something you have to investigate, negotiate with you connectivity provider. Many (but not all home connectivity products and certainly most datacenters can provide you with IPv6. There are many sources of infromation on how to test and check whether your connection is IPv6 enabled, [here is a starting point](http://www.ipv6enabled.org/ipv6_enabled/ipv6_enable.php)
|
||||
|
||||
### Step 2. Choosing you setup for connecitng you nodes.
|
||||
|
||||
Once you have established that you have IPv6 enabled on the network you are about to deploy, you have to make sure that there is an IPv6 DHCP facility available. Zero-OS does not work with static IPv6 addresses (at this point in time). So you have choose and create one of the following setups:
|
||||
|
||||
#### 2.1 Home setup
|
||||
|
||||
Use your (home) ISP router Ipv6 DHCP capabilities to provide (private) IPv6 addresses. The principle will work the same as for IPv4 home connections, everything happens enabled by Network Adress Translation (just like anything else that uses internet connectivity). This should be relatively straightforward if you have established that your conenction has IPv6 enabled.
|
||||
|
||||
#### 2.2 Datacenter / Expert setup
|
||||
|
||||
In this situation there are many options on how to setup you node. This requires you as the expert to make a few decisions on how to connect what what the best setup is that you can support for the operaitonal time of your farm. The same basics principles apply:
|
||||
- You have to have a block of (public) IPv6 routed to you router, or you have to have your router setup to provide Network Address Translation (NAT)
|
||||
- You have to have a DHCP server in your network that manages and controls IPV6 ip adress leases. Depending on your specific setup you have this DHCP server manage a public IPv6y range which makes all nodes directly connected to the public internet or you have this DHCP server manage a private block og IPv6 addresses which makes all you nodes connect to the internet through NAT.
|
||||
|
||||
As a farmer you are in charge of selecting and creating the appropriate network setup for your farm.
|
||||
|
||||
## General notes
|
||||
|
||||
The above setup will allows your node(s) to appear in explorer on the TF Grid and will allowd you to earn farming tokens. At stated in the introduction ThreeFold is creating next generation internet capacity and therefore has IPv6 as it's base building block. Connecting to the current (dominant) IPv4 network happens for IT workloads through so called webgateways. As the word sais these are gateways that provide connectivity between the currenct leading IPv4 adressing scheme and IPv6.
|
||||
|
||||
We have started a forum where people share their experiences and configurations. This will be work in progress and forever growing.
|
||||
|
||||
**IMPORTANT**: You as a farmer do not need access to IPV4 to be able to rent capacity for IT workloads that need to be visible on IPV4, this is something that can happen elswhere on the TF Grid.
|
||||
|
Before Width: | Height: | Size: 61 KiB |
@@ -1,87 +0,0 @@
|
||||
<h1> Introduction to Networkd</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Boot and initial setup](#boot-and-initial-setup)
|
||||
- [Networkd functionality](#networkd-functionality)
|
||||
- [Techie talk](#techie-talk)
|
||||
- [Wireguard explanations](#wireguard-explanations)
|
||||
- [Caveats](#caveats)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
We provide an introduction to Networkd, the network manager of 0-OS.
|
||||
|
||||
## Boot and initial setup
|
||||
|
||||
At boot, be it from an usb stick or PXE, ZOS starts up the kernel, with a few necessary parameters like farm ID and/or possible network parameters, but basically once the kernel has started, [zinit](https://github.com/threefoldtech/zinit) among other things, starts the network initializer.
|
||||
|
||||
In short, that process loops over the available network interfaces and tries to obtain an IP address that also provides for a default gateway. That means: it tries to get Internet connectivity. Without it, ZOS stops there, as not being able to register itself, nor start other processes, there wouldn't be any use for it to be started anyway.
|
||||
|
||||
Once it has obtained Internet connectivity, ZOS can then proceed to make itself known to the Grid, and acknowledge it's existence. It will then regularly poll the Grid for tasks.
|
||||
|
||||
Once initialized, with the network daemon running (a process that will handle all things related to networking), ZOS will set up some basic services so that workloads can themselves use that network.
|
||||
|
||||
## Networkd functionality
|
||||
|
||||
The network daemon is in itself responsible for a few tasks, and working together with the [provision daemon](../provision) it mainly sets up the local infrastructure to get the user network resources, together with the wireguard configurations for the user's mesh network.
|
||||
|
||||
The Wireguard mesh is an overlay network. That means that traffic of that network is encrypted and encapsulated in a new traffic frame that the gets transferred over the underlay network, here in essence the network that has been set up during boot of the node.
|
||||
|
||||
For users or workloads that run on top of the mesh, the mesh network looks and behaves like any other directly connected workload, and as such that workload can reach other workloads or services in that mesh with the added advantage that that traffic is encrypted, protecting services and communications over that mesh from too curious eyes.
|
||||
|
||||
That also means that workloads between nodes in a local network of a farmer is even protected from the farmer himself, in essence protecting the user from the farmer in case that farmer could become too curious.
|
||||
|
||||
As the nodes do not have any way to be accessed, be it over the underlaying network or even the local console of the node, a user can be sure that his workload cannot be snooped upon.
|
||||
|
||||
## Techie talk
|
||||
|
||||
- **boot and initial setup**
|
||||
For ZOS to work at all (the network is the computer), it needs an internet connection. That is: it needs to be able to communicate with the BCDB over the internet.
|
||||
So ZOS starts with that: with the `internet` process, that tries go get the node to receive an IP address. That process will have set-up a bridge (`zos`), connected to an interface that is on an Internet-capable network. That bridge will have an IP address that has Internet access.
|
||||
Also, that bridge is there for future public interfaces into workloads.
|
||||
Once ZOS can reach the Internet, the rest of the system can be started, where ultimately, the `networkd` daemon is started.
|
||||
|
||||
- **networkd initial setup**
|
||||
`networkd` starts with recensing the available Network interfaces, and registers them to the BCDB (grid database), so that farmers can specify non-standard configs like for multi-nic machines. Once that is done, `networkd` registers itself to the zbus, so it can receive tasks to execute from the provsioning daemon (`provisiond`).
|
||||
These tasks are mostly setting up network resources for users, where a network resource is a subnet in the user's wireguard mesh.
|
||||
|
||||
- **multi-nic setups**
|
||||
|
||||
When someone is a farmer, exploiting nodes somewhere in a datacentre, where the nodes have multiple NICs, it is advisable (though not necessary) to differentiate OOB traffic (like initial boot setup) from user traffic (as well the overlay network as the outgoing NAT for nodes for IPv4) to be on a different NIC. With these parameters, a user will have to make sure their switches are properly configured, more in docs later.
|
||||
|
||||
- **registering and configurations**
|
||||
|
||||
Once a node has booted and properly initialized, registering and configuring the node to be able to accept workloads and their associated network configs, is a two-step process.
|
||||
First, the node registers it's live network setup to the BCDB. That is : all NICs with their associated IP addresses and routes are registered so a farm admin can in a second phase configure eventual separate NICs to handle different kinds of workloads.
|
||||
In that secondary phase, a farm admin can then set-up the NICs and their associated IP's manually, so that workloads can start using them.
|
||||
|
||||
## Wireguard explanations
|
||||
|
||||
- **wireguard as pointopoint links and what that means**
|
||||
Wireguard is a special type of VPN, where every instance is as well server for multiple peers as client towards multiple peers. That way you can create fanning-out connections als receive connections from multiple peers, creating effectively a mesh of connections Like this : 
|
||||
|
||||
- **wireguard port management**
|
||||
Every wireguard point (a network resource point) needs a destination/port combo when it's publicly reachable. The destination is a public ip, but the port is the differentiator. So we need to make sure every network wireguard listening port is unique in the node where it runs, and can be reapplied in case of a node's reboot.
|
||||
ZOS registers the ports **already in use** to the BCDB, so a user can the pick a port that is not yet used.
|
||||
|
||||
- **wireguard and hidden nodes**
|
||||
Hidden nodes are nodes that are in essence hidden behind a firewall, and unreachable from the Internet to an internal network, be it as an IPv4 NATed host or an IPv6 host that is firewalled in any way, where it's impossible to have connection initiations form the Internet to the node.
|
||||
As such, these nodes can only partake in a network as client-only towards publicly reachable peers, and can only initiate the connections themselves. (ref previous drawing).
|
||||
To make sure connectivity stays up, the clients (all) have a keepalive towards all their peers so that communications towards network resources in hidden nodes can be established.
|
||||
|
||||
## Caveats
|
||||
|
||||
- **hidden nodes**
|
||||
Hidden nodes live (mostly) behind firewalls that keep state about connections and these states have a lifetime. We try at best to keep these communications going, but depending of the firewall your mileage may vary (YMMV ;-))
|
||||
|
||||
- **local underlay network reachability**
|
||||
When multiple nodes live in a same hidden network, at the moment we don't try to have the nodes establish connectivity between themselves, so all nodes in that hidden network can only reach each other through the intermediary of a node that is publicly reachable. So to get some performance, a farmer will have to have real routable nodes available in the vicinity.
|
||||
So for now, a farmer is better off to have his nodes really reachable over a public network.
|
||||
|
||||
- **IPv6 and IPv4 considerations**
|
||||
While the mesh can work over IPv4 __and__ IPv6 at the same time, the peers can only be reached through one protocol at the same time. That is a peer is IPv4 __or__ IPv6, not both. Hence if a peer is reachable over IPv4, the client towards that peer needs to reach it over IPv4 too and thus needs an IPv4 address.
|
||||
We advise strongly to have all nodes properly set-up on a routable unfirewalled IPv6 network, so that these problems have no reason to exist.
|
@@ -1,134 +0,0 @@
|
||||
<h1> Zero-Mesh</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [What It Is](#what-it-is)
|
||||
- [Overlay Network](#overlay-network)
|
||||
- [ZOS networkd](#zos-networkd)
|
||||
- [Internet reachability per Network Resource](#internet-reachability-per-network-resource)
|
||||
- [Interworkings](#interworkings)
|
||||
- [Network Resource Internals](#network-resource-internals)
|
||||
|
||||
***
|
||||
|
||||
## What It Is
|
||||
|
||||
When a user wants to deploy a workload, whatever that may be, that workload needs connectivity.
|
||||
If there is just one service to be run, things can be simple, but in general there are more than one services that need to interact to provide a full stack. Sometimes these services can live on one node, but mostly these service will be deployed over multiple nodes, in different containers.
|
||||
The Mesh is created for that, where containers can communicate over an encrypted path, and that network can be specified in terms of IP addresses by the user.
|
||||
|
||||
## Overlay Network
|
||||
|
||||
Zero-Mesh is an overlay network. That requires that nodes need a proper working network with existing access to the Internet in the first place, being full-blown public access, or behind a firewall/home router that provides for Private IP NAT to the internet.
|
||||
|
||||
Right now Zero-Mesh has support for both, where nodes behind a firewall are HIDDEN nodes, and nodes that are directly connected, be it over IPv6 or IPv4 as 'normal' nodes.
|
||||
Hidden nodes can thus only be participating as client nodes for a specific user Mesh, and all publicly reachable nodes can act as aggregators for hidden clients in that user Mesh.
|
||||
|
||||
Also, a Mesh is static: once it is configured, and thus during the lifetime of the network, there is one node containing the aggregator for Mesh clients that live on hidden nodes. So if then an aggregator node has died or is not reachable any more, the mesh needs to be reapplied, with __some__ publicly reachable node as aggregator node.
|
||||
|
||||
So it goes a bit like 
|
||||
The Exit labeled NR in that graph is the point where Network Resources in Hidden Nodes connect to. These Exit NRs are then the transfer nodes between Hidden NRs.
|
||||
|
||||
## ZOS networkd
|
||||
|
||||
The networkd daemon receives tasks from the provisioning daemon, so that it can create the necessary resources for a Mesh participator in the User Network (A network Resource - NR).
|
||||
|
||||
A network is defined as a whole by the User, using the tools in the 3bot to generate a proper configuration that can be used by the network daemon.
|
||||
|
||||
What networkd takes care of, is the establishment of the mesh itself, in accordance with the configuration a farmer has given to his nodes. What is configured on top of the Mesh is user defined, and applied as such by the networkd.
|
||||
|
||||
## Internet reachability per Network Resource
|
||||
|
||||
Every node that participates in a User mesh, will also provide for Internet access for every network resource.
|
||||
that means that every NR has the same Internet access as the node itself. Which also means, in terms of security, that a firewall in the node takes care of blocking all types of entry to the NR, effectively being an Internet access diode, for outgoing and related traffic only.
|
||||
In a later phase a user will be able to define some network resource as __sole__ outgoing Internet Access point, but for now that is not yet defined.
|
||||
|
||||
## Interworkings
|
||||
|
||||
So How is that set up ?
|
||||
|
||||
Every node participating in a User Network, sets up a Network Resource.
|
||||
Basically, it's a Linux Network Namespace (sort of a network virtual machine), that contains a wireguard interface that has a list of other Network resources it needs to route encrypted packets toward.
|
||||
|
||||
As a User Network has a range typically a `/16` (like `10.1.0.0/16`), that is user defined. The User then picks a subnet from that range (like e.g. `10.1.1.0/24`) to assign that to every new NR he wants to participate in that Network.
|
||||
|
||||
Workloads that are then provisioned are started in a newly created Container, and that container gets a User assigned IP __in__ that subnet of the Network Resource.
|
||||
|
||||
The Network resource itself then handles the routing and firewalling for the containers that are connected to it. Also, the Network Resource takes care of internet connectivity, so that the container can reach out to other services on the Internet.
|
||||
|
||||

|
||||
|
||||
Also in a later phase, a User will be able to add IPv6 prefixes to his Network Resources, so that containers are reachable over IPv6.
|
||||
|
||||
Fully-routed IPv6 will then be available, where an Exit NR will be the entrypoint towards that network.
|
||||
|
||||
## Network Resource Internals
|
||||
|
||||
Each NR is basically a router for the User Network, but to allow NRs to access the Internet through the Node's local connection, there are some other internal routers to be added.
|
||||
|
||||
Internally it looks like this :
|
||||
|
||||
```text
|
||||
+------------------------------------------------------------------------------+
|
||||
| |wg mesh |
|
||||
| +-------------+ +-----+-------+ |
|
||||
| | | | NR cust1 | 100.64.0.123/16 |
|
||||
| | container +----------+ 10.3.1.0/24 +----------------------+ |
|
||||
| | cust1 | veth| | public | |
|
||||
| +-------------+ +-------------+ | |
|
||||
| | |
|
||||
| +-------------+ +-------------+ | |
|
||||
| | | | NR cust200 | 100.64.4.200/16 | |
|
||||
| | container +----------+ 10.3.1.0/24 +----------------------+ |
|
||||
| | cust200 | veth| | public | |
|
||||
| +-------------+ +------+------+ | |
|
||||
| |wg mesh | |
|
||||
| 10.101.123.34/16 | |
|
||||
| +------------+ |tonrs |
|
||||
| | | +------------------+ |
|
||||
| | zos +------+ | 100.64.0.1/16 | |
|
||||
| | | | 10.101.12.231/16| ndmz | |
|
||||
| +---+--------+ NIC +-----------------------------+ | |
|
||||
| | | public +------------------+ |
|
||||
| +--------+------+ |
|
||||
| | |
|
||||
| | |
|
||||
+------------------------------------------------------------------------------+
|
||||
|
|
||||
|
|
||||
|
|
||||
| 10.101.0.0/16 10.101.0.1
|
||||
+------------------+------------------------------------------------------------
|
||||
|
||||
NAT
|
||||
--------
|
||||
rules NR custA
|
||||
nft add rule inet nat postrouting oifname public masquerade
|
||||
nft add rule inet filter input iifname public ct state { established, related } accept
|
||||
nft add rule inet filter input iifname public drop
|
||||
|
||||
rules NR custB
|
||||
nft add rule inet nat postrouting oifname public masquerade
|
||||
nft add rule inet filter input iifname public ct state { established, related } accept
|
||||
nft add rule inet filter input iifname public drop
|
||||
|
||||
rules ndmz
|
||||
nft add rule inet nat postrouting oifname public masquerade
|
||||
nft add rule inet filter input iifname public ct state { established, related } accept
|
||||
nft add rule inet filter input iifname public drop
|
||||
|
||||
|
||||
Routing
|
||||
|
||||
if NR only needs to get out:
|
||||
ip route add default via 100.64.0.1 dev public
|
||||
|
||||
if an NR wants to use another NR as exitpoint
|
||||
ip route add default via destnr
|
||||
with for AllowedIPs 0.0.0.0/0 on that wg peer
|
||||
|
||||
```
|
||||
|
||||
During startup of the Node, the ndmz is put in place, following the configuration if it has a single internet connection , or that with a dual-nic setup, a separate nic is used for internet access.
|
||||
|
||||
The ndmz network has the carrier-grade nat allocation assigned, so we don'tinterfere with RFC1918 private IPv4 address space, so users can use any of them (and not any of `100.64.0.0/10`, of course)
|
@@ -1,315 +0,0 @@
|
||||
# 0-OS v2 and it's network
|
||||
|
||||
## Introduction
|
||||
|
||||
0-OS nodes participating in the Threefold grid, need connectivity of course. They need to be able to communicate over
|
||||
the Internet with each-other in order to do various things:
|
||||
|
||||
- download it's OS modules
|
||||
- perform OS module upgrades
|
||||
- register itself to the grid, and send regular updates about it's status
|
||||
- query the grid for tasks to execute
|
||||
- build and run the Overlay Network
|
||||
- download flists and the effective files to cache
|
||||
|
||||
The nodes themselves can have connectivity in a few different ways:
|
||||
|
||||
- Only have RFC1918 private addresses, connected to the Internet through NAT, NO IPv6
|
||||
Mostly, these are single-NIC (Network card) machines that can host some workloads through the Overlay Network, but
|
||||
cant't expose services directly. These are HIDDEN nodes, and are mostly booted with an USB stick from
|
||||
bootstrap.grid.tf .
|
||||
- Dual-stacked: having RFC1918 private IPv4 and public IPv6 , where the IPv6 addresses are received from a home router,
|
||||
but firewalled for outgoing traffic only. These nodes are effectively also HIDDEN
|
||||
- Nodes with 2 NICs, one that has effectively a NIC connected to a segment that has real public
|
||||
addresses (IPv4 and/or IPv6) and one NIC that is used for booting and local
|
||||
management. (OOB) (like in the drawing for farmer setup)
|
||||
|
||||
For Farmers, we need to have Nodes to be reachable over IPv6, so that the nodes can:
|
||||
|
||||
- expose services to be proxied into containers/vms
|
||||
- act as aggregating nodes for Overlay Networks for HIDDEN Nodes
|
||||
|
||||
Some Nodes in Farms should also have a publicly reachable IPv4, to make sure that clients that only have IPv4 can
|
||||
effectively reach exposed services.
|
||||
|
||||
But we need to stress the importance of IPv6 availability when you're running a multi-node farm in a datacentre: as the
|
||||
grid is boldly claiming to be a new Internet, we should make sure we adhere to the new protocols that are future-proof.
|
||||
Hence: IPv6 is the base, and IPv4 is just there to accomodate the transition.
|
||||
|
||||
Nowadays, RIPE can't even hand out consecutive /22 IPv4 blocks any more for new LIRs, so you'll be bound to market to
|
||||
get IPv4, mostly at rates of 10-15 Euro per IP. Things tend to get costly that way.
|
||||
|
||||
So anyway, IPv6 is not an afterthought in 0-OS, we're starting with it.
|
||||
|
||||
## Physical setup for farmers
|
||||
|
||||
```text
|
||||
XXXXX XXX
|
||||
XX XXX XXXXX XXX
|
||||
X X XXX
|
||||
X X
|
||||
X INTERNET X
|
||||
XXX X X
|
||||
XXXXX XX XX XXXX
|
||||
+X XXXX XX XXXXX
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
+------+--------+
|
||||
| FIREWALL/ |
|
||||
| ROUTER |
|
||||
+--+----------+-+
|
||||
| |
|
||||
+-----------+----+ +-+--------------+
|
||||
| switch/ | | switch/ |
|
||||
| vlan segment | | vlan segment |
|
||||
+-+---------+----+ +---+------------+
|
||||
| | |
|
||||
+-------+-------+ |OOB | PUBLIC
|
||||
| PXE / dhcp | | |
|
||||
| Ser^er | | |
|
||||
+---------------+ | |
|
||||
| |
|
||||
+-----+------------+----------+
|
||||
| |
|
||||
| +--+
|
||||
| | |
|
||||
| NODES | +--+
|
||||
+--+--------------------------+ | |
|
||||
| | |
|
||||
+--+--------------------------+ |
|
||||
| |
|
||||
+-----------------------------+
|
||||
```
|
||||
|
||||
The PXE/dhcp can also be done by the firewall, your mileage may vary.
|
||||
|
||||
## Switch and firewall configs
|
||||
|
||||
Single switch, multiple switch, it all boils down to the same:
|
||||
|
||||
- one port is an access port on an OOB vlan/segment
|
||||
- one port is connected to a public vlan/segment
|
||||
|
||||
The farmer makes sure that every node receives properly an IPv4 address in the OOB segment through means of dhcp, so
|
||||
that with a PXE config or USB, a node can effectively start it's boot process:
|
||||
|
||||
- Download kernel and initrd
|
||||
- Download and mount the system flists so that the 0-OS daemons can start
|
||||
- Register itself on the grid
|
||||
- Query the grid for tasks to execute
|
||||
|
||||
For the PUBLIC side of the Nodes, there are a few things to consider:
|
||||
|
||||
- It's the farmer's job to inform the grid what node gets an IP address, be it IPv4 or IPv4.
|
||||
- Nodes that don't receive and IPv4 address will connect to the IPv4 net through the NATed OOB network
|
||||
- A farmer is responsible to provide and IPv6 prefix on at least one segment, and have a Router Advertisement daemon
|
||||
runnig to provide for SLAAC addressin on that segment.
|
||||
- That IPv6 Prefix on the public segment should not be firewalled, as it's impossible to know in your firewall what
|
||||
ports will get exposed for the proxies.
|
||||
|
||||
The Nodes themselves have nothing listening that points into the host OS itself, and are by themselves also firewalled.
|
||||
In dev mode, there is an ssh server with a key-only login, accessible by a select few ;-)
|
||||
|
||||
## DHCP/Radvd/RA/DHCP6
|
||||
|
||||
For home networks, there is not much to do, a Node will get an IPv4 Private(rfc1918) address , and most probaly and
|
||||
ipv6 address in a /64 prefix, but is not reachable over ipv6, unless the firewall is disabled for IPv6. As we can't
|
||||
rely on the fact that that is possible, we assume these nodes to be HIDDEN.
|
||||
|
||||
A normal self-respecting Firewall or IP-capable switch can hand out IP[46] addresses, some can
|
||||
even bootp/tftp to get nodes booted over the network.
|
||||
We are (full of hope) assuming that you would have such a beast to configure and splice your network
|
||||
in multiple segments.
|
||||
A segment is a physical network separation. That can be port-based vlans, or even separate switches, whatver rocks your
|
||||
boat, the keyword is here **separate**.
|
||||
|
||||
On both segments you will need a way to hand out IPv4 addresses based on MAC addresses of the nodes. Yes, there is some
|
||||
administration to do, but it's a one-off, and really necessary, because you really need to know whic physical machine
|
||||
has which IP. For lights-out management and location of machines that is a must.
|
||||
|
||||
So you'll need a list of mac addresses to add to your dhcp server for IPv4, to make sure you know which machine has
|
||||
received what IPv4 Address.
|
||||
That is necessary for 2 things:
|
||||
|
||||
- locate the node if something is amiss, like be able to pinpoint a node's disk in case it broke (which it will)
|
||||
- have the node be reachable all the time, without the need to update the grid and network configs every time the node
|
||||
boots.
|
||||
|
||||
## What happens under the hood (farmer)
|
||||
|
||||
While we did our uttermost best to keep IPv4 address needs to a strict minimum, at least one Node will need an IPv4 address for handling everything that is Overlay Networks.
|
||||
For Containers to reach the Internet, any type of connectivity will do, be it NAT or though an Internal DMZ that has a
|
||||
routable IPv4 address.
|
||||
|
||||
Internally, a lot of things are being set-up to have a node properly participate in the grid, as well to be prepared to partake in the User's Overlay Networks.
|
||||
|
||||
A node connects itself to 'the Internet' depending on a few states.
|
||||
|
||||
1. It lives in a fully private network (like it would be connected directly to a port on a home router)
|
||||
|
||||
```
|
||||
XX XXX
|
||||
XXX XXXXXX
|
||||
X Internet X
|
||||
XXXXXXX XXXXX
|
||||
XX XXX
|
||||
XX X
|
||||
X+X
|
||||
|
|
||||
|
|
||||
+--------+-----------+
|
||||
| HOME / |
|
||||
| SOHO router |
|
||||
| |
|
||||
+--------+-----------+
|
||||
|
|
||||
| Private space IPv4
|
||||
| (192.168.1.0/24)
|
||||
|
|
||||
+---------+------------+
|
||||
| |
|
||||
| NODE |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
+----------------------+
|
||||
```
|
||||
|
||||
1. It lives in a fully public network (like it is connected directly to an uplink and has a public ipv4 address)
|
||||
|
||||
```
|
||||
XX XXX
|
||||
XXX XXXXXX
|
||||
X Internet X
|
||||
XXXXXXX XXXXX
|
||||
XX XXX
|
||||
XX X
|
||||
X+X
|
||||
|
|
||||
| fully public space ipv4/6
|
||||
| 185.69.166.0/24
|
||||
| 2a02:1802:5e:0:1000::abcd/64
|
||||
|
|
||||
+---------+------------+
|
||||
| |
|
||||
| NODE |
|
||||
| |
|
||||
+----------------------+
|
||||
|
||||
```
|
||||
The node is fully reachable
|
||||
|
||||
1. It lives in a datacentre, where a farmer manages the Network.
|
||||
|
||||
A little Drawing :
|
||||
|
||||
```text
|
||||
+----------------------------------------------------+
|
||||
| switch |
|
||||
| |
|
||||
| |
|
||||
+----------+-------------------------------------+---+
|
||||
| |
|
||||
access | |
|
||||
mgmt | +---------------+
|
||||
vlan | | access
|
||||
| | public
|
||||
| | vlan
|
||||
| |
|
||||
+-------+---------------------+------+
|
||||
| |
|
||||
| nic1 nic2 |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| NODE |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
+------------------------------------+
|
||||
|
||||
```
|
||||
|
||||
Or the more elaborate drawing on top that should be sufficient for a sysadmin to comprehend.
|
||||
|
||||
Although:
|
||||
|
||||
- we don't (yet) support nic bonding (next release)
|
||||
- we don't (yet) support vlans, so your ports on switch/router need to be access ports to vlans to your router/firewall
|
||||
|
||||
|
||||
## yeayea, but really ... what now ?
|
||||
|
||||
Ok, what are the constraints?
|
||||
|
||||
A little foreword:
|
||||
ZosV2 uses IPv6 as it's base for networking, where the oldie IPv4 is merely an afterthought. So for it to work properly in it's actual incantation (we are working to get it to do IPv4-only too), for now, we need the node to live in a space that provides IPv6 __too__ .
|
||||
IPV4 and IPv6 are very different beasts, so any machine connected to the Internet wil do both on the same network. So basically your computer talks 2 different languages, when it comes to communicating. That is the same for ZOS, where right now, it's mother tongue is IPv6.
|
||||
|
||||
So your zos for V2 can start in different settings
|
||||
1) you are a farmer, your ISP can provide you with IPv6
|
||||
Ok, you're all set, aside from a public IPv4 DHCP, you need to run a Stateless-Only SLAAC Router Advertiser (ZOS does NOT do DHCP6).
|
||||
|
||||
1) you are a farmer, your ISP asks you what the hell IPv6 is
|
||||
That is problematic right now, wait for the next release of ZosV2
|
||||
|
||||
1) you are a farmer, with only one node , at home, and on your PC https://ipv6.net tells you you have IPv6 on your PC.
|
||||
That means your home router received an IPV6 allocation from the ISP,
|
||||
Your'e all set, your node will boot, and register to the grid. If you know what you're doing, you can configure your router to allow all ipv6 traffic in forwarding mode to the specifice mac address of your node. (we'll explain later)
|
||||
1) you are a farmer, with a few nodes somewhere that are registered on the grid in V1, but you have no clue if IPv6 is supported where these nodes live
|
||||
1) you have a ThreefoldToken node at home, and still do not have a clue
|
||||
|
||||
Basically it boils down also in a few other cases
|
||||
|
||||
1) the physical network where a node lives has: IPv6 and Private space IPv4
|
||||
1) the physical network where a node lives has: IPv6 and Public IPv4
|
||||
1) the physical network where a node lives has: only IPv4
|
||||
|
||||
But it bloils down to : call your ISP, ask for IPv6. It's the future, for yout ISP, it's time. There is no way to circumvent it. No way.
|
||||
|
||||
|
||||
OK, then, now what.
|
||||
|
||||
1) you're a farmer with a bunch of nodes somewhere in a DC
|
||||
|
||||
- your nodes are connected once (with one NIC) to a switch/router
|
||||
Then your router will have :
|
||||
- a segment that carries IPv4 __and__ IPv6:
|
||||
|
||||
- for IPv4, there are 2 possibilities:
|
||||
- it's RFC1918 (Private space) -> you NAT that subnet (e.g. 192.168.1.0/24) towards the Public Internet
|
||||
|
||||
- you __will__ have difficulty to designate a IPv4 public entrypoint into your farm
|
||||
- your workloads will be only reachable through the overlay
|
||||
- your storage will not be reachable
|
||||
|
||||
- you received a (small, because of the scarceness of IPv4 addresses, your ISP will give you only limited and pricy IPv4 adresses) IPv4 range you can utilise
|
||||
|
||||
- things are better, the nodes can live in public ipv4 space, where they can be used as entrypoint
|
||||
- standard configuration that works
|
||||
|
||||
- for IPv6, your router is a Routing advertiser that provides SLAAC (Stateless, unmanaged) for that segment, working witha /64 prefix
|
||||
|
||||
- the nodes will reachable over IPv6
|
||||
- storage backend will be available for the full grid
|
||||
- everything will just work
|
||||
|
||||
Best solution for single NIC:
|
||||
- an ipv6 prefx
|
||||
- an ipv4 subnet (however small)
|
||||
|
||||
- your nodes have 2 connections, and you wnat to differ management from user traffic
|
||||
|
||||
- same applies as above, where the best outcome will be obtained with a real IPv6 prefix allocation and a small public subnet that is routable.
|
||||
- the second NIC (typically 10GBit) will then carry everything public, and the first nic will just be there for managent, living in Private space for IPv4, mostly without IPv6
|
||||
- your switch needs to be configured to provide port-based vlans, so the segments are properly separated, and your router needs to reflect that vlan config so that separation is handeled by the firewall in the router (iptables, pf, acl, ...)
|
||||
|
||||
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 39 KiB |
@@ -1,8 +0,0 @@
|
||||
<h1> Zero-OS Networking </h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction to networkd](./introduction.md)
|
||||
- [Vocabulary Definitions](./definitions.md)
|
||||
- [Wireguard Mesh Details](./mesh.md)
|
||||
- [Farm Network Setup](./setup_farm_network.md)
|
@@ -1,123 +0,0 @@
|
||||
<h1>Setup</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Running ZOS (v2) at home](#running-zos-v2-at-home)
|
||||
- [Running ZOS (v2) in a multi-node farm in a DC](#running-zos-v2-in-a-multi-node-farm-in-a-dc)
|
||||
- [Necessities](#necessities)
|
||||
- [IPv6](#ipv6)
|
||||
- [Routing/firewalling](#routingfirewalling)
|
||||
- [Multi-NIC Nodes](#multi-nic-nodes)
|
||||
- [Farmers and the grid](#farmers-and-the-grid)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
We present ZOSv2 network considerations.
|
||||
|
||||
Running ZOS on a node is just a matter of booting it with a USB stick, or with a dhcp/bootp/tftp server with the right configuration so that the node can start the OS.
|
||||
Once it starts booting, the OS detects the NICs, and starts the network configuration. A Node can only continue it's boot process till the end when it effectively has received an IP address and a route to the Internet. Without that, the Node will retry indefinitely to obtain Internet access and not finish it's startup.
|
||||
|
||||
So a Node needs to be connected to a __wired__ network, providing a dhcp server and a default gateway to the Internet, be it NATed or plainly on the public network, where any route to the Internet, be it IPv4 or IPv6 or both is sufficient.
|
||||
|
||||
For a node to have that ability to host user networks, we **strongly** advise to have a working IPv6 setup, as that is the primary IP stack we're using for the User Network's Mesh to function.
|
||||
|
||||
## Running ZOS (v2) at home
|
||||
|
||||
Running a ZOS Node at home is plain simple. Connect it to your router, plug it in the network, insert the preconfigured USB stick containing the bootloader and the `farmer_id`, power it on.
|
||||
You will then see it appear in the Cockpit (`https://cockpit.testnet.grid.tf/capacity`), under your farm.
|
||||
|
||||
## Running ZOS (v2) in a multi-node farm in a DC
|
||||
|
||||
Multi-Node Farms, where a farmer wants to host the nodes in a data centre, have basically the same simplicity, but the nodes can boot from a boot server that provides for DHCP, and also delivers the iPXE image to load, without the need for a USB stick in every Node.
|
||||
|
||||
A boot server is not really necessary, but it helps ;-). That server has a list of the MAC addresses of the nodes, and delivers the bootloader over PXE. The farmer is responsible to set-up the network, and configure the boot server.
|
||||
|
||||
### Necessities
|
||||
|
||||
The Farmer needs to:
|
||||
|
||||
- Obtain an IPv6 prefix allocation from the provider. A `/64` will do, that is publicly reachable, but a `/48` is advisable if the farmer wants to provide IPv6 transit for User Networks
|
||||
- If IPv6 is not an option, obtain an IPv4 subnet from the provider. At least one IPv4 address per node is needed, where all IP addresses are publicly reachable.
|
||||
- Have the Nodes connected on that public network with a switch so that all Nodes are publicly reachable.
|
||||
- In case of multiple NICS, also make sure his farm is properly registered in BCDB, so that the Node's public IP Addresses are registered.
|
||||
- Properly list the MAC addresses of the Nodes, and configure the DHCP server to provide for an IP address, and in case of multiple NICs also provide for private IP addresses over DHCP per Node.
|
||||
- Make sure that after first boot, the Nodes are reachable.
|
||||
|
||||
### IPv6
|
||||
|
||||
IPv6, although already a real protocol since '98, has seen reluctant adoption over the time it exists. That mostly because ISPs and Carriers were reluctant to deploy it, and not seeing the need since the advent of NAT and private IP space, giving the false impression of security.
|
||||
But this month (10/2019), RIPE sent a mail to all it's LIRs that the last consecutive /22 in IPv4 has been allocated. Needless to say, but that makes the transition to IPv6 in 2019 of utmost importance and necessity.
|
||||
Hence, ZOS starts with IPv6, and IPv4 is merely an afterthought ;-)
|
||||
So in a nutshell: we greatly encourage Farmers to have IPv6 on the Node's network.
|
||||
|
||||
### Routing/firewalling
|
||||
|
||||
Basically, the Nodes are self-protecting, in the sense that they provide no means at all to be accessed through listening processes at all. No service is active on the node itself, and User Networks function solely on an overlay.
|
||||
That also means that there is no need for a Farm admin to protect the Nodes from exterior access, albeit some DDoS protection might be a good idea.
|
||||
In the first phase we will still allow the Host OS (ZOS) to reply on ICMP ping requests, but that 'feature' might as well be blocked in the future, as once a Node is able to register itself, there is no real need to ever want to try to reach it.
|
||||
|
||||
### Multi-NIC Nodes
|
||||
|
||||
Nodes that Farmers deploy are typically multi-NIC Nodes, where one (typically a 1GBit NIC) can be used for getting a proper DHCP server running from where the Nodes can boot, and one other NIC (1Gbit or even 10GBit), that then is used for transfers of User Data, so that there is a clean separation, and possible injections bogus data is not possible.
|
||||
|
||||
That means that there would be two networks, either by different physical switches, or by port-based VLANs in the switch (if there is only one).
|
||||
|
||||
- Management NICs
|
||||
The Management NIC will be used by ZOS to boot, and register itself to the GRID. Also, all communications from the Node to the Grid happens from there.
|
||||
- Public NICs
|
||||
|
||||
### Farmers and the grid
|
||||
|
||||
A Node, being part of the Grid, has no concept of 'Farmer'. The only relationship for a Node with a Farmer is the fact that that is registered 'somewhere (TM)', and that a such workloads on a Node will be remunerated with Tokens. For the rest, a Node is a wholly stand-alone thing that participates in the Grid.
|
||||
|
||||
```text
|
||||
172.16.1.0/24
|
||||
2a02:1807:1100:10::/64
|
||||
+--------------------------------------+
|
||||
| +--------------+ | +-----------------------+
|
||||
| |Node ZOS | +-------+ | |
|
||||
| | +-------------+1GBit +--------------------+ 1GBit switch |
|
||||
| | | br-zos +-------+ | |
|
||||
| | | | | |
|
||||
| | | | | |
|
||||
| | | | +------------------+----+
|
||||
| +--------------+ | | +-----------+
|
||||
| | OOB Network | | |
|
||||
| | +----------+ ROUTER |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| +------------+ | +----------+ |
|
||||
| | Public | | | | |
|
||||
| | container | | | +-----+-----+
|
||||
| | | | | |
|
||||
| | | | | |
|
||||
| +---+--------+ | +-------------------+--------+ |
|
||||
| | | | 10GBit Switch | |
|
||||
| br-pub| +-------+ | | |
|
||||
| +-----+10GBit +-------------------+ | +---------->
|
||||
| +-------+ | | Internet
|
||||
| | | |
|
||||
| | +----------------------------+
|
||||
+--------------------------------------+
|
||||
185.69.167.128/26 Public network
|
||||
2a02:1807:1100:0::/64
|
||||
|
||||
```
|
||||
|
||||
Where the underlay part of the wireguard interfaces get instantiated in the Public container (namespace), and once created these wireguard interfaces get sent into the User Network (Network Resource), where a user can then configure the interface a he sees fit.
|
||||
|
||||
The router of the farmer fulfills 2 roles:
|
||||
|
||||
- NAT everything in the OOB network to the outside, so that nodes can start and register themselves, as well get tasks to execute from the BCDB.
|
||||
- Route the assigned IPv4 subnet and IPv6 public prefix on the public segment, to which the public container is connected.
|
||||
|
||||
As such, in case that the farmer wants to provide IPv4 public access for grid proxies, the node will need at least one (1) IPv4 address. It's free to the farmer to assign IPv4 addresses to only a part of the Nodes.
|
||||
On the other hand, it is quite important to have a proper IPv6 setup, because things will work out better.
|
||||
|
||||
It's the Farmer's task to set up the Router and the switches.
|
||||
|
||||
In a simpler setup (small number of nodes for instance), the farmer could setup a single switch and make 2 port-based VLANs to separate OOB and Public, or even wit single-nic nodes, just put them directly on the public segment, but then he will have to provide a DHCP server on the Public network.
|
Before Width: | Height: | Size: 52 KiB |
Before Width: | Height: | Size: 50 KiB |
Before Width: | Height: | Size: 20 KiB |
Before Width: | Height: | Size: 28 KiB |
Before Width: | Height: | Size: 30 KiB |
Before Width: | Height: | Size: 25 KiB |
Before Width: | Height: | Size: 12 KiB |
@@ -1,68 +0,0 @@
|
||||
# On boot
|
||||
> this is setup by `internet` daemon, which is part of the bootstrap process.
|
||||
|
||||
the first basic network setup is done, the point of this setup is to connect the node to the internet, to be able to continue the rest of the boot process.
|
||||
|
||||
- Go over all **PLUGGED, and PHYSICAL** interfaces
|
||||
- For each matching interface, the interface is tested if it can get both IPv4 and IPv6
|
||||
- If multiple interfaces have been found to receive ipv4 from dhcp, we find the `smallest` ip, with the private gateway IP, otherwise if no private gateway ip found, we only find the one with the smallest IP.
|
||||
- Once the interface is found we do the following: (we will call this interface **eth**)
|
||||
- Create a bridge named `zos`
|
||||
- Disable IPv6 on this bridge, and ipv6 forwarding
|
||||
- Run `udhcpc` on zos bridge
|
||||

|
||||
|
||||
Once this setup complete, the node now has access to the internet which allows it to download and run `networkd` which takes over the network stack and continue the process as follows.
|
||||
|
||||
# Network Daemon
|
||||
- Validate zos setup created by the `internet` on boot daemon
|
||||
- Send information about all local nics to the explorer (?)
|
||||
|
||||
## Setting up `ndmz`
|
||||
First we need to find the master interface for ndmz, we have the following cases:
|
||||
- master of `public_config` if set. Public Config is an external configuration that is set by the farmer on the node object. that information is retrieved by the node from the public explorer.
|
||||
- otherwise (if public_config is not set) check if the public namespace is set (i think that's a dead branch because if this exist (or can exist) it means the master is always set. which means it will get used always.
|
||||
- otherwise find first interface with ipv6
|
||||
- otherwise check if zos has global unicast ipv6
|
||||
- otherwise hidden node (still uses zos but in hidden node setup)
|
||||
|
||||
### Hidden node ndmz
|
||||

|
||||
|
||||
### Dualstack ndmz
|
||||

|
||||
|
||||
## Setting up Public Config
|
||||
this is an external configuration step that is configured by the farmer on the node object. The node then must have setup in the explorer.
|
||||
|
||||

|
||||
|
||||
## Setting up Yggdrasil
|
||||
- Get a list of all public peers with status `up`
|
||||
- If hidden node:
|
||||
- Find peers with IPv4 addresses
|
||||
- If dual stack node:
|
||||
- Filter out all peers with same prefix as the node, to avoid connecting locally only
|
||||
- write down yggdrasil config, and start yggdrasil daemon via zinit
|
||||
- yggdrasil runs inside the ndmz namespace
|
||||
- add an ipv6 address to npub in the same prefix as yggdrasil. this way when npub6 is used as a gateway for this prefix, traffic
|
||||
will be routed through yggdrasil.
|
||||
|
||||
# Creating a network resource
|
||||
A network resource (`NR` for short) as a user private network that lives on the node and can span multiple nodes over wireguard. When a network is deployed the node builds a user namespace as follows:
|
||||
- A unique network id is generated by md5sum(user_id + network_name) then only take first 13 bytes. We will call this `net-id`.
|
||||
|
||||

|
||||
|
||||
## Create the wireguard interface
|
||||
if the node has `public_config` so the `public` namespace exists. then the wireguard device is first created inside the `public` namespace then moved
|
||||
to the network-resource namespace.
|
||||
|
||||
Otherwise, the port is created on the host namespace and then moved to the network-resource namespace. The final result is
|
||||
|
||||

|
||||
|
||||
Finally the wireguard peer list is applied and configured, routing rules is also configured to route traffic to the wireguard interface
|
||||
|
||||
# Member joining a user network (network resource)
|
||||

|
@@ -1,57 +0,0 @@
|
||||
@startuml
|
||||
[zos\nbridge] as zos
|
||||
[br-pub\nbridge] as brpub
|
||||
[br-ndmz\nbridge] as brndmz
|
||||
note top of brndmz
|
||||
disable ipv6
|
||||
- net.ipv6.conf.br-ndmz.disable_ipv6 = 1
|
||||
end note
|
||||
' brpub -left- zos : veth pair\n(tozos)
|
||||
brpub -down- master
|
||||
note right of master
|
||||
master is found as described
|
||||
in the readme (this can be zos bridge)
|
||||
in case of a single node machine
|
||||
end note
|
||||
|
||||
package "ndmz namespace" {
|
||||
[tonrs\nmacvlan] as tonrs
|
||||
note bottom of tonrs
|
||||
- net.ipv4.conf.tonrs.proxy_arp = 0
|
||||
- net.ipv6.conf.tonrs.disable_ipv6 = 0
|
||||
|
||||
Addresses:
|
||||
100.127.0.1/16
|
||||
fe80::1/64
|
||||
fd00::1
|
||||
end note
|
||||
tonrs - brndmz: macvlan
|
||||
|
||||
[npub6\nmacvlan] as npub6
|
||||
npub6 -down- brpub: macvlan
|
||||
|
||||
[npub4\nmacvlan] as npub4
|
||||
npub4 -down- zos: macvlan
|
||||
|
||||
note as MAC
|
||||
gets static mac address generated
|
||||
from node id. to make sure it receives
|
||||
same ip address.
|
||||
end note
|
||||
|
||||
MAC .. npub4
|
||||
MAC .. npub6
|
||||
|
||||
note as setup
|
||||
- net.ipv6.conf.all.forwarding = 1
|
||||
end note
|
||||
|
||||
[ygg0]
|
||||
note bottom of ygg0
|
||||
this will be added by yggdrasil setup
|
||||
in the next step
|
||||
end note
|
||||
}
|
||||
|
||||
footer (hidden node) no master with global unicast ipv6 found
|
||||
@enduml
|
@@ -1,55 +0,0 @@
|
||||
@startuml
|
||||
[zos\nbridge] as zos
|
||||
note left of zos
|
||||
current select master
|
||||
for hiddent ndmz setup
|
||||
end note
|
||||
[br-pub\nbridge] as brpub
|
||||
[br-ndmz\nbridge] as brndmz
|
||||
note top of brndmz
|
||||
disable ipv6
|
||||
- net.ipv6.conf.br-ndmz.disable_ipv6 = 1
|
||||
end note
|
||||
brpub -left- zos : veth pair\n(tozos)
|
||||
|
||||
package "ndmz namespace" {
|
||||
[tonrs\nmacvlan] as tonrs
|
||||
note bottom of tonrs
|
||||
- net.ipv4.conf.tonrs.proxy_arp = 0
|
||||
- net.ipv6.conf.tonrs.disable_ipv6 = 0
|
||||
|
||||
Addresses:
|
||||
100.127.0.1/16
|
||||
fe80::1/64
|
||||
fd00::1
|
||||
end note
|
||||
tonrs - brndmz: macvlan
|
||||
|
||||
[npub6\nmacvlan] as npub6
|
||||
npub6 -right- brpub: macvlan
|
||||
|
||||
[npub4\nmacvlan] as npub4
|
||||
npub4 -down- zos: macvlan
|
||||
|
||||
note as MAC
|
||||
gets static mac address generated
|
||||
from node id. to make sure it receives
|
||||
same ip address.
|
||||
end note
|
||||
|
||||
MAC .. npub4
|
||||
MAC .. npub6
|
||||
|
||||
note as setup
|
||||
- net.ipv6.conf.all.forwarding = 1
|
||||
end note
|
||||
|
||||
[ygg0]
|
||||
note bottom of ygg0
|
||||
this will be added by yggdrasil setup
|
||||
in the next step
|
||||
end note
|
||||
}
|
||||
|
||||
footer (hidden node) no master with global unicast ipv6 found
|
||||
@enduml
|
@@ -1,23 +0,0 @@
|
||||
@startuml
|
||||
|
||||
component "br-pub" as public
|
||||
component "b-<netid>\nbridge" as bridge
|
||||
package "<reservation-id> namespace" {
|
||||
component eth0 as eth
|
||||
note right of eth
|
||||
set ip as configured in the reservation
|
||||
it must be in the subnet assinged to n-<netid>
|
||||
in the user resource above.
|
||||
- set default route through n-<netid>
|
||||
end note
|
||||
eth .. bridge: veth
|
||||
|
||||
component [pub\nmacvlan] as pub
|
||||
pub .. public
|
||||
|
||||
note right of pub
|
||||
only if public ipv6 is requests
|
||||
also gets a consistent MAC address
|
||||
end note
|
||||
}
|
||||
@enduml
|
@@ -1,31 +0,0 @@
|
||||
@startuml
|
||||
component [b-<netid>] as bridge
|
||||
note left of bridge
|
||||
- net.ipv6.conf.b-<netid>.disable_ipv6 = 1
|
||||
end note
|
||||
|
||||
package "n-<netid> namespace" {
|
||||
component [n-<netid>\nmacvlan] as nic
|
||||
bridge .. nic: macvlan
|
||||
|
||||
note bottom of nic
|
||||
- nic gets the first ip ".1" in the assigned
|
||||
user subnet.
|
||||
- an ipv6 driven from ipv4 that is driven from the assigned ipv4
|
||||
- fe80::1/64
|
||||
end note
|
||||
component [public\nmacvlan] as public
|
||||
note bottom of public
|
||||
- gets an ipv4 in 100.127.0.9/16 range
|
||||
- get an ipv6 in the fd00::/64 prefix
|
||||
- route over 100.127.0.1
|
||||
- route over fe80::1/64
|
||||
end note
|
||||
note as G
|
||||
- net.ipv6.conf.all.forwarding = 1
|
||||
end note
|
||||
}
|
||||
|
||||
component [br-ndmz] as brndmz
|
||||
brndmz .. public: macvlan
|
||||
@enduml
|
@@ -1,33 +0,0 @@
|
||||
@startuml
|
||||
component [b-<netid>] as bridge
|
||||
note left of bridge
|
||||
- net.ipv6.conf.b-<netid>.disable_ipv6 = 1
|
||||
end note
|
||||
|
||||
package "n-<netid> namespace" {
|
||||
component [n-<netid>\nmacvlan] as nic
|
||||
bridge .. nic: macvlan
|
||||
|
||||
note bottom of nic
|
||||
- nic gets the first ip ".1" in the assigned
|
||||
user subnet.
|
||||
- an ipv6 driven from ipv4 that is driven from the assigned ipv4
|
||||
- fe80::1/64
|
||||
end note
|
||||
component [public\nmacvlan] as public
|
||||
note bottom of public
|
||||
- gets an ipv4 in 100.127.0.9/16 range
|
||||
- get an ipv6 in the fd00::/64 prefix
|
||||
- route over 100.127.0.1
|
||||
- route over fe80::1/64
|
||||
end note
|
||||
note as G
|
||||
- net.ipv6.conf.all.forwarding = 1
|
||||
end note
|
||||
component [w-<netid>\nwireguard]
|
||||
}
|
||||
|
||||
|
||||
component [br-ndmz] as brndmz
|
||||
brndmz .. public: macvlan
|
||||
@enduml
|
@@ -1,29 +0,0 @@
|
||||
@startuml
|
||||
|
||||
() "br-pub (Public Bridge)" as brpub
|
||||
|
||||
note bottom of brpub
|
||||
This bridge is always created on boot, and is either
|
||||
connected to the zos bridge (in single nic setup).
|
||||
or to the seond nic with public IPv6 (in dual nic setup)
|
||||
end note
|
||||
|
||||
|
||||
package "public namespace" {
|
||||
|
||||
[public\nmacvlan] as public
|
||||
public -down- brpub: macvlan
|
||||
note right of public
|
||||
- have a static mac generated from node id
|
||||
- set the ips as configured
|
||||
- set the default gateways as configured
|
||||
end note
|
||||
|
||||
note as global
|
||||
inside namespace
|
||||
- net.ipv6.conf.all.accept_ra = 2
|
||||
- net.ipv6.conf.all.accept_ra_defrtr = 1
|
||||
end note
|
||||
}
|
||||
|
||||
@enduml
|
@@ -1,16 +0,0 @@
|
||||
@startuml
|
||||
() eth
|
||||
[zos]
|
||||
eth -up- zos
|
||||
note left of zos
|
||||
bridge takes same mac address as eth
|
||||
(ipv6 is enabled on the bridge)
|
||||
- net.ipv6.conf.zos.disable_ipv6 = 0
|
||||
end note
|
||||
note left of eth
|
||||
disable ipv6 on interface:
|
||||
(ipv6 is disabled on the nic)
|
||||
- net.ipv6.conf.<eth>.disable_ipv6 = 1
|
||||
- net.ipv6.conf.all.forwarding = 0
|
||||
end note
|
||||
@enduml
|
@@ -1,25 +0,0 @@
|
||||
# Yggdrasil integration in 0-OS
|
||||
|
||||
Since day one, 0-OS v2 networking has been design around IPv6. The goal was avoid having to deal with exhausted IPV4 address and be ready for the future.
|
||||
|
||||
While this decision made sense on the long term, it pose trouble on the short term for farmer that only have access to ipv4 and are unable to ask for an upgrade to their IPS.
|
||||
|
||||
In order to allow these ipv4 only nodes to join the grid, an other overlay network has to be created between all the nodes. To achieve this, Yggdrasil has been selected.
|
||||
|
||||
## Yggdrasil
|
||||
|
||||
[Yggdrasil network project](https://yggdrasil-network.github.io/) has been selected to be integrated into 0-OS. All 0-OS node will runs an yggdrasil daemon which means all 0-OS nodes can now communicate over the yggdrasil network. The yggdrasil integration is an experiment planned in multiple phase:
|
||||
|
||||
Phase 1: Allow 0-DB container to be exposed over yggdrasil network. Implemented in v0.3.5
|
||||
Phase 2: Allow containers to request an interface with an yggdrasil IP address.
|
||||
|
||||
## networkd bootstrap
|
||||
|
||||
When booting, networkd will wait for 2 minute to receive an IPv6 address through router advertisement for it's `npub6` interface in the ndmz network namspace.
|
||||
If after 2 minutes, no IPv6 is received, networkd will consider the node to be an IPv4 only nodes, switch to this mode and continue booting.
|
||||
|
||||
### 0-DB containers
|
||||
|
||||
For ipv4 only nodes, the 0-DB container will be exposed on top an yggdrasil IPv6 address. Since all the 0-OS node will also run yggdrasil, these 0-DB container will always be reachable from any container in the grid.
|
||||
|
||||
For dual stack nodes, the 0-DB container will also get an yggdrasil IP in addition to the already present public IPv6.
|
@@ -1,46 +0,0 @@
|
||||
# Network module
|
||||
|
||||
## ZBus
|
||||
|
||||
Network module is available on zbus over the following channel
|
||||
|
||||
| module | object | version |
|
||||
|--------|--------|---------|
|
||||
| network|[network](#interface)| 0.0.1|
|
||||
|
||||
## Home Directory
|
||||
|
||||
network keeps some data in the following locations
|
||||
| directory | path|
|
||||
|----|---|
|
||||
| root| `/var/cache/modules/network`|
|
||||
|
||||
|
||||
## Interface
|
||||
|
||||
```go
|
||||
//Networker is the interface for the network module
|
||||
type Networker interface {
|
||||
// Create a new network resource
|
||||
CreateNR(Network) (string, error)
|
||||
// Delete a network resource
|
||||
DeleteNR(Network) error
|
||||
|
||||
// Join a network (with network id) will create a new isolated namespace
|
||||
// that is hooked to the network bridge with a veth pair, and assign it a
|
||||
// new IP from the network resource range. The method return the new namespace
|
||||
// name.
|
||||
// The member name specifies the name of the member, and must be unique
|
||||
// The NetID is the network id to join
|
||||
Join(networkdID NetID, containerID string, addrs []string) (join Member, err error)
|
||||
|
||||
// ZDBPrepare creates a network namespace with a macvlan interface into it
|
||||
// to allow the 0-db container to be publicly accessible
|
||||
// it retusn the name of the network namespace created
|
||||
ZDBPrepare() (string, error)
|
||||
|
||||
// Addrs return the IP addresses of interface
|
||||
// if the interface is in a network namespace netns needs to be not empty
|
||||
Addrs(iface string, netns string) ([]net.IP, error)
|
||||
}
|
||||
```
|
@@ -1,50 +0,0 @@
|
||||
<h1> Node Module</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Zbus](#zbus)
|
||||
- [Example](#example)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
This module is responsible of registering the node on the grid, and handling of grid events. The node daemon broadcast the intended events on zbus for other modules that are interested in those events.
|
||||
|
||||
The node also provide zbus interfaces to query some of the node information.
|
||||
|
||||
## Zbus
|
||||
|
||||
Node module is available on [zbus](https://github.com/threefoldtech/zbus) over the following channel
|
||||
|
||||
| module | object | version |
|
||||
|--------|--------|---------|
|
||||
|host |host| 0.0.1
|
||||
|system |system| 0.0.1
|
||||
|events |events| 0.0.1
|
||||
|
||||
## Example
|
||||
|
||||
```go
|
||||
|
||||
//SystemMonitor interface (provided by noded)
|
||||
type SystemMonitor interface {
|
||||
NodeID() uint32
|
||||
Memory(ctx context.Context) <-chan VirtualMemoryStat
|
||||
CPU(ctx context.Context) <-chan TimesStat
|
||||
Disks(ctx context.Context) <-chan DisksIOCountersStat
|
||||
Nics(ctx context.Context) <-chan NicsIOCounterStat
|
||||
}
|
||||
|
||||
// HostMonitor interface (provided by noded)
|
||||
type HostMonitor interface {
|
||||
Uptime(ctx context.Context) <-chan time.Duration
|
||||
}
|
||||
|
||||
// Events interface
|
||||
type Events interface {
|
||||
PublicConfigEvent(ctx context.Context) <-chan PublicConfigEvent
|
||||
ContractCancelledEvent(ctx context.Context) <-chan ContractCancelledEvent
|
||||
}
|
||||
```
|
@@ -1,35 +0,0 @@
|
||||
<h1>Provision Module</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [ZBus](#zbus)
|
||||
- [Introduction](#introduction)
|
||||
- [Supported workload](#supported-workload)
|
||||
|
||||
|
||||
***
|
||||
|
||||
## ZBus
|
||||
|
||||
This module is autonomous module and is not reachable over `zbus`.
|
||||
|
||||
## Introduction
|
||||
|
||||
This module is responsible to provision/decommission workload on the node.
|
||||
|
||||
It accepts new deployment over `rmb` and tries to bring them to reality by running a series of provisioning workflows based on the workload `type`.
|
||||
|
||||
`provisiond` knows about all available daemons and it contacts them over `zbus` to ask for the needed services. The pull everything together and update the deployment with the workload state.
|
||||
|
||||
If node was restarted, `provisiond` tries to bring all active workloads back to original state.
|
||||
## Supported workload
|
||||
|
||||
0-OS currently support 8 type of workloads:
|
||||
- network
|
||||
- `zmachine` (virtual machine)
|
||||
- `zmount` (disk): usable only by a `zmachine`
|
||||
- `public-ip` (v4 and/or v6): usable only by a `zmachine`
|
||||
- [`zdb`](https://github.com/threefoldtech/0-DB) `namespace`
|
||||
- [`qsfs`](https://github.com/threefoldtech/quantum-storage)
|
||||
- `zlogs`
|
||||
- `gateway`
|
@@ -1,153 +0,0 @@
|
||||
<h1> Storage Module</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [ZBus](#zbus)
|
||||
- [Overview](#overview)
|
||||
- [List of sub-modules](#list-of-sub-modules)
|
||||
- [On Node Booting](#on-node-booting)
|
||||
- [zinit unit](#zinit-unit)
|
||||
- [Interface](#interface)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
This module is responsible to manage everything related with storage.
|
||||
|
||||
## ZBus
|
||||
|
||||
Storage module is available on zbus over the following channel
|
||||
|
||||
| module | object | version |
|
||||
|--------|--------|---------|
|
||||
| storage|[storage](#interface)| 0.0.1|
|
||||
|
||||
|
||||
## Overview
|
||||
|
||||
On start, storaged holds ownership of all node disks, and it separate it into 2 different sets:
|
||||
|
||||
- SSD Storage: For each ssd disk available, a storage pool of type SSD is created
|
||||
- HDD Storage: For each HDD disk available, a storage pool of type HDD is created
|
||||
|
||||
|
||||
Then `storaged` can provide the following storage primitives:
|
||||
- `subvolume`: (with quota). The btrfs subvolume can be used by used by `flistd` to support read-write operations on flists. Hence it can be used as rootfs for containers and VMs. This storage primitive is only supported on `ssd` pools.
|
||||
- On boot, storaged will always create a permanent subvolume with id `zos-cache` (of 100G) which will be used by the system to persist state and to hold cache of downloaded files.
|
||||
- `vdisk`: Virtual disk that can be attached to virtual machines. this is only possible on `ssd` pools.
|
||||
- `device`: that is a full disk that gets allocated and used by a single `0-db` service. Note that a single 0-db instance can serve multiple zdb namespaces for multiple users. This is only possible for on `hdd` pools.
|
||||
|
||||
You already can tell that ZOS can work fine with no HDD (it will not be able to server zdb workloads though), but not without SSD. Hence a zos with no SSD will never register on the grid.
|
||||
|
||||
## List of sub-modules
|
||||
|
||||
- disks
|
||||
- 0-db
|
||||
- booting
|
||||
|
||||
## On Node Booting
|
||||
|
||||
When the module boots:
|
||||
|
||||
- Make sure to mount all available pools
|
||||
- Scan available disks that are not used by any pool and create new pools on those disks. (all pools now are created with `RaidSingle` policy)
|
||||
- Try to find and mount a cache sub-volume under /var/cache.
|
||||
- If no cache sub-volume is available a new one is created and then mounted.
|
||||
|
||||
### zinit unit
|
||||
|
||||
The zinit unit file of the module specify the command line, test command, and the order where the services need to be booted.
|
||||
|
||||
Storage module is a dependency for almost all other system modules, hence it has high boot presidency (calculated on boot) by zinit based on the configuration.
|
||||
|
||||
The storage module is only considered running, if (and only if) the /var/cache is ready
|
||||
|
||||
```yaml
|
||||
exec: storaged
|
||||
test: mountpoint /var/cache
|
||||
```
|
||||
|
||||
### Interface
|
||||
|
||||
```go
|
||||
|
||||
// StorageModule is the storage subsystem interface
|
||||
// this should allow you to work with the following types of storage medium
|
||||
// - full disks (device) (these are used by zdb)
|
||||
// - subvolumes these are used as a read-write layers for 0-fs mounts
|
||||
// - vdisks are used by zmachines
|
||||
// this works as following:
|
||||
// a storage module maintains a list of ALL disks on the system
|
||||
// separated in 2 sets of pools (SSDs, and HDDs)
|
||||
// ssd pools can only be used for
|
||||
// - subvolumes
|
||||
// - vdisks
|
||||
// hdd pools are only used by zdb as one disk
|
||||
type StorageModule interface {
|
||||
// Cache method return information about zos cache volume
|
||||
Cache() (Volume, error)
|
||||
|
||||
// Total gives the total amount of storage available for a device type
|
||||
Total(kind DeviceType) (uint64, error)
|
||||
// BrokenPools lists the broken storage pools that have been detected
|
||||
BrokenPools() []BrokenPool
|
||||
// BrokenDevices lists the broken devices that have been detected
|
||||
BrokenDevices() []BrokenDevice
|
||||
//Monitor returns stats stream about pools
|
||||
Monitor(ctx context.Context) <-chan PoolsStats
|
||||
|
||||
// Volume management
|
||||
|
||||
// VolumeCreate creates a new volume
|
||||
VolumeCreate(name string, size gridtypes.Unit) (Volume, error)
|
||||
|
||||
// VolumeUpdate updates the size of an existing volume
|
||||
VolumeUpdate(name string, size gridtypes.Unit) error
|
||||
|
||||
// VolumeLookup return volume information for given name
|
||||
VolumeLookup(name string) (Volume, error)
|
||||
|
||||
// VolumeDelete deletes a volume by name
|
||||
VolumeDelete(name string) error
|
||||
|
||||
// VolumeList list all volumes
|
||||
VolumeList() ([]Volume, error)
|
||||
|
||||
// Virtual disk management
|
||||
|
||||
// DiskCreate creates a virtual disk given name and size
|
||||
DiskCreate(name string, size gridtypes.Unit) (VDisk, error)
|
||||
|
||||
// DiskResize resizes the disk to given size
|
||||
DiskResize(name string, size gridtypes.Unit) (VDisk, error)
|
||||
|
||||
// DiskWrite writes the given raw image to disk
|
||||
DiskWrite(name string, image string) error
|
||||
|
||||
// DiskFormat makes sure disk has filesystem, if it already formatted nothing happens
|
||||
DiskFormat(name string) error
|
||||
|
||||
// DiskLookup looks up vdisk by name
|
||||
DiskLookup(name string) (VDisk, error)
|
||||
|
||||
// DiskExists checks if disk exists
|
||||
DiskExists(name string) bool
|
||||
|
||||
// DiskDelete deletes a disk
|
||||
DiskDelete(name string) error
|
||||
|
||||
DiskList() ([]VDisk, error)
|
||||
// Device management
|
||||
|
||||
//Devices list all "allocated" devices
|
||||
Devices() ([]Device, error)
|
||||
|
||||
// DeviceAllocate allocates a new device (formats and give a new ID)
|
||||
DeviceAllocate(min gridtypes.Unit) (Device, error)
|
||||
|
||||
// DeviceLookup inspects a previously allocated device
|
||||
DeviceLookup(name string) (Device, error)
|
||||
}
|
||||
```
|
@@ -1,66 +0,0 @@
|
||||
<h1>VMD Module</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [ZBus](#zbus)
|
||||
- [Home Directory](#home-directory)
|
||||
- [Introduction](#introduction)
|
||||
- [zinit unit](#zinit-unit)
|
||||
- [Interface](#interface)
|
||||
|
||||
***
|
||||
|
||||
## ZBus
|
||||
|
||||
Storage module is available on zbus over the following channel
|
||||
|
||||
| module | object | version |
|
||||
|--------|--------|---------|
|
||||
| vmd|[vmd](#interface)| 0.0.1|
|
||||
|
||||
## Home Directory
|
||||
|
||||
contd keeps some data in the following locations
|
||||
| directory | path|
|
||||
|----|---|
|
||||
| root| `/var/cache/modules/containerd`|
|
||||
|
||||
## Introduction
|
||||
|
||||
The vmd module, manages all virtual machines processes, it provide the interface to, create, inspect, and delete virtual machines. It also monitor the vms to make sure they are re-spawned if crashed. Internally it uses `cloud-hypervisor` to start the Vm processes.
|
||||
|
||||
It also provide the interface to configure VM logs streamers.
|
||||
|
||||
### zinit unit
|
||||
|
||||
`contd` must run after containerd is running, and the node boot process is complete. Since it doesn't keep state, no dependency on `stroaged` is needed
|
||||
|
||||
```yaml
|
||||
exec: vmd --broker unix:///var/run/redis.sock
|
||||
after:
|
||||
- boot
|
||||
- networkd
|
||||
```
|
||||
|
||||
## Interface
|
||||
|
||||
```go
|
||||
|
||||
// VMModule defines the virtual machine module interface
|
||||
type VMModule interface {
|
||||
Run(vm VM) error
|
||||
Inspect(name string) (VMInfo, error)
|
||||
Delete(name string) error
|
||||
Exists(name string) bool
|
||||
Logs(name string) (string, error)
|
||||
List() ([]string, error)
|
||||
Metrics() (MachineMetrics, error)
|
||||
|
||||
// VM Log streams
|
||||
|
||||
// StreamCreate creates a stream for vm `name`
|
||||
StreamCreate(name string, stream Stream) error
|
||||
// delete stream by stream id.
|
||||
StreamDelete(id string) error
|
||||
}
|
||||
```
|
@@ -1,273 +0,0 @@
|
||||
<h1>API</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Deployments](#deployments)
|
||||
- [Deploy](#deploy)
|
||||
- [Update](#update)
|
||||
- [Get](#get)
|
||||
- [Changes](#changes)
|
||||
- [Delete](#delete)
|
||||
- [Statistics](#statistics)
|
||||
- [Storage](#storage)
|
||||
- [List separate pools with capacity](#list-separate-pools-with-capacity)
|
||||
- [Network](#network)
|
||||
- [List Wireguard Ports](#list-wireguard-ports)
|
||||
- [Supports IPV6](#supports-ipv6)
|
||||
- [List Public Interfaces](#list-public-interfaces)
|
||||
- [List Public IPs](#list-public-ips)
|
||||
- [Get Public Config](#get-public-config)
|
||||
- [Admin](#admin)
|
||||
- [List Physical Interfaces](#list-physical-interfaces)
|
||||
- [Get Public Exit NIC](#get-public-exit-nic)
|
||||
- [Set Public Exit NIC](#set-public-exit-nic)
|
||||
- [System](#system)
|
||||
- [Version](#version)
|
||||
- [DMI](#dmi)
|
||||
- [Hypervisor](#hypervisor)
|
||||
- [GPUs](#gpus)
|
||||
- [List Gpus](#list-gpus)
|
||||
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
This document should list all the actions available on the node public API. which is available over [RMB](https://github.com/threefoldtech/rmb-rs)
|
||||
|
||||
The node is always reachable over the node twin id as per the node object on tfchain. Once node twin is known, a [client](https://github.com/threefoldtech/zos/blob/main/client/node.go) can be initiated and used to talk to the node.
|
||||
|
||||
## Deployments
|
||||
|
||||
### Deploy
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.deployment.deploy` | [Deployment](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/deployment.go)|-|
|
||||
|
||||
Deployment need to have valid signature, the contract must exist on chain with the correct contract hash as the deployment.
|
||||
|
||||
### Update
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.deployment.update` | [Deployment](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/deployment.go)|-|
|
||||
|
||||
The update call, will update (modify) an already existing deployment with new definition. The deployment must already exist on the node, the contract must have the new hash as the provided deployment, plus valid versions.
|
||||
|
||||
> TODO: need more details over the deployment update calls how to handle the version
|
||||
|
||||
### Get
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.deployment.get` | `{contract_id: <id>}`|[Deployment](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/deployment.go)|
|
||||
|
||||
### Changes
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.deployment.changes` | `{contract_id: <id>}`| `[]Workloads` |
|
||||
|
||||
Where:
|
||||
|
||||
- [workload](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/workload.go)
|
||||
|
||||
The list will contain all deployment workloads (changes) means a workload can (will) appear
|
||||
multiple times in this list for each time a workload state will change.
|
||||
|
||||
This means a workload will first appear in `init` state, then next time it will show the state change (with time) to the next state which can be success or failure, and so on.
|
||||
This will happen for each workload in the deployment.
|
||||
|
||||
### Delete
|
||||
>
|
||||
> You probably never need to call this command yourself, the node will delete the deployment once the contract is cancelled on the chain.
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.deployment.get` | `{contract_id: <id>}`|-|
|
||||
|
||||
## Statistics
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.statistics.get` | - |`{total: Capacity, used: Capacity, system: Capacity}`|
|
||||
|
||||
Where:
|
||||
|
||||
```json
|
||||
Capacity {
|
||||
"cur": "uint64",
|
||||
"sru": "bytes",
|
||||
"hru": "bytes",
|
||||
"mru": "bytes",
|
||||
"ipv4u": "unit64",
|
||||
}
|
||||
```
|
||||
|
||||
> Note that, `used` capacity equal the full workload reserved capacity PLUS the system reserved capacity
|
||||
so `used = user_used + system`, while `system` is only the amount of resourced reserved by `zos` itself
|
||||
|
||||
## Storage
|
||||
|
||||
### List separate pools with capacity
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.storage.pools` | - |`[]Pool`|
|
||||
|
||||
List all node pools with their types, size and used space
|
||||
where
|
||||
|
||||
```json
|
||||
Pool {
|
||||
"name": "pool-id",
|
||||
"type": "(ssd|hdd)",
|
||||
"size": <size in bytes>,
|
||||
"used": <used in bytes>
|
||||
}
|
||||
```
|
||||
|
||||
## Network
|
||||
|
||||
### List Wireguard Ports
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.network.list_wg_ports` | - |`[]uint16`|
|
||||
|
||||
List all `reserved` ports on the node that can't be used for network wireguard. A user then need to find a free port that is not in this list to use for his network
|
||||
|
||||
### Supports IPV6
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.network.has_ipv6` | - |`bool`|
|
||||
|
||||
### List Public Interfaces
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.network.interfaces` | - |`map[string][]IP` |
|
||||
|
||||
list of node IPs this is a public information. Mainly to show the node yggdrasil IP and the `zos` interface.
|
||||
|
||||
### List Public IPs
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.network.list_public_ips` | - |`[]IP` |
|
||||
|
||||
List all user deployed public IPs that are served by this node.
|
||||
|
||||
### Get Public Config
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.network.public_config_get` | - |`PublicConfig` |
|
||||
|
||||
Where
|
||||
|
||||
```json
|
||||
PublicConfig {
|
||||
"type": "string", // always vlan
|
||||
"ipv4": "CIDR",
|
||||
"ipv6": "CIDR",
|
||||
"gw4": "IP",
|
||||
"gw6": "IP",
|
||||
"domain": "string",
|
||||
}
|
||||
```
|
||||
|
||||
returns the node public config or error if not set. If a node has public config
|
||||
it means it can act like an access node to user private networks
|
||||
|
||||
## Admin
|
||||
|
||||
The next set of commands are ONLY possible to be called by the `farmer` only.
|
||||
|
||||
### List Physical Interfaces
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.network.admin.interfaces` | - |`map[string]Interface` |
|
||||
|
||||
Where
|
||||
|
||||
```json
|
||||
Interface {
|
||||
"ips": ["ip"],
|
||||
"mac": "mac-address",
|
||||
}
|
||||
```
|
||||
|
||||
Lists ALL node physical interfaces.
|
||||
Those interfaces then can be used as an input to `set_public_nic`
|
||||
|
||||
### Get Public Exit NIC
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.network.admin.get_public_nic` | - |`ExitDevice` |
|
||||
|
||||
Where
|
||||
|
||||
```json
|
||||
ExitInterface {
|
||||
"is_single": "bool",
|
||||
"is_dual": "bool",
|
||||
"dual_interface": "name",
|
||||
}
|
||||
```
|
||||
|
||||
returns the interface used by public traffic (for user workloads)
|
||||
|
||||
### Set Public Exit NIC
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.network.admin.set_public_nic` | `name` |- |
|
||||
|
||||
name must be one of (free) names returned by `zos.network.admin.interfaces`
|
||||
|
||||
## System
|
||||
|
||||
### Version
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.system.version` | - | `{zos: string, zinit: string}` |
|
||||
|
||||
### DMI
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.system.dmi` | - | [DMI](https://github.com/threefoldtech/zos/blob/main/pkg/capacity/dmi/dmi.go) |
|
||||
|
||||
### Hypervisor
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.system.hypervisor` | - | `string` |
|
||||
|
||||
## GPUs
|
||||
|
||||
### List Gpus
|
||||
|
||||
| command |body| return|
|
||||
|---|---|---|
|
||||
| `zos.gpu.list` | - | `[]GPU` |
|
||||
|
||||
Where
|
||||
|
||||
```json
|
||||
GPU {
|
||||
"id": "string"
|
||||
"vendor": "string"
|
||||
"device": "string",
|
||||
"contract": "uint64",
|
||||
}
|
||||
```
|
||||
|
||||
Lists all available node GPUs if exist
|
@@ -1,5 +0,0 @@
|
||||
# `gateway-fqdn-proxy` type
|
||||
|
||||
This create a proxy with the given fqdn to the given backends. In this case the user then must configure his dns server (i.e name.com) to point to the correct node public IP.
|
||||
|
||||
Full name-proxy workload data is defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/gw_fqdn.go)
|
@@ -1,5 +0,0 @@
|
||||
# `gateway-name-proxy` type
|
||||
|
||||
This create a proxy with the given name to the given backends. The `name` of the proxy must be owned by a name contract on the grid. The idea is that a user can reserve a name (i.e `example`). Later he can deploy a gateway work load with name `example` on any gateway node that points to specified backends. The name then is prefix by the gateway name. For example if the gateway domain is `gent0.freefarm.com` then your full QFDN is goint to be called `example.gen0.freefarm.com`
|
||||
|
||||
Full name-proxy workload data is defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/gw_name.go)
|
@@ -1,11 +0,0 @@
|
||||
# `ip` type
|
||||
The IP workload type reserves an IP from the available contract IPs list. Which means on contract creation the user must specify number of public IPs it needs to use. The contract then will allocate this number of IPs from the farm and will kept on the contract.
|
||||
|
||||
When the user then add the IP workload to the deployment associated with this contract, each IP workload will pick and link to one IP from the contract.
|
||||
|
||||
In minimal form, `IP` workload does not require any data. But in reality it has 2 flags to pick which kind of public IP do you want
|
||||
|
||||
- `ipv4` (`bool`): pick one from the contract public Ipv4
|
||||
- `ipv6` (`bool`): pick an IPv6 over SLAAC. Ipv6 are not reserved with a contract. They are basically free if the farm infrastructure allows Ipv6 over SLAAC.
|
||||
|
||||
Full `IP` workload definition can be found [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/ipv4.go)
|
@@ -1,187 +0,0 @@
|
||||
<h1> ZOS Manual</h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Farm? Network? What are these?](#farm-network-what-are-these)
|
||||
- [Creating a farm](#creating-a-farm)
|
||||
- [Interaction](#interaction)
|
||||
- [Deployment](#deployment)
|
||||
- [Workload](#workload)
|
||||
- [Types](#types)
|
||||
- [API](#api)
|
||||
- [Raid Controller Configuration](#raid-controller-configuration)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
This document explain the usage of `ZOS`. `ZOS` usually pronounced (zero OS), got it's name from the idea of zero configuration. Since after the initial `minimal` configuration which only include which `farm` to join and what `network` (`development`, `testing`, or `production`) the owner of the node does not has to do anything more, and the node work fully autonomous.
|
||||
|
||||
The farmer himself cannot control the node, or access it by any mean. The only way you can interact with a node is via it's public API.
|
||||
|
||||
## Farm? Network? What are these?
|
||||
|
||||
Well, `zos` is built to allow people to run `workloads` around the world this simply is enabled by allowing 3rd party data-centers to run `ZOS` on their hardware. Then a user can then find any nearby `farm` (is what we call a cluster of nodes that belong to the same `farmer`) and then they can choose to deploy capacity on that node/farm. A `farm` can consist of one or more nodes.
|
||||
|
||||
So what is `network`.Well, to allow developers to build and `zos` itself and make it available during the early stages of development for testers and other enthusiastic people to try it out. To allow this we created 3 `networks`
|
||||
- `development`: This is used mainly by developers to test their work. This is still available for users to deploy their capacity on (for really really cheap prices), but at the same time there is no grantee that it's stable or that data loss or corruption will happen. Also the entire network can be reset with no heads up.
|
||||
- `testing`: Once new features are developed and well tested on `development` network they are released to `testing` environment. This also available for users to use with a slightly higher price than `development` network. But it's much more stable. In theory this network is stable, there should be no resets of the network, issues on this network usually are not fatal, but partial data loss can still occurs.
|
||||
- `production`: Well, as the name indicates this is the most stable network (also full price) once new features are fully tested on `testing` network they are released on `production`.
|
||||
|
||||
## Creating a farm
|
||||
|
||||
While this is outside the scope of this document here you are a [link](https://library.threefold.me/info/manual/#/manual__create_farm)
|
||||
|
||||
## Interaction
|
||||
|
||||
`ZOS` provide a simple `API` that can be used to:
|
||||
- Query node runtime information
|
||||
- Network information
|
||||
- Free `wireguard` ports
|
||||
- Get public configuration
|
||||
- System version
|
||||
- Other (check client for details)
|
||||
- Deployment management (more on that later)
|
||||
- Create
|
||||
- Update
|
||||
- Delete
|
||||
|
||||
Note that `zos` API is available over `rmb` protocol. `rmb` which means `reliable message bus` is a simple messaging protocol that enables peer to peer communication over `yggdrasil` network. Please check [`rmb`](https://github.com/threefoldtech/rmb) for more information.
|
||||
|
||||
Simply put, `RMB` allows 2 entities two communicate securely knowing only their `id` an id is linked to a public key on the blockchain. Hence messages are verifiable via a signature.
|
||||
|
||||
To be able to contact the node directly you need to run
|
||||
- `yggdrasil`
|
||||
- `rmb` (correctly configured)
|
||||
|
||||
Once you have those running you can now contact the node over `rmb`. For a reference implementation (function names and parameters) please refer to [RMB documentation](../../rmb/rmb_toc.md)
|
||||
|
||||
Here is a rough example of how low level creation of a deployment is done.
|
||||
|
||||
```go
|
||||
cl, err := rmb.Default()
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
```
|
||||
then create an instance of the node client
|
||||
```go
|
||||
node := client.NewNodeClient(NodeTwinID, cl)
|
||||
```
|
||||
define your deployment object
|
||||
```go
|
||||
dl := gridtypes.Deployment{
|
||||
Version: Version,
|
||||
TwinID: Twin, //LocalTwin,
|
||||
// this contract id must match the one on substrate
|
||||
Workloads: []gridtypes.Workload{
|
||||
network(), // network workload definition
|
||||
zmount(), // zmount workload definition
|
||||
publicip(), // public ip definition
|
||||
zmachine(), // zmachine definition
|
||||
},
|
||||
SignatureRequirement: gridtypes.SignatureRequirement{
|
||||
WeightRequired: 1,
|
||||
Requests: []gridtypes.SignatureRequest{
|
||||
{
|
||||
TwinID: Twin,
|
||||
Weight: 1,
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
compute hash
|
||||
```go
|
||||
hash, err := dl.ChallengeHash()
|
||||
if err != nil {
|
||||
panic("failed to create hash")
|
||||
}
|
||||
fmt.Printf("Hash: %x\n", hash)
|
||||
```
|
||||
create the contract on `substrate` and get the `contract id` then you can link the deployment to the contract, then send to the node.
|
||||
|
||||
```go
|
||||
dl.ContractID = 11 // from substrate
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||
defer cancel()
|
||||
err = node.DeploymentDeploy(ctx, dl)
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
```
|
||||
|
||||
Once the node receives the deployment. It will then fetch the contract (using the contract id) from the node recompute the deployment hash and compare with the one set on the contract. If matches, the node proceeds to process the deployment.
|
||||
|
||||
## Deployment
|
||||
|
||||
A deployment is a set of workloads that are contextually related. Workloads in the same deployment can reference to other workloads in the same deployment. But can't be referenced from another deployment. Well, except the network workload which can be referenced from a different deployment as long it belongs to the same user.
|
||||
|
||||
Workloads has unique IDs (per deployment) that are set by the user, hence he can create multiple workloads then reference to them with the given IDs (`names`)
|
||||
|
||||
For example, a deployment can define
|
||||
- A private network with id `net`
|
||||
- A disk with id `data`
|
||||
- A public IP with id `ip`
|
||||
- A container that uses:
|
||||
- The container can mount the disk like `mount: {data: /mount/path}`.
|
||||
- The container can get assign the public IP to itself like by referencing the IP with id `ip`.
|
||||
- etc.
|
||||
|
||||
### Workload
|
||||
Each workload has a type which is associated with some data. So minimal definition of a workload contains:
|
||||
- `name`: unique per deployment (id)
|
||||
- `type`: workload type
|
||||
- `data`: workload data that is proper for the selected type.
|
||||
|
||||
```go
|
||||
|
||||
// Workload struct
|
||||
type Workload struct {
|
||||
// Version is version of reservation object. On deployment creation, version must be 0
|
||||
// then only workloads that need to be updated must match the version of the deployment object.
|
||||
// if a deployment update message is sent to a node it does the following:
|
||||
// - validate deployment version
|
||||
// - check workloads list, if a version is not matching the new deployment version, the workload is untouched
|
||||
// - if a workload version is same as deployment, the workload is "updated"
|
||||
// - if a workload is removed, the workload is deleted.
|
||||
Version uint32 `json:"version"`
|
||||
//Name is unique workload name per deployment (required)
|
||||
Name Name `json:"name"`
|
||||
// Type of the reservation (container, zdb, vm, etc...)
|
||||
Type WorkloadType `json:"type"`
|
||||
// Data is the reservation type arguments.
|
||||
Data json.RawMessage `json:"data"`
|
||||
// Metadata is user specific meta attached to deployment, can be used to link this
|
||||
// deployment to other external systems for automation
|
||||
Metadata string `json:"metadata"`
|
||||
//Description human readale description of the workload
|
||||
Description string `json:"description"`
|
||||
// Result of reservation, set by the node
|
||||
Result Result `json:"result"`
|
||||
}
|
||||
```
|
||||
|
||||
### Types
|
||||
- Virtual machine related
|
||||
- [`network`](./workload_types.md#network-type)
|
||||
- [`ip`](./workload_types.md#ip-type)
|
||||
- [`zmount`](./workload_types.md#zmount-type)
|
||||
- [`zmachine`](./workload_types.md#zmachine-type)
|
||||
- [`zlogs`](./workload_types.md#zlogs-type)
|
||||
- Storage related
|
||||
- [`zdb`](./workload_types.md#zdb-type)
|
||||
- [`qsfs`](./workload_types.md#qsfs-type)
|
||||
- Gateway related
|
||||
- [`gateway-name-proxy`](./workload_types.md#gateway-name-proxy-type)
|
||||
- [`gateway-fqdn-proxy`](./workload_types.md#gateway-fqdn-proxy-type)
|
||||
|
||||
### API
|
||||
Node is always connected to the RMB network with the node `twin`. Means the node is always reachable over RMB with the node `twin-id` as an address.
|
||||
|
||||
The [node client](https://github.com/threefoldtech/zos/blob/main/client/node.go) should have a complete list of all available functions. documentations of the API can be found [here](./api.md)
|
||||
|
||||
## Raid Controller Configuration
|
||||
|
||||
0-OS goal is to expose raw capacity. So it is best to always try to give it access to the most raw access to the disks. In case of raid controllers, the best is to try to set it up in [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD) mode if available.
|
@@ -1,14 +0,0 @@
|
||||
# `network` type
|
||||
Private network can span multiple nodes at the same time. Which means workloads (`VMs`) that live (on different node) but part of the same virtual network can still reach each other over this `private` network.
|
||||
|
||||
If one (or more) nodes are `public access nodes` you can also add your personal laptop to the nodes and be able to reach your `VMs` over `wireguard` network.
|
||||
|
||||
In the simplest form a network workload consists of:
|
||||
- network range
|
||||
- sub-range available on this node
|
||||
- private key
|
||||
- list of peers
|
||||
- each peer has public key
|
||||
- sub-range
|
||||
|
||||
Full network definition can be found [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/network.go)
|
@@ -1,5 +0,0 @@
|
||||
# `qsfs` type
|
||||
|
||||
`qsfs` short for `quantum safe file system` is a FUSE filesystem which aim to be able to support unlimited local storage with remote backend for offload and backup which cannot be broke even by a quantum computer. Please read about it [here](https://github.com/threefoldtech/quantum-storage)
|
||||
|
||||
To create a `qsfs` workload you need to provide the workload type as [here](https://github.com/threefoldtech/zos/blob/main/pkg/qsfsd/qsfs.go)
|
@@ -1,108 +0,0 @@
|
||||
<h1> Workload Types </h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Virtual Machine](#virtual-machine)
|
||||
- [`network` type](#network-type)
|
||||
- [`ip` type](#ip-type)
|
||||
- [`zmount` type](#zmount-type)
|
||||
- [`zmachine` type](#zmachine-type)
|
||||
- [Building your `flist`](#building-your-flist)
|
||||
- [`zlogs` type](#zlogs-type)
|
||||
- [Storage](#storage)
|
||||
- [`zdb` type](#zdb-type)
|
||||
- [`qsfs` type](#qsfs-type)
|
||||
- [Gateway](#gateway)
|
||||
- [`gateway-name-proxy` type](#gateway-name-proxy-type)
|
||||
- [`gateway-fqdn-proxy` type](#gateway-fqdn-proxy-type)
|
||||
|
||||
## Introduction
|
||||
|
||||
Each workload has a type which is associated with some data. We present here the different types of workload associated with Zero-OS.
|
||||
|
||||
## Virtual Machine
|
||||
|
||||
### `network` type
|
||||
Private network can span multiple nodes at the same time. Which means workloads (`VMs`) that live (on different node) but part of the same virtual network can still reach each other over this `private` network.
|
||||
|
||||
If one (or more) nodes are `public access nodes` you can also add your personal laptop to the nodes and be able to reach your `VMs` over `wireguard` network.
|
||||
|
||||
In the simplest form a network workload consists of:
|
||||
- network range
|
||||
- sub-range available on this node
|
||||
- private key
|
||||
- list of peers
|
||||
- each peer has public key
|
||||
- sub-range
|
||||
|
||||
Full network definition can be found [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/network.go)
|
||||
|
||||
### `ip` type
|
||||
The IP workload type reserves an IP from the available contract IPs list. Which means on contract creation the user must specify number of public IPs it needs to use. The contract then will allocate this number of IPs from the farm and will kept on the contract.
|
||||
|
||||
When the user then add the IP workload to the deployment associated with this contract, each IP workload will pick and link to one IP from the contract.
|
||||
|
||||
In minimal form, `IP` workload does not require any data. But in reality it has 2 flags to pick which kind of public IP do you want
|
||||
|
||||
- `ipv4` (`bool`): pick one from the contract public Ipv4
|
||||
- `ipv6` (`bool`): pick an IPv6 over SLAAC. Ipv6 are not reserved with a contract. They are basically free if the farm infrastructure allows Ipv6 over SLAAC.
|
||||
|
||||
Full `IP` workload definition can be found [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/ipv4.go)
|
||||
|
||||
### `zmount` type
|
||||
A `zmount` is a local disk that can be attached directly to a container or a virtual machine. `zmount` only require `size` as input as defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zmount.go) this workload type is only utilized via the `zmachine` workload.
|
||||
|
||||
### `zmachine` type
|
||||
|
||||
`zmachine` is a unified container/virtual machine type. This can be used to start a virtual machine on a `zos` node give the following:
|
||||
- `flist`, this what provide the base `vm` image or container image.
|
||||
- the `flist` content is what changes the `zmachine` mode. An `flist` built from a docker image or has files, or executable binaries will run in a container mode. `ZOS` will inject it's own `kernel+initramfs` to run the workload and kick start the defined `flist` `entrypoint`
|
||||
- private network to join (with assigned IP)
|
||||
- optional public `ipv4` or `ipv6`
|
||||
- optional disks. But at least one disk is required in case running `zmachine` in `vm` mode, which is used to hold the `vm` root image.
|
||||
|
||||
For more details on all parameters needed to run a `zmachine` please refer to [`zmachine` data](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zmachine.go)
|
||||
|
||||
#### Building your `flist`
|
||||
|
||||
Please refer to [this document](./manual.md) here about how to build an compatible `zmachine flist`
|
||||
|
||||
### `zlogs` type
|
||||
|
||||
Zlogs is a utility workload that allows you to stream `zmachine` logs to a remote location.
|
||||
|
||||
The `zlogs` workload needs to know what `zmachine` to stream logs of and also the `target` location to stream the logs to. `zlogs` uses internally the [`tailstream`](https://github.com/threefoldtech/tailstream) so it supports any streaming url that is supported by this utility.
|
||||
|
||||
`zlogs` workload runs inside the same private network as the `zmachine` instance. Which means zlogs can stream logs to other `zmachines` that is running inside the same private network (possibly on different nodes).
|
||||
|
||||
For example, you can run [`logagg`](https://github.com/threefoldtech/logagg) which is a web-socket server that can work with `tailstream` web-socket protocol.
|
||||
|
||||
Check `zlogs` configuration [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zlogs.go)
|
||||
|
||||
## Storage
|
||||
|
||||
### `zdb` type
|
||||
`zdb` is a storage primitives that gives you a persisted key value store over RESP protocol. Please check [`zdb` docs](https://github.com/threefoldtech/0-db)
|
||||
|
||||
Please check [here](https://github.com/threefoldtech/zos/blob/main/pkg/zdb/zdb.go) for workload data.
|
||||
|
||||
### `qsfs` type
|
||||
|
||||
`qsfs` short for `quantum safe file system` is a FUSE filesystem which aim to be able to support unlimited local storage with remote backend for offload and backup which cannot be broke even by a quantum computer. Please read about it [here](https://github.com/threefoldtech/quantum-storage)
|
||||
|
||||
To create a `qsfs` workload you need to provide the workload type as [here](https://github.com/threefoldtech/zos/blob/main/pkg/qsfsd/qsfs.go)
|
||||
|
||||
## Gateway
|
||||
|
||||
### `gateway-name-proxy` type
|
||||
|
||||
This create a proxy with the given name to the given backends. The `name` of the proxy must be owned by a name contract on the grid. The idea is that a user can reserve a name (i.e `example`). Later he can deploy a gateway work load with name `example` on any gateway node that points to specified backends. The name then is prefix by the gateway name. For example if the gateway domain is `gent0.freefarm.com` then your full QFDN is goint to be called `example.gen0.freefarm.com`
|
||||
|
||||
Full name-proxy workload data is defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/gw_name.go)
|
||||
|
||||
### `gateway-fqdn-proxy` type
|
||||
|
||||
This create a proxy with the given fqdn to the given backends. In this case the user then must configure his dns server (i.e name.com) to point to the correct node public IP.
|
||||
|
||||
Full name-proxy workload data is defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/gw_fqdn.go)
|
@@ -1,4 +0,0 @@
|
||||
# `zdb` type
|
||||
`zdb` is a storage primitives that gives you a persisted key value store over RESP protocol. Please check [`zdb` docs](https://github.com/threefoldtech/0-db)
|
||||
|
||||
Please check [here](https://github.com/threefoldtech/zos/blob/main/pkg/zdb/zdb.go) for workload data.
|
@@ -1,11 +0,0 @@
|
||||
# `zlogs` type
|
||||
|
||||
Zlogs is a utility workload that allows you to stream `zmachine` logs to a remote location.
|
||||
|
||||
The `zlogs` workload needs to know what `zmachine` to stream logs of and also the `target` location to stream the logs to. `zlogs` uses internally the [`tailstream`](https://github.com/threefoldtech/tailstream) so it supports any streaming url that is supported by this utility.
|
||||
|
||||
`zlogs` workload runs inside the same private network as the `zmachine` instance. Which means zlogs can stream logs to other `zmachines` that is running inside the same private network (possibly on different nodes).
|
||||
|
||||
For example, you can run [`logagg`](https://github.com/threefoldtech/logagg) which is a web-socket server that can work with `tailstream` web-socket protocol.
|
||||
|
||||
Check `zlogs` configuration [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zlogs.go)
|
@@ -1,14 +0,0 @@
|
||||
# Cloud console
|
||||
|
||||
- `cloud-console` is a tool to view machine logging and interact with the machine you have deployed
|
||||
- It always runs on the machine's private network ip and port number equla to `20000 +last octect` of machine private IP
|
||||
- For example if the machine ip is `10.20.2.2/24` this means
|
||||
- `cloud-console` is running on `10.20.2.1:20002`
|
||||
- For the cloud-console to run we need to start the cloud-hypervisor with option "--serial pty" instead of tty, this allows us to interact with the vm from another process `cloud-console` in our case
|
||||
- To be able to connect to the web console you should first start wireguard to connect to the private network
|
||||
|
||||
```
|
||||
wg-quick up wireguard.conf
|
||||
```
|
||||
|
||||
- Then go to your browser with the network router IP `10.20.2.1:20002`
|
@@ -1,13 +0,0 @@
|
||||
# `zmachine` type
|
||||
|
||||
`zmachine` is a unified container/virtual machine type. This can be used to start a virtual machine on a `zos` node give the following:
|
||||
- `flist`, this what provide the base `vm` image or container image.
|
||||
- the `flist` content is what changes the `zmachine` mode. An `flist` built from a docker image or has files, or executable binaries will run in a container mode. `ZOS` will inject it's own `kernel+initramfs` to run the workload and kick start the defined `flist` `entrypoint`
|
||||
- private network to join (with assigned IP)
|
||||
- optional public `ipv4` or `ipv6`
|
||||
- optional disks. But at least one disk is required in case running `zmachine` in `vm` mode, which is used to hold the `vm` root image.
|
||||
|
||||
For more details on all parameters needed to run a `zmachine` please refer to [`zmachine` data](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zmachine.go)
|
||||
|
||||
# Building your `flist`.
|
||||
Please refer to [this document](../manual.md) here about how to build an compatible `zmachine flist`
|
@@ -1,410 +0,0 @@
|
||||
# Zmachine
|
||||
|
||||
A `Zmachine` is an instance of virtual compute capacity. There are 2 kinds of Zmachines.
|
||||
One is a `VM`, standard in cloud environments. Next to this it can also be a `container`.
|
||||
On the Zos level, both of these are implemented as virtual machines. Depending on
|
||||
the context, it will be considered to be either a VM or a container. In either
|
||||
scenario, the `Zmachine` is started from an `Flist`.
|
||||
|
||||
> Note, both VM and Container on ZOS are actually served as Virtual Machines. The
|
||||
only difference is that if you are running in VM mode, you only need to provide a **raw**
|
||||
disk image (image.raw) in your flist.
|
||||
|
||||
## Container
|
||||
|
||||
A container is meant to host `microservice`. The `microservice` architecture generally
|
||||
dictates that each service should be run in it's own container (therefore providing
|
||||
a level of isolation), and communicate with other containers it depends on over the
|
||||
network.
|
||||
|
||||
Similar to docker. In Zos, a container is actually also run in a virtualized environment.
|
||||
Similar to containers, some setup is done on behalf of the user. After setup this is done,
|
||||
the users `entrypoint` is started.
|
||||
|
||||
It should be noted that a container has no control over the kernel
|
||||
used to run it, if this is required, a `VM` should be used instead. Furthermore,
|
||||
a container should ideally only have 1 process running. A container can be a single
|
||||
binary, or a complete filesystem. In general, the first should be preferred, and
|
||||
if you need the latter, it might be an indication that you actually want a `VM`.
|
||||
|
||||
For containers, the network setup will be created for you. Your init process can
|
||||
assume that it will be fully set up (according to the config you provided) by the
|
||||
time it is started. Mountpoints will also be setup for you. The environment variables
|
||||
passed will be available inside the container.
|
||||
|
||||
## VM
|
||||
|
||||
In container mode, zos provide a minimal kernel that is used to run a light weight VM
|
||||
and then run your app from your flist. If you need control over the kernel you can actually
|
||||
still provide it inside the flist as follows:
|
||||
|
||||
- /boot/vmlinuz
|
||||
- /boot/initrd.img [optional]
|
||||
|
||||
**NOTE**: the vmlinuz MUST be an EFI kernel (not compressed) if building your own kernel, or you can use the [extract-vmlinux](https://github.com/torvalds/linux/blob/master/scripts/extract-vmlinux) script to extract the EFI kernel. To test if your kernel is a valid elf kernel run command
|
||||
`readelf -n <path/to/vmlinuz>`
|
||||
|
||||
Any of those files can be a symlink to another file in the flist.
|
||||
|
||||
If ZOS found the `/boot/vmlinuz` file, it will use this with the initrd.img if also exists. otherwise zos will use the built-in minimal kernel and run in `container` mode.
|
||||
|
||||
### Building an ubuntu VM flist
|
||||
|
||||
This is a guide to help you build a working VM flist.
|
||||
|
||||
This guide is for ubuntu `jammy`
|
||||
|
||||
prepare rootfs
|
||||
|
||||
```bash
|
||||
mkdir ubuntu:jammy
|
||||
```
|
||||
|
||||
bootstrap ubuntu
|
||||
|
||||
```bash
|
||||
sudo debootstrap jammy ubuntu:jammy http://archive.ubuntu.com/ubuntu
|
||||
```
|
||||
|
||||
this will create and download the basic rootfs for ubuntu jammy in the directory `ubuntu:jammy`.
|
||||
After its done we can then chroot into this directory to continue installing the necessary packages needed and configure
|
||||
few things.
|
||||
|
||||
> I am using script called `arch-chroot` which is available by default on arch but you can also install on ubuntu to continue
|
||||
the following steps
|
||||
|
||||
```bash
|
||||
sudo arch-chroot ubuntu:jammy
|
||||
```
|
||||
|
||||
> This script (similar to the `chroot` command) switch root to that given directory but also takes care of mounting /dev /sys, etc.. for you
|
||||
and clean it up on exit.
|
||||
|
||||
Next just remove this link and re-create the file with a valid name to be able to continue
|
||||
|
||||
```bash
|
||||
# make sure to set the path correctly
|
||||
export PATH=/usr/local/sbin/:/usr/local/bin/:/usr/sbin/:/usr/bin/:/sbin:/bin
|
||||
|
||||
rm /etc/resolv.conf
|
||||
echo 'nameserver 1.1.1.1' > /etc/resolv.conf
|
||||
```
|
||||
|
||||
Install cloud-init
|
||||
|
||||
```bash
|
||||
apt-get update
|
||||
apt-get install cloud-init openssh-server curl
|
||||
# to make sure we have clean setup
|
||||
cloud-init clean
|
||||
```
|
||||
|
||||
Also really important that we install a kernel
|
||||
|
||||
```bash
|
||||
apt-get install linux-modules-extra-5.15.0-25-generic
|
||||
```
|
||||
|
||||
> I choose this package because it will also install extra modules for us and a generic kernel
|
||||
|
||||
Next make sure that virtiofs is part of the initramfs image
|
||||
|
||||
```bash
|
||||
echo 'fs-virtiofs' >> /etc/initramfs-tools/modules
|
||||
update-initramfs -c -k all
|
||||
```
|
||||
|
||||
clean up cache
|
||||
|
||||
```bash
|
||||
apt-get clean
|
||||
```
|
||||
|
||||
Last thing we do inside the container before we actually upload the flist
|
||||
is to make sure the kernel is in the correct format
|
||||
|
||||
This step does not require that we stay in the chroot so hit `ctr+d` or type `exit`
|
||||
|
||||
you should be out of the arch-chroot now
|
||||
|
||||
```bash
|
||||
curl -O https://raw.githubusercontent.com/torvalds/linux/master/scripts/extract-vmlinux
|
||||
chmod +x extract-vmlinux
|
||||
|
||||
sudo ./extract-vmlinux ubuntu:jammy/boot/vmlinuz | sudo tee ubuntu:jammy/boot/vmlinuz-5.15.0-25-generic.elf > /dev/null
|
||||
# then replace original kernel
|
||||
sudo mv ubuntu:jammy/boot/vmlinuz-5.15.0-25-generic.elf ubuntu:jammy/boot/vmlinuz-5.15.0-25-generic
|
||||
```
|
||||
|
||||
To verify you can do this:
|
||||
|
||||
```bash
|
||||
ls -l ubuntu:jammy/boot
|
||||
```
|
||||
|
||||
and it should show something like
|
||||
|
||||
```bash
|
||||
total 101476
|
||||
-rw-r--r-- 1 root root 260489 Mar 30 2022 config-5.15.0-25-generic
|
||||
drwxr-xr-x 1 root root 54 Jun 28 15:35 grub
|
||||
lrwxrwxrwx 1 root root 28 Jun 28 15:35 initrd.img -> initrd.img-5.15.0-25-generic
|
||||
-rw-r--r-- 1 root root 41392462 Jun 28 15:39 initrd.img-5.15.0-25-generic
|
||||
lrwxrwxrwx 1 root root 28 Jun 28 15:35 initrd.img.old -> initrd.img-5.15.0-25-generic
|
||||
-rw------- 1 root root 6246119 Mar 30 2022 System.map-5.15.0-25-generic
|
||||
lrwxrwxrwx 1 root root 25 Jun 28 15:35 vmlinuz -> vmlinuz-5.15.0-25-generic
|
||||
-rw-r--r-- 1 root root 55988436 Jun 28 15:50 vmlinuz-5.15.0-25-generic
|
||||
lrwxrwxrwx 1 root root 25 Jun 28 15:35 vmlinuz.old -> vmlinuz-5.15.0-25-generic
|
||||
```
|
||||
|
||||
Now package the tar for upload
|
||||
|
||||
```bash
|
||||
sudo rm -rf ubuntu:jammy/dev/*
|
||||
sudo tar -czf ubuntu-jammy.tar.gz -C ubuntu:jammy .
|
||||
```
|
||||
|
||||
Upload to the hub, and use it to create a Zmachine
|
||||
|
||||
## VM Image [deprecated]
|
||||
|
||||
In a VM image mode, you run your own operating system (for now only linux is supported)
|
||||
The image provided must be
|
||||
|
||||
- EFI bootable
|
||||
- Cloud-init enabled.
|
||||
|
||||
You can find later in this document how to create your own bootable image.
|
||||
|
||||
A VM reservations must also have at least 1 volume, as the boot image
|
||||
will be copied to this volume. The size of the root disk will be the size of this
|
||||
volume.
|
||||
|
||||
The image used to the boot the VM must has cloud-init enabled on boot. Cloud-init
|
||||
receive its config over the NoCloud source. This takes care of setting up networking, hostname
|
||||
, root authorized_keys.
|
||||
|
||||
> This method of building a full VM from a raw image is not recommended and will get phased out in
|
||||
the future. It's better to use either the container method to run containerized Apps. Another option
|
||||
is to run your own kernel from an flist (explained below)
|
||||
|
||||
### Expected Flist structure
|
||||
|
||||
An `Zmachine` will be considered a `VM` if it contains an `/image.raw` file.
|
||||
|
||||
`/image.raw` is used as "boot disk". This `/image.raw` is copied to the first attached
|
||||
volume of the `VM`. Cloud-init will take care of resizing the filesystem on the image
|
||||
to take the full disk size allocated in the deployment.
|
||||
|
||||
Note if the `image.raw` size is larger than the allocated disk. the workload for the VM
|
||||
will fail.
|
||||
|
||||
### Expected Flist structure
|
||||
|
||||
Any Flist will boot as a container, **UNLESS** is has a `/image.raw` file. There is
|
||||
no need to specify a kernel yourself (it will be provided).
|
||||
|
||||
### Known issues
|
||||
|
||||
- We need to do proper performance testing for `virtio-fs`. There seems to be some
|
||||
suboptimal performance right now.
|
||||
- It's not currently possible to get container logs.
|
||||
- TODO: more testing
|
||||
|
||||
## Creating VM image
|
||||
|
||||
This is a simple tutorial on how to create your own VM image
|
||||
> Note: Please consider checking the official vm images repo on the hub before building your own
|
||||
image. this can save you a lot of time (and network traffic) here <https://hub.grid.tf/tf-official-vms>
|
||||
|
||||
### Use one of ubuntu cloud-images
|
||||
|
||||
If the ubuntu images in the official repo are not enough, you can simply upload one of the official images as follows
|
||||
|
||||
- Visit <https://cloud-images.ubuntu.com/>
|
||||
- Select the version you want (let's assume bionic)
|
||||
- Go to bionic, then click on current
|
||||
- download the amd64.img file like this one <https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img>
|
||||
- This is a `Qcow2` image, this is not supported by zos. So we need to convert this to a raw disk image using the following command
|
||||
|
||||
```bash
|
||||
qemu-img convert -p -f qcow2 -O raw bionic-server-cloudimg-amd64.img image.raw
|
||||
```
|
||||
|
||||
- now we have the raw image (image.raw) time to compress and upload to the hub
|
||||
|
||||
```bash
|
||||
tar -czf ubuntu-18.04-lts.tar.gz image.raw
|
||||
```
|
||||
|
||||
- now visit the hub <https://hub.grid.tf/> and login or create your own account, then click on upload my file button
|
||||
- Select the newly created tar.gz file
|
||||
- Now you should be able to use this flist to create Zmachine workloads
|
||||
|
||||
### Create an image from scratch
|
||||
|
||||
This is an advanced scenario and you will require some prior knowledge of how to create local VMs and how to prepare the installation medium,
|
||||
and installing your OS of choice.
|
||||
|
||||
Before we continue you need to have some hypervisor that you can use locally. Libvirt/Qemu are good choices. Hence we skip on what you need to do to install and configure your system correctly not how to create the VM
|
||||
|
||||
#### VM Requirements
|
||||
|
||||
Create a VM with enough CPU and Memory to handle the installation process note that this does not relate on what your choices for CPU and Memory are going to be for the actual VM running on the grid.
|
||||
|
||||
We going to install arch linux image. So we will have to create a VM with
|
||||
|
||||
- Disk of about 2GB (note this also is not related to the final VM running on the grid, on installation the OS image will expand to use the entire allocated disk attached to the VM eventually). The smaller the disk is better this can be different for each OS.
|
||||
- Add the arch installation iso or any other installation medium
|
||||
|
||||
#### Boot the VM (locally)
|
||||
|
||||
Boot the VM to start installation. The boot must support EFI booting because ZOS only support images with esp partition. So make sure that both your hypervisor and boot/installation medium supports this.
|
||||
|
||||
For example in Libvirt Manager make sure you are using the right firmware (UEFI)
|
||||
|
||||
#### Installation
|
||||
|
||||
We going to follow the installation manual for Arch linux but with slight tweaks:
|
||||
|
||||
- Make sure VM is booted with UEFI, run `efivar -l` command see if you get any output. Otherwise the machine is probably booted in BIOS mode.
|
||||
- With `parted` create 2 partitions
|
||||
- an esp (boot) partition of 100M
|
||||
- a root partition that spans the remaining of the disk
|
||||
|
||||
```bash
|
||||
DISK=/dev/vda
|
||||
# First, create a gpt partition table
|
||||
parted $DISK mklabel gpt
|
||||
# Secondly, create the esp partition of 100M
|
||||
parted $DISK mkpart primary 1 100M
|
||||
# Mark first part as esp
|
||||
parted $DISK set 1 esp on
|
||||
# Use the remaining part as root that takes the remaining
|
||||
# space on disk
|
||||
parted $DISK mkpart primary 100M 100%
|
||||
|
||||
# To verify everything is correct do
|
||||
parted $DISK print
|
||||
|
||||
# this should 2 partitions the first one is slightly less that 100M and has flags (boot, esp), the second one takes the remaining space
|
||||
```
|
||||
|
||||
We need to format the partitions as follows:
|
||||
|
||||
```bash
|
||||
# this one has to be vfat of size 32 as follows
|
||||
mkfs.vfat -F 32 /dev/vda1
|
||||
# This one can be anything based on your preference as long as it's supported by you OS kernel. we going with ext4 in this tutorial
|
||||
mkfs.ext4 -L cloud-root /dev/vda2
|
||||
```
|
||||
|
||||
Note the label assigned to the /dev/vda2 (root) partition this can be anything but it's needed to configure the boot later when installing the boot loader. Otherwise you can use the partition UUID.
|
||||
|
||||
Next, we need to mount the disks
|
||||
|
||||
```bash
|
||||
mount /dev/vda2 /mnt
|
||||
mkdir /mnt/boot
|
||||
mount /dev/vda1 /mnt/boot
|
||||
```
|
||||
|
||||
After disks are mounted as above, we need to start the installation
|
||||
|
||||
```bash
|
||||
pacstrap /mnt base linux linux-firmware vim openssh cloud-init cloud-guest-utils
|
||||
```
|
||||
|
||||
This will install basic arch linux but will also include cloud-init, cloud-guest-utils, openssh, and vim for convenience.
|
||||
|
||||
Following the installation guid to generate fstab file
|
||||
|
||||
```
|
||||
genfstab -U /mnt >> /mnt/etc/fstab
|
||||
```
|
||||
|
||||
And arch-chroot into /mnt `arch-chroot /mnt` to continue the setup. please follow all steps in the installation guide to set timezone, and locales as needed.
|
||||
|
||||
- You don't have to set the hostname, this will be setup later on zos when the VM is deployed via cloud-init
|
||||
- let's drop the root password all together since login to the VM over ssh will require key authentication only, you can do this by running
|
||||
|
||||
```bash
|
||||
passwd -d root
|
||||
```
|
||||
|
||||
We make sure required services are enabled
|
||||
|
||||
```bash
|
||||
systemctl enable sshd
|
||||
systemctl enable systemd-networkd
|
||||
systemctl enable systemd-resolved
|
||||
systemctl enable cloud-init
|
||||
systemctl enable cloud-final
|
||||
|
||||
# make sure we using resolved
|
||||
rm /etc/resolv.conf
|
||||
ln -s /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
|
||||
```
|
||||
|
||||
Finally installing the boot loader as follows
|
||||
> Only grub2 has been tested and known to work.
|
||||
|
||||
```bash
|
||||
pacman -S grub
|
||||
```
|
||||
|
||||
Then we need to install grub
|
||||
|
||||
```
|
||||
grub-install --target=x86_64-efi --efi-directory=esp --removable
|
||||
```
|
||||
|
||||
Change default values as follows
|
||||
|
||||
```
|
||||
vim /etc/default/grub
|
||||
```
|
||||
|
||||
And make sure to change `GRUB_CMDLINE_LINUX_DEFAULT` as follows
|
||||
|
||||
```
|
||||
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 console=tty console=ttyS0"
|
||||
```
|
||||
|
||||
> Note: we removed the `quiet` and add the console flags.
|
||||
|
||||
Also set the `GRUB_TIMEOUT` to 0 for a faster boot
|
||||
|
||||
```
|
||||
GRUB_TIMEOUT=0
|
||||
```
|
||||
|
||||
Then finally generating the config
|
||||
|
||||
```
|
||||
grub-mkconfig -o /boot/grub/grub.cfg
|
||||
```
|
||||
|
||||
Last thing we need to do is clean up
|
||||
|
||||
- pacman cache by running `rm -rf /var/cache/pacman/pkg`
|
||||
- cloud-init state by running `cloud-init clean`
|
||||
|
||||
Click `Ctrl+D` to exit the change root, then power off by running `poweroff` command.
|
||||
|
||||
> NOTE: if you booted the machine again you always need to do `cloud-init clean` as long as it's not yet deployed on ZOS this to make sure the image has a clean state
|
||||
>
|
||||
#### Converting the disk
|
||||
|
||||
Based on your hypervisor of choice you might need to convert the disk to a `raw` image same way we did with ubuntu image.
|
||||
|
||||
```bash
|
||||
# this is an optional step in case you used a qcoq disk for the installation. If the disk is already `raw` you can skip this
|
||||
qemu-img convert -p -f qcow2 -O raw /path/to/vm/disk.img image.raw
|
||||
```
|
||||
|
||||
Compress and tar the image.raw as before, and upload to the hub.
|
||||
|
||||
```
|
||||
tar -czf arch-linux.tar.gz image.raw
|
||||
```
|
@@ -1,2 +0,0 @@
|
||||
# `zmount` type
|
||||
A `zmount` is a local disk that can be attached directly to a container or a virtual machine. `zmount` only require `size` as input as defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zmount.go) this workload type is only utilized via the `zmachine` workload.
|
@@ -1,85 +0,0 @@
|
||||
<h1> CPUBenchmark </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Configuration](#configuration)
|
||||
- [Details](#details)
|
||||
- [Result Sample](#result-sample)
|
||||
- [Result Explanation](#result-explanation)
|
||||
|
||||
***
|
||||
|
||||
## Overview
|
||||
|
||||
The `CPUBenchmark` task is designed to measure the performance of the CPU. it utilizes the [cpu-benchmark-simple](https://github.com/threefoldtech/cpu-benchmark-simple) tool and includes a zos stub to gather the number of workloads running on the node.
|
||||
|
||||
## Configuration
|
||||
|
||||
- Name: `cpu-benchmark`
|
||||
- Schedule: 4 times a day
|
||||
- Jitter: 0
|
||||
|
||||
## Details
|
||||
|
||||
- The benchmark simply runs a `CRC64` computation task, calculates the time spent in the computation and reports it in `seconds`.
|
||||
- The computation is performed in both single-threaded and multi-threaded scenarios.
|
||||
- Lower time = better performance: for a single threaded benchmark, a lower execution time indicates better performance.
|
||||
|
||||
## Result Sample
|
||||
|
||||
```json
|
||||
{
|
||||
"description": "Measures the performance of the node CPU by reporting the time spent of computing a task in seconds.",
|
||||
"name": "cpu-benchmark",
|
||||
"result": {
|
||||
"multi": 1.105,
|
||||
"single": 1.135,
|
||||
"threads": 1,
|
||||
"workloads": 0
|
||||
},
|
||||
"timestamp": 1700504403
|
||||
}
|
||||
```
|
||||
|
||||
## Result Explanation
|
||||
|
||||
The best way to know what's a good or bad value is by testing and comparing different hardware.
|
||||
Here are some examples:
|
||||
|
||||
**1x Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz** (Q3'2017)
|
||||
|
||||
```
|
||||
Single thread score: 0.777
|
||||
Multi threads score: 13.345 [16 threads]
|
||||
```
|
||||
|
||||
**1x Intel(R) Pentium(R) CPU G4400 @ 3.30GHz** (Q3'2015)
|
||||
|
||||
```
|
||||
Single thread score: 1.028
|
||||
Multi threads score: 2.089 [2 threads]
|
||||
```
|
||||
|
||||
**1x Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz** (Q2'2012)
|
||||
|
||||
```
|
||||
Single thread score: 2.943
|
||||
Multi threads score: 12.956 [4 threads]
|
||||
```
|
||||
|
||||
**2x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz** (Q1'2012)
|
||||
|
||||
```
|
||||
Single thread score: 1.298
|
||||
Multi threads score: 44.090 [32 threads]
|
||||
```
|
||||
|
||||
**2x Intel(R) Xeon(R) CPU L5640 @ 2.27GHz** (Q1'2010)
|
||||
|
||||
```
|
||||
Single thread score: 2.504
|
||||
Multi threads score: 72.452 [24 threads]
|
||||
```
|
||||
|
||||
As you can see, the more recent the CPU is, the faster it is, but for a same launch period, you can see Xeon way better than regular/desktop CPU. You have to take in account the amount of threads and the time per threads.
|
@@ -1,38 +0,0 @@
|
||||
<h1> Health Check </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Configuration](#configuration)
|
||||
- [Details](#details)
|
||||
- [Result Sample](#result-sample)
|
||||
|
||||
***
|
||||
|
||||
## Overview
|
||||
|
||||
Health check task executes some checks over ZOS components to determine if the node is in a usable state or not and set flags for the Power Daemon to stop uptime reports if the node is unusable.
|
||||
|
||||
## Configuration
|
||||
|
||||
- Name: `healthcheck`
|
||||
- Schedule: Every 20 mins.
|
||||
|
||||
## Details
|
||||
|
||||
- Check if the node cache disk is usable or not by trying to write some data to it. If it failed, it set the Readonly flag.
|
||||
|
||||
## Result Sample
|
||||
|
||||
```json
|
||||
{
|
||||
"description": "health check task runs multiple checks to ensure the node is in a usable state and set flags for the power daemon to stop reporting uptime if it is not usable",
|
||||
"name": "healthcheck",
|
||||
"result": {
|
||||
"cache": [
|
||||
"failed to write to cache: open /var/cache/healthcheck: operation not permitted"
|
||||
]
|
||||
},
|
||||
"timestamp": 1701599580
|
||||
}
|
||||
```
|
@@ -1,80 +0,0 @@
|
||||
<h1> IPerf </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Configuration](#configuration)
|
||||
- [Details](#details)
|
||||
- [Result Sample](#result-sample)
|
||||
|
||||
***
|
||||
|
||||
## Overview
|
||||
|
||||
The `iperf` package is designed to facilitate network performance testing using the `iperf3` tool. with both UDP and TCP over IPv4 and IPv6.
|
||||
|
||||
## Configuration
|
||||
|
||||
- Name: `iperf`
|
||||
- Schedule: 4 times a day
|
||||
- Jitter: 20 min
|
||||
|
||||
## Details
|
||||
|
||||
- The package using the iperf binary to examine network performance under different conditions.
|
||||
- It randomly fetch PublicConfig data for randomly public nodes on the chain + all public node from free farm. These nodes serve as the targets for the iperf tests.
|
||||
- For each node, it run the test with 4 times. through (UDP/TCP) using both node IPs (v4/v6)
|
||||
- result will be a slice of all public node report (4 for each) each one will include:
|
||||
```
|
||||
UploadSpeed: Upload speed (in bits per second).
|
||||
DownloadSpeed: Download speed (in bits per second).
|
||||
NodeID: ID of the node where the test was conducted.
|
||||
NodeIpv4: IPv4 address of the node.
|
||||
TestType: Type of the test (TCP or UDP).
|
||||
Error: Any error encountered during the test.
|
||||
CpuReport: CPU utilization report (in percentage).
|
||||
```
|
||||
|
||||
## Result Sample
|
||||
|
||||
```json
|
||||
{
|
||||
"description": "Test public nodes network performance with both UDP and TCP over IPv4 and IPv6",
|
||||
"name": "iperf",
|
||||
"result": [
|
||||
{
|
||||
"cpu_report": {
|
||||
"host_system": 2.4433388913571044,
|
||||
"host_total": 3.542919199613454,
|
||||
"host_user": 1.0996094859359695,
|
||||
"remote_system": 0.24430594945859846,
|
||||
"remote_total": 0.3854457128784448,
|
||||
"remote_user": 0.14115962407747246
|
||||
},
|
||||
"download_speed": 1041274.4792242317,
|
||||
"error": "",
|
||||
"node_id": 124,
|
||||
"node_ip": "88.99.30.200",
|
||||
"test_type": "tcp",
|
||||
"upload_speed": 1048549.3668460822
|
||||
},
|
||||
{
|
||||
"cpu_report": {
|
||||
"host_system": 0,
|
||||
"host_total": 0,
|
||||
"host_user": 0,
|
||||
"remote_system": 0,
|
||||
"remote_total": 0,
|
||||
"remote_user": 0
|
||||
},
|
||||
"download_speed": 0,
|
||||
"error": "unable to connect to server - server may have stopped running or use a different port, firewall issue, etc.: Network unreachable",
|
||||
"node_id": 124,
|
||||
"node_ip": "2a01:4f8:10a:710::2",
|
||||
"test_type": "tcp",
|
||||
"upload_speed": 0
|
||||
}
|
||||
],
|
||||
"timestamp": 1700507035
|
||||
}
|
||||
```
|
@@ -1,90 +0,0 @@
|
||||
<h1> Performance Monitor Package </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Flow](#flow)
|
||||
- [Node Initialization Check](#node-initialization-check)
|
||||
- [Scheduling](#scheduling)
|
||||
- [RMB Commands](#rmb-commands)
|
||||
- [Caching](#caching)
|
||||
- [Registered Tests](#registered-tests)
|
||||
- [Test Suite](#test-suite)
|
||||
|
||||
***
|
||||
|
||||
## Overview
|
||||
|
||||
The `perf` package is a performance monitor in `zos` nodes. it schedules tasks, cache their results and allows retrieval of these results through `RMB` calls.
|
||||
|
||||
## Flow
|
||||
|
||||
1. The `perf` monitor is started by the `noded` service in zos.
|
||||
2. Tasks are registered with a schedule in the new monitor.
|
||||
3. A bus handler is opened to allow result retrieval.
|
||||
|
||||
## Node Initialization Check
|
||||
|
||||
To ensure that the node always has a test result available, a check is performed on node startup for all the registered tasks, if a task doesn't have any stored result, it will run immediately without waiting for the next scheduled time.
|
||||
|
||||
## Scheduling
|
||||
|
||||
- Tasks are scheduled using a 6 fields cron format. this format provides flexibility to define time, allowing running tasks periodically or at specific time.
|
||||
|
||||
- Each task has a jitter which is the maximum number of seconds the task could sleep before it runs, this happens to prevent all tests ending up running at exactly the same time. So, for example, if a task is scheduled to run at `06:00` and its jitter is `10`, it is expected to run anywhere between `06:00` and `06:10`.
|
||||
|
||||
## RMB Commands
|
||||
|
||||
- `zos.perf.get`:
|
||||
|
||||
- Payload: a payload type that contains the name of the test
|
||||
|
||||
```go
|
||||
type Payload struct {
|
||||
Name string
|
||||
}
|
||||
```
|
||||
|
||||
Possible values:
|
||||
|
||||
- `"public-ip-validation"`
|
||||
- `"cpu-benchmark"`
|
||||
- `"iperf"`
|
||||
|
||||
- Return: a single task result.
|
||||
|
||||
- Possible Error: `ErrResultNotFound` if no result is stored for the given task.
|
||||
|
||||
- `zos.perf.get_all`:
|
||||
|
||||
- Return: all stored results
|
||||
|
||||
The rmb direct client can be used to call these commands. check the [example](https://github.com/threefoldtech/tfgrid-sdk-go/blob/development/rmb-sdk-go/examples/rpc_client/main.go)
|
||||
|
||||
## Caching
|
||||
|
||||
Results are stored in a Redis server running on the node.
|
||||
|
||||
The key in redis is the name of the task prefixed with the word `perf`.
|
||||
The value is an instance of `TaskResult` struct contains:
|
||||
|
||||
- Name of the task
|
||||
- Timestamp when the task was run
|
||||
- A brief description about what the task do
|
||||
- The actual returned result from the task
|
||||
|
||||
Notes:
|
||||
|
||||
- Storing results by a key ensures each new result overrides the old one, so there is always a single result for each task.
|
||||
- Storing results prefixed with `perf` eases retrieving all the results stored by this module.
|
||||
|
||||
## Registered Tests
|
||||
|
||||
- [Public IP Validation](./publicips.md)
|
||||
- [CPUBenchmark](./cpubench.md)
|
||||
- [IPerf](./iperf.md)
|
||||
- [Health Check](./healthcheck.md)
|
||||
|
||||
## Test Suite
|
||||
|
||||
Go to [this link](https://app.testlodge.com/a/26076/projects/40893/suites/234919) for a test suite covering the test cases for the performance testing.
|
@@ -1,55 +0,0 @@
|
||||
<h1> Public IPs Validation Task </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Configuration](#configuration)
|
||||
- [Task Details](#task-details)
|
||||
- [Result](#result)
|
||||
- [Result Sample](#result-sample)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
The goal of the task is to make sure public IPs assigned to a farm are valid and can be assigned to deployments.
|
||||
|
||||
## Configuration
|
||||
|
||||
- Name: `public-ip-validation`
|
||||
- Schedule: 4 times a day
|
||||
- Jitter: 10 min
|
||||
|
||||
## Task Details
|
||||
|
||||
- The task depends on `Networkd` ensuring the proper test network setup is correct and will fail if it wasn't setup properly. The network setup consists of a test Namespace and a MacVLAN as part of it. All steps are done inside the test Namespace.
|
||||
- Decide if the node should run the task or another one in the farm based on the node ID. The node with the least ID and with power target as up should run it. The other will log why they shouldn't run the task and return with no errors. This is done to ensure only one node runs the task to avoid problems like assigning the same IP.
|
||||
- Get public IPs set on the farm.
|
||||
- Remove all IPs and routes added to the test MacVLAN to ensure any remaining from previous task run are removed.
|
||||
- Skip IPs that are assigned to a contract.
|
||||
- Set the MacVLAN link up.
|
||||
- Iterate over all public IPs and add them with the provided gateway to the MacVLAN.
|
||||
- Validate the IP by querying an external source that return the public IP for the node.
|
||||
- If the public IP returned matches the IP added in the link, then the IP is valid. Otherwise, it is invalid.
|
||||
- Remove all IPs and routes between each IP to make them available for other deployments.
|
||||
- After iterating over all public IPs, set the link down.
|
||||
|
||||
## Result
|
||||
|
||||
The task only returns a single map of String (IP) to IPReport. The report consists of the IP state (valid, invalid or skipped) and the reason for the state.
|
||||
|
||||
### Result Sample
|
||||
|
||||
```json
|
||||
{
|
||||
"description": "Runs on the least NodeID node in a farm to validate all its IPs.",
|
||||
"name": "public-ip-validation",
|
||||
"result": {
|
||||
"185.206.122.29/24": {
|
||||
"reason": "public ip or gateway data are not valid",
|
||||
"state": "invalid"
|
||||
}
|
||||
},
|
||||
"timestamp": 1700504421
|
||||
}
|
||||
```
|
@@ -1,28 +0,0 @@
|
||||
<h1> Zero-OS </h1>
|
||||
|
||||
<h2> Table of Contents </h2>
|
||||
|
||||
- [Manual](./manual/manual.md)
|
||||
- [Workload Types](./manual/workload_types.md)
|
||||
- [Internal Modules](./internals/internals.md)
|
||||
- [Identity](./internals/identity/index.md)
|
||||
- [Node ID Generation](./internals/identity/identity.md)
|
||||
- [Node Upgrade](./internals/identity/upgrade.md)
|
||||
- [Node](./internals/node/index.md)
|
||||
- [Storage](./internals/storage/index.md)
|
||||
- [Network](./internals/network/index.md)
|
||||
- [Introduction](./internals/network/introduction.md)
|
||||
- [Definitions](./internals/network/definitions.md)
|
||||
- [Mesh](./internals/network/mesh.md)
|
||||
- [Setup](./internals/network/setup_farm_network.md)
|
||||
- [Flist](./internals/flist/index.md)
|
||||
- [Container](./internals/container/index.md)
|
||||
- [VM](./internals/vmd/index.md)
|
||||
- [Provision](./internals/provision/index.md)
|
||||
- [Capacity](./internals/capacity.md)
|
||||
- [Performance Monitor Package](./performance/performance.md)
|
||||
- [Public IPs Validation Task](./performance/publicips.md)
|
||||
- [CPUBenchmark](./performance/cpubench.md)
|
||||
- [IPerf](./performance/iperf.md)
|
||||
- [Health Check](./performance/healthcheck.md)
|
||||
- [API](./manual/api.md)
|
@@ -1,31 +0,0 @@
|
||||
# Releases of Zero-OS
|
||||
|
||||
We use a simple pipeline release workflow. Building and file distribution are made using GitHub Actions.
|
||||
Usable files are available on the [Zero-OS Hub](https://hub.grid.tf/tf-zos).
|
||||
|
||||
This pipeline is made to match the 3 different type of running mode of 0-OS. For more information head to the [upgrade documentation](../identity/upgrade.md).
|
||||
|
||||
## Development build
|
||||
|
||||
On a push to main branch on the zos repository, a new development build is triggered. If the build succeed,
|
||||
binaries are packed into an flist and uploaded to the [tf-autobuilder](https://hub.grid.tf/tf-autobuilder) repository of the hub.
|
||||
|
||||
This flist is then promoted into the [tf-zos](https://hub.grid.tf/tf-zos) repository of the hub and a symlink to this latest build is made (`tf-autobuilder/zos:development-3:latest.flist`)
|
||||
|
||||
## Releases
|
||||
We create 3 types of releases:
|
||||
- QA release, in this release the version is suffixed by `qa<number>` for example `v3.5.0-qa1`.
|
||||
- RC release, in this release the version is suffixed by `rc<number>` for example `v3.5.0-rc2`.
|
||||
- Main release, is this release the version has no suffix, for example `v3.5.0`
|
||||
|
||||
The release cycle goes like this:
|
||||
- As mentioned before devnet is updated the moment new code is available on `main` branch. Since the `dev` release is auto linked to the latest `flist` on the hub. Nodes on devnet will auto update to the latest available build.
|
||||
- Creating a `qa` release, will not not trigger the same behavior on `qa` net, same for both testnet and mainnet. Instead a workflow must be triggered, this is only to make sure 100% that an update is needed.
|
||||
- Once the build of the release is available, a [deploy](../../.github/workflows/grid-deploy.yaml) workflow needed to be triggered with the right version to deploy on the proper network.
|
||||
- The work flow all what it does is linking the right version under the hub [tf-zos](https://hub.grid.tf/tf-zos) repo
|
||||
|
||||
> The `deploy` flow is rarely used, the on chain update is also available. By setting the right version on tfchain, the link on the hub is auto-updated and hence the deploy workflow won't be needed to be triggered. Although we have it now as a safety net in case something goes wrong (chain is broken) and we need to force a specific version on ZOS.
|
||||
|
||||
- Development: https://playground.hub.grid.tf/tf-autobuilder/zos:development-3:latest.flist
|
||||
- Testing: https://playground.hub.grid.tf/tf-zos/zos:testing-3:latest.flist
|
||||
- Production: https://playground.hub.grid.tf/tf-zos/zos:production-3:latest.flist
|