diff --git a/collections/documentation/developers/developers.md b/collections/documentation/developers/developers.md new file mode 100644 index 0000000..3877320 --- /dev/null +++ b/collections/documentation/developers/developers.md @@ -0,0 +1,90 @@ +# ThreeFold Developers + +This section covers all practical tutorials on how to develop and build on the ThreeFold Grid. + +For complementary information on the technology developed by ThreeFold, refer to the [Technology](../../knowledge_base/technology/technology_toc.md) section. + +

Table of Contents

+ +- [Javascript Client](./javascript/grid3_javascript_readme.md) + - [Installation](./javascript/grid3_javascript_installation.md) + - [Loading Client](./javascript/grid3_javascript_loadclient.md) + - [Deploy a VM](./javascript/grid3_javascript_vm.md) + - [Capacity Planning](./javascript/grid3_javascript_capacity_planning.md) + - [Deploy Multiple VMs](./javascript/grid3_javascript_vms.md) + - [Deploy CapRover](./javascript/grid3_javascript_caprover.md) + - [Gateways](./javascript/grid3_javascript_vm_gateways.md) + - [Deploy a Kubernetes Cluster](./javascript/grid3_javascript_kubernetes.md) + - [Deploy a ZDB](./javascript/grid3_javascript_zdb.md) + - [Deploy ZDBs for QSFS](./javascript/grid3_javascript_qsfs_zdbs.md) + - [QSFS](./javascript/grid3_javascript_qsfs.md) + - [Key Value Store](./javascript/grid3_javascript_kvstore.md) + - [VM with Wireguard and Gateway](./javascript/grid3_wireguard_gateway.md) + - [GPU Support](./javascript/grid3_javascript_gpu_support.md) +- [Go Client](./go/grid3_go_readme.md) + - [Installation](./go/grid3_go_installation.md) + - [Loading Client](./go/grid3_go_load_client.md) + - [Deploy a VM](./go/grid3_go_vm.md) + - [Deploy Multiple VMs](./go/grid3_go_vms.md) + - [Deploy Gateways](./go/grid3_go_gateways.md) + - [Deploy Kubernetes](./go/grid3_go_kubernetes.md) + - [Deploy a QSFS](./go/grid3_go_qsfs.md) + - [GPU Support](./go/grid3_go_gpu.md) +- [TFCMD](./tfcmd/tfcmd.md) + - [Getting Started](./tfcmd/tfcmd_basics.md) + - [Deploy a VM](./tfcmd/tfcmd_vm.md) + - [Deploy Kubernetes](./tfcmd/tfcmd_kubernetes.md) + - [Deploy ZDB](./tfcmd/tfcmd_zdbs.md) + - [Gateway FQDN](./tfcmd/tfcmd_gateway_fqdn.md) + - [Gateway Name](./tfcmd/tfcmd_gateway_name.md) + - [Contracts](./tfcmd/tfcmd_contracts.md) +- [TFROBOT](./tfrobot/tfrobot.md) + - [Installation](./tfrobot/tfrobot_installation.md) + - [Configuration File](./tfrobot/tfrobot_config.md) + - [Deployment](./tfrobot/tfrobot_deploy.md) + - [Commands and Flags](./tfrobot/tfrobot_commands_flags.md) + - [Supported Configurations](./tfrobot/tfrobot_configurations.md) +- [ThreeFold Chain](./tfchain/tfchain.md) + - [Introduction](./tfchain/introduction.md) + - [Farming Policies](./tfchain/farming_policies.md) + - [External Service Contract](./tfchain/tfchain_external_service_contract.md) + - [Solution Provider](./tfchain/tfchain_solution_provider.md) +- [Grid Proxy](./proxy/proxy_readme.md) + - [Introducing Grid Proxy](./proxy/proxy.md) + - [Setup](./proxy/setup.md) + - [DB Testing](./proxy/db_testing.md) + - [Commands](./proxy/commands.md) + - [Contributions](./proxy/contributions.md) + - [Explorer](./proxy/explorer.md) + - [Database](./proxy/database.md) + - [Production](./proxy/production.md) + - [Release](./proxy/release.md) +- [Flist](./flist/flist.md) + - [ThreeFold Hub Intro](./flist/flist_hub/zos_hub.md) + - [Generate an API Token](./flist/flist_hub/api_token.md) + - [Convert Docker Image Into Flist](./flist/flist_hub/convert_docker_image.md) + - [Supported Flists](./flist/grid3_supported_flists.md) + - [Flist Case Studies](./flist/flist_case_studies/flist_case_studies.md) + - [Case Study: Debian 12](./flist/flist_case_studies/flist_debian_case_study.md) + - [Case Study: Nextcloud AIO](./flist/flist_case_studies/flist_nextcloud_case_study.md) +- [Internals](./internals/internals.md) + - [Reliable Message Bus (RMB)](./internals/rmb/rmb_toc.md) + - [Introduction to RMB](./internals/rmb/rmb_intro.md) + - [RMB Specs](./internals/rmb/rmb_specs.md) + - [RMB Peer](./internals/rmb/uml/peer.md) + - [RMB Relay](./internals/rmb/uml/relay.md) + - [ZOS](./internals/zos/index.md) + - [Manual](./internals/zos/manual/manual.md) + - [Workload Types](./internals/zos/manual/workload_types.md) + - [Internal Modules](./internals/zos/internals/internals.md) + - [Capacity](./internals/zos/internals/capacity.md) + - [Performance Monitor Package](./internals/zos/performance/performance.md) + - [Public IPs Validation Task](./internals/zos/performance/publicips.md) + - [CPUBenchmark](./internals/zos/performance/cpubench.md) + - [IPerf](./internals/zos/performance/iperf.md) + - [Health Check](./internals/zos/performance/healthcheck.md) + - [API](./internals/zos/manual/api.md) +- [Grid Deployment](./grid_deployment/grid_deployment.md) + - [TFGrid Stacks](./grid_deployment/tfgrid_stacks.md) + - [Full VM Grid Deployment](./grid_deployment/grid_deployment_full_vm.md) + - [Grid Snapshots](./grid_deployment/snapshots.md) \ No newline at end of file diff --git a/collections/documentation/developers/flist/flist.md b/collections/documentation/developers/flist/flist.md new file mode 100644 index 0000000..6c69e05 --- /dev/null +++ b/collections/documentation/developers/flist/flist.md @@ -0,0 +1,11 @@ +

Flist

+ +

Table of Contents

+ +- [Zero-OS Hub](./flist_hub/zos_hub.md) +- [Generate an API Token](./flist_hub/api_token.md) +- [Convert Docker Image Into Flist](./flist_hub/convert_docker_image.md) +- [Supported Flists](./grid3_supported_flists.md) +- [Flist Case Studies](./flist_case_studies/flist_case_studies.md) + - [Case Study: Debian 12](./flist_case_studies/flist_debian_case_study.md) + - [Case Study: Nextcloud AIO](./flist_case_studies/flist_nextcloud_case_study.md) \ No newline at end of file diff --git a/collections/documentation/developers/flist/flist_case_studies/flist_case_studies.md b/collections/documentation/developers/flist/flist_case_studies/flist_case_studies.md new file mode 100644 index 0000000..b258836 --- /dev/null +++ b/collections/documentation/developers/flist/flist_case_studies/flist_case_studies.md @@ -0,0 +1,6 @@ +

Flist Case Studies

+ +

Table of Contents

+ +- [Case Study: Debian 12](./flist_debian_case_study.md) +- [Case Study: Nextcloud AIO](./flist_nextcloud_case_study.md) \ No newline at end of file diff --git a/collections/documentation/developers/flist/flist_case_studies/flist_debian_case_study.md b/collections/documentation/developers/flist/flist_case_studies/flist_debian_case_study.md new file mode 100644 index 0000000..3777433 --- /dev/null +++ b/collections/documentation/developers/flist/flist_case_studies/flist_debian_case_study.md @@ -0,0 +1,300 @@ +

Flist Case Study: Debian 12

+ +

Table of Contents

+ +- [Introduction](#introduction) + - [You Said Flist?](#you-said-flist) + - [Case Study Objective](#case-study-objective) + - [The Overall Process](#the-overall-process) +- [Docker Image Creation](#docker-image-creation) + - [Dockerfile](#dockerfile) + - [Docker Image Script](#docker-image-script) + - [zinit Folder](#zinit-folder) + - [README.md File](#readmemd-file) + - [Putting it All Together](#putting-it-all-together) +- [Docker Publishing Steps](#docker-publishing-steps) + - [Create Account and Access Token](#create-account-and-access-token) + - [Build and Push the Docker Image](#build-and-push-the-docker-image) +- [Convert the Docker Image to an Flist](#convert-the-docker-image-to-an-flist) +- [Deploy the Flist on the TF Playground](#deploy-the-flist-on-the-tf-playground) +- [Conclusion](#conclusion) + +*** + +## Introduction + +For this tutorial, we will present a case study demonstrating how easy it is to create a new flist on the ThreeFold ecosystem. We will be creating a Debian Flist and we will deploy a micro VM on the ThreeFold Playground and access our Debian deployment. + +To do all this, we will need to create a Docker Hub account, create a Dockerfile, a docker image and a docker container, then convert the docker image to a Zero-OS flist. After all this, we will be deploying our Debian workload on the ThreeFold Playground. You'll see, it's pretty straightforward and fun to do. + + + +### You Said Flist? + +First, let's recall what an flist actually is and does. In short, an flist is a very effective way to deal with software data and the end result is fast deployment and high reliability. + +In a flist, we separate the metadata from the data. The metadata is a description of what files are in that particular image. It's the data providing information about the app/software. Thanks to flist, the 3Node doesn't need to install a complete software program in order to run properly. Only the necessary files are installed. Zero-OS can read the metadata of a container and only download and execute the necessary binaries and applications to run the workload, when it is necessary. + +Sounds great? It really is great, and very effective! + +One amazing thing about the flist technology is that it is possible to convert any Docker image into an flist, thanks to the [ThreeFold Docker Hub Converter tool](https://hub.grid.tf/docker-convert). If this sounds complicated, fear not. It is very easy and we will show you how to proceed in this case study. + + + +### Case Study Objective + +The goal of this case study is to give you enough information and tools so that you can yourself build your own flist projects and deploy on the ThreeFold Grid. + +This case study is not meant to show you all the detailed steps on creating an flist from scratch. We will instead start with some files templates available on the ThreeFold repository [tf-images](https://github.com/threefoldtech/tf-images). This is one of the many advantages of working with open-source projects: we can easily get inspiration from the already available codes of the many ThreeFold repositories and work our way up from there. + + + +### The Overall Process + +To give you a bird's view of the whole project, here are the main steps: + +* Create the Docker image +* Push the Docker image to the Docker Hub +* Convert the Docker image to a Zero-OS flist +* Deploy a micro VM with the flist on the ThreeFold Playground + + + +## Docker Image Creation + +As we've said previously, we will not explore all the details of creating an flist from scratch. This would be done in a subsequent guide. For now, we want to take existing codes and work our way from there. This is not only quicker, but it is a good way to get to know the ThreeFold's ecosystem and repositories. + +We will be using the code available on the [ThreeFold Tech's Github page](https://github.com/threefoldtech). In our case, we want to explore the repository [tf-images](https://github.com/threefoldtech/tf-images). + +If you go on the subsection [tfgrid3](https://github.com/threefoldtech/tf-images/tree/development/tfgrid3), you can see many different flists available. In our case, we want to deploy the Debian Linux distribution. It is thus logic to try and find similar Linux distributions to take inspiration from. + +For this case study, we draw inspiration from the [Ubuntu 22.04](https://github.com/threefoldtech/tf-images/tree/development/tfgrid3/ubuntu22.04) directory. + +If we look at the Ubuntu 22.04 directory tree, this is what we get: + +``` +. +├── Dockerfile +├── README.md +├── start.sh +└── zinit + ├── ssh-init.yaml + └── sshd.yaml +``` + +We will now explore each of those files to get a good look at the whole repository and try to understand how it all works together. + +### Dockerfile + +We recall that to make a Docker image, you need to create a Dockerfile. As per [Docker's documentation](https://docs.docker.com/engine/reference/builder/), a Dockerfile is "a Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image". + +The Ubuntu 22.04 Dockerfile is as follows: + +File: `Dockerfile` + +```Dockerfile +FROM ubuntu:22.04 + +RUN apt update && \ + apt -y install wget openssh-server + +RUN wget -O /sbin/zinit https://github.com/threefoldtech/zinit/releases/download/v0.2.5/zinit && \ + chmod +x /sbin/zinit + +COPY zinit /etc/zinit +COPY start.sh /start.sh + +RUN chmod +x /sbin/zinit && chmod +x /start.sh +ENTRYPOINT ["zinit", "init"] +``` + +We can see from the first line that the Dockerfile will look for the docker image `ubuntu:22.04`. In our case, we want to get the Debian 12 docker image. This information is available on the Docker Hub (see [Debian Docker Hub](https://hub.docker.com/_/debian)). + +We will thus need to change the line `FROM ubuntu:22.04` to the line `FROM debian:12`. It isn't more complicated than that! + +We now have the following Dockerfile fore the Debian docker image: + +File: `Dockerfile` + +```Dockerfile +FROM debian:12 + +RUN apt update && \ + apt -y install wget openssh-server + +RUN wget -O /sbin/zinit https://github.com/threefoldtech/zinit/releases/download/v0.2.5/zinit && \ + chmod +x /sbin/zinit + +COPY zinit /etc/zinit +COPY start.sh /start.sh + +RUN chmod +x /sbin/zinit && chmod +x /start.sh +ENTRYPOINT ["zinit", "init"] +``` + +There is nothing more needed here. Pretty fun to start from some existing open-source code, right? + +### Docker Image Script + +The other important file we will be looking at is the `start.sh` file. This is the basic script that will be used to properly set the docker image. Thankfully, there is nothing more to change in this file, we can leave it as is. As we will see later, this file will be executed by zinit when the container starts. + +File: `start.sh` + +```.sh +#!/bin/bash + +mkdir -p /var/run/sshd +mkdir -p /root/.ssh +touch /root/.ssh/authorized_keys + +chmod 700 /root/.ssh +chmod 600 /root/.ssh/authorized_keys + +echo "$SSH_KEY" >> /root/.ssh/authorized_keys +``` + +### zinit Folder + +Next, we want to take a look at the zinit folder. + +But first, what is zinit? In a nutshell, zinit is a process manager (pid 1) that knows how to launch, monitor and sort dependencies. It thus executes targets in the proper order. For more information on zinit, check the [zinit repository](https://github.com/threefoldtech/zinit). + +When we start the Docker container, the files in the folder zinit will be executed. + +If we take a look at the file `ssh-init.yaml`, we find the following: + +```.yaml +exec: bash /start.sh +log: stdout +oneshot: true +```` + +We can see that the first line calls the [bash](https://www.gnu.org/software/bash/) Unix shell and that it will run the file `start.sh` we've seen earlier. + +In this zinit service file, we define a service named `ssh-init.yaml`, where we tell zinit which commands to execute (here `bash /start.sh`), where to log (here in `stdout`) and where `oneshot` is set to `true` (meaning that it should only be executed once). + +If we take a look at the file `sshd.yaml`, we find the following: + +```.yaml +exec: bash -c "/usr/sbin/sshd -D" +after: + - ssh-init +``` + +Here another service `sshd.yaml` runs after the `ssh-init.yaml` process. + +### README.md File + +As every good programmer knows, a good code is nothing without some good documentation to help others understand what's going on! This is where the `README.md` file comes into play. + +In this file, we can explain what our code is doing and offer steps to properly configure the whole deployment. For the users that will want to deploy the flist on the ThreeFold Playground, they would need the FLIst URL and the basic steps to deploy a Micro VM on the TFGrid. We will thus add this information in the README.md file. This information can be seen in the [section below](#deploy-the-flist-on-the-tf-playground). To read the complete README.md file, go to [this link](https://github.com/threefoldtech/tf-images/tree/development/tfgrid3/debian). + +### Putting it All Together + +We've now went through all the files available in the Ubuntu 22.04 directory on the tf-images repository. To build your own image, you would simply need to put all those files in a local folder on your computer and follow the steps presented at the next section, [Docker Publishing Steps](#docker-publishing-steps). + +To have a look at the final result of the changes we bring to the Ubuntu 22.04 version, have a look at this [Debian directory](https://github.com/threefoldtech/tf-images/tree/development/tfgrid3/debian) on the ThreeFold's tf-images repository. + + + +## Docker Publishing Steps + +### Create Account and Access Token + +To be able to push Docker images to the Docker Hub, you obviously need to create a Docker Hub account! This is very easy and please note that there are so many amazing documentation on Docker online. If you're lost, make the most of your favorite search engine and find a way out of the blue. + +Here are the steps to create an account and an access token. + +* Go to the [Docker Hub](https://hub.docker.com/) +* Click `Register` and follow the steps given by Docker +* On the top right corner, click on your account name and select `Account Settings` +* On the left menu, click on `Security` +* Click on `New Access Token` +* Choose an Access Token description that you will easily identify then click `Generate` + * Make sure to set the permissions `Read, Write, Delete` +* Follow the steps given to properly connect your local computer to the Docker Hub + * Run `docker login -u ` + * Set the password + +You now have access to the Docker Hub from your local computer. We will then proceed to push the Docker image we've created. + +### Build and Push the Docker Image + +* Make sure the Docker Daemon is running +* Build the docker container + * Template: + * ``` + docker build -t / . + ``` + * Example: + * ``` + docker build -t username/debian12 . + ``` +* Push the docker container to the [Docker Hub](https://hub.docker.com/) + * Template: + * ``` + docker push / + ``` + * Example: + * ``` + docker push username/debian12 + ``` +* You should now see your docker image on the [Docker Hub](https://hub.docker.com/) when you go into the menu option `My Profile`. + * Note that you can access this link quickly with the following template: + * ``` + https://hub.docker.com/u/ + ``` + + + +## Convert the Docker Image to an Flist + +We will now convert the Docker image into a Zero-OS flist. This part is so easy you will almost be wondering why you never heard about flist before! + +* Go to the [ThreeFold Hub](https://hub.grid.tf/). +* Sign in with the ThreeFold Connect app. +* Go to the [Docker Hub Converter](https://hub.grid.tf/docker-convert) section. +* Next to `Docker Image Name`, add the docker image repository and name, see the example below: + * Template: + * `/docker_image_name:tagname` + * Example: + * `username/debian12:latest` +* Click `Convert the docker image`. +* Once the conversion is done, the flist is available as a public link on the ThreeFold Hub. +* To get the flist URL, go to the [TF Hub main page](https://hub.grid.tf/), scroll down to your 3Bot ID and click on it. +* Under `Name`, you will see all your available flists. +* Right-click on the flist you want and select `Copy Clean Link`. This URL will be used when deploying on the ThreeFold Playground. We show below the template and an example of what the flist URL looks like. + * Template: + * ``` + https://hub.grid.tf/<3BOT_name.3bot>/--.flist + ``` + * Example: + * ``` + https://hub.grid.tf/idrnd.3bot/username-debian12-latest.flist + ``` + + + +## Deploy the Flist on the TF Playground + +* Go to the [ThreeFold Playground](https://play.grid.tf). +* Set your profile manager. +* Go to the [Micro VM](https://play.grid.tf/#/vm) page. +* Choose your parameters (name, VM specs, etc.). +* Under `flist`, paste the Debian flist from the TF Hub you copied previously. +* Make sure the entrypoint is as follows: + * ``` + /sbin/zinit init + ``` +* Choose a 3Node to deploy on +* Click `Deploy` + +That's it! You can now SSH into your Debian deployment and change the world one line of code at a time! + +* + +## Conclusion + +In this case study, we've seen the overall process of creating a new flist to deploy a Debian workload on a Micro VM on the ThreeFold Playground. + +If you have any questions or feedback, please let us know by either writing a post on the [ThreeFold Forum](https://forum.threefold.io/), or by chatting with us on the [TF Grid Tester Community](https://t.me/threefoldtesting) Telegram channel. diff --git a/collections/documentation/developers/flist/flist_case_studies/flist_nextcloud_case_study.md b/collections/documentation/developers/flist/flist_case_studies/flist_nextcloud_case_study.md new file mode 100644 index 0000000..4193317 --- /dev/null +++ b/collections/documentation/developers/flist/flist_case_studies/flist_nextcloud_case_study.md @@ -0,0 +1,858 @@ +

Flist Case Study: Nextcloud All-in-One

+ +

Table of Contents

+ +- [Introduction](#introduction) + - [Flist: What is It?](#flist-what-is-it) + - [Case Study Objective](#case-study-objective) + - [The Overall Process](#the-overall-process) +- [Docker Image Creation](#docker-image-creation) + - [Nextcloud Flist Directory Tree](#nextcloud-flist-directory-tree) + - [Caddyfile](#caddyfile) + - [Dockerfile](#dockerfile) + - [README.md File](#readmemd-file) + - [scripts Folder](#scripts-folder) + - [caddy.sh](#caddysh) + - [sshd\_init.sh](#sshd_initsh) + - [ufw\_init.sh](#ufw_initsh) + - [nextcloud.sh](#nextcloudsh) + - [nextcloud\_conf.sh](#nextcloud_confsh) + - [zinit Folder](#zinit-folder) + - [ssh-init.yaml and sshd.yaml](#ssh-inityaml-and-sshdyaml) + - [ufw-init.yaml and ufw.yaml](#ufw-inityaml-and-ufwyaml) + - [caddy.yaml](#caddyyaml) + - [dockerd.yaml](#dockerdyaml) + - [nextcloud.yaml](#nextcloudyaml) + - [nextcloud-conf.yaml](#nextcloud-confyaml) + - [Putting it All Together](#putting-it-all-together) +- [Docker Publishing Steps](#docker-publishing-steps) + - [Create Account and Access Token](#create-account-and-access-token) + - [Build and Push the Docker Image](#build-and-push-the-docker-image) +- [Convert the Docker Image to an Flist](#convert-the-docker-image-to-an-flist) +- [Deploy Nextcloud AIO on the TFGrid with Terraform](#deploy-nextcloud-aio-on-the-tfgrid-with-terraform) + - [Create the Terraform Files](#create-the-terraform-files) + - [Deploy Nextcloud with Terraform](#deploy-nextcloud-with-terraform) + - [Nextcloud Setup](#nextcloud-setup) +- [Conclusion](#conclusion) + +*** + +# Introduction + +In this case study, we explain how to create a new flist on the ThreeFold ecosystem. We will show the process of creating a Nextcloud All-in-One flist and we will deploy a micro VM on the ThreeFold Playground to access our Nextcloud instance. As a reference, the official Nextcloud flist is available [here](https://hub.grid.tf/tf-official-apps/threefoldtech-nextcloudaio-latest.flist.md). + +To achieve all this, we will need to create a Docker Hub account, create a Dockerfile and its associated files, a docker image and a docker container, then convert the docker image to a Zero-OS flist. After all this, we will be deploying our Nextcloud instance on the ThreeFold Playground. + +As a general advice, before creating an flist for a ThreeFold deployment, you should make sure that you are able to deploy your workload properly by using a micro VM or a full VM on the TFGrid. Once you know all the steps to deploy your workload, and after some thorough tests, you can take what you've learned and incorporate all this into an flist. + +## Flist: What is It? + +Before we go any further, let us recall what is an flist. In short, an flist is a technology for storing and efficiently sharing sets of files. While it has many great features, it's purpose in this case is simply to deliver the image contents to Zero-OS for execution as a micro VM. It thus acts as a bundle of files like a normal archive. + +One convenient thing about the flist technology is that it is possible to convert any Docker image into an flist, thanks to the [ThreeFold Docker Hub Converter tool](https://hub.grid.tf/docker-convert). It is very easy to do and we will show you how to proceed in this case study. For a quick guide on converting Docker images into flists, read [this section](../flist_hub/convert_docker_image.md) of the ThreeFold Manual. + +## Case Study Objective + +The goal of this case study is to give you enough information and tools so that you can build your own flist projects and deploy on the ThreeFold Grid. + +We will explore the different files needed to create the flist and explain the overall process. Instead of starting from scratch, we will analyze the Nextcloud flist directory in the [tf-images](https://github.com/threefoldtech/tf-images/tree/development/tfgrid3/nextcloud) ThreeFold Tech repository. As the project is already done, it will be easier to get an overview of the process and the different components so you can learn to create your own. + +## The Overall Process + +To give you a bird's-eye view of the whole project, here are the main steps: + +* Create the Docker image +* Push the Docker image to the Docker Hub +* Convert the Docker image to a Zero-OS flist +* Deploy a micro VM with the flist on the ThreeFold Playground with Terraform + +One important thing to have in mind is that, when we create an flist, what we are doing is basically automating the required steps to deploy a given workload on the TFGrid. Usually, these steps would be done manually and step-by-step by an individual deploying on a micro or a full VM. + +Once we've successfully created an flist, we thus have a very quick way to deploy a specific workload while always obtaining the same result. This is why it is highly recommended to test a given deployment on a full or micro VM before building an flist. + +For example, in the case of building a Nextcloud All-in-One flist, the prerequisites would be to successfully deploy a Nextcloud AIO instance on a full VM by executing each step sequentially. This specific example is documented in the Terraform section [Nextcloud All-in-One Guide](../../../system_administrators/terraform/advanced/terraform_nextcloud_aio.md) of the System Administrators book. + +In our case, the flist we will be using has some specific configurations depending on the way we deploy Nextcloud (e.g. using or not the gateway and a custom domain). The Terraform **main.tf** we will be sharing later on will thus take all this into account for a smooth deployment. + +# Docker Image Creation + +As we've said previously, we will explore the different components of the existing Nextcloud flist directory. We thus want to check the existing files and try to understand as much as possible how the different components work together. This is also a very good introduction to the ThreeFold ecosystem. + +We will be using the files available on the [ThreeFold Tech Github page](https://github.com/threefoldtech). In our case, we want to explore the repository [tf-images](https://github.com/threefoldtech/tf-images). + +If you go in the subsection [tfgrid3](https://github.com/threefoldtech/tf-images/tree/development/tfgrid3), you can see many different flists available. In our case, we want to deploy the [Nextcloud All-in-One Flist](https://github.com/threefoldtech/tf-images/tree/development/tfgrid3/nextcloud). + +## Nextcloud Flist Directory Tree + +The Nextcloud flist directory tree is the following: + +``` +tree tf-images/tfgrid3/nextcloud +. +├── Caddyfile +├── Dockerfile +├── README.md +├── scripts +│ ├── caddy.sh +│ ├── nextcloud_conf.sh +│ ├── nextcloud.sh +│ ├── sshd_init.sh +│ └── ufw_init.sh +└── zinit + ├── caddy.yaml + ├── dockerd.yaml + ├── nextcloud-conf.yaml + ├── nextcloud.yaml + ├── sshd.yaml + ├── ssh-init.yaml + ├── ufw-init.yaml + └── ufw.yaml +``` + +We can see that the directory is composed of a Caddyfile, a Dockerfile, a README.md and two directories, **scripts** and **zinit**. We will now explore each of those components to have a good grasp of the whole repository and to understand how it all works together. + +To get a big picture of this directory, we could say that the **README.md** file provides the necessary documentation for the users to understand the Nextcloud flist, how it is built and how it works, the **Caddyfile** provides the necessary requirements to run the reverse proxy, the **Dockerfile** specifies how the Docker image is built, installing things such as [openssh](https://www.openssh.com/) and the [ufw firewall](https://wiki.ubuntu.com/UncomplicatedFirewall) for secure remote connection, while the two folders, **scripts** and **zinit**, could be said to work hand-in-hand. + +Each `.yaml` file is a *unit file* for zinit. That means it specifies a single service for zinit to start. We'll learn more about these files later, but for now we can just note that each script file (ending with `.sh`) has an associated zinit file to make sure that the script is run. There are also some other files for running programs aside from our scripts. + +## Caddyfile + +For our Nextcloud deployment, we are using Caddy as a reverse proxy. A reverse proxy is an application that sits in front of back-end applications and forwards client requests to those applications. + +Since Nextcloud AIO actually includes two web applications, both Nextcloud itself and the AIO management interface, we use the reverse proxy to serve them both on a single domain. It also allows us to make some changes on the fly to the content of the AIO site to considerably enhance the user experience. Finally, we also use Caddy to provide SSL termination if the user reserves a public IP and no gateway, since otherwise SSL termination is provided by the gateway. + +File: `Caddyfile` +``` +{ + order replace after encode + servers { + trusted_proxies static 100.64.0.0/10 10.0.0.0/8 + } +} + + +{$DOMAIN}:{$PORT} { + handle_path /aio* { + replace { + href="/ href="/aio/ + src="/ src="/aio/ + action=" action="/aio + url(' url('/aio + `value="" placeholder="nextcloud.yourdomain.com"` `value="{$DOMAIN}"` + `"Submit domain"` `"Submit domain" id="domain-submit"` + {$REPLACEMENTS} + {$BODY} + } + + reverse_proxy localhost:8000 { + header_down Location "^/(.*)$" "/aio/$1" + header_down Refresh "^/(.*)$" "/aio/$1" + } + + } + + redir /api/auth/getlogin /aio{uri} + + reverse_proxy localhost:11000 + + handle_errors { + @502-aio expression {err.status_code} == 502 && path('/aio*') + handle @502-aio { + header Content-Type text/html + respond < + Nextcloud + Your Nextcloud management interface isn't ready. If you just deployed this instance, please wait a minute and refresh the page. + + HTML 200 + } + + @502 expression {err.status_code} == 502 + handle @502 { + redir /* /aio + } + } +} +``` + +We can see in the first section (`trusted_proxies static`) that we set a range of IP addresses as trusted proxy addresses. These include the possible source addresses for gateway traffic, which we mark as trusted for compatibility with some Nextcloud features. + +After the global config at the top, the line `{$DOMAIN}:{$PORT}` defines the port that Caddy will listen to and the domain that we are using for our site. This is important, because in the case that port `443` is specified, Caddy will handle SSL certificates automatically. + +The following blocks define behavior for different URL paths that users might try to access. + +To begin, we have `/aio*`. This is how we place the AIO management app in a "subfolder" of our main domain. To accomplish that we need a few rules that rewrite the contents of the returned pages to correct the links. We also add some text replacements here to accomplish the enhancements mentioned earlier, like automatically filling the domain entry field. + +With the `reverse_proxy` line, we specify that requests to all URLs starting with `/aio` should be sent to the web server running on port `8000` of `localhost`. That's the port where the AIO server is listening, as we'll see below. There's also a couple of header rewrite rules here that correct the links for any redirects the AIO site makes. + +The `redir` line is needed to support a feature where users open the AIO interface from within Nextcloud. This redirects the original request to the correct equivalent within the `/aio` "subfolder". + +Then there's a second `reverse_proxy` line, which is the catch-all for any traffic that didn't get intercepted earlier. This handles the actual Nextcloud app and sends the traffic to its separate server running on port `11000`. + +The section starting with `handle_errors` ensures that the user will receive an understandable error message when trying to access the Nextcloud deployment before it has fully started up. + +## Dockerfile + +We recall that to make a Docker image, you need to create a Dockerfile. As per the [Docker documentation](https://docs.docker.com/engine/reference/builder/), a Dockerfile is "a text document that contains all the commands a user could call on the command line to assemble an image". + +File: `Dockerfile` + +```Dockerfile +FROM ubuntu:22.04 + +RUN apt update && \ + apt -y install wget openssh-server curl sudo ufw inotify-tools iproute2 + +RUN wget -O /sbin/zinit https://github.com/threefoldtech/zinit/releases/download/v0.2.5/zinit && \ + chmod +x /sbin/zinit + +RUN wget -O /sbin/caddy 'https://caddyserver.com/api/download?os=linux&arch=amd64&p=github.com%2Fcaddyserver%2Freplace-response&idempotency=43631173212363' && \ + chmod +x /sbin/caddy + +RUN curl -fsSL https://get.docker.com -o /usr/local/bin/install-docker.sh && \ + chmod +x /usr/local/bin/install-docker.sh + +RUN sh /usr/local/bin/install-docker.sh + +COPY ./Caddyfile /etc/caddy/ +COPY ./scripts/ /scripts/ +COPY ./zinit/ /etc/zinit/ +RUN chmod +x /scripts/*.sh + +ENTRYPOINT ["/sbin/zinit", "init"] +``` + +We can see from the first line that this Dockerfile uses a base image of Ubuntu Linux version 22.04. + +With the first **RUN** command, we refresh the package lists, and then install **openssh**, **ufw** and other dependencies for our Nextcloud uses. Note that we also install **curl** so that we can quickly install **Docker**. + +With the second **RUN** command, we install **zinit** and we give it execution permission with the command `chmod +x`. More will be said about zinit in a section below. + +With the third **RUN** command, we install **caddy** and we give it execution permission with the command `chmod +x`. Caddy is an extensible, cross-platform, open-source web server written in Go. For more information on Caddy, check the [Caddy website](https://caddyserver.com/). + +With fourth **RUN** command, we download and give proper permissions to the script `install-docker.sh`. On a terminal, the common line to install Docker would be `curl -fsSL https://get.docker.com | sudo sh`. To understand really what's going here, we can simply go to the link provided at the line [https://get.docker.com](https://get.docker.com) for more information. + +The fifth **RUN** command runs the `install-docker.sh` script to properly install Docker within the image. + +Once those commands are run, we proceed to copy into our Docker image the necessary folders `scripts` and `zinit` as well as the Caddyfile. Once this is done, we give execution permissions to all scripts in the scripts folder using `chmod +x`. + +Finally, we set an entrypoint in our Dockerfile. As per the [Docker documentation](https://docs.docker.com/engine/reference/builder/), an entrypoint "allows you to configure a container that will run as an executable". Since we are using zinit, we set the entrypoint `/sbin/zinit`. + +## README.md File + +The **README.md** file has the main goal of explaining clearly to the user the functioning of the Nextcloud directory and its associated flist. In this file, we can explain what our code is doing and offer steps to properly configure the whole deployment. + +We also give the necessary steps to create the Docker image and convert it into an flist starting directly with the Nextcloud directory. This can be useful for users that want to create their own flist, instead of using the [official ThreeFold Nextcloud flist](https://hub.grid.tf/tf-official-apps/threefoldtech-nextcloudaio-latest.flist.md). + +To read the complete README.md file, go to [this link](https://github.com/threefoldtech/tf-images/blob/development/tfgrid3/nextcloud/README.md). + +## scripts Folder + +The **scripts** folder contains without surprise the scripts necessary to run the Nextcloud instance. + +In the Nextcloud Flist case, there are five scripts: + +* **caddy.sh** +* **nextcloud.sh** +* **nextcloud_conf.sh** +* **sshd_init.sh** +* **ufw_init.sh** + +Let's take a look at each of them. + +### caddy.sh + +File: `caddy.sh` + +```bash +#!/bin/bash +export DOMAIN=$NEXTCLOUD_DOMAIN + +if $IPV4 && ! $GATEWAY; then + export PORT=443 +else + export PORT=80 +fi + +if $IPV4; then + export BODY="\`\`" + +else + export BODY="\`\`" + + export REPLACEMENTS=' `name="talk"` `name="talk" disabled` + `needs ports 3478/TCP and 3478/UDP open/forwarded in your firewall/router` `running the Talk container requires a public IP and this VM does not have one. It is still possible to use Talk in a limited capacity. Please consult the documentation for details`' +fi + +caddy run --config /etc/caddy/Caddyfile +``` + +The script **caddy.sh** sets the proper port depending on the network configuration (e.g. IPv4 or Gateway) in the first if/else section. In the second if/else section, the script also makes sure that the proper domain is given to Nextcloud All-in-One. This quickens the installation process as the user doesn't have to set the domain in Nextcloud AIO after deployment. We also disable a feature that's not relevant if the user didn't reserve an IPv4 address and we insert a note about that. + +### sshd_init.sh + +File: `sshd_init.sh` + +```bash +#!/bin/bash + +mkdir -p ~/.ssh +mkdir -p /var/run/sshd +chmod 600 ~/.ssh +chmod 600 /etc/ssh/* +echo $SSH_KEY >> ~/.ssh/authorized_keys +``` + +This file starts with a shebang (`#!`) that instructs the operating system to execute the following lines using the [Bash shell](https://www.gnu.org/software/bash/). In essence, it lets us write `./sshd_init.sh` with the same outcome as `bash ./sshd_init.sh`, assuming the file is executable. + +The goal of this script is to add the public key within the VM in order for the user to get a secure and remote connection to the VM. The two lines starting with `mkdir` create the necessary folders. The lines starting with `chmod` give the owner the permission to write and read the content within the folders. Finally, the line `echo` will write the public SSH key in a file within the VM. In the case that the flist is used as a weblet, the SSH key is set in the Playground profile manager and passed as an environment variable when we deploy the solution. + +### ufw_init.sh + +File: `ufw_init.sh` + +```bash +#!/bin/bash + +ufw default deny incoming +ufw default allow outgoing +ufw allow ssh +ufw allow http +ufw allow https +ufw allow 8443 +ufw allow 3478 +ufw limit ssh +``` + +The goal of the `ufw_init.sh` script is to set the correct firewall parameters to make sure that our deployment is secure while also providing the necessary access for the Nextcloud users. + +The first two lines starting with `ufw default` are self-explanatory. We want to restrain incoming traffic while making sure that outgoing traffic has no restraints. + +The lines starting with `ufw allow` open the ports necessary for our Nextcloud instance. We note that **ssh** is port 22, **http** is port 80 and **https** is port 443. This means, for example, that the line `ufw allow 22` is equivalent to the line `ufw allow ssh`. + +Port 8443 can be used to access the AIO interface, as an alternative to using the `/aio` "subfolder" on deployments with a public IPv4 address. Finally, the port 3478 is used for Nextcloud Talk. + +The line `ufw limit ssh` will provide additional security by denying connection from IP addresses that attempt to initiate 6 or more connections within a 30-second period. + +### nextcloud.sh + +File: `nextcloud.sh` + +```bash +#!/bin/bash + +export COMPOSE_HTTP_TIMEOUT=800 +while ! docker info > /dev/null 2>&1; do + echo docker not ready + sleep 2 +done + +docker run \ +--init \ +--sig-proxy=false \ +--name nextcloud-aio-mastercontainer \ +--restart always \ +--publish 8000:8000 \ +--publish 8080:8080 \ +--env APACHE_PORT=11000 \ +--env APACHE_IP_BINDING=0.0.0.0 \ +--env SKIP_DOMAIN_VALIDATION=true \ +--volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config \ +--volume /var/run/docker.sock:/var/run/docker.sock:ro \ +nextcloud/all-in-one:latest +``` + +The **nextcloud.sh** script is where the real action starts. This is where we run the Nextcloud All-in-One docker image. + +Before discussing the main part of this script, we note that the `while` loop is used to ensure that the `docker run` command starts only after the Docker daemon has properly started. + +The code section starting with `docker run` is taken from the [Nextcloud All-in-One repository on Github](https://github.com/nextcloud/all-in-one) with some slight modifications. The last line indicates that the Docker image being pulled will always be the latest version of Nextcloud All-in-One. + +We note here that Nextcloud AIO is published on the port 8000 and 8080. We also note that we set restart to **always**. This is very important as it will make sure that the Nextcloud instance is restarted if the Docker daemon reboots. We take the opportunity to note that the way zinit configures micro VMs, the Docker daemon restarts automatically after a reboot. Thus, this latter fact combined with the line `--restart always` ensures that the user that the Nextcloud instance will restart after a VM reboot. + +We also set **11000** as the Apache port with an IP binding of **0.0.0.0**. For our deployment, we want to skip the domain validation, thus it is set to **true**. + +Considering the line `--sig-proxy=false`, when this command is run interactively, it prevents the user from accidentally killing the spawned AIO container. While it is not of great importance in our case, it means that zinit will not kill the container if the service is stopped. + +For more information on this, we invite the readers to consult the [Nextcloud documentation](https://github.com/nextcloud/all-in-one#how-to-use-this). + +### nextcloud_conf.sh + +File: `nextcloud_conf.sh` + +```bash +#!/bin/bash + +# Wait for the nextcloud container to become healthy. Note that we can set the +# richtext config parameters even before the app is installed + +nc_ready () { + until [[ "`docker inspect -f {{.State.Health.Status}} nextcloud-aio-nextcloud 2> /dev/null`" == "healthy" ]]; do + sleep 1; + done; +} + +# When a gateway is used, AIO sets the WOPI allow list to only include the +# gateway IP. Since requests don't originate from the gateway IP, they are +# blocked by default. Here we add the public IP of the VM, or of the router +# upstream of the node +# See: github.com/nextcloud/security-advisories/security/advisories/GHSA-24x8-h6m2-9jf2 + +if $IPV4; then + interface=$(ip route show default | cut -d " " -f 5) + ipv4_address=$(ip a show $interface | grep -Po 'inet \K[\d.]+') +fi + +if $GATEWAY; then + nc_ready + wopi_list=$(docker exec --user www-data nextcloud-aio-nextcloud php occ config:app:get richdocuments wopi_allowlist) + + if $IPV4; then + ip=$ipv4_address + else + ip=$(curl -fs https://ipinfo.io/ip) + fi + + if [[ $ip ]] && ! echo $wopi_list | grep -q $ip; then + docker exec --user www-data nextcloud-aio-nextcloud php occ config:app:set richdocuments wopi_allowlist --value=$ip + fi +fi + + +# If the VM has a gateway and a public IPv4, then AIO will set the STUN/TURN +# servers to the gateway domain which does not point to the public IP, so we +# use the IP instead. In this case, we must wait for the Talk app to be +# installed before changing the settings. With inotifywait, we don't need +# a busy loop that could run indefinitely + +apps_dir=/mnt/data/docker/volumes/nextcloud_aio_nextcloud/_data/custom_apps/ + +if $GATEWAY && $IPV4; then + if [[ ! -d ${apps_dir}spreed ]]; then + inotifywait -qq -e create --include spreed $apps_dir + fi + nc_ready + + turn_list=$(docker exec --user www-data nextcloud-aio-nextcloud php occ talk:turn:list) + turn_secret=$(echo "$turn_list" | grep secret | cut -d " " -f 4) + turn_server=$(echo "$turn_list" | grep server | cut -d " " -f 4) + + if ! echo $turn_server | grep -q $ipv4_address; then + docker exec --user www-data nextcloud-aio-nextcloud php occ talk:turn:delete turn $turn_server udp,tcp + docker exec --user www-data nextcloud-aio-nextcloud php occ talk:turn:add turn $ipv4_address:3478 udp,tcp --secret=$turn_secret + fi + + stun_list=$(docker exec --user www-data nextcloud-aio-nextcloud php occ talk:stun:list) + stun_server=$(echo $stun_list | cut -d " " -f 2) + + if ! echo $stun_server | grep -q $ipv4_address; then + docker exec --user www-data nextcloud-aio-nextcloud php occ talk:stun:add $ipv4_address:3478 + docker exec --user www-data nextcloud-aio-nextcloud php occ talk:stun:delete $stun_server + fi +fi +``` + +The script **nextcloud_conf.sh** ensures that the network settings are properly configured. In the first section, we use a function called **nc_ready ()**. This function will makes sure that the rest of the script only starts when the Nextcloud container is healthy. + +We note that the comments present in this script explain very well what is happening. In short, we want to set the Nextcloud instance according to the user's choice of network. For example, the user can decide to deploy using a ThreeFold gateway or a standard IPv4 connection. If the VM has a gateway and a public IPv4, then Nextcloud All-in-One will set the STUN/TURN servers to the gateway domain which does not point to the public IP, so we use the IP instead. + +## zinit Folder + +Next, we want to take a look at the zinit folder. + +But first, what is zinit? In a nutshell, zinit is a process manager (pid 1) that knows how to launch, monitor and sort dependencies. It thus executes targets in the proper order. For more information on zinit, check the [zinit repository](https://github.com/threefoldtech/zinit). + +When we start the Docker container, zinit will parse each unit file in the `/etc/zinit` folder and execute the contained command according to the specified parameters. + +In the Nextcloud Flist case, there are eight **.yaml** files: + +* **caddy.yaml** +* **dockerd.yaml** +* **nextcloud-conf.yaml** +* **nextcloud.yaml** +* **ssh-init.yaml** +* **sshd.yaml** +* **ufw-init.yaml** +* **ufw.yaml** + + +### ssh-init.yaml and sshd.yaml + +We start by taking a look at the **ssh-init.yaml** and **sshd.yaml** files. + +File: `ssh-init.yaml` + +```yaml +exec: /scripts/sshd_init.sh +oneshot: true +``` + +In this zinit service file, we define a service named `ssh-init.yaml`, where we tell zinit to execute the following command: `exec: /scripts/sshd_init.sh`. This unit file thus runs the script `sshd_init.sh` we covered in a previous section. + +We also note that `oneshot` is set to `true` and this means that it should only be executed once. This directive is often used for setup scripts that only need to run once. When it is not specified, the default value of `false` means that zinit will continue to start up a service if it ever dies. + +Now, we take a look at the file `sshd.yaml`: + +File: `sshd.yaml` + +```yaml +exec: bash -c "/usr/sbin/sshd -D" +after: + - ssh-init +``` + +We can see that this file executes a line from the Bash shell. It is important to note that, with zinit and .yaml files, you can easily order the executions of the files with the `after` directive. In this case, it means that the service `sshd` will only run after `ssh-init`. + +### ufw-init.yaml and ufw.yaml + +Let's take a look at the files **ufw-init.yaml** and **ufw.yaml**. + +File: `ufw-init.yaml` + +```yaml +exec: /scripts/ufw_init.sh +oneshot: true +``` + +The file `ufw-init.yaml` is very similar to the previous file `ssh-init.yaml`. + +File: `ufw.yaml` + +```yaml +exec: ufw --force enable +oneshot: true +after: + - ufw-init +``` + +We can see that the file `ufw.yaml` will only run once and only after the file `ufw-init.yaml` has been run. This is important since the file `ufw-init.yaml` executes the script `ufw_init.sh`. We recall this script allows different ports in the firewall. Once those ports are defined, we can then run the command `ufw --force enable`. This will start the ufw firewall. + +### caddy.yaml + +```yaml +exec: /scripts/caddy.sh +oneshot: true +``` + +This is also very similar to previous files and just runs the Caddy script as a oneshot. + +### dockerd.yaml + +We now take a look at the file **dockerd.yaml**. + +File: `dockerd.yaml` + +```yaml +exec: /usr/bin/dockerd --data-root /mnt/data/docker +``` + +This file will run the [dockerd daemon](https://docs.docker.com/engine/reference/commandline/dockerd/) which is the persistent process that manages containers. We also note that it sets the data to be stored in the directory **/mnt/data/docker**, which is important because we will mount a virtual disk there that will provide better performance, especially for Docker's storage driver. + +### nextcloud.yaml + +File: `nextcloud.yaml` + +```yaml +exec: /scripts/nextcloud.sh +after: + - dockerd +``` + +The file `nextcloud.yaml` runs after dockerd. + +This file will execute the `nextcloud.sh` script we saw earlier. We recall that this script starts the Nextcloud All-in-One image. + +### nextcloud-conf.yaml + +File: `nextcloud-conf.yaml` + +```yaml +exec: /scripts/nextcloud_conf.sh +oneshot: true +after: + - nextcloud +``` + +Finally, the file `nextcloud-conf.yaml` runs after `nextcloud.yaml`. + +This file will execute the `nextcloud-conf.sh` script we saw earlier. We recall that this script starts the Nextcloud All-in-One image. At this point, the deployment is complete. + +## Putting it All Together + +We've now gone through all the files in the Nextcloud flist directory. You should now have a proper understanding of the interplay between the zinit (.yaml) and the scripts (.sh) files as well as the basic steps to build a Dockerfile and to write clear documentation. + +To build your own Nextcloud docker image, you would simply need to clone this directory to your local computer and to follow the steps presented in the next section [Docker Publishing Steps](#docker-publishing-steps). + +To have a look at the complete directory, you can always refer to the [Nextcloud flist directory](https://github.com/threefoldtech/tf-images/tree/development/tfgrid3/nextcloud) on the ThreeFold tf-images repository. + +# Docker Publishing Steps + +In this section, we show the necessary steps to publish the Docker image to the Docker Hub. + +To do so, we need to create an account and an access token. Then we will build the Docker image and push it to the Docker Hub. + +## Create Account and Access Token + +To be able to push Docker images to the Docker Hub, you obviously need to create a Docker Hub account! This is very easy and note that there are many great tutorials online about Docker. + +Here are the steps to create an account and an access token: + +* Go to the [Docker Hub](https://hub.docker.com/) +* Click `Register` and follow the steps given by Docker +* On the top right corner, click on your account name and select `Account Settings` +* On the left menu, click on `Security` +* Click on `New Access Token` +* Choose an Access Token description that you will easily identify then click `Generate` + * Make sure to set the permissions `Read, Write, Delete` +* On your local computer, make sure that the Docker daemon is running +* Write the following in the command line to connect to the Docker hub: + * Run `docker login -u ` + * Set the password + +You now have access to the Docker Hub from your local computer. We will then proceed to push the Docker image to the Docker Hub. + +## Build and Push the Docker Image + +* Make sure the Docker Daemon is running +* Build the docker container (note that, while the tag is optional, it can help to track different versions) + * Template: + * ``` + docker build -t /: . + ``` + * Example: + * ``` + docker build -t dockerhubuser/nextcloudaio . + ``` +* Push the docker container to the [Docker Hub](https://hub.docker.com/) + * Template: + * ``` + docker push / + ``` + * Example: + * ``` + docker push dockerhubuser/nextcloudaio + ``` +* You should now see your docker image on the [Docker Hub](https://hub.docker.com/) when you go into the menu option `My Profile`. + * Note that you can access this link quickly with the following template: + * ``` + https://hub.docker.com/u/ + ``` + +# Convert the Docker Image to an Flist + +We will now convert the Docker image into a Zero-OS flist. + +* Go to the [ThreeFold Hub](https://hub.grid.tf/). +* Sign in with the ThreeFold Connect app. +* Go to the [Docker Hub Converter](https://hub.grid.tf/docker-convert) section. +* Next to `Docker Image Name`, add the docker image repository and name, see the example below: + * Template: + * `/docker_image_name:tagname` + * Example: + * `dockerhubuser/nextcloudaio:latest` +* Click `Convert the docker image`. +* Once the conversion is done, the flist is available as a public link on the ThreeFold Hub. +* To get the flist URL, go to the [TF Hub main page](https://hub.grid.tf/), scroll down to your 3Bot ID and click on it. +* Under `Name`, you will see all your available flists. +* Right-click on the flist you want and select `Copy Clean Link`. This URL will be used when deploying on the ThreeFold Playground. We show below the template and an example of what the flist URL looks like. + * Template: + * ``` + https://hub.grid.tf/<3BOT_name.3bot>/--.flist + ``` + * Example: + * ``` + https://hub.grid.tf/tf-official-apps/threefoldtech-nextcloudaio-latest.flist + ``` + +# Deploy Nextcloud AIO on the TFGrid with Terraform + +We now proceed to deploy a Nextcloud All-in-One instance by using the Nextcloud flist we've just created. + +To do so, we will deploy a micro VM with the Nextcloud flist on the TFGrid using Terraform. + +## Create the Terraform Files + +For this guide, we use two files to deploy with Terraform. The first file contains the environment variables and the second file contains the parameters to deploy our workloads. + +To facilitate the deployment, only the environment variables file needs to be adjusted. The **main.tf** file contains the environment variables (e.g. **var.size** for the disk size) and thus you do not need to change this file. Of course, you can adjust the deployment based on your preferences. That being said, it should be easy to deploy the Terraform deployment with the main.tf as is. + +For this example, we will be deployment with a ThreeFold gateway as well as a gateway domain. + +* Copy the following content and save the file under the name `credentials.auto.tfvars`: + +``` +mnemonics = "..." +network = "main" +SSH_KEY = "..." + +size = "50" +cpu = "2" +memory = "4096" + +gateway_id = "50" +vm1_id = "5453" + +deployment_name = "nextcloudgateway" +nextcloud_flist = "https://hub.grid.tf/tf-official-apps/threefoldtech-nextcloudaio-latest.flist" +``` + +Make sure to add your own seed phrase and SSH public key. Simply replace the three dots by the content. Note that you can deploy on a different node than node 5453 for the **vm1** node. If you want to deploy on another node than node 5453 for the **gateway** node, make sure that you choose a gateway node. To find a gateway node, go on the [ThreeFold Dashboard](https://dashboard.grid.tf/) Nodes section of the Explorer and select **Gateways (Only)**. + +Obviously, you can decide to increase or modify the quantity in the variables `size`, `cpu` and `memory`. + +Note that in our case, we set the flist to be the official Nextcloud flist. Simply replace the URL with your newly created Nextcloud flist to test it! + +* Copy the following content and save the file under the name `main.tf`: + +``` +variable "mnemonics" { + type = string + default = "your mnemonics" +} + +variable "network" { + type = string + default = "main" +} + +variable "SSH_KEY" { + type = string + default = "your SSH pub key" +} + +variable "deployment_name" { + type = string +} + +variable "size" { + type = string +} + +variable "cpu" { + type = string +} + +variable "memory" { + type = string +} + +variable "nextcloud_flist" { + type = string +} + +variable "gateway_id" { + type = string +} + +variable "vm1_id" { + type = string +} + + +terraform { + required_providers { + grid = { + source = "threefoldtech/grid" + } + } +} + +provider "grid" { + mnemonics = var.mnemonics + network = var.network +} + +data "grid_gateway_domain" "domain" { + node = var.gateway_id + name = var.deployment_name +} + +resource "grid_network" "net" { + nodes = [var.gateway_id, var.vm1_id] + ip_range = "10.1.0.0/16" + name = "network" + description = "My network" + add_wg_access = true +} + +resource "grid_deployment" "d1" { + node = var.vm1_id + network_name = grid_network.net.name + + disks { + name = "data" + size = var.size + } + + vms { + name = "vm1" + flist = var.nextcloud_flist + cpu = var.cpu + memory = var.memory + rootfs_size = 15000 + entrypoint = "/sbin/zinit init" + env_vars = { + SSH_KEY = var.SSH_KEY + GATEWAY = "true" + IPV4 = "false" + NEXTCLOUD_DOMAIN = data.grid_gateway_domain.domain.fqdn + } + mounts { + disk_name = "data" + mount_point = "/mnt/data" + } + } +} + +resource "grid_name_proxy" "p1" { + node = var.gateway_id + name = data.grid_gateway_domain.domain.name + backends = [format("http://%s:80", grid_deployment.d1.vms[0].ip)] + network = grid_network.net.name + tls_passthrough = false +} + +output "wg_config" { + value = grid_network.net.access_wg_config +} + +output "vm1_ip" { + value = grid_deployment.d1.vms[0].ip +} +output "vm1_ygg_ip" { + value = grid_deployment.d1.vms[0].ygg_ip +} + +output "fqdn" { + value = data.grid_gateway_domain.domain.fqdn +} +``` + +## Deploy Nextcloud with Terraform + +We now deploy Nextcloud with Terraform. Make sure that you are in the correct folder containing the main and variables files. + +* Initialize Terraform: + * ``` + terraform init + ``` + +* Apply Terraform to deploy Nextcloud: + * ``` + terraform apply + ``` + +Note that, at any moment, if you want to see the information on your Terraform deployment, write the following: + * ``` + terraform show + ``` + +## Nextcloud Setup + +Once you've deployed Nextcloud, you can access the Nextcloud setup page by pasting the URL displayed on the line `fqdn = "..."` of the Terraform output. + +# Conclusion + +In this case study, we've seen the overall process of creating a new flist to deploy a Nextcloud instance on a Micro VM on the TFGrid with Terraform. + +If you have any questions or feedback, please let us know by either writing a post on the [ThreeFold Forum](https://forum.threefold.io/), or by chatting with us on the [TF Grid Tester Community](https://t.me/threefoldtesting) Telegram channel. \ No newline at end of file diff --git a/collections/documentation/developers/flist/flist_case_studies/img/nextcloud_logo.jpeg b/collections/documentation/developers/flist/flist_case_studies/img/nextcloud_logo.jpeg new file mode 100644 index 0000000..2d85227 Binary files /dev/null and b/collections/documentation/developers/flist/flist_case_studies/img/nextcloud_logo.jpeg differ diff --git a/collections/documentation/developers/flist/flist_hub/api_token.md b/collections/documentation/developers/flist/flist_hub/api_token.md new file mode 100644 index 0000000..5ca968f --- /dev/null +++ b/collections/documentation/developers/flist/flist_hub/api_token.md @@ -0,0 +1,33 @@ +

TF Hub API Token

+ +

Table of Contents

+ +- [Generate an API Token](#generate-an-api-token) +- [Verify the Token Validity](#verify-the-token-validity) + +*** + +## Generate an API Token + +To generate an API Token on the TF Hub, follow those steps: + +* Go to the [ThreeFold Hub](https://hub.grid.tf/) +* Open the top right drop-down menu +* Click on `Generate API Token` +* Take note of the token and keep it somewhere safe + +## Verify the Token Validity + +To make sure the generated token is valid, in the terminal write the following with your own API Token: + +```bash +curl -H "Authorization: bearer " https://hub.grid.tf/api/flist/me +``` + +You should see the following line with your own 3BotID + +```bash +{"status": "success", "payload": {"username": "<3BotID>.3bot"}} +``` + +You can then use this API Token in the terminal to [get and update information through the API](./zos_hub.md#get-and-update-information-through-the-api). \ No newline at end of file diff --git a/collections/documentation/developers/flist/flist_hub/convert_docker_image.md b/collections/documentation/developers/flist/flist_hub/convert_docker_image.md new file mode 100644 index 0000000..e6fba85 --- /dev/null +++ b/collections/documentation/developers/flist/flist_hub/convert_docker_image.md @@ -0,0 +1,45 @@ +

Convert Docker Image to Flist

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Upload the Image](#upload-the-image) +- [Flist on the Hub](#flist-on-the-hub) + +*** + +## Introduction + +We show the steps to convert a docker image to an Flist. + +## Upload the Image + +1. Upload the Docker image to Docker Hub with the following command: + +```bash +docker push +``` + +2. Navigate to the docker converter link: https://hub.grid.tf/docker-convert + ![ ](./img/docker_convert.png) + +3. Copy the name of the uploaded Docker image to the Docker Image Name field. + +4. Then press the convert button. + +When the image is ready, some information will be displayed. + +![ ](./img/flist_ready.png) + +## Flist on the Hub + +To navigate to the created flist, you can search with the newly created file name in the search tab. + +![ ](./img/search.png) + +You can also navigate to your repository in the contributors section from the Zero-Os Hub and navigate to the newly created flist. + +Then press the preview button to display the flist's url and some other data. + +![ ](./img/preview.png) + diff --git a/collections/documentation/developers/flist/flist_hub/img/docker_convert.png b/collections/documentation/developers/flist/flist_hub/img/docker_convert.png new file mode 100644 index 0000000..1be37a0 Binary files /dev/null and b/collections/documentation/developers/flist/flist_hub/img/docker_convert.png differ diff --git a/collections/documentation/developers/flist/flist_hub/img/flist_ready.png b/collections/documentation/developers/flist/flist_hub/img/flist_ready.png new file mode 100644 index 0000000..c8913d5 Binary files /dev/null and b/collections/documentation/developers/flist/flist_hub/img/flist_ready.png differ diff --git a/collections/documentation/developers/flist/flist_hub/img/hub_flist.png b/collections/documentation/developers/flist/flist_hub/img/hub_flist.png new file mode 100644 index 0000000..3e5331a Binary files /dev/null and b/collections/documentation/developers/flist/flist_hub/img/hub_flist.png differ diff --git a/collections/documentation/developers/flist/flist_hub/img/preview.png b/collections/documentation/developers/flist/flist_hub/img/preview.png new file mode 100644 index 0000000..bd555ca Binary files /dev/null and b/collections/documentation/developers/flist/flist_hub/img/preview.png differ diff --git a/collections/documentation/developers/flist/flist_hub/img/search.png b/collections/documentation/developers/flist/flist_hub/img/search.png new file mode 100644 index 0000000..a5128cd Binary files /dev/null and b/collections/documentation/developers/flist/flist_hub/img/search.png differ diff --git a/collections/documentation/developers/flist/flist_hub/zos_hub.md b/collections/documentation/developers/flist/flist_hub/zos_hub.md new file mode 100644 index 0000000..95bdbc8 --- /dev/null +++ b/collections/documentation/developers/flist/flist_hub/zos_hub.md @@ -0,0 +1,142 @@ +

Zero-OS Hub

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Upload Your Files](#upload-your-files) +- [Merge Multiple Flists](#merge-multiple-flists) +- [Convert Docker Images and Tar Files](#convert-docker-images-and-tar-files) +- [Upload Customize Flists](#upload-customize-flists) +- [Upload Homemade Flists](#upload-homemade-flists) +- [Upload your Existing Flist to Reduce Bandwidth](#upload-your-existing-flist-to-reduce-bandwidth) +- [Authenticate via 3Bot](#authenticate-via-3bot) +- [Get and Update Information Through the API](#get-and-update-information-through-the-api) + - [Public API Endpoints (No Authentication Required)](#public-api-endpoints-no-authentication-required) + - [Restricted API Endpoints (Authentication Required)](#restricted-api-endpoints-authentication-required) + - [API Request Templates and Examples](#api-request-templates-and-examples) + +*** + +## Introduction + +The [ThreeFold Zero-OS Hub](https://hub.grid.tf/) allows you to do multiple things and acts as a public centralization of flists. + +The ZOS Hub is mainly there to gives an easy way to distribute flist files, which are databases of metadata that you can use in any Zero-OS container or virtual machine. + +## Upload Your Files +In order to publish easily your files, you can upload a `.tar.gz` and the hub will convert it automatically to a flist +and store the contents in the hub backend. After that you can use your flist directly on a container. + +## Merge Multiple Flists +In order to reduce the maintenance of your images, products, etc. flist allows you to keep your +different products and files separately and then merge them with another flist to make it usable without +keeping the system up-to-date. + +Example: there is an official `ubuntu 16.04` flist image, you can make a flist which contains your application files +and then merge your flist with ubuntu, so the resulting flist is your product on the last version of ubunbu. +You don't need to take care about the base system yourself, just merge it with the one provided. + +## Convert Docker Images and Tar Files + +The ZOS Hub allows you to convert Docker Hub images and Tar files into flists thanks to the Docker Hub Converter. + +You can convert a docker image (eg: `busybox`, `ubuntu`, `fedora`, `couchdb`, ...) to an flist directly from the backend, this allows you to use your existing docker image in our infrastructure out-of-the-box. Go to the [Docker Hub Converter](https://hub.grid.tf/docker-convert) to use this feature. For more information on the process, read the section [Convert Docker Image to flist](./convert_docker_image.md) of the TF Manual. + +You can also easily convert a Tar file into an flist via the [Upload section](https://hub.grid.tf/upload) of the ZOS Hub. + +## Upload Customize Flists + +The ZOS Hub also allows you to customize an flist via the [Customization section](https://hub.grid.tf/merge) of the ZOS Hub. Note that this is currently in beta. + +## Upload Homemade Flists + +The ZOS Hub allows you to upload flist that you've made yourself via the section [Upload a homemade flist](https://hub.grid.tf/upload-flist). + +## Upload your Existing Flist to Reduce Bandwidth +In addition with the hub-client (a side product) you can upload efficiently contents of file +to make the backend up-to-date and upload a self-made flist. This allows you to do all the jobs yourself +and gives you the full control of the chain. The only restriction is that the contents of the files you host +on the flist needs to exists on the backend, otherwise your flist will be rejected. + +## Authenticate via 3Bot +All the operations on the ZOS Hub needs to be done via a `3Bot` (default) authentication. Only downloading a flist can be done anonymously. To authenticate request via the API, you need to generate an API Token as shown in the section [ZOS Hub API Token](./api_token.md). + +## Get and Update Information Through the API +The hub host a basic REST API which can gives you some informations about flists, renaming them, remove them, etc. + +To use authenticated endpoints, you need to provide a itsyou.online valid `jwt` via `Authorization: bearer ` header. +This `jwt` can contains special `memberof` to allows you cross-repository actions. + +If your `jwt` contains memberof, you can choose which user you want to use by specifying cookie `active-user`. +See example below. + + +### Public API Endpoints (No Authentication Required) +- `/api/flist` (**GET**) + - Returns a json array with all repository/flists found +- `/api/repositories` (**GET**) + - Returns a json array with all repositories found +- `/api/fileslist` (**GET**) + - Returns a json array with all repositories and files found +- `/api/flist/` (**GET**) + - Returns a json array of each flist found inside specified repository. + - Each entry contains `filename`, `size`, `updated` date and `type` (regular or symlink), optionally `target` if it's a symbolic link. +- `/api/flist//` (**GET**) + - Returns json object with flist dumps (full file list) + +### Restricted API Endpoints (Authentication Required) +- `/api/flist/me` (**GET**) + - Returns json object with some basic information about yourself (authenticated user) +- `/api/flist/me/` (**GET**, **DELETE**) + - **GET**: same as `/api/flist//` + - **DELETE**: remove that specific flist +- `/api/flist/me//link/` (**GET**) + - Create a symbolic link `linkname` pointing to `source` +- `/api/flist/me//crosslink//` (**GET**) + - Create a cross-repository symbolic link `linkname` pointing to `repository/sourcename` +- `/api/flist/me//rename/` (**GET**) + - Rename `source` to `destination` +- `/api/flist/me/promote///` (**GET**) + - Copy cross-repository `sourcerepo/sourcefile` to your `[local-repository]/localname` file + - This is useful when you want to copy flist from one repository to another one, if your jwt allows it +- `/api/flist/me/upload` (**POST**) + - **POST**: uploads a `.tar.gz` archive and convert it to an flist + - Your file needs to be passed via `file` form attribute +- `/api/flist/me/upload-flist` (**POST**) + - **POST**: uploads a `.flist` file and store it + - Note: the flist is checked and full contents is verified to be found on the backend, if some chunks are missing, the file will be discarded. + - Your file needs to be passed via `file` form attribute +- `/api/flist/me/merge/` (**POST**) + - **POST**: merge multiple flist together + - You need to passes a json array of flists (in form `repository/file`) as POST body +- `/api/flist/me/docker` (**POST**) + - **POST**: converts a docker image to an flist + - You need to passes `image` form argument with docker-image name + - The resulting conversion will stay on your repository + +### API Request Templates and Examples + +The main template to request information from the API is the following: + +```bash +curl -H "Authorization: bearer " https://hub.grid.tf/api/flist/me/ -X +``` + +For example, if we take the command `DELETE` of the previous section and we want to delete the flist `example-latest.flist` with the API Token `abc12`, we would write the following line: + +```bash +curl -H "Authorization: bearer abc12" https://hub.grid.tf/api/flist/me/example-latest.flist -X DELETE +``` + +As another template example, if we wanted to rename the flist `current-name-latest.flist` to `new-name-latest.flist`, we would use the following template: + +```bash +curl -H "Authorization: bearer " https://hub.grid.tf/api/flist/me//rename/ -X GET +``` + +To upload an flist to the ZOS Hub, you would use the following template: + +```bash +curl -H "Authorization: bearer " -X POST -F file=@my-local-archive.tar.gz \ + https://hub.grid.tf/api/flist/me/upload +``` \ No newline at end of file diff --git a/collections/documentation/developers/flist/grid3_supported_flists.md b/collections/documentation/developers/flist/grid3_supported_flists.md new file mode 100644 index 0000000..537d0b3 --- /dev/null +++ b/collections/documentation/developers/flist/grid3_supported_flists.md @@ -0,0 +1,26 @@ +

Supported Flists

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Flists and Parameters](#flists-and-parameters) +- [More Flists](#more-flists) + +*** + +## Introduction + +We provide basic information on the currently supported Flists. + +## Flists and Parameters + +|flist|entrypoint|env vars| +|:--:|:--:|--| +|[Alpine](https://hub.grid.tf/tf-official-apps/threefoldtech-alpine-3.flist.md)|`/entrypoint.sh`|`SSH_KEY`| +|[Ubuntu](https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-22.04.flist.md)|`/init.sh`|`SSH_KEY`| +|[CentOS](https://hub.grid.tf/tf-official-apps/threefoldtech-centos-8.flist.md)|`/entrypoint.sh`|`SSH_KEY`| +|[K3s](https://hub.grid.tf/tf-official-apps/threefoldtech-k3s-latest.flist.md)|`/sbin/zinit init`|- `SSH_KEY`
- `K3S_TOKEN`
- `K3S_DATA_DIR`
- `K3S_FLANNEL_IFACE`
- `K3S_NODE_NAME`
- `K3S_URL` `https://${masterIp}:6443`| + +## More Flists + +You can convert any docker image to an flist. Feel free to explore the different possibilities on the [ThreeFold Hub](https://hub.grid.tf/). \ No newline at end of file diff --git a/collections/documentation/developers/go/grid3_go_gateways.md b/collections/documentation/developers/go/grid3_go_gateways.md new file mode 100644 index 0000000..66f5f68 --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_gateways.md @@ -0,0 +1,104 @@ +

Deploying Gateways

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Gateway Name](#gateway-name) +- [Example](#example) +- [Gateway FQDN](#gateway-fqdn) +- [Example](#example-1) + +*** + +## Introduction + +After [deploying a VM](./grid3_go_vm.md) you can deploy Gateways to further expose your VM. + +## Gateway Name + +This generates a FQDN for your VM. + +## Example + +```go +import ( + "fmt" + + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/deployer" + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/workloads" + "github.com/threefoldtech/tfgrid-sdk-go/grid-proxy/pkg/types" + "github.com/threefoldtech/zos/pkg/gridtypes/zos" +) + +func main() { + + // Create Threefold plugin client + tfPluginClient, err := deployer.NewTFPluginClient(mnemonics, "sr25519", network, "", "", true, false) + + // Get a free node to deploy + domain := true + status := "up" + filter := types.NodeFilter{ + Domain: &domain, + Status: &status, + } + nodeIDs, err := deployer.FilterNodes(tfPluginClient.GridProxyClient, filter) + nodeID := uint32(nodeIDs[0].NodeID) + + // Create gateway to deploy + gateway := workloads.GatewayNameProxy{ + NodeID: nodeID, + Name: "mydomain", + Backends: []zos.Backend{"http://[300:e9c4:9048:57cf:6d98:42c6:a7bf:2e3f]:8080"}, + TLSPassthrough: true, + } + err = tfPluginClient.GatewayNameDeployer.Deploy(ctx, &gateway) + + gatewayObj, err := tfPluginClient.State.LoadGatewayNameFromGrid(nodeID, gateway.Name, gateway.Name) + fmt.Println(gatewayObj.FQDN) +} + +``` + +This deploys a Gateway Name Proxy that forwards requests to your VM. You should see an output like this: + +```bash +mydomain.gent01.dev.grid.tf +``` + +## Gateway FQDN + +In case you have a FQDN already pointing to the node, you can expose your VM using Gateway FQDN. + +## Example + +```go +import ( + "fmt" + + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/deployer" + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/workloads" + "github.com/threefoldtech/zos/pkg/gridtypes/zos" +) + +func main() { + + // Create Threefold plugin client + tfPluginClient, err := deployer.NewTFPluginClient(mnemonics, "sr25519", network, "", "", "", 0, true) + + // Create gateway to deploy + gateway := workloads.GatewayFQDNProxy{ + NodeID: 14, + Name: "mydomain", + Backends: []zos.Backend{"http://[300:e9c4:9048:57cf:6d98:42c6:a7bf:2e3f]:8080"}, + FQDN: "my.domain.com", + TLSPassthrough: true, + } + err = tfPluginClient.GatewayFQDNDeployer.Deploy(ctx, &gateway) + + gatewayObj, err := tfPluginClient.State.LoadGatewayFQDNFromGrid(nodeID, gateway.Name, gateway.Name) +} + +``` + +This deploys a Gateway FQDN Proxy that forwards requests to from node 14 public IP to your VM. diff --git a/collections/documentation/developers/go/grid3_go_gpu.md b/collections/documentation/developers/go/grid3_go_gpu.md new file mode 100644 index 0000000..3ff0c94 --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_gpu.md @@ -0,0 +1,6 @@ +

GPU and Go

+ +

Table of Contents

+ +- [GPU and Go Introduction](grid3_go_gpu_support.md) +- [Deploy a VM with GPU](grid3_go_vm_with_gpu.md) \ No newline at end of file diff --git a/collections/documentation/developers/go/grid3_go_gpu_support.md b/collections/documentation/developers/go/grid3_go_gpu_support.md new file mode 100644 index 0000000..9aadceb --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_gpu_support.md @@ -0,0 +1,116 @@ +

GPU Support

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) +- [More Information](#more-information) + +*** + +## Introduction + +We present here an example on how to deploy using the Go client. This is part of our integration tests. + + + +## Example + +```go +func TestVMWithGPUDeployment(t *testing.T) { + tfPluginClient, err := setup() + assert.NoError(t, err) + + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) + defer cancel() + + publicKey, privateKey, err := GenerateSSHKeyPair() + assert.NoError(t, err) + + twinID := uint64(tfPluginClient.TwinID) + nodeFilter := types.NodeFilter{ + Status: &statusUp, + FreeSRU: convertGBToBytes(20), + FreeMRU: convertGBToBytes(8), + RentedBy: &twinID, + HasGPU: &trueVal, + } + + nodes, err := deployer.FilterNodes(ctx, tfPluginClient, nodeFilter) + if err != nil { + t.Skip("no available nodes found") + } + nodeID := uint32(nodes[0].NodeID) + + nodeClient, err := tfPluginClient.NcPool.GetNodeClient(tfPluginClient.SubstrateConn, nodeID) + assert.NoError(t, err) + + gpus, err := nodeClient.GPUs(ctx) + assert.NoError(t, err) + + network := workloads.ZNet{ + Name: "gpuNetwork", + Description: "network for testing gpu", + Nodes: []uint32{nodeID}, + IPRange: gridtypes.NewIPNet(net.IPNet{ + IP: net.IPv4(10, 20, 0, 0), + Mask: net.CIDRMask(16, 32), + }), + AddWGAccess: false, + } + + disk := workloads.Disk{ + Name: "gpuDisk", + SizeGB: 20, + } + + vm := workloads.VM{ + Name: "gpu", + Flist: "https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist", + CPU: 4, + Planetary: true, + Memory: 1024 * 8, + GPUs: ConvertGPUsToStr(gpus), + Entrypoint: "/init.sh", + EnvVars: map[string]string{ + "SSH_KEY": publicKey, + }, + Mounts: []workloads.Mount{ + {DiskName: disk.Name, MountPoint: "/data"}, + }, + NetworkName: network.Name, + } + + err = tfPluginClient.NetworkDeployer.Deploy(ctx, &network) + assert.NoError(t, err) + + defer func() { + err = tfPluginClient.NetworkDeployer.Cancel(ctx, &network) + assert.NoError(t, err) + }() + + dl := workloads.NewDeployment("gpu", nodeID, "", nil, network.Name, []workloads.Disk{disk}, nil, []workloads.VM{vm}, nil) + err = tfPluginClient.DeploymentDeployer.Deploy(ctx, &dl) + assert.NoError(t, err) + + defer func() { + err = tfPluginClient.DeploymentDeployer.Cancel(ctx, &dl) + assert.NoError(t, err) + }() + + vm, err = tfPluginClient.State.LoadVMFromGrid(nodeID, vm.Name, dl.Name) + assert.NoError(t, err) + assert.Equal(t, vm.GPUs, ConvertGPUsToStr(gpus)) + + time.Sleep(30 * time.Second) + output, err := RemoteRun("root", vm.YggIP, "lspci -v", privateKey) + assert.NoError(t, err) + assert.Contains(t, string(output), gpus[0].Vendor) +} +``` + + + +## More Information + +For more information on this, you can check this [Client Pull Request](https://github.com/threefoldtech/tfgrid-sdk-go/pull/207/) on how to support the new calls to list GPUs and to deploy a machine with GPU. \ No newline at end of file diff --git a/collections/documentation/developers/go/grid3_go_installation.md b/collections/documentation/developers/go/grid3_go_installation.md new file mode 100644 index 0000000..c07008c --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_installation.md @@ -0,0 +1,45 @@ +

Go Client Installation

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Requirements](#requirements) +- [Steps](#steps) +- [References](#references) + +*** + +## Introduction + +We present the general steps to install the ThreeFold Grid3 Go Client. + +## Requirements + +Make sure that you have at least Go 1.19 installed on your machine. + +- [Go](https://golang.org/doc/install) >= 1.19 + +## Steps + +* Create a new directory + * ```bash + mkdir tf_go_client + ``` +* Change directory + * ```bash + cd tf_go_client + ``` +* Creates a **go.mod** file to track the code's dependencies + * ```bash + go mod init main + ``` +* Install the Grid3 Go Client + * ```bash + go get github.com/threefoldtech/tfgrid-sdk-go/grid-client + ``` + +This will make Grid3 Go Client packages available to you. + +## References + +For more information, you can read the official [Go documentation](https://go.dev/doc/). \ No newline at end of file diff --git a/collections/documentation/developers/go/grid3_go_kubernetes.md b/collections/documentation/developers/go/grid3_go_kubernetes.md new file mode 100644 index 0000000..ba1bab8 --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_kubernetes.md @@ -0,0 +1,120 @@ +

Deploying Kubernetes Clusters

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) + +*** + +## Introduction + +We show how to deploy a Kubernetes cluster with the Go client. + +## Example + +```go +import ( + "fmt" + "net" + + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/deployer" + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/workloads" + "github.com/threefoldtech/tfgrid-sdk-go/grid-proxy/pkg/types" + "github.com/threefoldtech/zos/pkg/gridtypes" +) + +func main() { + + // Create Threefold plugin client + tfPluginClient, err := deployer.NewTFPluginClient(mnemonics, "sr25519", network, "", "", "", 0, true) + + // Get a free node to deploy + freeMRU := uint64(1) + freeSRU := uint64(1) + status := "up" + filter := types.NodeFilter{ + FreeMRU: &freeMRU, + FreeSRU: &freeSRU, + Status: &status, + } + nodeIDs, err := deployer.FilterNodes(tfPluginClient.GridProxyClient, filter) + masterNodeID := uint32(nodeIDs[0].NodeID) + workerNodeID1 := uint32(nodeIDs[1].NodeID) + workerNodeID2 := uint32(nodeIDs[2].NodeID) + + // Create a new network to deploy + network := workloads.ZNet{ + Name: "newNetwork", + Description: "A network to deploy", + Nodes: []uint32{masterNodeID, workerNodeID1, workerNodeID2}, + IPRange: gridtypes.NewIPNet(net.IPNet{ + IP: net.IPv4(10, 1, 0, 0), + Mask: net.CIDRMask(16, 32), + }), + AddWGAccess: true, + } + + // Create master and worker nodes to deploy + master := workloads.K8sNode{ + Name: "master", + Node: masterNodeID, + DiskSize: 1, + CPU: 2, + Memory: 1024, + Planetary: true, + Flist: "https://hub.grid.tf/tf-official-apps/threefoldtech-k3s-latest.flist", + } + + worker1 := workloads.K8sNode{ + Name: "worker1", + Node: workerNodeID1, + DiskSize: 1, + CPU: 2, + Memory: 1024, + Flist: "https://hub.grid.tf/tf-official-apps/threefoldtech-k3s-latest.flist", + } + + worker2 := workloads.K8sNode{ + Name: "worker2", + Node: workerNodeID2, + DiskSize: 1, + Flist: "https://hub.grid.tf/tf-official-apps/threefoldtech-k3s-latest.flist", + CPU: 2, + Memory: 1024, + } + + k8sCluster := workloads.K8sCluster{ + Master: &master, + Workers: []workloads.K8sNode{worker1, worker2}, + Token: "tokens", + SSHKey: publicKey, + NetworkName: network.Name, + } + + // Deploy the network first + err = tfPluginClient.NetworkDeployer.Deploy(ctx, &network) + + // Deploy the k8s cluster + err = tfPluginClient.K8sDeployer.Deploy(ctx, &k8sCluster) + + // Load the k8s cluster + k8sClusterObj, err := tfPluginClient.State.LoadK8sFromGrid([]uint32{masterNodeID, workerNodeID1, workerNodeID2}, master.Name) + + // Print master node Yggdrasil IP + fmt.Println(k8sClusterObj.Master.YggIP) + + // Cancel the VM deployment + err = tfPluginClient.K8sDeployer.Cancel(ctx, &k8sCluster) + + // Cancel the network deployment + err = tfPluginClient.NetworkDeployer.Cancel(ctx, &network) +} + +``` + +You should see an output like this: + +```bash +300:e9c4:9048:57cf:6d98:42c6:a7bf:2e3f +``` diff --git a/collections/documentation/developers/go/grid3_go_load_client.md b/collections/documentation/developers/go/grid3_go_load_client.md new file mode 100644 index 0000000..fe61a6d --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_load_client.md @@ -0,0 +1,35 @@ +

Load Client

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [TFPluginClient Configuration](#tfpluginclient-configuration) +- [Creating Client](#creating-client) + +*** + +## Introduction + +We cover how to load client using the Go client. + +## TFPluginClient Configuration + +- mnemonics +- keyType: can be `ed25519` or `sr25519` +- network: can be `dev`, `qa`, `test` or `main` + +## Creating Client + +Import `deployer` package to your project: + +```go +import "github.com/threefoldtech/tfgrid-sdk-go/grid-client/deployer" +``` + +Create new Client: + +```go +func main() { + client, err := deployer.NewTFPluginClient(mnemonics, keyType, network, "", "", "", 0, true) +} +``` diff --git a/collections/documentation/developers/go/grid3_go_qsfs.md b/collections/documentation/developers/go/grid3_go_qsfs.md new file mode 100644 index 0000000..92a352a --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_qsfs.md @@ -0,0 +1,186 @@ +

Deploying QSFS

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) + +*** + +## Introduction + +We show how to deploy QSFS workloads with the Go client. + +## Example + +```go +import ( + "context" + "fmt" + "net" + + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/deployer" + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/workloads" + "github.com/threefoldtech/tfgrid-sdk-go/grid-proxy/pkg/types" + "github.com/threefoldtech/zos/pkg/gridtypes" +) + +func main() { + + // Create Threefold plugin client + tfPluginClient, err := deployer.NewTFPluginClient(mnemonics, "sr25519", network, "", "", "", 0, true) + + // Get a free node to deploy + freeMRU := uint64(2) + freeSRU := uint64(20) + status := "up" + filter := types.NodeFilter{ + FreeMRU: &freeMRU, + FreeSRU: &freeSRU, + Status: &status, + } + nodeIDs, err := deployer.FilterNodes(tfPluginClient.GridProxyClient, filter) + nodeID := uint32(nodeIDs[0].NodeID) + + // Create data and meta ZDBs + dataZDBs := []workloads.ZDB{} + metaZDBs := []workloads.ZDB{} + for i := 1; i <= DataZDBNum; i++ { + zdb := workloads.ZDB{ + Name: "qsfsDataZdb" + strconv.Itoa(i), + Password: "password", + Public: true, + Size: 1, + Description: "zdb for testing", + Mode: zos.ZDBModeSeq, + } + dataZDBs = append(dataZDBs, zdb) + } + + for i := 1; i <= MetaZDBNum; i++ { + zdb := workloads.ZDB{ + Name: "qsfsMetaZdb" + strconv.Itoa(i), + Password: "password", + Public: true, + Size: 1, + Description: "zdb for testing", + Mode: zos.ZDBModeUser, + } + metaZDBs = append(metaZDBs, zdb) + } + + // Deploy ZDBs + dl1 := workloads.NewDeployment("qsfs", nodeID, "", nil, "", nil, append(dataZDBs, metaZDBs...), nil, nil) + err = tfPluginClient.DeploymentDeployer.Deploy(ctx, &dl1) + + // result ZDBs + resDataZDBs := []workloads.ZDB{} + resMetaZDBs := []workloads.ZDB{} + for i := 1; i <= DataZDBNum; i++ { + res, err := tfPluginClient.State.LoadZdbFromGrid(nodeID, "qsfsDataZdb"+strconv.Itoa(i), dl1.Name) + resDataZDBs = append(resDataZDBs, res) + } + for i := 1; i <= MetaZDBNum; i++ { + res, err := tfPluginClient.State.LoadZdbFromGrid(nodeID, "qsfsMetaZdb"+strconv.Itoa(i), dl1.Name) + resMetaZDBs = append(resMetaZDBs, res) + } + + // backends + dataBackends := []workloads.Backend{} + metaBackends := []workloads.Backend{} + for i := 0; i < DataZDBNum; i++ { + dataBackends = append(dataBackends, workloads.Backend{ + Address: "[" + resDataZDBs[i].IPs[1] + "]" + ":" + fmt.Sprint(resDataZDBs[i].Port), + Namespace: resDataZDBs[i].Namespace, + Password: resDataZDBs[i].Password, + }) + } + for i := 0; i < MetaZDBNum; i++ { + metaBackends = append(metaBackends, workloads.Backend{ + Address: "[" + resMetaZDBs[i].IPs[1] + "]" + ":" + fmt.Sprint(resMetaZDBs[i].Port), + Namespace: resMetaZDBs[i].Namespace, + Password: resMetaZDBs[i].Password, + }) + } + + // Create a new qsfs to deploy + qsfs := workloads.QSFS{ + Name: "qsfs", + Description: "qsfs for testing", + Cache: 1024, + MinimalShards: 2, + ExpectedShards: 4, + RedundantGroups: 0, + RedundantNodes: 0, + MaxZDBDataDirSize: 512, + EncryptionAlgorithm: "AES", + EncryptionKey: "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af", + CompressionAlgorithm: "snappy", + Groups: workloads.Groups{{Backends: dataBackends}}, + Metadata: workloads.Metadata{ + Type: "zdb", + Prefix: "test", + EncryptionAlgorithm: "AES", + EncryptionKey: "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af", + Backends: metaBackends, + }, + } + + // Create a new network to deploy + network := workloads.ZNet{ + Name: "newNetwork", + Description: "A network to deploy", + Nodes: []uint32{nodeID}, + IPRange: gridtypes.NewIPNet(net.IPNet{ + IP: net.IPv4(10, 1, 0, 0), + Mask: net.CIDRMask(16, 32), + }), + AddWGAccess: true, + } + + vm := workloads.VM{ + Name: "vm", + Flist: "https://hub.grid.tf/tf-official-apps/base:latest.flist", + CPU: 2, + Planetary: true, + Memory: 1024, + Entrypoint: "/sbin/zinit init", + EnvVars: map[string]string{ + "SSH_KEY": publicKey, + }, + Mounts: []workloads.Mount{ + {DiskName: qsfs.Name, MountPoint: "/qsfs"}, + }, + NetworkName: network.Name, + } + + // Deploy the network first + err = tfPluginClient.NetworkDeployer.Deploy(ctx, &network) + + // Deploy the VM/QSFS deployment + dl2 := workloads.NewDeployment("qsfs", nodeID, "", nil, network.Name, nil, append(dataZDBs, metaZDBs...), []workloads.VM{vm}, []workloads.QSFS{qsfs}) + err = tfPluginClient.DeploymentDeployer.Deploy(ctx, &dl2) + + // Load the QSFS using the state loader + qsfsObj, err := tfPluginClient.State.LoadQSFSFromGrid(nodeID, qsfs.Name, dl2.Name) + + // Load the VM using the state loader + vmObj, err := tfPluginClient.State.LoadVMFromGrid(nodeID, vm.Name, dl2.Name) + + // Print the VM Yggdrasil IP + fmt.Println(vmObj.YggIP) + + // Cancel the VM,QSFS deployment + err = tfPluginClient.DeploymentDeployer.Cancel(ctx, &dl1) + err = tfPluginClient.DeploymentDeployer.Cancel(ctx, &dl2) + + // Cancel the network deployment + err = tfPluginClient.NetworkDeployer.Cancel(ctx, &network) +} +``` + +Running this code should result in a VM with QSFS deployed on an available node and get an output like this: + +```bash +Yggdrasil IP: 300:e9c4:9048:57cf:6d98:42c6:a7bf:2e3f +``` diff --git a/collections/documentation/developers/go/grid3_go_readme.md b/collections/documentation/developers/go/grid3_go_readme.md new file mode 100644 index 0000000..cf24ada --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_readme.md @@ -0,0 +1,17 @@ +# Grid Go Client + +Grid Go Client is a Go client created to interact and develop on Threefold Grid using Go language. + +Please make sure to check the [basics](../../system_administrators/getstarted/tfgrid3_getstarted.md) before continuing. + +

Table of Contents

+ +- [Installation](../go/grid3_go_installation.md) +- [Loading Client](../go/grid3_go_load_client.md) +- [Deploy a VM](../go/grid3_go_vm.md) +- [Deploy a VM with GPU](../go/grid3_go_vm_with_gpu.md) +- [Deploy Multiple VMs](../go/grid3_go_vms.md) +- [Deploy Gateways](../go/grid3_go_gateways.md) +- [Deploy Kubernetes](../go/grid3_go_kubernetes.md) +- [Deploy a QSFS](../go/grid3_go_qsfs.md) +- [GPU Support](../go/grid3_go_gpu_support.md) diff --git a/collections/documentation/developers/go/grid3_go_vm.md b/collections/documentation/developers/go/grid3_go_vm.md new file mode 100644 index 0000000..c164114 --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_vm.md @@ -0,0 +1,99 @@ +

Deploying a VM

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) + +*** + +## Introduction + +We show how to deploy a VM with the Go client. + +## Example + +```go +import ( + "context" + "fmt" + "net" + + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/deployer" + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/workloads" + "github.com/threefoldtech/tfgrid-sdk-go/grid-proxy/pkg/types" + "github.com/threefoldtech/zos/pkg/gridtypes" +) + +func main() { + + // Create Threefold plugin client + tfPluginClient, err := deployer.NewTFPluginClient(mnemonics, keyType, network, "", "", "", 0, true) + + // Get a free node to deploy + freeMRU := uint64(2) + freeSRU := uint64(20) + status := "up" + filter := types.NodeFilter{ + FreeMRU: &freeMRU, + FreeSRU: &freeSRU, + Status: &status, + } + nodeIDs, err := deployer.FilterNodes(tfPluginClient.GridProxyClient, filter) + nodeID := uint32(nodeIDs[0].NodeID) + + // Create a new network to deploy + network := workloads.ZNet{ + Name: "newNetwork", + Description: "A network to deploy", + Nodes: []uint32{nodeID}, + IPRange: gridtypes.NewIPNet(net.IPNet{ + IP: net.IPv4(10, 1, 0, 0), + Mask: net.CIDRMask(16, 32), + }), + AddWGAccess: true, + } + + // Create a new VM to deploy + vm := workloads.VM{ + Name: "vm", + Flist: "https://hub.grid.tf/tf-official-apps/base:latest.flist", + CPU: 2, + PublicIP: true, + Planetary: true, + Memory: 1024, + RootfsSize: 20 * 1024, + Entrypoint: "/sbin/zinit init", + EnvVars: map[string]string{ + "SSH_KEY": publicKey, + }, + IP: "10.20.2.5", + NetworkName: network.Name, + } + + // Deploy the network first + err = tfPluginClient.NetworkDeployer.Deploy(ctx, &network) + + // Deploy the VM deployment + dl := workloads.NewDeployment("vm", nodeID, "", nil, network.Name, nil, nil, []workloads.VM{vm}, nil) + err = tfPluginClient.DeploymentDeployer.Deploy(ctx, &dl) + + // Load the VM using the state loader + vmObj, err := tfPluginClient.State.LoadVMFromGrid(nodeID, vm.Name, dl.Name) + + // Print the VM Yggdrasil IP + fmt.Println(vmObj.YggIP) + + // Cancel the VM deployment + err = tfPluginClient.DeploymentDeployer.Cancel(ctx, &dl) + + // Cancel the network deployment + err = tfPluginClient.NetworkDeployer.Cancel(ctx, &network) +} +``` + +Running this code should result in a VM deployed on an available node and get an output like this: + +```bash +300:e9c4:9048:57cf:6d98:42c6:a7bf:2e3f +``` diff --git a/collections/documentation/developers/go/grid3_go_vm_with_gpu.md b/collections/documentation/developers/go/grid3_go_vm_with_gpu.md new file mode 100644 index 0000000..d635800 --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_vm_with_gpu.md @@ -0,0 +1,121 @@ +

Deploy a VM with GPU

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) + +*** + +## Introduction + +In this section, we explore how to deploy a virtual machine equipped with GPU. We deploy the VM using Go. The VM will be deployed on a 3Node with an available GPU. + + + +## Example + +```go +import ( + "context" + "fmt" + "net" + + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/deployer" + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/workloads" + "github.com/threefoldtech/tfgrid-sdk-go/grid-proxy/pkg/types" + "github.com/threefoldtech/zos/pkg/gridtypes" +) + +func main() { + + // Create Threefold plugin client + tfPluginClient, err := deployer.NewTFPluginClient(mnemonics, "sr25519", network, "", "", "", 0, true) + + // Get a free node to deploy + freeMRU := uint64(2) + freeSRU := uint64(20) + status := "up" + trueVal := true + + twinID := uint64(tfPluginClient.TwinID) + filter := types.NodeFilter{ + FreeMRU: &freeMRU, + FreeSRU: &freeSRU, + Status: &status, + RentedBy: &twinID, + HasGPU: &trueVal, + } + nodeIDs, err := deployer.FilterNodes(tfPluginClient.GridProxyClient, filter) + nodeID := uint32(nodeIDs[0].NodeID) + + // Get the available gpus on the node + nodeClient, err := tfPluginClient.NcPool.GetNodeClient(tfPluginClient.SubstrateConn, nodeID) + gpus, err := nodeClient.GPUs(ctx) + + // Create a new network to deploy + network := workloads.ZNet{ + Name: "newNetwork", + Description: "A network to deploy", + Nodes: []uint32{nodeID}, + IPRange: gridtypes.NewIPNet(net.IPNet{ + IP: net.IPv4(10, 1, 0, 0), + Mask: net.CIDRMask(16, 32), + }), + AddWGAccess: true, + } + + // Create a new disk to deploy + disk := workloads.Disk{ + Name: "gpuDisk", + SizeGB: 20, + } + + // Create a new VM to deploy + vm := workloads.VM{ + Name: "vm", + Flist: "https://hub.grid.tf/tf-official-apps/base:latest.flist", + CPU: 2, + PublicIP: true, + Planetary: true, + // Insert your GPUs' IDs here + GPUs: []zos.GPU{zos.GPU(gpus[0].ID)}, + Memory: 1024, + RootfsSize: 20 * 1024, + Entrypoint: "/sbin/zinit init", + EnvVars: map[string]string{ + "SSH_KEY": publicKey, + }, + Mounts: []workloads.Mount{ + {DiskName: disk.Name, MountPoint: "/data"}, + }, + IP: "10.20.2.5", + NetworkName: network.Name, + } + + // Deploy the network first + err = tfPluginClient.NetworkDeployer.Deploy(ctx, &network) + + // Deploy the VM deployment + dl := workloads.NewDeployment("gpu", nodeID, "", nil, network.Name, []workloads.Disk{disk}, nil, []workloads.VM{vm}, nil) + err = tfPluginClient.DeploymentDeployer.Deploy(ctx, &dl) + + // Load the VM using the state loader + vmObj, err := tfPluginClient.State.LoadVMFromGrid(nodeID, vm.Name, dl.Name) + + // Print the VM Yggdrasil IP + fmt.Println(vmObj.YggIP) + + // Cancel the VM deployment + err = tfPluginClient.DeploymentDeployer.Cancel(ctx, &dl) + + // Cancel the network deployment + err = tfPluginClient.NetworkDeployer.Cancel(ctx, &network) +} +``` + +Running this code should result in a VM with a GPU deployed on an available node. The output should look like this: + +```bash +Yggdrasil IP: 300:e9c4:9048:57cf:6d98:42c6:a7bf:2e3f +``` \ No newline at end of file diff --git a/collections/documentation/developers/go/grid3_go_vms.md b/collections/documentation/developers/go/grid3_go_vms.md new file mode 100644 index 0000000..f78aaae --- /dev/null +++ b/collections/documentation/developers/go/grid3_go_vms.md @@ -0,0 +1,125 @@ +

Deploying Multiple VMs

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) + +*** + +## Introduction + +We show how to deploy multiple VMs with the Go client. + +## Example + +```go +import ( + "context" + "fmt" + "net" + + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/deployer" + "github.com/threefoldtech/tfgrid-sdk-go/grid-client/workloads" + "github.com/threefoldtech/tfgrid-sdk-go/grid-proxy/pkg/types" + "github.com/threefoldtech/zos/pkg/gridtypes" +) + +func main() { + + // Create Threefold plugin client + tfPluginClient, err := deployer.NewTFPluginClient(mnemonics, "sr25519", network, "", "", "", 0, true) + + // Get a free node to deploy + freeMRU := uint64(2) + freeSRU := uint64(2) + status := "up" + filter := types.NodeFilter { + FreeMRU: &freeMRU, + FreeSRU: &freeSRU, + Status: &status, + } + nodeIDs, err := deployer.FilterNodes(tfPluginClient.GridProxyClient, filter) + nodeID1 := uint32(nodeIDs[0].NodeID) + nodeID2 := uint32(nodeIDs[1].NodeID) + + // Create a new network to deploy + network := workloads.ZNet{ + Name: "newNetwork", + Description: "A network to deploy", + Nodes: []uint32{nodeID1, nodeID2}, + IPRange: gridtypes.NewIPNet(net.IPNet{ + IP: net.IPv4(10, 1, 0, 0), + Mask: net.CIDRMask(16, 32), + }), + AddWGAccess: true, + } + + // Create new VMs to deploy + vm1 := workloads.VM{ + Name: "vm1", + Flist: "https://hub.grid.tf/tf-official-apps/base:latest.flist", + CPU: 2, + PublicIP: true, + Planetary: true, + Memory: 1024, + RootfsSize: 20 * 1024, + Entrypoint: "/sbin/zinit init", + EnvVars: map[string]string{ + "SSH_KEY": publicKey, + }, + IP: "10.20.2.5", + NetworkName: network.Name, + } + vm2 := workloads.VM{ + Name: "vm2", + Flist: "https://hub.grid.tf/tf-official-apps/base:latest.flist", + CPU: 2, + PublicIP: true, + Planetary: true, + Memory: 1024, + RootfsSize: 20 * 1024, + Entrypoint: "/sbin/zinit init", + EnvVars: map[string]string{ + "SSH_KEY": publicKey, + }, + IP: "10.20.2.6", + NetworkName: network.Name, + } + + // Deploy the network first + err = tfPluginClient.NetworkDeployer.Deploy(ctx, &network) + + // Load the network using the state loader + // this loader should load the deployment as json then convert it to a deployment go object with workloads inside it + networkObj, err := tfPluginClient.State.LoadNetworkFromGrid(network.Name) + + // Deploy the VM deployments + dl1 := workloads.NewDeployment("vm1", nodeID1, "", nil, network.Name, nil, nil, []workloads.VM{vm1}, nil) + dl2 := workloads.NewDeployment("vm2", nodeID2, "", nil, network.Name, nil, nil, []workloads.VM{vm2}, nil) + err = tfPluginClient.DeploymentDeployer.BatchDeploy(ctx, []*workloads.Deployment{&dl1, &dl2}) + + // Load the VMs using the state loader + vmObj1, err := tfPluginClient.State.LoadVMFromGrid(nodeID1, vm1.Name, dl1.Name) + vmObj2, err := tfPluginClient.State.LoadVMFromGrid(nodeID2, vm2.Name, dl2.Name) + + // Print the VMs Yggdrasil IP + fmt.Println(vmObj1.YggIP) + fmt.Println(vmObj2.YggIP) + + // Cancel the VM deployments + err = tfPluginClient.DeploymentDeployer.Cancel(ctx, &dl1) + err = tfPluginClient.DeploymentDeployer.Cancel(ctx, &dl2) + + // Cancel the network + err = tfPluginClient.NetworkDeployer.Cancel(ctx, &network) +} + +``` + +Running this code should result in two VMs deployed on two separate nodes while being on the same network and you should see an output like this: + +```bash +300:e9c4:9048:57cf:f4e0:2343:f891:6037 +300:e9c4:9048:57cf:6d98:42c6:a7bf:2e3f +``` diff --git a/collections/documentation/developers/grid_deployment/grid_deployment.md b/collections/documentation/developers/grid_deployment/grid_deployment.md new file mode 100644 index 0000000..14fa506 --- /dev/null +++ b/collections/documentation/developers/grid_deployment/grid_deployment.md @@ -0,0 +1,9 @@ +# Grid Deployment + +The TFGrid whole source code is open-source and instances of the grid can be deployed by anyone thanks to the distribution of daily grid snapshots of the complete ThreeFold Grid stacks. + +## Table of Contents + +- [TFGrid Stacks](./tfgrid_stacks.md) +- [Full VM Grid Deployment](./grid_deployment_full_vm.md) +- [Grid Snapshots](./snapshots.md) \ No newline at end of file diff --git a/collections/documentation/developers/grid_deployment/grid_deployment_full_vm.md b/collections/documentation/developers/grid_deployment/grid_deployment_full_vm.md new file mode 100644 index 0000000..aee260a --- /dev/null +++ b/collections/documentation/developers/grid_deployment/grid_deployment_full_vm.md @@ -0,0 +1,152 @@ +

Grid Deployment on a Full VM

+

Table of Contents

+ + +- [Introduction](#introduction) +- [Prerequisites](#prerequisites) +- [DNS Settings](#dns-settings) + - [DNS Verification](#dns-verification) +- [Prepare the VM](#prepare-the-vm) +- [Set the Firewall](#set-the-firewall) +- [Launch the Script](#launch-the-script) +- [Access the Grid Services](#access-the-grid-services) +- [Manual Commands](#manual-commands) +- [Update the Deployment](#update-the-deployment) + +*** + +## Introduction + +We present the steps to deploy a network instance of the TFGrid on a full VM. + +For this guide, we will be deploying a mainnet instance. While the steps are similar for testnet and devnet, you will have to adjust your deployment depending on which network you use. + +## Prerequisites + +For this guide, you will need to deploy a full VM on the ThreeFold Grid with at least the following minimum specs: + +- IPv4 +- IPv6 +- 32GB of RAM +- 1000 GB of SSD +- 8 vcores + +After deploying the full VM, take note of the IPv4 and IPv6 addresses to properly set the DNS records and then SSH into the VM. + +## DNS Settings + +You need to set an A record for the IPv4 address and an AAAA record for the IPv6 address with a wildcard subdomain. + +The following table explicitly shows how to set the A and AAAA records for your domain. + +| Type | Host | Value | +| ---- | ---- | -------------- | +| A | \* | | +| AAAA | \* | | + + +### DNS Verification + +You can use tools such as [DNSChecker](https://dnschecker.org/) or [dig](https://linux.die.net/man/1/dig) on a terminal to check if the DNS propagadation is complete. + +## Prepare the VM + +- Download the ThreeFold Tech `grid_deployment` repository + ``` + git clone https://github.com/threefoldtech/grid_deployment + cd grid_deployment/docker-compose/mainnet + ``` +- Generate a TFChain node key with `subkey` + ``` + echo .subkey_mainnet >> .gitignore + ../subkey generate-node-key > .nodekey_mainnet + cat .nodekey_mainnet + ``` +- Create and the set environment variables file + ``` + cp .secrets.env-example .secrets.env + ``` +- Adjust the environment file + ``` + nano .secrets.env + ``` +- To adjust the `.secrets.env` file, take into account the following: + - **DOMAIN**="example.com" + - Write your own domain + - **TFCHAIN_NODE_KEY**="abc123" + - Write the output of the command `cat .nodekey_mainnet` + - **ACTIVATION_SERVICE_MNEMONIC**="word1 word2 ... word24" + - Write the seed phrase of an account on mainnet with at least 10 TFT in the wallet + - **GRID_PROXY_MNEMONIC**="word1 word2 ... word24" + - Write the seed phrase of an account on mainnet with at least 10 TFT in the wallet and a registered twin ID\* + +> \*Note: If you've created an account using the ThreeFold Dashboard on mainnet, the twin ID is automatically registered. + +## Set the Firewall + +You can use UFW to set the firewall: + +``` +ufw allow 80/tcp +ufw allow 443/tcp +ufw allow 30333/tcp +ufw allow 22/tcp +ufw enable +ufw status +``` + +## Launch the Script + +Once you've prepared the VM, you can simply run the script to install the grid stack and deploy it online. + +``` +sh install_grid_bknd.sh +``` + +This will take some time since you are downloading the whole mainnet grid snapshots. + +## Access the Grid Services + +Once you've deployed the grid stack online, you can access the different grid services by usual the usual subdomains: + +``` +dashboard.your.domain +metrics.your.domain +tfchain.your.domain +graphql.your.domain +relay.your.domain +gridproxy.your.domain +activation.your.domain +stats.your.domain +``` + +## Manual Commands + +Once you've run the install script, you can deploy manually the grid stack with the following command: + +``` +docker compose --env-file .secrets.env --env-file .env up -d +``` + +You can also check if the environment variables are properly set: + +``` +docker compose --env-file .secrets.env --env-file .env config +``` + +If you want to see the output during deployment, remove `-d` in the command above as follows: + +``` +docker compose --env-file .secrets.env --env-file .env up +``` + +This can be helpful to troubleshoot errors. + +## Update the Deployment + +Go into the folder of the proper network, e.g. mainnet, and run the following commands: + +``` +git pull -r +docker compose --env-file .secrets.env --env-file .env up -d +``` \ No newline at end of file diff --git a/collections/documentation/developers/grid_deployment/snapshots.md b/collections/documentation/developers/grid_deployment/snapshots.md new file mode 100644 index 0000000..2b7a7d4 --- /dev/null +++ b/collections/documentation/developers/grid_deployment/snapshots.md @@ -0,0 +1,196 @@ +

Snapshots for Grid Backend Services

+

Table of Contents

+ +- [Introduction](#introduction) +- [Services](#services) +- [ThreeFold Public Snapshots](#threefold-public-snapshots) +- [Deploy the Services with Scripts](#deploy-the-services-with-scripts) + - [Create the Snapshots](#create-the-snapshots) + - [Start All the Services](#start-all-the-services) + - [Stop All the Services](#stop-all-the-services) +- [Expose the Snapshots with Rsync](#expose-the-snapshots-with-rsync) + - [Create the Service Configuration File](#create-the-service-configuration-file) + - [Start the Service](#start-the-service) + +*** + +## Introduction + +To facilitate deploying grid backend services, we provide snapshots to significantly reduce sync time. This can be setup anywhere from scratch. Once all services are synced, one can use the scripts to create snapshots automatically. + +To learn how to deploy your own grid stack, read [this section](./grid_deployment_full_vm.md). + +## Services + +There are 3 grid backend services that collect enough data to justify creating snapshots: + +- ThreeFold blockchain - TFChain +- Graphql - Indexer +- Graphql - Processor + +## ThreeFold Public Snapshots + +ThreeFold hosts all available snapshots at: [https://bknd.snapshot.grid.tf/](https://bknd.snapshot.grid.tf/). Those snapshots can be downloaded with rsync: + +- Mainnet: + ``` + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshots/tfchain-mainnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshots/indexer-mainnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshots/processor-mainnet-latest.tar.gz . + ``` +- Testnet: + ``` + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotstest/tfchain-testnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotstest/indexer-testnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotstest/processor-testnet-latest.tar.gz . + ``` +- Devnet: + ``` + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotsdev/tfchain-devnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotsdev/indexer-devnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotsdev/processor-devnet-latest.tar.gz . + ``` + +## Deploy the Services with Scripts + +You can deploy the 3 individual services using known methods such as [Docker](../../system_administrators/computer_it_basics/docker_basics.md). To facilitate the process, scripts are provided that run the necessary docker commands. + +The first script creates the snapshots, while the second and third scripts serve to start and stop all services. + +You can use the start script to start all services and then set a cron job to execute periodically the snapshot creation script. This will ensure that you always have the latest version available on your server. + +### Create the Snapshots + +You can set a cron job to execute a script running rsync to create the snapshots and generate logs at a given interval. + +- First download the script. + - Main net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/mainnet/create_snapshot.sh + ``` + - Test net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/testnet/create_snapshot.sh + ``` + - Dev net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/devnet/create_snapshot.sh + ``` +- Set the permissions of the script + ``` + chmod +x create_snapshot.sh + ``` +- Make sure to a adjust the snapshot creation script for your specific deployment +- Set a cron job + ``` + crontab -e + ``` + - Here is an example of a cron job where we execute the script every day at 1 AM and send the logs to `/var/log/snapshots/snapshots-cron.log`. + ```sh + 0 1 * * * sh /opt/snapshots/create-snapshot.sh > /var/log/snapshots/snapshots-cron.log 2>&1 + ``` + +### Start All the Services + +You can start all services by running the provided scripts. + +- Download the script. + - Main net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/mainnet/startall.sh + ``` + - Test net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/testnet/startall.sh + ``` + - Dev net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/devnet/startall.sh + ``` +- Set the permissions of the script + ``` + chmod +x startall.sh + ``` +- Run the script to start all services via docker engine. + ``` + ./startall.sh + ``` + +### Stop All the Services + +You can stop all services by running the provided scripts. + +- Download the script. + - Main net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/mainnet/stopall.sh + ``` + - Test net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/testnet/stopall.sh + ``` + - Dev net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/devnet/stopall.sh + ``` +- Set the permissions of the script + ``` + chmod +x stopall.sh + ``` +- Run the script to stop all services via docker engine. + ``` + ./stopall.sh + ``` + +## Expose the Snapshots with Rsync + +We use rsync with a systemd service to expose the snapshots to the community. + +### Create the Service Configuration File + +To setup a public rsync server, create and edit the following file: + +`/etc/rsyncd.conf` + +```sh +pid file = /var/run/rsyncd.pid +lock file = /var/run/rsync.lock +log file = /var/log/rsync.log +port = 34873 +max connections = 20 +exclude = lost+found/ +transfer logging = yes +use chroot = yes +reverse lookup = no + +[gridsnapshots] +path = /storage/rsync-public/mainnet +comment = THREEFOLD GRID MAINNET SNAPSHOTS +read only = true +timeout = 300 +list = false + +[gridsnapshotstest] +path = /storage/rsync-public/testnet +comment = THREEFOLD GRID TESTNET SNAPSHOTS +read only = true +timeout = 300 +list = false + +[gridsnapshotsdev] +path = /storage/rsync-public/devnet +comment = THREEFOLD GRID DEVNET SNAPSHOTS +read only = true +timeout = 300 +list = false +``` + +### Start the Service + +Start and enable via systemd: + +```sh +systemctl start rsync +systemctl enable rsync +systemctl status rsync +``` diff --git a/collections/documentation/developers/grid_deployment/snapshots_archive.md b/collections/documentation/developers/grid_deployment/snapshots_archive.md new file mode 100644 index 0000000..eac3ff7 --- /dev/null +++ b/collections/documentation/developers/grid_deployment/snapshots_archive.md @@ -0,0 +1,206 @@ +

Snapshots for Grid Backend Services

+

Table of Contents

+ +- [Introduction](#introduction) +- [Prerequisites](#prerequisites) +- [Services](#services) +- [ThreeFold Public Snapshots](#threefold-public-snapshots) +- [Deploy the Services with Scripts](#deploy-the-services-with-scripts) + - [Start All the Services](#start-all-the-services) + - [Stop All the Services](#stop-all-the-services) + - [Create the Snapshots](#create-the-snapshots) +- [Expose the Snapshots with Rsync](#expose-the-snapshots-with-rsync) + - [Create the Service Configuration File](#create-the-service-configuration-file) + - [Start the Service](#start-the-service) + +*** + +## Introduction + +To facilitate deploying grid backend services, we provide snapshots to significantly reduce sync time. This can be setup anywhere from scratch. Once all services are synced, one can use the scripts to create snapshots automatically. + +## Prerequisites + +There are a few prerequisites to properly run the ThreeFold services. + +- [Docker engine](../computer_it_basics/docker_basics.md#install-docker-desktop-and-docker-engine) +- [Rsync](../computer_it_basics/file_transfer.md#rsync) + +## Services + +There are 3 grid backend services that collect enough data to justify creating snapshots: + +- ThreeFold blockchain - TFChain +- Graphql - Indexer +- Graphql - Processor + +## ThreeFold Public Snapshots + +ThreeFold hosts all available snapshots at: [https://bknd.snapshot.grid.tf/](https://bknd.snapshot.grid.tf/). Those snapshots can be downloaded with rsync: + +- Mainnet: + ``` + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshots/tfchain-mainnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshots/indexer-mainnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshots/processor-mainnet-latest.tar.gz . + ``` +- Testnet: + ``` + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotstest/tfchain-testnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotstest/indexer-testnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotstest/processor-testnet-latest.tar.gz . + ``` +- Devnet: + ``` + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotsdev/tfchain-devnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotsdev/indexer-devnet-latest.tar.gz . + rsync -Lv --progress --partial rsync://bknd.snapshot.grid.tf:34873/gridsnapshotsdev/processor-devnet-latest.tar.gz . + ``` + +Let's now see how to use those snapshots to run the services via scripts. + +## Deploy the Services with Scripts + +You can deploy the 3 individual services using known methods such as [Docker](https://manual.grid.tf/computer_it_basics/docker_basics.html). To facilitate the process, scripts are provided that run the necessary docker commands. + +The first script creates the snapshots, while the second and third scripts serve to start and stop all services. + +You can use the start script to start all services and then set a cron job to execute periodically the snapshot creation script. This will ensure that you always have the latest version available on your server. + +### Start All the Services + +You can start all services by running the provided scripts. + +- Download the script. + - Main net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/mainnet/startall.sh + ``` + - Test net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/testnet/startall.sh + ``` + - Dev net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/devnet/startall.sh + ``` +- Set the permissions of the script + ``` + chmod +x startall.sh + ``` +- Run the script to start all services via docker engine. + ``` + ./startall.sh + ``` + +### Stop All the Services + +You can stop all services by running the provided scripts. + +- Download the script. + - Main net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/mainnet/stopall.sh + ``` + - Test net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/testnet/stopall.sh + ``` + - Dev net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/devnet/stopall.sh + ``` +- Set the permissions of the script + ``` + chmod +x stopall.sh + ``` +- Run the script to stop all services via docker engine. + ``` + ./stopall.sh + ``` + +### Create the Snapshots + +You can set a cron job to execute a script running rsync to create the snapshots and generate logs at a given interval. + +- First download the script. + - Main net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/mainnet/create_snapshot.sh + ``` + - Test net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/testnet/create_snapshot.sh + ``` + - Dev net + ``` + wget https://github.com/threefoldtech/grid_deployment/blob/development/grid-snapshots/devnet/create_snapshot.sh + ``` +- Set the permissions of the script + ``` + chmod +x create_snapshot.sh + ``` +- Make sure to a adjust the snapshot creation script for your specific deployment +- Set a cron job + ``` + crontab -e + ``` + - Here is an example of a cron job where we execute the script every day at 1 AM and send the logs to `/var/log/snapshots/snapshots-cron.log`. + ```sh + 0 1 * * * sh /opt/snapshots/create-snapshot.sh > /var/log/snapshots/snapshots-cron.log 2>&1 + ``` + +## Expose the Snapshots with Rsync + +We use rsync with a systemd service to expose the snapshots to the community. + +### Create the Service Configuration File + +To setup a public rsync server, create and edit the following file: + +`/etc/rsyncd.conf` + +```sh +pid file = /var/run/rsyncd.pid +lock file = /var/run/rsync.lock +log file = /var/log/rsync.log +port = 34873 +max connections = 20 +exclude = lost+found/ +transfer logging = yes +use chroot = yes +reverse lookup = no + +[gridsnapshots] +path = /storage/rsync-public/mainnet +comment = THREEFOLD GRID MAINNET SNAPSHOTS +read only = true +timeout = 300 +list = false + +[gridsnapshotstest] +path = /storage/rsync-public/testnet +comment = THREEFOLD GRID TESTNET SNAPSHOTS +read only = true +timeout = 300 +list = false + +[gridsnapshotsdev] +path = /storage/rsync-public/devnet +comment = THREEFOLD GRID DEVNET SNAPSHOTS +read only = true +timeout = 300 +list = false +``` + +### Start the Service + +Start and enable via systemd: + +```sh +systemctl start rsync +systemctl enable rsync +systemctl status rsync +``` + +If you're interested about hosting your own instance of the grid to strenghten the ThreeFold ecosystem, make sure to read the next section, [Guardians of the Grid](./tfgrid_guardians.md). \ No newline at end of file diff --git a/collections/documentation/developers/grid_deployment/tfgrid_stacks.md b/collections/documentation/developers/grid_deployment/tfgrid_stacks.md new file mode 100644 index 0000000..7845722 --- /dev/null +++ b/collections/documentation/developers/grid_deployment/tfgrid_stacks.md @@ -0,0 +1,32 @@ +

TFGrid Stacks

+

Table of Contents

+ + +- [Introduction](#introduction) +- [Advantages](#advantages) +- [Run Your Own Stack](#run-your-own-stack) + + +*** + +## Introduction + +ThreeFold is an open-source project and anyone can run the full stack of the TFGrid in a totally decentralized manner. In practice, this means that anyone can grab a docker compose file shared by ThreeFold of the TFGrid stacks and run an instance of the grid services on their own domain. + +This means that you could host your own instance of the ThreeFold Dashboard at `dashboard.yourdomain.com` that would serve your own instance of the complete TFGrid stack! Users could then access the ThreeFold Dashboard via your own domain. + +The process is actually very straightforward and we even provide a script to streamline the process. + +## Advantages + +Setting such instances of the TFGrid ensures resiliency and decentralization of the ThreeFold ecosystem. + +As a very concrete example, image that one instance of the Dashboard goes offline, `dashboard.grid.tf`, then users could still access the Dashboard from another instance. The more users of the TFGrid deploy their own instance, the more resilient the grid becomes. + +The overall ThreeFold ecosystem becomes more resilient to failures of individual nodes. + +## Run Your Own Stack + +To set your own instance of the TFGrid, you can download a snapshot of the grid and deploy the TFGrid services with Docker. We even provide scripts to quicken the whole process! + +Read more about snapshots in the [next section](./grid_deployment_full_vm.md). \ No newline at end of file diff --git a/collections/documentation/developers/internals/internals.md b/collections/documentation/developers/internals/internals.md new file mode 100644 index 0000000..af1c1f8 --- /dev/null +++ b/collections/documentation/developers/internals/internals.md @@ -0,0 +1,19 @@ +

Internals

+ +We present in this section of the developers book a partial list of system components. Content will be added progressively. + +

Table of Contents

+ +- [Reliable Message Bus (RMB)](rmb/rmb_toc.md) + - [Introduction to RMB](rmb/rmb_intro.md) + - [RMB Specs](rmb/rmb_specs.md) + - [RMB Peer](rmb/uml/peer.md) + - [RMB Relay](rmb/uml/relay.md) + +- [ZOS](zos/index.md) + - [Manual](./zos/manual/manual.md) + - [Workload Types](./zos/manual/workload_types.md) + - [Internal Modules](./zos/internals/internals.md) + - [Capacity](./zos/internals/capacity.md) + - [Performance Monitor Package](./zos/performance/performance.md) + - [API](./zos/manual/api.md) \ No newline at end of file diff --git a/collections/documentation/developers/internals/rmb/img/layout.png b/collections/documentation/developers/internals/rmb/img/layout.png new file mode 100644 index 0000000..a6fc981 Binary files /dev/null and b/collections/documentation/developers/internals/rmb/img/layout.png differ diff --git a/collections/documentation/developers/internals/rmb/img/peer.png b/collections/documentation/developers/internals/rmb/img/peer.png new file mode 100644 index 0000000..6d6a7d8 Binary files /dev/null and b/collections/documentation/developers/internals/rmb/img/peer.png differ diff --git a/collections/documentation/developers/internals/rmb/img/relay.png b/collections/documentation/developers/internals/rmb/img/relay.png new file mode 100644 index 0000000..6b8bf01 Binary files /dev/null and b/collections/documentation/developers/internals/rmb/img/relay.png differ diff --git a/collections/documentation/developers/internals/rmb/rmb_intro.md b/collections/documentation/developers/internals/rmb/rmb_intro.md new file mode 100644 index 0000000..bb08f99 --- /dev/null +++ b/collections/documentation/developers/internals/rmb/rmb_intro.md @@ -0,0 +1,107 @@ +

Introduction to Reliable Message Bus (RMB)

+ +

Table of Contents

+ +- [What is RMB](#what-is-rmb) +- [Why](#why) +- [Specifications](#specifications) +- [How to Use RMB](#how-to-use-rmb) +- [Libraries](#libraries) + - [Known Libraries](#known-libraries) + - [No Known Libraries](#no-known-libraries) +- [What is rmb-peer](#what-is-rmb-peer) +- [Download](#download) +- [Building](#building) +- [Running tests](#running-tests) + +*** + +## What is RMB + +Reliable message bus is a secure communication panel that allows `bots` to communicate together in a `chat` like way. It makes it very easy to host a service or a set of functions to be used by anyone, even if your service is running behind NAT. + +Out of the box RMB provides the following: + +- Guarantee authenticity of the messages. You are always sure that the received message is from whoever is pretending to be +- End to End encryption +- Support for 3rd party hosted relays. Anyone can host a relay and people can use it safely since there is no way messages can be inspected while using e2e. That's similar to `home` servers by `matrix` + +![layout](img/layout.png) +*** +## Why + +RMB is developed by ThreefoldTech to create a global network of nodes that are available to host capacity. Each node will act like a single bot where you can ask to host your capacity. This enforced a unique set of requirements: + +- Communication needed to be reliable + - Minimize and completely eliminate message loss + - Reduce downtime +- Node need to authenticate and authorize calls + - Guarantee identity of the other peer so only owners of data can see it +- Fast request response time + +Starting from this we came up with a more detailed requirements: + +- User (or rather bots) need their identity maintained by `tfchain` (a blockchain) hence each bot needs an account on tfchain to be able to use `rmb` +- Then each message then can be signed by the `bot` keys, hence make it easy to verify the identity of the sender of a message. This is done both ways. +- To support federation (using 3rd party relays) we needed to add e2e encryption to make sure messages that are surfing the public internet can't be sniffed +- e2e encryption is done by deriving an encryption key from the same identity seed, and share the public key on `tfchain` hence it's available to everyone to use +*** +## Specifications + +For details about protocol itself please check the [specs](./rmb_specs.md). +*** +## How to Use RMB + +There are many ways to use `rmb` because it was built for `bots` and software to communicate. Hence, there is no mobile app for it for example, but instead a set of libraries where you can use to connect to the network, make chitchats with other bots then exit. + +Or you can keep the connection forever to answer other bots requests if you are providing a service. +*** +## Libraries + +If there is a library in your preferred language, then you are in luck! Simply follow the library documentations to implement a service bot, or to make requests to other bots. + +### Known Libraries + +- Golang [rmb-sdk-go](https://github.com/threefoldtech/rmb-sdk-go) +- Typescript [rmb-sdk-ts](https://github.com/threefoldtech/rmb-sdk-ts) +*** +### No Known Libraries + +If there are no library in your preferred language, here's what you can do: + +- Implement a library in your preferred language +- If it's too much to do all the signing, verification, e2e in your language then use `rmb-peer` +*** +## What is rmb-peer + +think of `rmb-peer` as a gateway that stands between you and the `relay`. `rmb-peer` uses your mnemonics (your identity secret key) to assume your identity and it connects to the relay on your behalf, it maintains the connection forever and takes care of + +- reconnecting if connection was lost +- verifying received messages +- decrypting received messages +- sending requests on your behalf, taking care of all crypto heavy lifting. + +Then it provide a simple (plain-text) api over `redis`. means to send messages (or handle requests) you just need to be able to push and pop messages from some redis queues. Messages are simple plain text json. + +> More details can be found [here](./rmb_specs.md) + +*** +## Download + +Please check the latest [releases](https://github.com/threefoldtech/rmb-rs/releases) normally you only need the `rmb-peer` binary, unless you want to host your own relay. +*** +## Building + +```bash +git clone git@github.com:threefoldtech/rmb-rs.git +cd rmb-rs +cargo build --release --target=x86_64-unknown-linux-musl +``` +*** +## Running tests + +While inside the repository + +```bash +cargo test +``` diff --git a/collections/documentation/developers/internals/rmb/rmb_specs.md b/collections/documentation/developers/internals/rmb/rmb_specs.md new file mode 100644 index 0000000..2d28cbe --- /dev/null +++ b/collections/documentation/developers/internals/rmb/rmb_specs.md @@ -0,0 +1,258 @@ +

RMB Specs

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Overview of the Operation of RMB Relay](#overview-of-the-operation-of-rmb-relay) + - [Connections](#connections) + - [Peer](#peer) + - [Peer implementation](#peer-implementation) + - [Message Types](#message-types) + - [Output Requests](#output-requests) + - [Incoming Response](#incoming-response) + - [Incoming Request](#incoming-request) + - [Outgoing Response](#outgoing-response) +- [End2End Encryption](#end2end-encryption) +- [Rate Limiting](#rate-limiting) + +*** + +# Introduction + +RMB is (reliable message bus) is a set of protocols and tools (client and daemon) that aims to abstract inter-process communication between multiple processes running over multiple nodes. + +The point behind using RMB is to allow the clients to not know much about the other process, or where it lives (client doesn't know network addresses, or identity). Unlike HTTP(S) or gRPC where the caller must know exact address (or dns-name) and endpoints of the service it's trying to call. Instead RMB requires you to only know about + +- Twin ID (numeric ID) of the endpoint as defined by `tfchain` +- Command (string) is simply the function to call +- The request "body" which is binary blob that is passed to the command as is + - implementation of the command need then to interpret this data as intended (out of scope of rmb) + +Twins are stored on tfchain. hence identity of twins is granted not to be spoofed, or phished. When a twin is created he needs to define 2 things: + +- RMB Relay +- His Elliptic Curve public key (we use secp256k1 (K-256) elliptic curve) + +> This data is stored on tfchain forever, and only the twin can change it using his secure-key. Hence phishing is impossible. A twin can decide later to change this encryption key or relay. + +Once all twins has their data set correctly on the chain. Any 2 twins can communicate with full end-to-end encryption as follows: + +- A twin establish a WS connection to his relay of choice +- A twin create an `envelope` as defined by the protobuf [schema](https://github.com/threefoldtech/rmb-rs/blob/main/proto/types.proto) +- Twin fill in all envelope information (more about this later) +- Twin pushes the envelope to the relay + - If the destination twin is also using the same relay, message is directly forwarded to this twin + - If federation is needed (twin using different relay), message is forwarded to the proper twin. + +> NOTE: since a sender twin need to also encrypt the message for the receiver twin, a twin queries the `tf-chain` for the twin information. Usually it caches this data locally for reuse, hence clients need to make sure this data is always up-to-date. + +On the relay, the relay checks federation information set on the envelope and then decide to either to forward it internally to one of it's connected clients, or forward it to the destination relay. Hence relays need to be publicly available. + +When the relay receive a message that is destined to a `local` connected client, it queue it for delivery. The relay can maintain a queue of messages per twin to a limit. If the twin does not come back online to consume queued messages, the relay will start to drop messages for that specific twin client. + +Once a twin come online and connect to its peer, the peer will receive all queued messages. the messages are pushed over the web-socket as they are received. the client then can decide how to handle them (a message can be a request or a response). A message type can be inspected as defined by the schema. +*** +# Overview of the Operation of RMB Relay + +![relay](img/relay.png) + +## Connections + +By design, there can be only `ONE TWIN` with that specific ID. Hence only has `ONE RELAY` set on tfchain per twin. This force a twin to always use this defined relay if it wishes to open multiple connections to its relay. In other words, a twin once sets up a relay on its public information can only use that relay for all of its connections. If decided to change the relay address, all connections must use the new relay otherwise messages will get lost as they will be delivered to the wrong relay. + +In an RPC system, the response of a request must be delivered to the requester. Hence if a twin is maintaining multiple connections to its relay, it need to identify `uniquely` the connection to allow the relay to route back the responses to the right requester. We call this `id` a `session-id`. The `session-id` must be unique per twin. + +The relay can maintain **MULTIPLE** connections per peer given that each connection has a unique **SID** (session id). But for each (twin-id, session-id) combo there can be only one connection. if a new connection with the same (twin-id, session-id) is created, the older connection is dropped. + +The message received always has the session-id as part of the source address. a reply message then must have destination set back to the source as is, this allows the relay to route the message back correctly without the need to maintain an internal state. + +The `rmb-peer` process reserved the `None` sid. It connects with No session id, hence you can only run one `rmb-peer` per `twin` (identity). But the same twin (identity) can make other connection with other rmb clients (for example rmb-sdk-go direct client) to establish more connections with unique session ids. + +## Peer + +Any language or code that can open `WebSocket` connection to the relay can work as a peer. A peer need to do the following: + +- Authenticate with the relay. This is by providing a `JWT` that is signed by the twin key (more on that later) +- Handle received binary mesasge +- Send binary messages + +Each message is an object of type `Envelope` serialized as with protobuf. Type definition can be found under `proto/types.proto` + +## Peer implementation + +This project already have a peer implementation that works as local peer gateway. By running this peer instance it allows you to +run multiple services (and clients) behind that gateway and they appear to the world as a single twin. + +- The peer gateway (rmb-peer) starts and connects to realy +- If requests are received, they are verified, decrypted and pushed to a redis queue that as command specific (from the envelope) +- A service can then be waiting on this redis queue for new messages + - The service can process the command, and push a response back to a specific redis queue for responses. +- The gateway can then pull ready responses from the responses queue, create a valid envelope, encrypt, and sign and send to destination + +![peer](img/peer.png) + +### Message Types + +Concerning, `rmb-peer` message types, to make it easy for apps to work behind an `rmb-peer`, we use JSON message for communication between the local process and the rmb-peer. the rmb-peer still +maintains a fully binary communication with the relay. + +A request message is defined as follows + +#### Output Requests + +This is created by a client who wants to request make a request to a remote service + +> this message is pushed to `msgbus.system.local` to be picked up by the peer + +```rust +#[derive(Serialize, Deserialize, Clone, Debug)] +pub struct JsonOutgoingRequest { + #[serde(rename = "ver")] + pub version: usize, + #[serde(rename = "ref")] + pub reference: Option, + #[serde(rename = "cmd")] + pub command: String, + #[serde(rename = "exp")] + pub expiration: u64, + #[serde(rename = "dat")] + pub data: String, + #[serde(rename = "tag")] + pub tags: Option, + #[serde(rename = "dst")] + pub destinations: Vec, + #[serde(rename = "ret")] + pub reply_to: String, + #[serde(rename = "shm")] + pub schema: String, + #[serde(rename = "now")] + pub timestamp: u64, +} +``` + +#### Incoming Response + +A response message is defined as follows this is what is received as a response by a client in response to his outgoing request. + +> This response is what is pushed to `$ret` queue defined by the outgoing request, hence the client need to wait on this queue until the response is received or it times out + +```rust +#[derive(Serialize, Deserialize, Clone, Debug)] +pub struct JsonError { + pub code: u32, + pub message: String, +} + +#[derive(Serialize, Deserialize, Clone, Debug)] +pub struct JsonIncomingResponse { + #[serde(rename = "ver")] + pub version: usize, + #[serde(rename = "ref")] + pub reference: Option, + #[serde(rename = "dat")] + pub data: String, + #[serde(rename = "src")] + pub source: String, + #[serde(rename = "shm")] + pub schema: Option, + #[serde(rename = "now")] + pub timestamp: u64, + #[serde(rename = "err")] + pub error: Option, +} +``` + +#### Incoming Request + +An incoming request is a modified version of the request that is received by a service running behind RMB peer +> this request is received on `msgbus.${request.cmd}` (always prefixed with `msgbus`) + +```rust +#[derive(Serialize, Deserialize, Clone, Debug)] +pub struct JsonIncomingRequest { + #[serde(rename = "ver")] + pub version: usize, + #[serde(rename = "ref")] + pub reference: Option, + #[serde(rename = "src")] + pub source: String, + #[serde(rename = "cmd")] + pub command: String, + #[serde(rename = "exp")] + pub expiration: u64, + #[serde(rename = "dat")] + pub data: String, + #[serde(rename = "tag")] + pub tags: Option, + #[serde(rename = "ret")] + pub reply_to: String, + #[serde(rename = "shm")] + pub schema: String, + #[serde(rename = "now")] + pub timestamp: u64, +} +``` + +Services that receive this needs to make sure their responses `destination` to have the same value as the incoming request `source` + +#### Outgoing Response + +A response message is defined as follows this is what is sent as a response by a service in response to an incoming request. + +Your bot (server) need to make sure to set `destination` to the same value as the incoming request `source` + +The +> this response is what is pushed to `msgbus.system.reply` + +```rust +#[derive(Serialize, Deserialize, Clone, Debug)] +pub struct JsonOutgoingResponse { + #[serde(rename = "ver")] + pub version: usize, + #[serde(rename = "ref")] + pub reference: Option, + #[serde(rename = "dat")] + pub data: String, + #[serde(rename = "dst")] + pub destination: String, + #[serde(rename = "shm")] + pub schema: Option, + #[serde(rename = "now")] + pub timestamp: u64, + #[serde(rename = "err")] + pub error: Option, +} +``` +*** +# End2End Encryption + +Relay is totally opaque to the messages. Our implementation of the relay does not poke into messages except for the routing attributes (source, and destinations addresses, and federation information). But since the relay is designed to be hosted by other 3rd parties (hence federation) you should +not fully trust the relay or whoever is hosting it. Hence e2e was needed + +As you already understand e2e is completely up to the peers to implement, and even other implementations of the peers can agree on a completely different encryption algorithm and key sharing algorithm (again, relay does not care). But in our implementation of the e2e (rmb-peer) things goes like this + +- Each twin has a `pk` field on tfchain. when rmb-peer start, it generates an `secp256k1` key from the same seed as the user tfchain mnemonics. Note that this will not make the encryption key and the signing key any related, they just are driven from the same seed. +- On start, if the key is not already set on the twin object, the key is updated. +- If a peer A is trying to send a message to peer B. but peer B does not has his `pk` set, peer A will send the message in plain-text format (please check the protobuf envelope type for details) +- If peer B has public key set, peer A will prefer e2e encryption and will does the following: +- Drive a shared secret point with `ecdh` algorithm, the key is the `sha256` of that point +- `shared = ecdh(A.sk, B.pk)` +- create a 12 bytes random nonce +- encrypt data as `encrypted = aes-gcm.encrypt(shared-key, nonce, plain-data)` +- create cipher as `cipher nonce + encrypted` +- fill `envelope.cipher = cipher` +- on receiving a message peer B does the same in the opposite direction +- split data and nonce (nonce is always first 12 bytes) +- derive the same shared key +- `shared = ecdh(B.sk, A.pk)` +- `plain-data = aes-gcm.decrypt(shared-key, nonce, encrypted)` +*** +# Rate Limiting + +To avoid abuse of the server, and prevent DoS attacks on the relay, a rate limiter is used to limit the number of clients' requests.\ +It was decided that the rate limiter should only watch websocket connections of users, since all other requests/connections with users consume little resources, and since the relay handles the max number of users inherently.\ +The limiter's configurations are passed as a command line argument `--limit , `. `` represents the number of messages a twin is allowed to send in each time window, `` represents the total size of messages in bytes a twin is allowed to send in each time window.\ +Currently there are two implementations of the rate limiter: + +- `NoLimit` which imposes no limits on users. +- `FixedWindowLimiter` which breaks the timeline into fixed time windows, and allows a twin to send a fixed number of messages, with a fixed total size, in each time window. If a twin exceeded their limits in some time window, their message is dropped, an error message is sent back to the user, the relay dumps a log about this twin, and the user gets to keep their connection with the relay. diff --git a/collections/documentation/developers/internals/rmb/rmb_toc.md b/collections/documentation/developers/internals/rmb/rmb_toc.md new file mode 100644 index 0000000..0801571 --- /dev/null +++ b/collections/documentation/developers/internals/rmb/rmb_toc.md @@ -0,0 +1,18 @@ +

Reliable Message Bus (RMB)

+ +Reliable message bus is a secure communication panel that allows bots to communicate together in a chat like way. It makes it very easy to host a service or a set of functions to be used by anyone, even if your service is running behind NAT. + +Out of the box RMB provides the following: + +- Guarantee authenticity of the messages. + - You are always sure that the received message is from whoever is pretending to be. +- End to End encryption. +- Support for 3rd party hosted relays. + - Anyone can host a relay and people can use it safely since there is no way messages can be inspected while using e2e. That's similar to home servers by matrix. + +

Table of Contents

+ +- [Introduction to RMB](rmb_intro.md) +- [RMB Specs](rmb_specs.md) +- [RMB Peer](uml/peer.md) +- [RMB Relay](uml/relay.md) \ No newline at end of file diff --git a/collections/documentation/developers/internals/rmb/uml/peer.md b/collections/documentation/developers/internals/rmb/uml/peer.md new file mode 100644 index 0000000..80c195a --- /dev/null +++ b/collections/documentation/developers/internals/rmb/uml/peer.md @@ -0,0 +1,44 @@ +

RMB Peer

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) + +*** + +## Introduction + +We present an example of RMB peer. Note that the extension for this kind of file is `.wsd`. + +## Example + +``` +@startuml RMB + +participant "Local Process" as ps +database "Local Redis" as redis +participant "Rmb Peer" as peer + +participant "Rmb Relay" as relay +note across: Handling Out Request +peer --> relay: Establish connection + +ps -> redis: PUSH message on \n(msgbus.system.local) +redis -> peer : POP message from \n(msgbus.system.local) + +peer -> relay: message pushed over the websocket to the relay +... +relay -> peer: received response +peer -> redis: PUSH over $msg.reply_to queue +... +note across: Handling In Request +relay --> peer: Received a request +peer -> redis: PUSh request to `msgbus.$cmd` +redis -> ps: POP new request msg +ps -> ps: Process message +ps -> redis: PUSH to (msgbus.system.reply) +redis -> peer: POP from (msgbus.system.reply) +peer -> relay: send response message +@enduml +``` \ No newline at end of file diff --git a/collections/documentation/developers/internals/rmb/uml/relay.md b/collections/documentation/developers/internals/rmb/uml/relay.md new file mode 100644 index 0000000..0618d7b --- /dev/null +++ b/collections/documentation/developers/internals/rmb/uml/relay.md @@ -0,0 +1,40 @@ +

RMB Peer

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) + +*** + +## Introduction + +We present an example of RMB relay. Note that the extension for this kind of file is `.wsd`. + +## Example + + +``` +@startuml RMB +actor "Peer 1" as peer1 +participant "Relay 1" as relay1 +participant "Relay 2" as relay2 +actor "Peer 2" as peer2 +actor "Peer 3" as peer3 + +peer1 --> relay1: Establish WS connection +peer2 --> relay1: Establish WS connection +peer3 --> relay2: Establish WS connection + +peer1 -> relay1: Send message (Envelope)\n(destination "Peer 2") +relay1 -> peer2: Forward message directly + +peer1 -> relay1: Send message (Envelope)\n(destination "Peer 3") +note right +"Peer 3" does not live on "Relay 1" hence federation is +needed +end note +relay1 -> relay2: Federation of message for\n Peer 3 +relay2 -> peer3: Forward message directly +@enduml +``` \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/assets/.keep b/collections/documentation/developers/internals/zos/assets/.keep new file mode 100644 index 0000000..e69de29 diff --git a/collections/documentation/developers/internals/zos/assets/0-OS v2 architecture.xml b/collections/documentation/developers/internals/zos/assets/0-OS v2 architecture.xml new file mode 100644 index 0000000..263e7e3 --- /dev/null +++ b/collections/documentation/developers/internals/zos/assets/0-OS v2 architecture.xml @@ -0,0 +1 @@ +3Vlbc6JIFP41Pkp10/fHxNzciUkqJpPKvkyhNMoEaQfxll+/BwFFAWN2NTs1WhXgnL75fd85feg0SGu0uI6c8bBjXB00bOQuGuSiYduYKAyXxLJMLULR1DCIfDdrtDF0/XedGVFmnfqunmw1jI0JYn+8beybMNT9eMvmRJGZbzfzTLA969gZ6JKh23eCsvXFd+NhapUMbew32h8M85kxyjwjJ2+cGSZDxzXzgolcNkgrMiZO70aLlg4S8HJc0n5XNd71wiIdxod06LwFL4/3TaGmC7KInu7i29llU6SjzJxgmv3ghs0DGO/cMzAsrDpeZlDwX1OTO5qTFVFn0MCm48XGCXeD5Iqa911w3gAycAn0bCULJ+oP/RhomkY6nwcWnE6VdsywWs9qx3qR2IfxKAADhttJHJk33TKBicASmlAnq/KDYMfkBP4ghMc+AKTBfj7TUewDtWeZY+S7bjLN+TxZVXfs9JM55yBksEVmGro6wQ6tl1UEO8M/GVMvCqYM/GttRjqOltAk8xLC0i7LbYXPN7LCMrMNC5LKpeZkSh6sR96QDTcZ39Xcv0+dv9vu/Mf74/J1aI+uxY1eNnEF9zvgDwCFcc0PXcec08uboxqYajHBSFrbqIgyKpxVoGKjI8ByNls89cT928/OhedO23ejWfuiApZbP5yCxtE3HYWJjndA2igF12ipIN6iThs28VjyLYkaPHz1yeKtYE8/lZLcy/LHOs11ScsUMI4tySlFXAjFOcZlQvjn+Xie6Oi+9zNJ25AjnJ4O0q7d2ERJWoYkatwpwLwaOPDDt9TvOrEDyWeVuu0rINNudcZPwUXr2v9Fo3v75ZWddxhPu+3lmhaQyLRxJGJdR0uvX0lsX+qedwoCcy+2LSxsJgiWikHi+TDClLQwg50LM4oksRE/Cr3wWGB4L+N3Op6b6O0zjI9b8/nz2fMDX949dB9+hJft8eM245UbHj4d40xLl1YxLu0eWYXyqRiHALUI5VyKpADJK6w9fDMLU06EQpRySTH9ar5bkNMcP4Rd+ROMD0Of3l7Nb5+80ffX6K9Ji8jFAYzbJ2NcY+BcVDGuuCDOKRmHLcBimCAlkLApzqvsesoxwhDjjNsUEa6EUuJUnB9WeJLSLvu9syOGY22xnmf3KzOxy3ucnZIlypDFFSGC0qS4kx+RJCEPE8YUgfKGEKnUUTgqEXInNZFPj+8t8vCOWhPTPEPdJq2oBtMK3fVneXXegan8/iSZaVO+F/xVXUzow37uh4Pc2YtyX872rr1u8D9DFTXF1FfLoi43XwX+JD5u7VUpOPZH1l7WpoqiigDFv03tdVgaKB8IPLThL8LHjb7f6LVnHZXCkoJwhoXAinG1RVzVS5GEMtvmhEq4RZQcqYoq0eQJHPxiF8IffxuM3Nsb3X5rVmyfJX7Sd/cS9D2kieZV0CMtEexRxaMP/N8BrzsmqCfCVpYshkS5lrGRhSWFRCgpU5jLLwS+vE0+jweR4yZvq3Mn7g+TmvaIkXIAXWWC9krm44jYA7U4UUlSuWL1r0WOkHQQqkQNMbHy7OQXb/X538Vft38Uz2KwsqB+5wpLQphCxzmLOYyR/MyyuD1EZuZPfBOeKAA8xxVO5Wbeo5whengAqBqS9gbAadCGx83B/8pX+PcJufwHzZbbbqMwEIafBmlvUhmcQ/dyE9JGq22jNj1IexO54IBbYyPbOdCn3zGYAEn2KHWVG7A/hpnh98wID0+y3bUieXojY8q9AMU7D4deEPj9oA83S4qKjC5HFUgUi51RAxbsnTqIHF2zmOqOoZGSG5Z3YSSFoJHpMKKU3HbNVpJ3o+YkoUdgERF+TJ9ZbNKKXg5Qw2eUJWkd2UfuSUZqYwd0SmK5bSE89fBESWmqVbabUG7Fq3V5jHbJZpbN9dOX3djoaPn9ftmrnF39zSv7T1BUmH92fXuHk2/oDqvgHQ309eh29PWxF1SuN4SvnV5eMOQQZJxC1GFiVwsjlRU5QFAda9DVWbyo2qImELx5rYZ5DR5Spq2uzgkqd4rqXArNXkpkpK00SmzeWwb6Qw2UH2HS8g0RyYyJxBoRQ+AmV3BBvfmilUJ+mMGaHxLOauL8rKSyV8602e+gIg1hgto1ETFcn25aYRofJ7wy/VaWkADhMntqAfoE52eYYVJ4wcQGYZzqQhuaVfuLi4s/9I564djmR3ISMVPY0uBEiFKYXxzNsUuAbW3KIjdF3TmG7spCMBkH4MNSGyXf6ERy0AeHQgqwHMN38AOkbWKQDQ4Hze5B5gB60Dp4vE2ZoQvgNtQWBg8wuaFqxcsGS1kcUwFMybWIqS1gtM8QzCCzg+b+TWf4+3aFOUdlRo2ysjkvfdfgRXe7bcYF/uxY2h4V2EHiRlSy99y0ISxcJ57uytclyeZqOnib3r8OJrOiGIXENXK7K1HvCkr8yhXowUE1Kvk/0bZ1hh8loj/sqoiDYxn94ISMw49S8Xi2hcdteY5S7qU7GynxiYIsZ1Ap5fkq2b/8f0rCtvkXKJ+1/qjw9Ac=nZRLc9owEIB/DTPtgRmDSUqP5dGmzUAOZkJuHWGtLSWy1pEFBn59V7aMMcykSS629Gm1u9pXL5xm+1+G5WKBHFRvGPB9L5z1hsPBaDiinyOHmoyDcQ1SI7kXakEkj+Bh4OlWcig6ghZRWZl3YYxaQ2w7jBmDZVcsQdW1mrMUrkAUM3VN15Jb4V9xE7T8DmQqGsuDwJ9krBH2oBCMY3mGwnkvnBpEW6+y/RSUC14Tl9fs7/z+9zJePa/5w7d+eHxdPPVrZT8/cuX0BAPaflp1Bs/s+EP8ye94+phE6/K+NP5KsGNq6+PVG94qMjIRZPU2dasl2BLNC4lRdWwprl5iYxqJhpDx9loD8washCxcXL2SoNoZKHLUhdxUyKKrNGDO71JS/KkGqkdY4Y517YnUKW0woU/Qf4je9Ce/ZFt1SZRsyIYVMnZ2mK1tJEg6g8eF80PzukotkxrMm0Zbjdc2jmDQSqegeU4VFKapEDOX388qxhz0joIWi3demC2jd0rO9Q4P5OWXKj8bqPMGzMQC+NcPekzwPAdVZ9lD064W9lX12UwRGNCysAZfYIqKchHONGqSnCRSqQtU5Cx2WQtnN+1uhTmBPvVrOCmFtBARd6ZKmnYuaDswiaq6WkjOQRMzuNUcXNcEJw9JjDy7mCj/acfBaUbQcAXMwBoXRK9l5Fvv0N2W7YwKv3smzufT2EPm52J60tz2Pi18+zfbdkpVZ2ezPpz/Aw==nZTbjtowEIafJlJ7sVIOsKW3pLRbCbSVoN3V3qxMMsQWju06DoF9+o4Tm5CAtmqvYn/5Z8aHfxwkaXn8pomiK5kDD+IwPwbJlyCOo0k8wY8lp47MwlkHCs1yJ+rBmr2Bg6GjNcuhGgiNlNwwNYSZFAIyM2BEa9kMZTvJh1UVKeAKrDPCr+kTyw11u5iGPX8AVlBfOQrdn5J4sQMVJblsLlCyCJJUS2m6UXlMgdvD8+eyme5f1a4U+2hJl2/Ln9ly9XzXJfv6LyHnLWgQ5r9T/1Y/tpvFcfUkXn89fyoeX2bw4ELCA+G1O68gvudYZE6x6n1hR6kUhjABGoXojxpP1mm22ms8wfJ9oIfKgw1llT1ZlyRsZxoqJUXFti0y0noNiF15w/AGWmd09at366oxy9lhjG4G3tDdQDUfE8480bXIWscIdFNpLykOPzym3z++W7ePv4DXZcYrbK1oTt7fBo7tdZmSI4hwWBkt95BKLjUSIQUq5zvG+QhVimRMFAim/WwjFYI7NHgybygzsEZuSzX4PCCTB9A73rYBZXkOwm5f1iIHa7PwvEKU4cpGLfgX/0bnpsLXCGQJRp8wzmWZOK+ehtOmb+rks2P0sqETB4l7SIpz5r5ZcOD6xU/7tm7/XTyOyeIPrZTBTuMwEIafJkekNCksXBtYWLEcVkUsN2TiaWxhe4LjNC1Pv+PEbppUAq3EKfbn8fzj0T9J8kLvbi2rxQNyUEmW8l2SXydZtlhmS/p4sh/IZXo5gMpKHoJGsJYfEGAaaCs5NJNAh6icrKewRGOgdBPGrMVuGrZBNVWtWQUnYF0ydUr/Su5EeMV5OvI7kJWIyos0nGgWgwNoBOPYHaH8JskLi+iGld4VoHzzYl9+81/ZD/P+/PwOrHiBpsCr+7Mh2c//uXJ4ggXjvjd1NqTeMtWGfiXZhSKRlSDVi8qvnh4ogozRUkvD4auNh5GQ7ngjwjqCRyEb39KQJO13FpoaTSNfe+TQmwyYL7mT1HqqSlrX9kCzUkhDNvpMv54zLrdz1Ko5UTKSP6BbSnPfv1czQ87QvuGfaY63j+CpCMFJMb2d3D561MGub7nTisCClo2z+AYFKrREDBqKXG2kUjPU1KyUpiJwPu4esSZwRibNV52QDtbEvVRHI04Mt2A3qreykJyDIWaxNRy8VdJDhRRGlc3G6AsPLg6DQX8UQA3O7uleyLIMo7SfbrtxMPOrwMTxUOYBsvAzqA6ZR8PTIng+bsfR7M+OfnD5zT8= \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/assets/0-OS-upgrade.png b/collections/documentation/developers/internals/zos/assets/0-OS-upgrade.png new file mode 100644 index 0000000..f82c801 Binary files /dev/null and b/collections/documentation/developers/internals/zos/assets/0-OS-upgrade.png differ diff --git a/collections/documentation/developers/internals/zos/assets/0-OS-upgrade.wsd b/collections/documentation/developers/internals/zos/assets/0-OS-upgrade.wsd new file mode 100644 index 0000000..26ef653 --- /dev/null +++ b/collections/documentation/developers/internals/zos/assets/0-OS-upgrade.wsd @@ -0,0 +1,12 @@ +@startuml +start +:power on node; +repeat +:mount boot flist; +:copy files to node root; +:reconfigure services; +:restart services; +repeat while (new flist version?) is (yes) + -> power off; +stop +@enduml \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/assets/0-OS_v2_architecture.png b/collections/documentation/developers/internals/zos/assets/0-OS_v2_architecture.png new file mode 100644 index 0000000..d4fdd98 Binary files /dev/null and b/collections/documentation/developers/internals/zos/assets/0-OS_v2_architecture.png differ diff --git a/collections/documentation/developers/internals/zos/assets/Container_module_flow.png b/collections/documentation/developers/internals/zos/assets/Container_module_flow.png new file mode 100644 index 0000000..95175ae Binary files /dev/null and b/collections/documentation/developers/internals/zos/assets/Container_module_flow.png differ diff --git a/collections/documentation/developers/internals/zos/assets/boot_sequence.plantuml b/collections/documentation/developers/internals/zos/assets/boot_sequence.plantuml new file mode 100644 index 0000000..ac5a663 --- /dev/null +++ b/collections/documentation/developers/internals/zos/assets/boot_sequence.plantuml @@ -0,0 +1,50 @@ +@startuml + +package "node-ready"{ + [local-modprobe] + [udev-trigger] + [redis] + [haveged] + [cgroup] + [redis] +} + +package "boot" { + [storaged] + [internet] + [networkd] + [identityd] +} + +package "internal modules"{ + [flistd] + [containerd] + [contd] + [upgraded] + [provisiond] +} + +[local-modprobe]<-- [udev-trigger] +[udev-trigger] <-- [storaged] +[udev-trigger] <-- [internet] +[storaged] <-- [identityd] + +[identityd] <- [networkd] + +[internet] <-- [networkd] +[networkd] <-- [containerd] +[storaged] <-- [containerd] + +[containerd] <-- [contd] + +[storaged] <-- [flistd] +[networkd] <-- [flistd] + +[flistd] <-- [upgraded] +[networkd] <-- [upgraded] + +[networkd] <-- [provisiond] +[flistd] <-- [provisiond] +[contd] <-- [provisiond] + +@enduml diff --git a/collections/documentation/developers/internals/zos/assets/boot_sequence.png b/collections/documentation/developers/internals/zos/assets/boot_sequence.png new file mode 100644 index 0000000..9a1fc5e Binary files /dev/null and b/collections/documentation/developers/internals/zos/assets/boot_sequence.png differ diff --git a/collections/documentation/developers/internals/zos/assets/grid_provisioning.png b/collections/documentation/developers/internals/zos/assets/grid_provisioning.png new file mode 100644 index 0000000..3cbb45e Binary files /dev/null and b/collections/documentation/developers/internals/zos/assets/grid_provisioning.png differ diff --git a/collections/documentation/developers/internals/zos/assets/grid_provisioning.wsd b/collections/documentation/developers/internals/zos/assets/grid_provisioning.wsd new file mode 100644 index 0000000..48931fc --- /dev/null +++ b/collections/documentation/developers/internals/zos/assets/grid_provisioning.wsd @@ -0,0 +1,37 @@ +@startuml +title Provisioning of a resource space + +autonumber +actor User as user +' entity Farmer as farmer +entity Network as network +database Blockchain as bc +boundary Node as node +collections "Resource space" as rs + +== Resource research == +user -> network: Send resource request +activate network +network -> node: broadcast resource request +activate node +deactivate network +...broadcast to all nodes... +node -> user: Send offer +user -> user: inspect offer + +== Resource space negotiation == +user -> node: accept offer +user <-> node: key exchange +user -> bc: money is locked on blockchain +... +node -> rs: create resrouce space +activate rs +node -> user: notify space is created +node -> bc: notify he created the space +user -> rs: make sure it can access the space +user -> bc: validate can access the space +bc -> node: money is released to the node +deactivate node +== Usage of the space == +user -> rs: deploy workload +@enduml \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/assets/grid_provisioning2.png b/collections/documentation/developers/internals/zos/assets/grid_provisioning2.png new file mode 100644 index 0000000..2150407 Binary files /dev/null and b/collections/documentation/developers/internals/zos/assets/grid_provisioning2.png differ diff --git a/collections/documentation/developers/internals/zos/assets/grid_provisioning2.wsd b/collections/documentation/developers/internals/zos/assets/grid_provisioning2.wsd new file mode 100644 index 0000000..76370a3 --- /dev/null +++ b/collections/documentation/developers/internals/zos/assets/grid_provisioning2.wsd @@ -0,0 +1,42 @@ +@startuml +title Provisioning a workload on the TFGrid + +autonumber +actor "User" as user +actor "Farmer" as farmer +database "TF Explorer" as explorer +database Blockchain as blockchain +boundary Node as node + +== Price definition == +farmer -> explorer: Farmer set the price of its Resource units +== Resource research == +activate explorer +user -> explorer: User look where to deploy the workload +user <- explorer: Gives detail about the farmer owning the node selected +== Resource reservation == +user -> explorer: write description of the workload +explorer -> user: return a list of transaction to execute on the blockchain +== Reservation processing == +user -> blockchain: execute transactions +explorer <-> blockchain: verify transactions are done +explorer -> explorer: reservation status changed to `deploy` +== Resource provisioning == +node <-> explorer: read description of the workloads +node -> node: provision workload +alt provision successfull + node -> explorer: write result of the provisining + explorer -> blockchain: forward token to the farmer + blockchain -> farmer: tokens are available to the farmer + user <- explorer: read the connection information to his workload +else provision error + node -> explorer: write result of the provisining + explorer -> explorer: cancel reservation + node -> node: free up capacity + explorer -> blockchain: token refunded to user + blockchain <-> user: tokens are available to the user again +end +deactivate explorer +== Resource monitoring == +user <-> node: use / monitor workload +@enduml \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/assets/ipc.plantuml b/collections/documentation/developers/internals/zos/assets/ipc.plantuml new file mode 100644 index 0000000..20bb31d --- /dev/null +++ b/collections/documentation/developers/internals/zos/assets/ipc.plantuml @@ -0,0 +1,20 @@ +@startuml + +== Initialization == +Module -> MsgBroker: Announce Module +MsgBroker -> Module: create bi-directional channel + +== Utilisation == +loop + DSL -> MsgBroker: put RPC message + activate MsgBroker + Module <- MsgBroker: pull RPC message + activate Module + Module -> Module: execute method + Module -> MsgBroker: put reponse + deactivate Module + MsgBroker -> DSL : read reponse + deactivate MsgBroker +end + +@enduml \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/assets/ipc.png b/collections/documentation/developers/internals/zos/assets/ipc.png new file mode 100644 index 0000000..9cc940b Binary files /dev/null and b/collections/documentation/developers/internals/zos/assets/ipc.png differ diff --git a/collections/documentation/developers/internals/zos/assets/market.png b/collections/documentation/developers/internals/zos/assets/market.png new file mode 100644 index 0000000..801bdf6 Binary files /dev/null and b/collections/documentation/developers/internals/zos/assets/market.png differ diff --git a/collections/documentation/developers/internals/zos/assets/market.wsd b/collections/documentation/developers/internals/zos/assets/market.wsd new file mode 100644 index 0000000..cd49bd7 --- /dev/null +++ b/collections/documentation/developers/internals/zos/assets/market.wsd @@ -0,0 +1,22 @@ +@startuml +actor User as user +box "To Be Defined" #LightBlue + participant Market +end box +entity Farmer as farmer +boundary Node as node + +user -> farmer: Request space +activate farmer +farmer -> node: reserve space +activate node +farmer -> user: confirmation +deactivate farmer +... +note over user, node: communication allows only owner of space +user -> node: deploy services +... +user -> farmer: destroy space +farmer -> node: delete space +deactivate node +@enduml \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/development/README.md b/collections/documentation/developers/internals/zos/development/README.md new file mode 100644 index 0000000..63b7034 --- /dev/null +++ b/collections/documentation/developers/internals/zos/development/README.md @@ -0,0 +1,6 @@ +Development +=========== + +* [Quick start](./quickstart.md) +* [Testing](./testing.md) +* [Binary packages](./packages.md) \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/development/net.sh b/collections/documentation/developers/internals/zos/development/net.sh new file mode 100755 index 0000000..ffca7f7 --- /dev/null +++ b/collections/documentation/developers/internals/zos/development/net.sh @@ -0,0 +1,30 @@ +#!/bin/bash + +# This is the same as the first case at qemu/README.md in a single script + +sudo ip link add zos0 type bridge +sudo ip link set zos0 up + +sudo ip addr add 192.168.123.1/24 dev zos0 +md5=$(echo $USER| md5sum ) +ULA=${md5:0:2}:${md5:2:4}:${md5:6:4} +sudo ip addr add fd${ULA}::1/64 dev zos0 +# you might want to add fe80::1/64 +sudo ip addr add fe80::1/64 dev zos0 + +sudo iptables -t nat -I POSTROUTING -s 192.168.123.0/24 -j MASQUERADE +sudo ip6tables -t nat -I POSTROUTING -s fd${ULA}::/64 -j MASQUERADE +sudo iptables -t filter -I FORWARD --source 192.168.123.0/24 -j ACCEPT +sudo iptables -t filter -I FORWARD --destination 192.168.123.0/24 -j ACCEPT +sudo sysctl -w net.ipv4.ip_forward=1 + +sudo dnsmasq --strict-order \ + --except-interface=lo \ + --interface=zos0 \ + --bind-interfaces \ + --dhcp-range=192.168.123.20,192.168.123.50 \ + --dhcp-range=::1000,::1fff,constructor:zos0,ra-stateless,12h \ + --conf-file="" \ + --pid-file=/var/run/qemu-dnsmasq-zos0.pid \ + --dhcp-leasefile=/var/run/qemu-dnsmasq-zos0.leases \ + --dhcp-no-override diff --git a/collections/documentation/developers/internals/zos/development/packages.md b/collections/documentation/developers/internals/zos/development/packages.md new file mode 100644 index 0000000..f7391f6 --- /dev/null +++ b/collections/documentation/developers/internals/zos/development/packages.md @@ -0,0 +1,61 @@ +# Adding a new package + +Binary packages are added via providing [a build script](../../bins/), then an automated workflow will build/publish an flist with this binary. + +For example, to add `rmb` binary, we need to provide a bash script with a `build_rmb` function: + + +```bash +RMB_VERSION="0.1.2" +RMB_CHECKSUM="4fefd664f261523b348fc48e9f1c980b" +RMB_LINK="https://github.com/threefoldtech/rmb-rs/releases/download/v${RMB_VERSION}/rmb" + +download_rmb() { + echo "download rmb" + download_file ${RMB_LINK} ${RMB_CHECKSUM} rmb +} + +prepare_rmb() { + echo "[+] prepare rmb" + github_name "rmb-${RMB_VERSION}" +} + +install_rmb() { + echo "[+] install rmb" + + mkdir -p "${ROOTDIR}/bin" + + cp ${DISTDIR}/rmb ${ROOTDIR}/bin/ + chmod +x ${ROOTDIR}/bin/* +} + +build_rmb() { + pushd "${DISTDIR}" + + download_rmb + popd + + prepare_rmb + install_rmb +} +``` + +Note that, you can just download a statically build binary instead of building it. + + +The other step is to add it to workflow to be built automatically, in [bins workflow](../../.github/workflows/bins.yaml), add your binary's job: + +```yaml +jobs: + containerd: + ... + ... + rmb: + uses: ./.github/workflows/bin-package.yaml + with: + package: rmb + secrets: + token: ${{ secrets.HUB_JWT }} +``` + +Once e.g. a `devnet` release is published, your package will be built then pushed to an flist repository. After that, you can start your local zos node, wait for it to finish downloading, then you should find your binary available. \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/development/quickstart.md b/collections/documentation/developers/internals/zos/development/quickstart.md new file mode 100644 index 0000000..0563b61 --- /dev/null +++ b/collections/documentation/developers/internals/zos/development/quickstart.md @@ -0,0 +1,70 @@ +# Quick start + +- [Quick start](#quick-start) + - [Starting a local zos node](#starting-a-local-zos-node) + - [Accessing node](#accessing-node) + - [Development](#development) + +## Starting a local zos node + +* Make sure `qemu` and `dnsmasq` are installed +* [Create a farm](../manual/manual.md#creating-a-farm) +* [Download a zos image](https://bootstrap.grid.tf/kernel/zero-os-development-zos-v3-generic-7e587e499a.efi) +* Make sure `zos0` bridge is allowed by qemu, you can add `allow zos0` in `/etc/qemu/bridge.conf` (create the file if it's not there) +* Setup the network using this script [this script](./net.sh) + +Then, inside zos repository + +``` +make -C cmds +cd qemu +mv ./zos.efi +sudo ./vm.sh -n myzos-01 -c "farmer_id= printk.devmsg=on runmode=dev" +``` + +You should see the qemu console and boot logs, wait for awhile and you can [browse farms](https://dashboard.dev.grid.tf/explorer/farms) to see your node is added/detected automatically. + +To stop the machine you can do `Control + a` then `x`. + +You can read more about setting up a qemu development environment and more network options [here](../../qemu/README.md). + +## Accessing node + +After booting up, the node should start downloading external packages, this would take some time depending on your internet connection. + +See [how to ssh into it.](../../qemu/README.md#to-ssh-into-the-machine) + +How to get the node IP? +Given the network script `dhcp-range`, it usually would be one of `192.168.123.43`, `192.168.123.44` or `192.168.123.45`. + +Or you can simply install `arp-scan` then do something like: + +``` +✗ sudo arp-scan --interface=zos0 --localnet +Interface: zos0, type: EN10MB, MAC: de:26:45:e6:87:95, IPv4: 192.168.123.1 +Starting arp-scan 1.9.7 with 256 hosts (https://github.com/royhills/arp-scan) +192.168.123.44 54:43:83:1f:eb:81 (Unknown) +``` + +Now we know for sure it's `192.168.123.44`. + +To check logs and see if the downloading of packages is still in progress, you can simply do: + +``` +zinit log +``` + +## Development + +While the overlay will enable your to boot with the binaries that's been built locally, sometimes you'll need to test the changes of certain modules without restarting the node (or intending to do so for e.g. testing a migration). + +For example if we changed anything related to `noded`, we can do the following: + +Inside zos repository: + +* Build binaries locally + * `make -C cmds` +* Copy the binary inside the machine + * `scp bin/zos root@192.168.123.44:/bin/noded` +* SSH into the machine then use `zinit` to restart it: + * `zinit stop noded && zinit start noded` diff --git a/collections/documentation/developers/internals/zos/development/testing.md b/collections/documentation/developers/internals/zos/development/testing.md new file mode 100644 index 0000000..8f023ec --- /dev/null +++ b/collections/documentation/developers/internals/zos/development/testing.md @@ -0,0 +1,157 @@ +# Testing + +Beside unit testing, you might want to test your change in an integrated environment, the following are two options to do it. + +- [Testing](#testing) + - [Using grid/node client](#using-gridnode-client) + - [Using a test app](#using-a-test-app) + - [An example to talk to container and qsfs modules](#an-example-to-talk-to-container-and-qsfs-modules) + - [An example of directly using zinit package](#an-example-of-directly-using-zinit-package) + + +## Using grid/node client + +You can simply use any grid client to deploy a workload of any type, you should specify your node's twin ID (and make sure you are on the correct network). + +Inside the node, you can do `noded -id` and `noded -net` to get your current node ID and network. Also, [you can check your farm](https://dashboard.dev.grid.tf/explorer/farms) and get node information from there. + +Another option is the golang [node client](../manual/manual.md#interaction). + +While deploying on your local node, logs with `zinit log` would be helpful to see any possible errors and to debug your code. + +## Using a test app + +If you need to test a specific module or functionality, you can create a simple test app inside e.g. [tools directory](../../tools/). + +Inside this simple test app, you can import any module or talk to another one using [zbus](../internals/internals.md#ipc). + +### An example to talk to container and qsfs modules + + +```go +// tools/del/main.go + +package main + +import ( + "context" + "flag" + "strings" + "time" + + "github.com/rs/zerolog" + "github.com/rs/zerolog/log" + + "github.com/threefoldtech/zbus" + "github.com/threefoldtech/zos/pkg" + "github.com/threefoldtech/zos/pkg/stubs" +) + +func main() { + zerolog.SetGlobalLevel(zerolog.DebugLevel) + + zbus, err := zbus.NewRedisClient("unix:///var/run/redis.sock") + if err != nil { + log.Err(err).Msg("cannot init zbus client") + return + } + + var workloadType, workloadID string + + flag.StringVar(&workloadType, "type", "", "workload type (qsfs or container)") + flag.StringVar(&workloadID, "id", "", "workload ID") + + flag.Parse() + + if workloadType == "" || workloadID == "" { + log.Error().Msg("you need to provide both type and id") + return + } + + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + defer cancel() + + if workloadType == "qsfs" { + qsfsd := stubs.NewQSFSDStub(zbus) + err := qsfsd.SignalDelete(ctx, workloadID) + if err != nil { + log.Err(err).Msg("cannot delete qsfs workload") + } + } else if workloadType == "container" { + args := strings.Split(workloadID, ":") + if len(args) != 2 { + log.Error().Msg("container id must contain namespace, e.g. qsfs:wl129") + } + + containerd := stubs.NewContainerModuleStub(zbus) + err := containerd.SignalDelete(ctx, args[0], pkg.ContainerID(args[1])) + if err != nil { + log.Err(err).Msg("cannot delete container workload") + } + } + +} +``` + +Then we can simply build, upload and execute this in our node: + +``` +cd tools/del +go build +scp del root@192.168.123.44:/root/del +``` + +Then ssh into `192.168.123.44` and simply execute your test app: + +``` +./del +``` + +### An example of directly using zinit package + +```go +// tools/zinit_test +package main + +import ( + "encoding/json" + "fmt" + "regexp" + + "github.com/rs/zerolog" + "github.com/rs/zerolog/log" + + "github.com/threefoldtech/zos/pkg/zinit" +) + +func main() { + zerolog.SetGlobalLevel(zerolog.DebugLevel) + z := zinit.New("/var/run/zinit.sock") + + regex := fmt.Sprintf(`^ip netns exec %s %s`, "ndmz", "/sbin/udhcpc") + _, err := regexp.Compile(regex) + if err != nil { + log.Err(err).Msgf("cannot compile %s", regex) + return + } + + // try match + matched, err := z.Matches(zinit.WithExecRegex(regex)) + if err != nil { + log.Err(err).Msg("cannot filter services") + } + + matchedStr, err := json.Marshal(matched) + if err != nil { + log.Err(err).Msg("cannot convert matched map to json") + } + + log.Debug().Str("matched", string(matchedStr)).Msg("matched services") + + // // try destroy + // err = z.Destroy(10*time.Second, matched...) + // if err != nil { + // log.Err(err).Msg("cannot destroy matched services") + // } +} +``` \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/faq/readme.md b/collections/documentation/developers/internals/zos/faq/readme.md new file mode 100644 index 0000000..f241686 --- /dev/null +++ b/collections/documentation/developers/internals/zos/faq/readme.md @@ -0,0 +1,6 @@ +# FAQ + +This section consolidated all the common question we get about how 0-OS work and how to operate it. + +- **Q**: What is the preferred configuration for my raid controller when running 0-OS ? + **A**: 0-OS goal is to expose raw capacity. So it is best to always try to give him access to the most raw access to the disks. In case of raid controllers, the best is to try to set it up in [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD) mode if available. diff --git a/collections/documentation/developers/internals/zos/internals/boot.md b/collections/documentation/developers/internals/zos/internals/boot.md new file mode 100644 index 0000000..bb98c8a --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/boot.md @@ -0,0 +1,11 @@ +# Services Boot Sequence + +Here is dependency graph of all the services started by 0-OS: + +![boot sequence](../assets/boot_sequence.png) + +## Pseudo boot steps + +both `node-ready` and `boot` are not actual services, but instead they are there to define a `boot stage`. for example once `node-ready` service is (ready) it means all crucial system services defined by 0-initramfs are now running. + +`boot` service is similar, but guarantees that some 0-OS services are running (for example `storaged`), before starting other services like `flistd` which requires `storaged` diff --git a/collections/documentation/developers/internals/zos/internals/capacity.md b/collections/documentation/developers/internals/zos/internals/capacity.md new file mode 100644 index 0000000..04d1137 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/capacity.md @@ -0,0 +1,89 @@ +

Capacity

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [System reserved capacity](#system-reserved-capacity) + - [Reserved Memory](#reserved-memory) + - [Reserved Storage](#reserved-storage) +- [User Capacity](#user-capacity) + - [Memory](#memory) + - [Storage](#storage) + +*** + +## Introduction + +This document describes how ZOS does the following tasks: + +- Reserved system resources + - Memory + - Storage +- Calculation of free usable capacity for user workloads + +## System reserved capacity + +ZOS always reserve some amount of the available physical resources to its own operation. The system tries to be as protective +as possible of it's critical services to make sure that the node is always reachable and usable even if it's under heavy load + +ZOS make sure it reserves Memory and Storage (but not CPU) as per the following: + +### Reserved Memory + +ZOS reserve 10% of the available system memory for basic services AND operation overhead. The operation overhead can happen as a side effect of running user workloads. For example, a user network while in theory does not consume any memory, in matter of fact it also consume some memory (kernel buffers, etc...). Same for a VM. A user VM can be assigned say 5G but the process that running the VM can/will take few extra megabytes to operate. + +This is why we decided to play on the safe side, and reserve 10% of total system memory to the system overhead, with a **MIN** reserved memory of 2GB + +```python +reserved = min(total_in_gb * 0.1, 2G) +``` + +### Reserved Storage + +While ZOS does not require installation, but it needs to download and store many things to operate correctly. This include the following: + +- Node identity. Information about the node id and keys +- The system binaries, those what include all zos to join the grid and operate as expected +- Workload flists. Those are the flists of the user workloads. Those are downloaded on demand so they don't always exist. +- State information. Tracking information maintained by ZOS to track the state of workloads, owner-ship, and more. + +This is why the system on first start allocates and reserve a part of the available SSD storage and is called `zos-cache`. Initially is `5G` (was 100G in older version) but because the `dynamic` nature of the cache we can't fix it at `5G` + +The required space to be reserved by the system can dramatically change based on the amount of workloads running on the system. For example if many users are running many different VMs, the system will need to download (and cache) different VM images, hence requiring more cache. + +This is why the system periodically checks the reserved storage and then dynamically expand or shrink to a more suitable value in increments of 5G. The expansion happens around the 20% of current cache size, and shrinking if went below 20%. + +## User Capacity + +All workloads requires some sort of a resource(s) to run and that is actually what the user hae to pay for. Any workload can consume resources in one of the following criteria: + +- CU (compute unit in vCPU) +- MU (memory unit in bytes) +- NU (network unit in bytes) +- SU (ssd storage in bytes) +- HU (hdd storage in bytes) + +A workloads, based on the type can consume one or more of those resource types. Some workloads will have a well known "size" on creation, others might be dynamic and won't be know until later. + +For example, a disk workload SU consumption will be know ahead. Unlike the NU used by a network which will only be known after usage over a certain period of time. + +A single deployment can has multiple workloads each requires a certain amount of one or more capacity types (listed above). ZOS then for each workloads type compute the amount of resources needed per workload, and then check if it can provide this amount of capacity. + +> This means that a deployment that define 2 VMs can partially succeed to deploy one of the VMs but not the other one if the amount of resources it requested are higher than what the node can provide + +### Memory + +How the system decide if there are enough memory to run a certain workload that demands MU resources goes as follows: + +- compute the "theoretically used" memory by all user workloads excluding `self`. This is basically the sum of all consumed MU units of all active workloads (as defined by their corresponding deployments, not as per actually used in the system). +- The theoretically used memory is topped with the system reserved memory. +- The the system checks actually used memory on the system this is done simply by doing `actual_used = memory.total - memory.available` +- The system now can simply `assume` an accurate used memory by doing `used = max(actual_used, theoretically_used)` +- Then `available = total - used` +- Then simply checks that `available` memory is enough to hold requested workload memory! + +### Storage + +Storage is much simpler to allocate than memory. It's completely left to the storage subsystem to find out if it can fit the requested storage on the available physical disks or not, if not possible the workloads is marked as error. + +Storage tries to find the requested space based on type (SU or HU), then find the optimal way to fit that on the available disks, or spin up a new one if needed. diff --git a/collections/documentation/developers/internals/zos/internals/compatibility/readme.md b/collections/documentation/developers/internals/zos/internals/compatibility/readme.md new file mode 100644 index 0000000..1eb4394 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/compatibility/readme.md @@ -0,0 +1,14 @@ +# Compatibility list + +This document track all the hardware that have been tested, the issues encountered and possible workarounds. + +**Legend** +✅ : fully supported +⚠️ : supported with some tweaking +🛑 : not supported + + +| vendor | Hardware | Support | Issues | workaround | +| --- | --- | --- | --- | --- | +| Supermicro | SYS-5038ML-H8TRF | ✅ | | | +| Gigabyte Technology Co | AB350N-Gaming WIFI | ✅ | | | diff --git a/collections/documentation/developers/internals/zos/internals/container/readme.md b/collections/documentation/developers/internals/zos/internals/container/readme.md new file mode 100644 index 0000000..6d5d2e2 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/container/readme.md @@ -0,0 +1,106 @@ +

Container Module

+ +

Table of Contents

+ +- [ZBus](#zbus) +- [Home Directory](#home-directory) +- [Introduction](#introduction) + - [zinit unit](#zinit-unit) +- [Interface](#interface) + +*** + +## ZBus + +Storage module is available on zbus over the following channel + +| module | object | version | +|--------|--------|---------| +| container|[container](#interface)| 0.0.1| + +## Home Directory + +contd keeps some data in the following locations +| directory | path| +|----|---| +| root| `/var/cache/modules/containerd`| + +## Introduction + +The container module, is a proxy to [containerd](https://github.com/containerd/containerd). The proxy provides integration with zbus. + +The implementation is the moment is straight forward, which includes preparing the OCI spec for the container, the tenant containerd namespace, +setting up proper capabilities, and finally creating the container instance on `containerd`. + +The module is fully stateless, all container information is queried during runtime from `containerd`. + +### zinit unit + +`contd` must run after containerd is running, and the node boot process is complete. Since it doesn't keep state, no dependency on `stroaged` is needed + +```yaml +exec: contd -broker unix:///var/run/redis.sock -root /var/cache/modules/containerd +after: + - containerd + - boot +``` + +## Interface + +```go +package pkg + +// ContainerID type +type ContainerID string + +// NetworkInfo defines a network configuration for a container +type NetworkInfo struct { + // Currently a container can only join one (and only one) + // network namespace that has to be pre defined on the node + // for the container tenant + + // Containers don't need to know about anything about bridges, + // IPs, wireguards since this is all is only known by the network + // resource which is out of the scope of this module + Namespace string +} + +// MountInfo defines a mount point +type MountInfo struct { + Source string // source of the mount point on the host + Target string // target of mount inside the container + Type string // mount type + Options []string // mount options +} + +//Container creation info +type Container struct { + // Name of container + Name string + // path to the rootfs of the container + RootFS string + // Env env variables to container in format {'KEY=VALUE', 'KEY2=VALUE2'} + Env []string + // Network network info for container + Network NetworkInfo + // Mounts extra mounts for container + Mounts []MountInfo + // Entrypoint the process to start inside the container + Entrypoint string + // Interactivity enable Core X as PID 1 on the container + Interactive bool +} + +// ContainerModule defines rpc interface to containerd +type ContainerModule interface { + // Run creates and starts a container on the node. It also auto + // starts command defined by `entrypoint` inside the container + // ns: tenant namespace + // data: Container info + Run(ns string, data Container) (ContainerID, error) + + // Inspect, return information about the container, given its container id + Inspect(ns string, id ContainerID) (Container, error) + Delete(ns string, id ContainerID) error +} +``` diff --git a/collections/documentation/developers/internals/zos/internals/flist/readme.md b/collections/documentation/developers/internals/zos/internals/flist/readme.md new file mode 100644 index 0000000..46f1076 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/flist/readme.md @@ -0,0 +1,74 @@ +

Flist Module

+ +

Table of Contents

+ +- [Zbus](#zbus) +- [Home Directory](#home-directory) +- [Introduction](#introduction) +- [Public interface ](#public-interface-) +- [zinit unit](#zinit-unit) + +*** + +## Zbus + +Flist module is available on zbus over the following channel: + +| module | object | version | +|--------|--------|---------| +|flist |[flist](#public-interface)| 0.0.1 + +## Home Directory +flist keeps some data in the following locations: +| directory | path| +|----|---| +| root| `/var/cache/modules/containerd`| + +## Introduction + +This module is responsible to "mount an flist" in the filesystem of the node. The mounted directory contains all the files required by containers or (in the future) VMs. + +The flist module interface is very simple. It does not expose any way to choose where to mount the flist or have any reference to containers or VM. The only functionality is to mount a given flist and receive the location where it is mounted. It is up to the above layer to do something useful with this information. + +The flist module itself doesn't contain the logic to understand the flist format or to run the fuse filesystem. It is just a wrapper that manages [0-fs](https://github.com/threefoldtech/0-fs) processes. + +Its only job is to download the flist, prepare the isolation of all the data and then start 0-fs with the proper arguments. + +## Public interface [![GoDoc](https://godoc.org/github.com/threefoldtech/zos/pkg/flist?status.svg)](https://godoc.org/github.com/threefoldtech/zos/pkg/flist) + +```go + +//Flister is the interface for the flist module +type Flister interface { + // Mount mounts an flist located at url using the 0-db located at storage + // in a RO mode. note that there is no way u can unmount a ro flist because + // it can be shared by many users, it's then up to system to decide if the + // mount is not needed anymore and clean it up + Mount(name, url string, opt MountOptions) (path string, err error) + + // UpdateMountSize change the mount size + UpdateMountSize(name string, limit gridtypes.Unit) (path string, err error) + + // Umount a RW mount. this only unmounts the RW layer and remove the assigned + // volume. + Unmount(name string) error + + // HashFromRootPath returns flist hash from a running g8ufs mounted with NamedMount + HashFromRootPath(name string) (string, error) + + // FlistHash returns md5 of flist if available (requesting the hub) + FlistHash(url string) (string, error) + + Exists(name string) (bool, error) +} + +``` + +## zinit unit + +The zinit unit file of the module specifies the command line, test command, and the order in which the services need to be booted. + +Flist module depends on the storage and network pkg. +This is because it needs connectivity to download flist and data and it needs storage to be able to cache the data once downloaded. + +Flist doesn't do anything special on the system except creating a bunch of directories it will use during its lifetime. diff --git a/collections/documentation/developers/internals/zos/internals/gateway/readme.md b/collections/documentation/developers/internals/zos/internals/gateway/readme.md new file mode 100644 index 0000000..45f5035 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/gateway/readme.md @@ -0,0 +1,121 @@ +# Gateway Module + +## ZBus + +Gateway module is available on zbus over the following channel + +| module | object | version | +| ------- | --------------------- | ------- | +| gateway | [gateway](#interface) | 0.0.1 | + +## Home Directory + +gateway keeps some data in the following locations +| directory | path | +| --------- | ---------------------------- | +| root | `/var/cache/modules/gateway` | + +The directory `/var/cache/modules/gateway/proxy` contains the route information used by traefik to forward traffic. +## Introduction + +The gateway modules is used to register traefik routes and services to act as a reverse proxy. It's the backend supporting two kinds of workloads: `gateway-fqdn-proxy` and `gateway-name-proxy`. + +For the FQDN type, it receives the domain and a list of backends in the form `http://ip:port` or `https://ip:port` and registers a route for this domain forwarding traffic to these backends. It's a requirement that the domain resolves to the gateway public ip. The `tls_passthrough` parameter determines whether the tls termination happens on the gateway or in the backends. When it's true, the backends must be in the form `https://ip:port`, and the backends must be https-enabled servers. + +The name type is the same as the FQDN type except that the `name` parameter is added as a prefix to the gatweay domain to determine the fqdn. It's forbidden to use a FQDN type workload to reserve a domain managed by the gateway. + +The fqdn type is enabled only if there's a public config on the node. The name type works only if a domain exists in the public config. To make a full-fledged gateway node, these DNS records are required: +``` +gatwaydomain.com A ip.of.the.gateway +*.gatewaydomain.com CNAME gatewaydomain.com +__acme-challenge.gatewaydomain.com NS gatdwaydomain.com +``` + +### zinit unit + +```yaml +exec: gateway --broker unix:///var/run/redis.sock --root /var/cache/modules/gateway +after: + - boot +``` +## Implementation details + +Traefik is used as the reverse proxy forwarding traffic to upstream servers. All worklaods deployed on the node is associated with a domain that resolves to the node IP. In the name workload case, it's a subdomain of the gateway main domain. In the FQDN case, the user must create a DNS A record pointing it to the node IP. The node by default redirects all http traffic to https. + +When an https request reaches the node, it looks at the domain and determines the correct service that should handle the request. The services defintions are in `/var/cache/modules/gateway/proxy/` and is hot-reloaded by traefik every time a service is added/removed to/from it. Zos currently supports enabling `tls_passthrough` in which case the https request is passed as is to the backend (at the TCP level). The default is `tls_passthrough` is false which means the node terminates the TLS traffic and then forwards the request as http to the backend. +Example of a FQDN service definition with tls_passthrough enabled: +```yaml +tcp: + routers: + 37-2039-testname-route: + rule: HostSNI(`remote.omar.grid.tf`) + service: 37-2039-testname + tls: + passthrough: "true" + services: + 37-2039-testname: + loadbalancer: + servers: + - address: 137.184.106.152:443 +``` +Example of a "name" service definition with tls_passthrough disabled: +```yaml +http: + routers: + 37-1976-workloadname-route: + rule: Host(`workloadname.gent01.dev.grid.tf`) + service: 40-1976-workloadname + tls: + certResolver: dnsresolver + domains: + - sans: + - '*.gent01.dev.grid.tf' + services: + 40-1976-workloadname: + loadbalancer: + servers: + - url: http://[backendip]:9000 +``` + +The `certResolver` option has two valid values, `resolver` and `dnsresolver`. The `resolver` is an http resolver and is used in FQDN services with `tls_passthrough` disabled. It uses the http challenge to generate a single-domain certificate. The `dnsresolver` is used for name services with `tls_passthrough` disabled. The `dnsresolver` is responsible for generating a wildcard certificate to be used for all subdomains of the gateway domain. Its flow is described below. + +The CNAME record is used to make all subdomains (reserved or not) resolve to the ip of the gateway. Generating a wildcard certificate requires adding a TXT record at `__acme-challenge.gatewaydomain.com`. The NS record is used to delegate this specific subdomain to the node. So if someone did `dig TXT __acme-challenge.gatewaydomain.com`, the query is served by the node, not the DNS provider used for the gateway domain. + +Traefik has, as a config parameter, multiple dns [providers](https://doc.traefik.io/traefik/https/acme/#providers) to communicate with when it wants to add the required TXT record. For non-supported providers, a bash script can be provided to do the record generation and clean up (i.e. External program). The bash [script](https://github.com/threefoldtech/zos/blob/main/pkg/gateway/static/cert.sh) starts dnsmasq managing a dns zone for the `__acme-challenge` subdomain with the given TXT record. It then kills the dnsmasq process and removes the config file during cleanup. +## Interface + +```go +type Backend string + +// GatewayFQDNProxy definition. this will proxy name. to backends +type GatewayFQDNProxy struct { + // FQDN the fully qualified domain name to use (cannot be present with Name) + FQDN string `json:"fqdn"` + + // Passthroug whether to pass tls traffic or not + TLSPassthrough bool `json:"tls_passthrough"` + + // Backends are list of backend ips + Backends []Backend `json:"backends"` +} + + +// GatewayNameProxy definition. this will proxy name. to backends +type GatewayNameProxy struct { + // Name the fully qualified domain name to use (cannot be present with Name) + Name string `json:"name"` + + // Passthroug whether to pass tls traffic or not + TLSPassthrough bool `json:"tls_passthrough"` + + // Backends are list of backend ips + Backends []Backend `json:"backends"` +} + +type Gateway interface { + SetNamedProxy(wlID string, prefix string, backends []string, TLSPassthrough bool) (string, error) + SetFQDNProxy(wlID string, fqdn string, backends []string, TLSPassthrough bool) error + DeleteNamedProxy(wlID string) error + Metrics() (GatewayMetrics, error) +} +``` diff --git a/collections/documentation/developers/internals/zos/internals/history/readme.md b/collections/documentation/developers/internals/zos/internals/history/readme.md new file mode 100644 index 0000000..e4c023a --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/history/readme.md @@ -0,0 +1,99 @@ +# 0-OS, a bit of history and introduction to Version 2 + +## Once upon a time +---- +A few years ago, we were trying to come up with some solutions to the problem of self-healing IT. +We boldly started that : the current model of cloud computing in huge data-centers is not going to be able to scale to fit the demand in IT capacity. + +The approach we took to solve this problem was to enable localized compute and storage units at the edge of the network, close to where it is needed. +That basically meant that if we were to deploy physical hardware to the edges, nearby the users, we would have to allow information providers to deploy their solutions on that edge network and hardware. That means also sharing hardware resources between users, where we would have to make damn sure noone can peek around in things that are not his. + +When we talk about sharing capacity in a secure environment, virtualization comes to mind. It's not a new technology and it has been around for quite some time. This solution comes with a cost though. Virtual machines, emulating a full hardware platform on real hardware is costly in terms of used resources, and eat away at the already scarce resources we want to provide for our users. + +Containerizing technologies were starting to get some hype at the time. Containers provide for basically the same level of isolation as Full Virtualisation, but are a lot less expensive in terms of resource utilization. + +With that in mind, we started designing the first version of 0-OS. The required features were: + +- be able to be fully in control of the hardware +- give the possibility to different users to share the same hardware +- deploy this capacity at the edge, close to where it is needed +- the System needs to self-heal. Because of their location and sheer scale, manual maintenance was not an option. Self-healing is a broad topic, and will require a lot of experience and fine-tuning, but it was meant to culminate at some point so that most of the actions that sysadmins execute, would be automated. +- Have an a small as possible attack surface, as well for remote types of attack, as well as protecting users from each-other + +The result of that thought process resulted in 0-OS v1. A linux kernel with the minimal components on top that allows to provide for these features. + +In the first incantation of 0-OS, the core framework was a single big binary that got started as the first process of the system (PID 1). All the managment features were exposed through an API that was only accessible locally. + +The idea was to have an orchestration system running on top that was going to be responsible to deploy Virtual Machines and Containers on the system using that API. + +This API exposes 3 main primitives: + +- networking: zerotier, vlan, macvlan, bridge, openvswitch... +- storage: plain disk, 0-db, ... +- compute: VM, containers + +That was all great and it allowed us to learn a lot. But some limitations started to appear. Here is a non exhaustive list of the limitations we had to face after a couple of years of utilization: + +- Difficulty to push new versions and fixes on the nodes. The fact that 0-OS was a single process running as PID 1, forced us to completely reboot the node every time we wanted to push an update. +- The API, while powerful, still required to have some logic on top to actually deploy usable solutions. +- We noticed that some features we implemented were never or extremely rarely used. This was just increasing the possible attack surface for no real benefits. +- The main networking solution we choose at the time, zerotier, was not scaling as well as we hoped for. +- We wrote a lot of code ourselves, instead of relying on already existing open source libraries that would have made that task a lot easier, but also, these libraries were a lot more mature and have had a lot more exposure for ironing out possible bugs and vulnerabilities than we could have created and tested ourselves with the little resources we have at hand. + +## Now what ? +With the knowledge and lessons gathered during these first years of usage, we +concluded that trying to fix the already existing codebase would be cumbersome +and we also wanted to avoid any technical debt that could haunt us for years +after. So we decided for a complete rewrite of that stack, taking a new and +fully modular approach, where every component could be easily replaced and +upgraded without the need for a reboot. + +Hence Version 2 saw the light of day. + +Instead of trial and error, and muddling along trying to fit new features in +that big monolithic codebase, we wanted to be sure that the components were +reduced to a more manageable size, having a clearly cut Domain Separation. + +Instead of creating solutions waiting for a problem, we started looking at things the other way around. Which is logical, as by now, we learned what the real puzzles to solve were, albeit sometimes by painful experience. + +## Tadaa! +---- +The [first commit](https://github.com/threefoldtech/zosv2/commit/7b783c888673d1e9bc400e4abbb17272e995f5a4) of the v2 repository took place the 11 of February 2019. +We are now 6 months in, and about to bake the first release of 0-OS v2. +Clocking in at almost 27KLoc, it was a very busy half-year. (admitted, there are the spec and docs too in that count ;-) ) + +Let's go over the main design decisions that were made and explain briefly each component. + +While this is just an introduction, we'll add more articles digging deeper in the technicalities and approaches of each component. + +## Solutions to puzzles (there are no problems) +---- +**UPDATES** + +One of the first puzzles we wanted to solve was the difficulty to push upgrades. +In order to solve that, we designed 0-OS components as completely stand-alone modules. Each subsystem, be it storage, networking, containers/VMs, is managed by it's own component (mostly a daemon), and communicate with each-other through a local bus. And as we said, each component can then be upgraded separately, together with the necessary data migrations that could be required. + +**WHAT API?** + +The second big change is our approach to the API, or better, lack thereof. +In V2 we dropped the idea to expose the primitives of the Node over an API. +Instead, all the required knowledge to deploy workloads is directly embedded in 0-OS. +So in order to have the node deploy a workload, we have created a blueprint like system where the user describes what his requirements in terms of compute power, storage and networking are, and the node applies that blueprint to make it reality. +That approach has a few advantages: + - It greatly reduces the attack surface of the node because there is no more direct interaction between a user and a node. + - And it also allows us to have a greater control over how things are organized in the node itself. The node being its own boss, can decide to re-organize itself whenever needed to optimize the capacity it can provide. + - Having a blueprint with requirements, gives the grid the possibility to verify that blueprint on multiple levels before applying it. That is: as well on top level as on node level a blueprint can be verified for validity and signatures before any other action will be executed. + +**PING** + +The last major change is how we want to handle networking. +The solution used during the lifetime of V1 exposed its limitations when we started scaling our networks to hundreds of nodes. +So here again we started from scratch and created our own overlay network solution. +That solution is based on the 'new kid on the block' in terms of VPN: [Wireguard](https://wireguard.io) and it's approach and usage will be fully explained in the next 0-OS article. +For the eager ones of you, there are some specifications and also some documentation [here](https://github.com/threefoldtech/zosv2/tree/master/docs/network) and [there](https://github.com/threefoldtech/zosv2/tree/master/specs/network). + +## That's All, Folks (for now) +So this little article as an intro to the brave new world of 0-OS. +The Zero-OS team engages itself to regularly keep you updated on it's progress, the new features that will surely be added, and for the so inclined, add a lot more content for techies on how to actually use that novel beast. + +[Till next time](https://youtu.be/b9434BoGkNQ) diff --git a/collections/documentation/developers/internals/zos/internals/identity/identity.md b/collections/documentation/developers/internals/zos/internals/identity/identity.md new file mode 100644 index 0000000..9ed7400 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/identity/identity.md @@ -0,0 +1,143 @@ +

Node ID Generation

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [ZBus](#zbus) +- [Home Directory](#home-directory) +- [Introduction](#introduction-1) +- [On Node Booting](#on-node-booting) +- [ID generation](#id-generation) +- [Cryptography](#cryptography) + - [zinit unit](#zinit-unit) +- [Interface](#interface) + +*** + +## Introduction + +We explain the node ID generation process. + +## ZBus + +Identity module is available on zbus over the following channel + +| module | object | version | +|--------|--------|---------| +| identity|[manager](#interface)| 0.0.1| + +## Home Directory + +identity keeps some data in the following locations + +| directory | path| +|----|---| +| root| `/var/cache/modules/identity`| + +## Introduction + +Identity manager is responsible for maintaining the node identity (public key). The manager make sure the node has one valid ID during the entire lifetime of the node. It also provide service to sign, encrypt and decrypt data using the node identity. + +On first boot, the identity manager will generate an ID and then persist this ID for life. + +Since the identity daemon is the only one that can access the node private key, it provides an interface to sign, verify and encrypt data. This methods are available for other modules on the local node to use. + +## On Node Booting + +- Check if node already has a seed generated +- If yes, load the node identity +- If not, generate a new ID +- Start the zbus daemon. + +## ID generation + +At this time of development the ID generated by identityd is the base58 encoded public key of a ed25519 key pair. + +The key pair itself is generated from a random seed of 32 bytes. It is this seed that is actually saved on the node. And during boot the key pair is re-generated from this seed if it exists. + +## Cryptography + +The signing and encryption capabilities of the identity module rely on this ed25519 key pair. + +For signing, it directly used the key pair. +For public key encryption, the ed25519 key pair is converted to its cure25519 equivalent and then use use to encrypt the data. + +### zinit unit + +The zinit unit file of the module specify the command line, test command, and the order where the services need to be booted. + +`identityd` require `storaged` to make sure the seed is persisted over reboots, to make sure node has the same ID during the full life time of the node. +The identityd daemon is only considered running if the seed file exists. + +```yaml +exec: /bin/identityd +test: test -e /var/cache/modules/identity/seed.txt +after: + - storaged +``` + +## Interface + +For an up to date interface please check code [here](https://github.com/threefoldtech/zos/blob/main/pkg/identity.go) +```go +package pkg + +// Identifier is the interface that defines +// how an object can be used as an identity +type Identifier interface { + Identity() string +} + +// StrIdentifier is a helper type that implement the Identifier interface +// on top of simple string +type StrIdentifier string + +// Identity implements the Identifier interface +func (s StrIdentifier) Identity() string { + return string(s) +} + +// IdentityManager interface. +type IdentityManager interface { + // NodeID returns the node id (public key) + NodeID() StrIdentifier + + // NodeIDNumeric returns the node registered ID. + NodeIDNumeric() (uint32, error) + + // FarmID return the farm id this node is part of. this is usually a configuration + // that the node is booted with. An error is returned if the farmer id is not configured + FarmID() (FarmID, error) + + // Farm returns name of the farm. Or error + Farm() (string, error) + + //FarmSecret get the farm secret as defined in the boot params + FarmSecret() (string, error) + + // Sign signs the message with privateKey and returns a signature. + Sign(message []byte) ([]byte, error) + + // Verify reports whether sig is a valid signature of message by publicKey. + Verify(message, sig []byte) error + + // Encrypt encrypts message with the public key of the node + Encrypt(message []byte) ([]byte, error) + + // Decrypt decrypts message with the private of the node + Decrypt(message []byte) ([]byte, error) + + // EncryptECDH aes encrypt msg using a shared key derived from private key of the node and public key of the other party using Elliptic curve Diffie Helman algorithm + // the nonce if prepended to the encrypted message + EncryptECDH(msg []byte, publicKey []byte) ([]byte, error) + + // DecryptECDH decrypt aes encrypted msg using a shared key derived from private key of the node and public key of the other party using Elliptic curve Diffie Helman algorithm + DecryptECDH(msg []byte, publicKey []byte) ([]byte, error) + + // PrivateKey sends the keypair + PrivateKey() []byte +} + +// FarmID is the identification of a farm +type FarmID uint32 +``` diff --git a/collections/documentation/developers/internals/zos/internals/identity/readme.md b/collections/documentation/developers/internals/zos/internals/identity/readme.md new file mode 100644 index 0000000..1bff097 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/identity/readme.md @@ -0,0 +1,8 @@ +

Identity Module

+ +Identity daemon is responsible for two major operations that are crucial for the node operation. + +

Table of Contents

+ +- [Node ID Generation](identity.md) +- [Node Live Software Update](upgrade.md) diff --git a/collections/documentation/developers/internals/zos/internals/identity/upgrade.md b/collections/documentation/developers/internals/zos/internals/identity/upgrade.md new file mode 100644 index 0000000..e486a53 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/identity/upgrade.md @@ -0,0 +1,98 @@ +

Node Upgrade

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Philosophy](#philosophy) +- [Booting a new node](#booting-a-new-node) +- [Runtime upgrade of a node](#runtime-upgrade-of-a-node) +- [Technical](#technical) + - [Flist layout](#flist-layout) + +*** + +## Introduction + +We provide information concerning node upgrade with ZOS. We also explain the philosophy behind ZOS. + +## Philosophy + +0-OS is meant to be a black box no one can access. While this provide some nice security features it also makes it harder to manage. Specially when it comes to update/upgrade. + +Hence, zos only trust few sources for upgrade packages. When the node boots up it checks the sources for the latest release and make sure all the local binaries are up-to-date before continuing the booting. The flist source must be rock-solid secured, that's another topic for different documentation. + +The run mode defines which flist the node is going to use to boot. Run mode can be specified by passing `runmode=` to the kernel boot params. Currently we have those different run modes. + +- dev: ephemeral network only setup to develop and test new features. Can be created and reset at anytime +- test: Mostly stable features that need to be tested at scale, allow preview and test of new features. Always the latest and greatest. This network can be reset sometimes, but should be relatively stable. +- prod: Released of stable version. Used to run the real grid with real money. Cannot be reset ever. Only stable and battle tested feature reach this level. + +## Booting a new node + +The base image for zos contains a very small subset of tools, plus the boot program. Standing alone, the image is not really useful. On boot and +after initial start of the system, the boot program kicks in and it does the following: + +- Detect the boot flist that the node must use to fully start. The default is hard-coded into zos, but this can be overridden by the `flist=` kernel param. The `flist=` kernel param can get deprecated without a warning, since it's a development flag. +- The bootstrap, will then mount this flist using 0-fs, this of course requires a working connection to the internet. Hence bootstrap is configured to wait for the `internet` service. +- The flist information (name, and version) is saved under `/tmp/flist.name` and `/tmp/flist.info`. +- The bootstrap makes sure to copy all files in the flist to the proper locations under the system rootfs, this include `zinit` config files. +- Then zinit is asked to monitor new installed services, zinit takes care of those services and make sure they are properly working at all times. +- Bootstrap, umounts the flist, cleans up before it exits. +- Boot process continues. + +## Runtime upgrade of a node + +Once the node is up and running, identityd takes over and it does the following: + +- It loads the boot info files `/tmp/flist.name` and `/tmp/flist.info` +- If the `flist.name` file does **not** exist, `identityd` will assume the node is booted with other means than an flist (for example overlay). In that case, identityd will log this, and disable live upgrade of the node. +- If the `flist.name` file exists, the flist will be monitored on the `https://hub.grid.tf` for changes. Any change in the version will initiate a life upgrade routine. +- Once the flist change is detected, identityd will mount the flist, make sure identityd is running the latest version. If not, identityd will update itself first before continuing. +- services that will need update will be gracefully stopped. +- `identityd` will then make sure to update all services from the flist, and config files. and restart the services properly. +- services are started again after all binaries has been copied + +## Technical + +0-OS is designed to provide maximum uptime for its workload, rebooting a node should never be required to upgrade any of its component (except when we push a kernel upgrade). + +![flow](../../assets/0-OS-upgrade.png) + +### Flist layout + +The files in the upgrade flist needs to be located in the filesystem tree at the same destination they would need to be in 0-OS. This allow the upgrade code to stays simple and only does a copy from the flist to the root filesystem of the node. + +Booting a new node, or updating a node uses the same flist. Hence, a boot flist must container all required services for node operation. + +Example: + +0-OS filesystem: + +``` +/etc/zinit/identityd.yaml +/etc/zinit/networkd.yaml +/etc/zinit/contd.yaml +/etc/zinit/init/node-ready.sh +/etc/zinit/init +/etc/zinit/redis.yaml +/etc/zinit/storaged.yaml +/etc/zinit/flistd.yaml +/etc/zinit/readme.md +/etc/zinit/internet.yaml +/etc/zinit/containerd.yaml +/etc/zinit/boot.yaml +/etc/zinit/provisiond.yaml +/etc/zinit/node-ready.yaml +/etc/zinit +/etc +/bin/zlf +/bin/provisiond +/bin/flistd +/bin/identityd +/bin/contd +/bin/capacityd +/bin/storaged +/bin/networkd +/bin/internet +/bin +``` diff --git a/collections/documentation/developers/internals/zos/internals/internals.md b/collections/documentation/developers/internals/zos/internals/internals.md new file mode 100644 index 0000000..ac7dfe8 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/internals.md @@ -0,0 +1,88 @@ +

Internal Modules

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Booting](#booting) +- [Bootstrap](#bootstrap) +- [Zinit](#zinit) +- [Architecture](#architecture) + - [IPC](#ipc) +- [ZOS Processes (modules)](#zos-processes-modules) +- [Capacity](#capacity) + +*** + +## Introduction + +This document explains in a nutshell the internals of ZOS. This includes the boot process, architecture, the internal modules (and their responsibilities), and the inter-process communication. + +## Booting + +ZOS is a linux based operating system in the sense that we use the main-stream linux kernel with no modifications (but heavily customized). The base image of ZOS includes linux, busybox, [zinit](https://github.com/threefoldtech/zinit) and other required tools that are needed during the boot process. The base image is also shipped with a bootstrap utility that is self-updating on boot which kick starts everything. + +For more details about the ZOS base image please check [0-initramfs](https://github.com/threefoldtech/0-initramfs). + +`ZOS` uses zinit as its `init` or `PID 1` process. `zinit` acts as a process manager and it takes care of starting all required services in the right order. Using simple configuration that is available under `/etc/zinit`. + +The base `ZOS` image has a zinit config to start the basic services that are required for booting. These include (mainly) but are not limited to: + +- internet: A very basic service that tries to connect zos to the internet as fast (and as simple) as possible (over ethernet) using dhcp. This is needed so the system can continue the boot process. Once this one succeeds, it exits and leaves node network management to the more sophisticated ZOS module `networkd` which is yet to be downloaded and started by bootstrap. +- redis: This is required by all zos modules for its IPC (inter process communication). +- bootstrap: The bootstrap process which takes care of downloading all required zos binaries and modules. This one requires the `internet` service to actually succeed. + +## Bootstrap + +`bootstrap` is a utility that resides on the base image. It takes care of downloading and configuring all zos main services by doing the following: + +- It checks if there is a more recent version of itself available. If it exists, the process first updates itself before proceeding. +- It checks zos boot parameters (for example, which network you are booting into) as set by . +- Once the network is known, let's call it `${network}`. This can either be `production`, `testing`, or `development`. The proper release is downloaded as follows: + - All flists are downloaded from one of the [hub](https://hub.grid.tf/) `tf-zos-v3-bins.dev`, `tf-zos-v3-bins.test`, or `tf-zos-v3-bins` repos. Based on the network, only one of those repos is used to download all the support tools and binaries. Those are not included in the base image because they can be updated, added, or removed. + - The flist `https://hub.grid.tf/tf-zos/zos:${network}-3:latest.flist.md` is downloaded (note that ${network} is replaced with the actual value). This flist includes all zos services from this repository. More information about the zos modules are explained later. + - Once all binaries are downloaded, `bootstrap` finishes by asking zinit to start monitoring the newly installed services. The bootstrap exits and will never be started again as long as zos is running. + - If zos is restarted the entire bootstrap process happens again including downloading the binaries because ZOS is completely stateless (except for some cached runtime data that is preserved across reboots on a cache disk). + +## Zinit + +As mentioned earlier, `zinit` is the process manager of zos. Bootstrap makes sure it registers all zos services for zinit to monitor. This means that zinit will take care that those services are always running, and restart them if they have crashed for any reason. + +## Architecture + +For `ZOS` to be able to run workloads of different types it has split its functionality into smaller modules. Where each module is responsible for providing a single functionality. For example `storaged` which manages machine storages, hence it can provide low level storage capacity to other services that need it. + +As an example, imagine that you want to start a `virtual machine`. For a `virtual machine` to be able to run it will require a `rootfs` image or the image of the VM itself this is normally provided via an `flist` (managed by `flistd`), then you would need an actual persistent storage (managed by `storaged`), a virtual nic (managed by `networkd`), another service that can put everything together in a form of a VM (`vmd`). Then finally a service that orchestrates all of this and translates the user request to an actual workload `provisiond`, you get the picture. + +### IPC + +All modules running in zos needs to be able to interact with each other. As it shows from the previous example. For example, `provision` daemon need to be able to ask `storage` daemon to prepare a virtual disk. A new `inter-process communication` protocol and library was developed to enable this with those extra features: + +- Modules do not need to know where other modules live, there are no ports, and/or urls that have to be known by all services. +- A single module can run multiple versions of an API. +- Ease of development. +- Auto generated clients. + +For more details about the message bus please check [zbus](https://github.com/threefoldtech/zbus) + +`zbus` uses redis as a message bus, hence redis is started in the early stages of zos booting. + +`zbus` allows auto generation of `stubs` which are generated clients against a certain module interface. Hence a module X can interact with a module Y by importing the generated clients and then start making function calls. + +## ZOS Processes (modules) + +Modules of zos are completely internal. There is no way for an external user to talk to them directly. The idea is the node exposes a public API over rmb, while internally this API can talk to internal modules over `zbus`. + +Here is a list of the major ZOS modules. + +- [Identity](identity/index.md) +- [Node](node/index.md) +- [Storage](storage/index.md) +- [Network](network/index.md) +- [Flist](flist/index.md) +- [Container](container/index.md) +- [VM](vmd/index.md) +- [Provision](provision/index.md) + +## Capacity + +In [this document](./capacity.md), you can find detail description of how ZOS does capacity planning. diff --git a/collections/documentation/developers/internals/zos/internals/macdev/readme.md b/collections/documentation/developers/internals/zos/internals/macdev/readme.md new file mode 100644 index 0000000..1a9d4b6 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/macdev/readme.md @@ -0,0 +1,57 @@ +> Note: This is unmaintained, try on your own responsibility + +# MacOS Developer + +0-OS (v2) uses a Linux kernel and is really build with a linux environment in mind. +As a developer working from a MacOS environment you will have troubles running the 0-OS code. + +Using [Docker][docker] you can work from a Linux development environment, hosted from your MacOS Host machine. +In this README we'll do exactly that using the standard Ubuntu [Docker][docker] container as our base. + +## Setup + +0. Make sure to have Docker installed, and configured (also make sure you have your code folder path shared in your Docker preferences). +1. Start an _Ubuntu_ Docker container with your shared code directory mounted as a volume: +```bash +docker run -ti -v "$HOME/oss":/oss ubuntu /bin/bash +``` +2. Make sure your environment is updated and upgraded using `apt-get`. +3. Install Go (`1.13`) from src using the following link or the one you found on [the downloads page](https://golang.org/dl/): +```bash +wget https://dl.google.com/go/go1.13.3.linux-amd64.tar.gz +sudo tar -xvf go1.13.3.linux-amd64.tar.gz +sudo mv go /usr/local +``` +4. Add the following to your `$HOME/.bashrc` and `source` it: +```vim +export GOROOT=/usr/local/go +export GOPATH=$HOME/go +export PATH=$GOPATH/bin:$GOROOT/bin:$PATH +``` +5. Confirm you have Go installed correctly: +``` +go version && go env +``` +6. Go to your `zos` code `pkg` directory hosted from your MacOS development machine within your docker `/bin/bash`: +```bash +cd /oss/github.com/threefoldtech/zos/pkg +``` +7. Install the dependencies for testing: +```bash +make getdeps +``` +8. Run tests and verify all works as expected: +```bash +make test +``` +9. Build `zos`: +```bash +make build +``` + +If you can successfully do step (8) and step (9) you +can now contribute to `zos` as a MacOS developer. +Testing and compiling you'll do from within your container's shell, +coding you can do from your beloved IDE on your MacOS development environment. + +[docker]: https://www.docker.com diff --git a/collections/documentation/developers/internals/zos/internals/network/Deploy_Network-V2.md b/collections/documentation/developers/internals/zos/internals/network/Deploy_Network-V2.md new file mode 100644 index 0000000..59a04d9 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/Deploy_Network-V2.md @@ -0,0 +1,74 @@ +# 0-OS v2 and it's network setup + +## Introduction + +0-OS nodes participating in the Threefold grid, need connectivity of course. They need to be able to communicate over +the Internet with each-other in order to do various things: + +- download it's OS modules +- perform OS module upgrades +- register itself to the grid, and send regular updates about it's status +- query the grid for tasks to execute +- build and run the Overlay Network +- download flists and the effective files to cache + +The nodes themselves can have connectivity in a few different ways: + +- Only have RFC1918 private addresses, connected to the Internet through NAT, NO IPv6 + Mostly, these are single-NIC (Network card) machines that can host some workloads through the Overlay Network, but + cant't expose services directly. These are HIDDEN nodes, and are mostly booted with an USB stick from + bootstrap.grid.tf . +- Dual-stacked: having RFC1918 private IPv4 and public IPv6 , where the IPv6 addresses are received from a home router, +but firewalled for outgoing traffic only. These nodes are effectively also HIDDEN +- Nodes with 2 NICs, one that has effectively a NIC connected to a segment that has real public +addresses (IPv4 and/or IPv6) and one NIC that is used for booting and local +management. (OOB) (like in the drawing for farmer setup) + +For Farmers, we need to have Nodes to be reachable over IPv6, so that the nodes can: + +- expose services to be proxied into containers/vms +- act as aggregating nodes for Overlay Networks for HIDDEN Nodes + +Some Nodes in Farms should also have a publicly reachable IPv4, to make sure that clients that only have IPv4 can +effectively reach exposed services. + +But we need to stress the importance of IPv6 availability when you're running a multi-node farm in a datacentre: as the +grid is boldly claiming to be a new Internet, we should make sure we adhere to the new protocols that are future-proof. +Hence: IPv6 is the base, and IPv4 is just there to accomodate the transition. + +Nowadays, RIPE can't even hand out consecutive /22 IPv4 blocks any more for new LIRs, so you'll be bound to market to +get IPv4, mostly at rates of 10-15 Euro per IP. Things tend to get costly that way. + +So anyway, IPv6 is not an afterthought in 0-OS, we're starting with it. + +## Network setup for farmers + +This is a quick manual to what is needed for connecting a node with zero-OS V2.0 + +### Step 1. Testing for IPv6 availability in your location +As descibed above the network in which the node is instaleld has to be IPv6 enabled. This is not an afterthought as we are building a new internet it has to ba based on the new and forward looking IP addressing scheme. This is something you have to investigate, negotiate with you connectivity provider. Many (but not all home connectivity products and certainly most datacenters can provide you with IPv6. There are many sources of infromation on how to test and check whether your connection is IPv6 enabled, [here is a starting point](http://www.ipv6enabled.org/ipv6_enabled/ipv6_enable.php) + +### Step 2. Choosing you setup for connecitng you nodes. + +Once you have established that you have IPv6 enabled on the network you are about to deploy, you have to make sure that there is an IPv6 DHCP facility available. Zero-OS does not work with static IPv6 addresses (at this point in time). So you have choose and create one of the following setups: + +#### 2.1 Home setup + +Use your (home) ISP router Ipv6 DHCP capabilities to provide (private) IPv6 addresses. The principle will work the same as for IPv4 home connections, everything happens enabled by Network Adress Translation (just like anything else that uses internet connectivity). This should be relatively straightforward if you have established that your conenction has IPv6 enabled. + +#### 2.2 Datacenter / Expert setup + +In this situation there are many options on how to setup you node. This requires you as the expert to make a few decisions on how to connect what what the best setup is that you can support for the operaitonal time of your farm. The same basics principles apply: + - You have to have a block of (public) IPv6 routed to you router, or you have to have your router setup to provide Network Address Translation (NAT) + - You have to have a DHCP server in your network that manages and controls IPV6 ip adress leases. Depending on your specific setup you have this DHCP server manage a public IPv6y range which makes all nodes directly connected to the public internet or you have this DHCP server manage a private block og IPv6 addresses which makes all you nodes connect to the internet through NAT. + +As a farmer you are in charge of selecting and creating the appropriate network setup for your farm. + +## General notes + +The above setup will allows your node(s) to appear in explorer on the TF Grid and will allowd you to earn farming tokens. At stated in the introduction ThreeFold is creating next generation internet capacity and therefore has IPv6 as it's base building block. Connecting to the current (dominant) IPv4 network happens for IT workloads through so called webgateways. As the word sais these are gateways that provide connectivity between the currenct leading IPv4 adressing scheme and IPv6. + +We have started a forum where people share their experiences and configurations. This will be work in progress and forever growing. + +**IMPORTANT**: You as a farmer do not need access to IPV4 to be able to rent capacity for IT workloads that need to be visible on IPV4, this is something that can happen elswhere on the TF Grid. + diff --git a/collections/documentation/developers/internals/zos/internals/network/HIDDEN-PUBLIC.dia b/collections/documentation/developers/internals/zos/internals/network/HIDDEN-PUBLIC.dia new file mode 100644 index 0000000..139cffa Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/HIDDEN-PUBLIC.dia differ diff --git a/collections/documentation/developers/internals/zos/internals/network/HIDDEN-PUBLIC.png b/collections/documentation/developers/internals/zos/internals/network/HIDDEN-PUBLIC.png new file mode 100644 index 0000000..72fbe35 Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/HIDDEN-PUBLIC.png differ diff --git a/collections/documentation/developers/internals/zos/internals/network/NR_layout.dia b/collections/documentation/developers/internals/zos/internals/network/NR_layout.dia new file mode 100644 index 0000000..a9f59e2 Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/NR_layout.dia differ diff --git a/collections/documentation/developers/internals/zos/internals/network/NR_layout.png b/collections/documentation/developers/internals/zos/internals/network/NR_layout.png new file mode 100644 index 0000000..2336642 Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/NR_layout.png differ diff --git a/collections/documentation/developers/internals/zos/internals/network/Network-V2.md b/collections/documentation/developers/internals/zos/internals/network/Network-V2.md new file mode 100644 index 0000000..59e16ae --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/Network-V2.md @@ -0,0 +1,315 @@ +# 0-OS v2 and it's network + +## Introduction + +0-OS nodes participating in the Threefold grid, need connectivity of course. They need to be able to communicate over +the Internet with each-other in order to do various things: + +- download it's OS modules +- perform OS module upgrades +- register itself to the grid, and send regular updates about it's status +- query the grid for tasks to execute +- build and run the Overlay Network +- download flists and the effective files to cache + +The nodes themselves can have connectivity in a few different ways: + +- Only have RFC1918 private addresses, connected to the Internet through NAT, NO IPv6 + Mostly, these are single-NIC (Network card) machines that can host some workloads through the Overlay Network, but + cant't expose services directly. These are HIDDEN nodes, and are mostly booted with an USB stick from + bootstrap.grid.tf . +- Dual-stacked: having RFC1918 private IPv4 and public IPv6 , where the IPv6 addresses are received from a home router, +but firewalled for outgoing traffic only. These nodes are effectively also HIDDEN +- Nodes with 2 NICs, one that has effectively a NIC connected to a segment that has real public +addresses (IPv4 and/or IPv6) and one NIC that is used for booting and local +management. (OOB) (like in the drawing for farmer setup) + +For Farmers, we need to have Nodes to be reachable over IPv6, so that the nodes can: + +- expose services to be proxied into containers/vms +- act as aggregating nodes for Overlay Networks for HIDDEN Nodes + +Some Nodes in Farms should also have a publicly reachable IPv4, to make sure that clients that only have IPv4 can +effectively reach exposed services. + +But we need to stress the importance of IPv6 availability when you're running a multi-node farm in a datacentre: as the +grid is boldly claiming to be a new Internet, we should make sure we adhere to the new protocols that are future-proof. +Hence: IPv6 is the base, and IPv4 is just there to accomodate the transition. + +Nowadays, RIPE can't even hand out consecutive /22 IPv4 blocks any more for new LIRs, so you'll be bound to market to +get IPv4, mostly at rates of 10-15 Euro per IP. Things tend to get costly that way. + +So anyway, IPv6 is not an afterthought in 0-OS, we're starting with it. + +## Physical setup for farmers + +```text + XXXXX XXX + XX XXX XXXXX XXX + X X XXX + X X + X INTERNET X + XXX X X + XXXXX XX XX XXXX + +X XXXX XX XXXXX + | + | + | + | + | + +------+--------+ + | FIREWALL/ | + | ROUTER | + +--+----------+-+ + | | + +-----------+----+ +-+--------------+ + | switch/ | | switch/ | + | vlan segment | | vlan segment | + +-+---------+----+ +---+------------+ + | | | ++-------+-------+ |OOB | PUBLIC +| PXE / dhcp | | | +| Ser^er | | | ++---------------+ | | + | | + +-----+------------+----------+ + | | + | +--+ + | | | + | NODES | +--+ + +--+--------------------------+ | | + | | | + +--+--------------------------+ | + | | + +-----------------------------+ +``` + +The PXE/dhcp can also be done by the firewall, your mileage may vary. + +## Switch and firewall configs + +Single switch, multiple switch, it all boils down to the same: + +- one port is an access port on an OOB vlan/segment +- one port is connected to a public vlan/segment + +The farmer makes sure that every node receives properly an IPv4 address in the OOB segment through means of dhcp, so +that with a PXE config or USB, a node can effectively start it's boot process: + +- Download kernel and initrd +- Download and mount the system flists so that the 0-OS daemons can start +- Register itself on the grid +- Query the grid for tasks to execute + +For the PUBLIC side of the Nodes, there are a few things to consider: + +- It's the farmer's job to inform the grid what node gets an IP address, be it IPv4 or IPv4. +- Nodes that don't receive and IPv4 address will connect to the IPv4 net through the NATed OOB network +- A farmer is responsible to provide and IPv6 prefix on at least one segment, and have a Router Advertisement daemon +runnig to provide for SLAAC addressin on that segment. +- That IPv6 Prefix on the public segment should not be firewalled, as it's impossible to know in your firewall what +ports will get exposed for the proxies. + +The Nodes themselves have nothing listening that points into the host OS itself, and are by themselves also firewalled. +In dev mode, there is an ssh server with a key-only login, accessible by a select few ;-) + +## DHCP/Radvd/RA/DHCP6 + +For home networks, there is not much to do, a Node will get an IPv4 Private(rfc1918) address , and most probaly and +ipv6 address in a /64 prefix, but is not reachable over ipv6, unless the firewall is disabled for IPv6. As we can't +rely on the fact that that is possible, we assume these nodes to be HIDDEN. + +A normal self-respecting Firewall or IP-capable switch can hand out IP[46] addresses, some can +even bootp/tftp to get nodes booted over the network. +We are (full of hope) assuming that you would have such a beast to configure and splice your network +in multiple segments. +A segment is a physical network separation. That can be port-based vlans, or even separate switches, whatver rocks your +boat, the keyword is here **separate**. + +On both segments you will need a way to hand out IPv4 addresses based on MAC addresses of the nodes. Yes, there is some +administration to do, but it's a one-off, and really necessary, because you really need to know whic physical machine +has which IP. For lights-out management and location of machines that is a must. + +So you'll need a list of mac addresses to add to your dhcp server for IPv4, to make sure you know which machine has +received what IPv4 Address. +That is necessary for 2 things: + +- locate the node if something is amiss, like be able to pinpoint a node's disk in case it broke (which it will) +- have the node be reachable all the time, without the need to update the grid and network configs every time the node +boots. + +## What happens under the hood (farmer) + +While we did our uttermost best to keep IPv4 address needs to a strict minimum, at least one Node will need an IPv4 address for handling everything that is Overlay Networks. +For Containers to reach the Internet, any type of connectivity will do, be it NAT or though an Internal DMZ that has a +routable IPv4 address. + +Internally, a lot of things are being set-up to have a node properly participate in the grid, as well to be prepared to partake in the User's Overlay Networks. + +A node connects itself to 'the Internet' depending on a few states. + +1. It lives in a fully private network (like it would be connected directly to a port on a home router) + +``` + XX XXX + XXX XXXXXX + X Internet X + XXXXXXX XXXXX + XX XXX + XX X + X+X + | + | + +--------+-----------+ + | HOME / | + | SOHO router | + | | + +--------+-----------+ + | + | Private space IPv4 + | (192.168.1.0/24) + | ++---------+------------+ +| | +| NODE | +| | +| | +| | +| | +| | ++----------------------+ +``` + +1. It lives in a fully public network (like it is connected directly to an uplink and has a public ipv4 address) + +``` + XX XXX + XXX XXXXXX + X Internet X + XXXXXXX XXXXX + XX XXX + XX X + X+X + | + | fully public space ipv4/6 + | 185.69.166.0/24 + | 2a02:1802:5e:0:1000::abcd/64 + | ++---------+------------+ +| | +| NODE | +| | ++----------------------+ + +``` +The node is fully reachable + +1. It lives in a datacentre, where a farmer manages the Network. + +A little Drawing : + +```text ++----------------------------------------------------+ +| switch | +| | +| | ++----------+-------------------------------------+---+ + | | + access | | + mgmt | +---------------+ + vlan | | access + | | public + | | vlan + | | + +-------+---------------------+------+ + | | + | nic1 nic2 | + | | + | | + | | + | NODE | + | | + | | + | | + +------------------------------------+ + +``` + +Or the more elaborate drawing on top that should be sufficient for a sysadmin to comprehend. + +Although: + +- we don't (yet) support nic bonding (next release) +- we don't (yet) support vlans, so your ports on switch/router need to be access ports to vlans to your router/firewall + + +## yeayea, but really ... what now ? + +Ok, what are the constraints? + +A little foreword: +ZosV2 uses IPv6 as it's base for networking, where the oldie IPv4 is merely an afterthought. So for it to work properly in it's actual incantation (we are working to get it to do IPv4-only too), for now, we need the node to live in a space that provides IPv6 __too__ . +IPV4 and IPv6 are very different beasts, so any machine connected to the Internet wil do both on the same network. So basically your computer talks 2 different languages, when it comes to communicating. That is the same for ZOS, where right now, it's mother tongue is IPv6. + +So your zos for V2 can start in different settings +1) you are a farmer, your ISP can provide you with IPv6 +Ok, you're all set, aside from a public IPv4 DHCP, you need to run a Stateless-Only SLAAC Router Advertiser (ZOS does NOT do DHCP6). + +1) you are a farmer, your ISP asks you what the hell IPv6 is +That is problematic right now, wait for the next release of ZosV2 + +1) you are a farmer, with only one node , at home, and on your PC https://ipv6.net tells you you have IPv6 on your PC. +That means your home router received an IPV6 allocation from the ISP, +Your'e all set, your node will boot, and register to the grid. If you know what you're doing, you can configure your router to allow all ipv6 traffic in forwarding mode to the specifice mac address of your node. (we'll explain later) +1) you are a farmer, with a few nodes somewhere that are registered on the grid in V1, but you have no clue if IPv6 is supported where these nodes live +1) you have a ThreefoldToken node at home, and still do not have a clue + +Basically it boils down also in a few other cases + +1) the physical network where a node lives has: IPv6 and Private space IPv4 +1) the physical network where a node lives has: IPv6 and Public IPv4 +1) the physical network where a node lives has: only IPv4 + +But it bloils down to : call your ISP, ask for IPv6. It's the future, for yout ISP, it's time. There is no way to circumvent it. No way. + + +OK, then, now what. + +1) you're a farmer with a bunch of nodes somewhere in a DC + + - your nodes are connected once (with one NIC) to a switch/router + Then your router will have : + - a segment that carries IPv4 __and__ IPv6: + + - for IPv4, there are 2 possibilities: + - it's RFC1918 (Private space) -> you NAT that subnet (e.g. 192.168.1.0/24) towards the Public Internet + + - you __will__ have difficulty to designate a IPv4 public entrypoint into your farm + - your workloads will be only reachable through the overlay + - your storage will not be reachable + + - you received a (small, because of the scarceness of IPv4 addresses, your ISP will give you only limited and pricy IPv4 adresses) IPv4 range you can utilise + + - things are better, the nodes can live in public ipv4 space, where they can be used as entrypoint + - standard configuration that works + + - for IPv6, your router is a Routing advertiser that provides SLAAC (Stateless, unmanaged) for that segment, working witha /64 prefix + + - the nodes will reachable over IPv6 + - storage backend will be available for the full grid + - everything will just work + + Best solution for single NIC: + - an ipv6 prefx + - an ipv4 subnet (however small) + + - your nodes have 2 connections, and you wnat to differ management from user traffic + + - same applies as above, where the best outcome will be obtained with a real IPv6 prefix allocation and a small public subnet that is routable. + - the second NIC (typically 10GBit) will then carry everything public, and the first nic will just be there for managent, living in Private space for IPv4, mostly without IPv6 + - your switch needs to be configured to provide port-based vlans, so the segments are properly separated, and your router needs to reflect that vlan config so that separation is handeled by the firewall in the router (iptables, pf, acl, ...) + + + + + diff --git a/collections/documentation/developers/internals/zos/internals/network/attic/exitpoints.md b/collections/documentation/developers/internals/zos/internals/network/attic/exitpoints.md new file mode 100644 index 0000000..efecf3c --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/attic/exitpoints.md @@ -0,0 +1,66 @@ +## Farmers providing transit for Tenant Networks (TN or Network) + +For networks of a user to be reachable, these networks need penultimate Network resources that act as exit nodes for the WireGuard mesh. + +For that Users need to sollicit a routable network with farmers that provide such a service. + +### Global registry for network resources. (`GRNR`?) + +Threefold through BCDB shoud keep a store where Farmers can register also a network service for Tenant Network (TN) reachablility. + +In a network transaction the first thing asked should be where a user wants to purchase it's transit. That can be with a nearby (latency or geolocation) Exit Provider (can e.g. be a Farmer), or with a Exit Provider outside of the geolocation for easier routing towards the primary entrypoint. (VPN-like services coming to mind) + +With this, we could envision in a later stage to have the Network Resources to be IPv6 multihomed with policy-based routing. That adds the possibiltiy to have multiple exit nodes for the same Network, with different IPv6 routes to them. + +### Datastructure + +A registered Farmer can also register his (dc-located?) network to be sold as transit space. For that he registers: + - the IPv4 addresses that can be allocated to exit nodes. + - the IPv6 prefix he obtained to be used in the Grid + - the nodes that will serve as exit nodes. + These nodes need to have IPv[46] access to routable address space through: + - Physical access in an interface of the node + - Access on a public `vlan` or via `vxlan / mpls / gre` + +Together with the registered nodes that will be part of that Public segment, the TNoDB (BCDB) can verify a Network Object containing an ExitPoint for a Network and add it to the queue for ExitNodes to fetch and apply. + +Physcally Nodes can be connected in several ways: + - living directly on the Internet (with a routable IPv4 and/or IPv6 Address) without Provider-enforced firewalling (outgoing traffic only) + - having an IPv4 allocation --and-- and IPv6 allocation + - having a single IPv4 address --and-- a single IPv6 allocation (/64) or even (Oh God Why) a single IPv6 addr. + - living in a Farm that has Nodes only reachable through NAT for IPv4 and no IPv6 + - living in a Farm that has NAT IPv4 and routable IPv6 with an allocation + - living in a single-segment having IPv4 RFC1918 and only one IPv6 /64 prefix (home Nodes mostly) + +#### A Network resource allocation. +We define Network Resource (NR) as a routable IPv6 `/64` Prefix, so for every time a new TNo is generated and validated, containing a new serial number and an added/removed NR, there has been a request to obtain a valid IPv6 Prefix (/64) to be added to the TNo. + +Basically it's just a list of allocations in that prefix, that are in use. Any free Prefix will do, as we do routing in the exit nodes with a `/64` granularity. + +The TNoDB (BCDB) then validates/updates the Tenant Network object with that new Network Resource and places it on a queue to be fetched by the interested Nodes. + +#### The Nodes responsible for ExitPoints + +A Node responsible for ExitPoints as wel as a Public endpoint will know so because of how it's registered in the TNoDB (BCDB). That is : + - it is defined as an exit node + - the TNoDB hands out an Object that describes it's public connectivity. i.e. : + - the public IPv4 address(es) it can use + - the IPv6 Prefix in the network segment that contains the penultimate default route + - an eventual Private BGP AS number for announcing the `/64` Prefixes of a Tenant Network, and the BGP peer(s). + +With that information, a Node can then build the Network Namespace from which it builds the Wireguard Interfaces prior to sending them in the ExitPoint Namespace. + +So the TNoDB (BCDB) hands out + - Tenant Network Objects + - Public Interface Objects + +They are related : + - A Node can have Network Resources + - A Network Resource can have (1) Public Interface + - Both are part of a Tenant Network + +A TNo defines a Network where ONLY the ExitPoint is flagged as being one. No more. +When the Node (networkd) needs to setup a Public node, it will need to act differently. + - Verify if the Node is **really** public, if so use standard WG interface setup + - If not, verify if there is already a Public Exit Namespace defined, create WG interface there. + - If there is Public Exit Namespace, request one, and set it up first. diff --git a/collections/documentation/developers/internals/zos/internals/network/attic/tools.md b/collections/documentation/developers/internals/zos/internals/network/attic/tools.md new file mode 100644 index 0000000..8897bca --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/attic/tools.md @@ -0,0 +1,264 @@ +# Network + +- [How does a farmer configure a node as exit node](#How-does-a-farmer-configure-a-node-as-exit-node) +- [How to create a user private network](#How-to-create-a-user-private-network) + +## How does a farmer configure a node as exit node + +For the network of the grid to work properly, some of the nodes in the grid need to be configured as "exit nodes". An "exit node" is a node that has a publicly accessible IP address and that is responsible routing IPv6 traffic, or proxy IPv4 traffic. + +A farmer that wants to configure one of his nodes as "exit node", needs to register it in the TNODB. The node will then automatically detect it has been configured to be an exit node and do the necessary network configuration to start acting as one. + +At the current state of the development, we have a [TNODB mock](../../tools/tnodb_mock) server and a [tffarmer CLI](../../tools/tffarm) tool that can be used to do these configuration. + +Here is an example of how a farmer could register one of his node as "exit node": + +1. Farmer needs to create its farm identity + +```bash +tffarmer register --seed myfarm.seed "mytestfarm" +Farm registered successfully +Name: mytestfarm +Identity: ZF6jtCblLhTgAqp2jvxKkOxBgSSIlrRh1mRGiZaRr7E= +``` + +2. Boot your nodes with your farm identity specified in the kernel parameters. + +Take that farm identity create at step 1 and boot your node with the kernel parameters `farmer_id=` + +for your test farm that would be `farmer_id=ZF6jtCblLhTgAqp2jvxKkOxBgSSIlrRh1mRGiZaRr7E=` + +Once the node is booted, it will automatically register itself as being part of your farm into the [TNODB](../../tools/tnodb_mock) server. + +You can verify that you node registered itself properly by listing all the node from the TNODB by doing a GET request on the `/nodes` endpoints: + +```bash +curl http://tnodb_addr/nodes +[{"node_id":"kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA=","farm_id":"ZF6jtCblLhTgAqp2jvxKkOxBgSSIlrRh1mRGiZaRr7E=","Ifaces":[]}] +``` + +3. Farmer needs to specify its public allocation range to the TNODB + +```bash +tffarmer give-alloc 2a02:2788:0000::/32 --seed myfarm.seed +prefix registered successfully +``` + +4. Configure the public interface of the exit node if needed + +In this step the farmer will tell his node how it needs to connect to the public internet. This configuration depends on the farm network setup, this is why this is up to the farmer to provide the detail on how the node needs to configure itself. + +In a first phase, we create the internet access in 2 ways: + +- the node is fully public: you don't need to configure a public interface, you can skip this step +- the node has a management interface and a nic for public + then `configure-public` is required, and the farmer has the public interface connected to a specific public segment with a router to the internet in front. + +```bash +tffarmer configure-public --ip 172.20.0.2/24 --gw 172.20.0.1 --iface eth1 kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA= +#public interface configured on node kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA= +``` + +We still need to figure out a way to get the routes properly installed, we'll do static on the toplevel router for now to do a demo. + +The node is now configured to be used as an exit node. + +5. Mark a node as being an exit node + +The farmer then needs to select which node he agrees to use as an exit node for the grid + +```bash +tffarmer select-exit kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA= +#Node kV3u7GJKWA7Js32LmNA5+G3A0WWnUG9h+5gnL6kr6lA= marked as exit node +``` + +## How to create a user private network + +1. Choose an exit node +2. Request an new allocation from the farm of the exit node + - a GET request on the tnodb_mock at `/allocations/{farm_id}` will give you a new allocation +3. Creates the network schema + +Steps 1 and 2 are easy enough to be done even manually but step 3 requires a deep knowledge of how networking works +as well as the specific requirement of 0-OS network system. +This is why we provide a tool that simplify this process for you, [tfuser](../../tools/tfuser). + +Using tfuser creating a network becomes trivial: + +```bash +# creates a new network with node DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk as exit node +# and output the result into network.json +tfuser generate --schema network.json network create --node DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk +``` + +network.json will now contains something like: + +```json +{ + "id": "", + "tenant": "", + "reply-to": "", + "type": "network", + "data": { + "network_id": "J1UHHAizuCU6s9jPax1i1TUhUEQzWkKiPhBA452RagEp", + "resources": [ + { + "node_id": { + "id": "DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk", + "farmer_id": "7koUE4nRbdsqEbtUVBhx3qvRqF58gfeHGMRGJxjqwfZi", + "reachability_v4": "public", + "reachability_v6": "public" + }, + "prefix": "2001:b:a:8ac6::/64", + "link_local": "fe80::8ac6/64", + "peers": [ + { + "type": "wireguard", + "prefix": "2001:b:a:8ac6::/64", + "Connection": { + "ip": "2a02:1802:5e::223", + "port": 1600, + "key": "PK1L7n+5Fo1znwD/Dt9lAupL19i7a6zzDopaEY7uOUE=", + "private_key": "9220e4e29f0acbf3bd7ef500645b78ae64b688399eb0e9e4e7e803afc4dd72418a1c5196208cb147308d7faf1212758042f19f06f64bad6ffe1f5ed707142dc8cc0a67130b9124db521e3a65e4aee18a0abf00b6f57dd59829f59662" + } + } + ], + "exit_point": true + } + ], + "prefix_zero": "2001:b:a::/64", + "exit_point": { + "ipv4_conf": null, + "ipv4_dnat": null, + "ipv6_conf": { + "addr": "fe80::8ac6/64", + "gateway": "fe80::1", + "metric": 0, + "iface": "public" + }, + "ipv6_allow": [] + }, + "allocation_nr": 0, + "version": 0 + } +} +``` + +Which is a valid network schema. This network only contains a single exit node though, so not really useful. +Let's add another node to the network: + +```bash +tfuser generate --schema network.json network add-node --node 4hpUjrbYS4YeFbvLoeSR8LGJKVkB97JyS83UEhFUU3S4 +``` + +result looks like: + +```json +{ + "id": "", + "tenant": "", + "reply-to": "", + "type": "network", + "data": { + "network_id": "J1UHHAizuCU6s9jPax1i1TUhUEQzWkKiPhBA452RagEp", + "resources": [ + { + "node_id": { + "id": "DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk", + "farmer_id": "7koUE4nRbdsqEbtUVBhx3qvRqF58gfeHGMRGJxjqwfZi", + "reachability_v4": "public", + "reachability_v6": "public" + }, + "prefix": "2001:b:a:8ac6::/64", + "link_local": "fe80::8ac6/64", + "peers": [ + { + "type": "wireguard", + "prefix": "2001:b:a:8ac6::/64", + "Connection": { + "ip": "2a02:1802:5e::223", + "port": 1600, + "key": "PK1L7n+5Fo1znwD/Dt9lAupL19i7a6zzDopaEY7uOUE=", + "private_key": "9220e4e29f0acbf3bd7ef500645b78ae64b688399eb0e9e4e7e803afc4dd72418a1c5196208cb147308d7faf1212758042f19f06f64bad6ffe1f5ed707142dc8cc0a67130b9124db521e3a65e4aee18a0abf00b6f57dd59829f59662" + } + }, + { + "type": "wireguard", + "prefix": "2001:b:a:b744::/64", + "Connection": { + "ip": "", + "port": 0, + "key": "3auHJw3XHFBiaI34C9pB/rmbomW3yQlItLD4YSzRvwc=", + "private_key": "96dc64ff11d05e8860272b91bf09d52d306b8ad71e5c010c0ccbcc8d8d8f602c57a30e786d0299731b86908382e4ea5a82f15b41ebe6ce09a61cfb8373d2024c55786be3ecad21fe0ee100339b5fa904961fbbbd25699198c1da86c5" + } + } + ], + "exit_point": true + }, + { + "node_id": { + "id": "4hpUjrbYS4YeFbvLoeSR8LGJKVkB97JyS83UEhFUU3S4", + "farmer_id": "7koUE4nRbdsqEbtUVBhx3qvRqF58gfeHGMRGJxjqwfZi", + "reachability_v4": "hidden", + "reachability_v6": "hidden" + }, + "prefix": "2001:b:a:b744::/64", + "link_local": "fe80::b744/64", + "peers": [ + { + "type": "wireguard", + "prefix": "2001:b:a:8ac6::/64", + "Connection": { + "ip": "2a02:1802:5e::223", + "port": 1600, + "key": "PK1L7n+5Fo1znwD/Dt9lAupL19i7a6zzDopaEY7uOUE=", + "private_key": "9220e4e29f0acbf3bd7ef500645b78ae64b688399eb0e9e4e7e803afc4dd72418a1c5196208cb147308d7faf1212758042f19f06f64bad6ffe1f5ed707142dc8cc0a67130b9124db521e3a65e4aee18a0abf00b6f57dd59829f59662" + } + }, + { + "type": "wireguard", + "prefix": "2001:b:a:b744::/64", + "Connection": { + "ip": "", + "port": 0, + "key": "3auHJw3XHFBiaI34C9pB/rmbomW3yQlItLD4YSzRvwc=", + "private_key": "96dc64ff11d05e8860272b91bf09d52d306b8ad71e5c010c0ccbcc8d8d8f602c57a30e786d0299731b86908382e4ea5a82f15b41ebe6ce09a61cfb8373d2024c55786be3ecad21fe0ee100339b5fa904961fbbbd25699198c1da86c5" + } + } + ], + "exit_point": false + } + ], + "prefix_zero": "2001:b:a::/64", + "exit_point": { + "ipv4_conf": null, + "ipv4_dnat": null, + "ipv6_conf": { + "addr": "fe80::8ac6/64", + "gateway": "fe80::1", + "metric": 0, + "iface": "public" + }, + "ipv6_allow": [] + }, + "allocation_nr": 0, + "version": 1 + } +} +``` + +Our network schema is now ready, but before we can provision it onto a node, we need to sign it and send it to the bcdb. +To be able to sign it we need to have a pair of key. You can use `tfuser id` command to create an identity: + +```bash +tfuser id --output user.seed +``` + +We can now provision the network on both nodes: + +```bash +tfuser provision --schema network.json \ +--node DLFF6CAshvyhCrpyTHq1dMd6QP6kFyhrVGegTgudk6xk \ +--node 4hpUjrbYS4YeFbvLoeSR8LGJKVkB97JyS83UEhFUU3S4 \ +--seed user.seed +``` diff --git a/collections/documentation/developers/internals/zos/internals/network/attic/zostst.dhcp b/collections/documentation/developers/internals/zos/internals/network/attic/zostst.dhcp new file mode 100644 index 0000000..0ac53be --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/attic/zostst.dhcp @@ -0,0 +1,54 @@ +#!/usr/bin/bash + +mgmtnic=( +0c:c4:7a:51:e3:6a +0c:c4:7a:51:e9:e6 +0c:c4:7a:51:ea:18 +0c:c4:7a:51:e3:78 +0c:c4:7a:51:e7:f8 +0c:c4:7a:51:e8:ba +0c:c4:7a:51:e8:0c +0c:c4:7a:51:e7:fa +) + +ipminic=( +0c:c4:7a:4c:f3:b6 +0c:c4:7a:4d:02:8c +0c:c4:7a:4d:02:91 +0c:c4:7a:4d:02:62 +0c:c4:7a:4c:f3:7e +0c:c4:7a:4d:02:98 +0c:c4:7a:4d:02:19 +0c:c4:7a:4c:f2:e0 +) +cnt=1 +for i in ${mgmtnic[*]} ; do +cat << EOF +config host + option name 'zosv2tst-${cnt}' + option dns '1' + option mac '${i}' + option ip '10.5.0.$((${cnt} + 10))' + +EOF +let cnt++ +done + + + +cnt=1 +for i in ${ipminic[*]} ; do +cat << EOF +config host + option name 'ipmiv2tst-${cnt}' + option dns '1' + option mac '${i}' + option ip '10.5.0.$((${cnt} + 100))' + +EOF +let cnt++ +done + +for i in ${mgmtnic[*]} ; do + echo ln -s zoststconf 01-$(echo $i | sed s/:/-/g) +done diff --git a/collections/documentation/developers/internals/zos/internals/network/definitions.md b/collections/documentation/developers/internals/zos/internals/network/definitions.md new file mode 100644 index 0000000..137fe0a --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/definitions.md @@ -0,0 +1,35 @@ +

Definitions

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Node](#node) +- [TNo : Tenant Network Object](#tno--tenant-network-object) +- [NR: Network Resource](#nr-network-resource) + +*** + +## Introduction + +We present definitions of words used through the documentation. + +## Node + + TL;DR: Computer. + A Node is a computer with CPU, Memory, Disks (or SSD's, NVMe) connected to _A_ network that has Internet access. (i.e. it can reach www.google.com, just like you on your phone, at home) + That Node will, once it has received an IP address (IPv4 or IPv6), register itself when it's new, or confirm it's identity and it's online-ness (for lack of a better word). + +## TNo : Tenant Network Object + + TL;DR: The Network Description. + We named it so, because it is a data structure that describes the __whole__ network a user can request (or setup). + That network is a virtualized overlay network. + Basically that means that transfer of data in that network *always* is encrypted, protected from prying eyes, and __resources in that network can only communicate with each other__ **unless** there is a special rule that allows access. Be it by allowing access through firewall rules, *and/or* through a proxy (a service that forwards requests on behalf of, and ships replies back to the client). + +## NR: Network Resource + + TL;DR: the Node-local part of a TNo. + The main building block of a TNo; i.e. each service of a user in a Node lives in an NR. + Each Node hosts User services, whatever type of service that is. Every service in that specific node will always be solely part of the Tenant's Network. (read that twice). + So: A Network Resource is the thing that interconnects all other network resources of the TN (Tenant Network), and provides routing/firewalling for these interconnects, including the default route to the BBI (Big Bad Internet), aka ExitPoint. + All User services that run in a Node are in some way or another connected to the Network Resource (NR), which will provide ip packet forwarding and firewalling to all other network resources (including the Exitpoint) of the TN (Tenant Network) of the user. (read that three times, and the last time, read it slowly and out loud) \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/internals/network/introduction.md b/collections/documentation/developers/internals/zos/internals/network/introduction.md new file mode 100644 index 0000000..76660fa --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/introduction.md @@ -0,0 +1,87 @@ +

Introduction to Networkd

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Boot and initial setup](#boot-and-initial-setup) +- [Networkd functionality](#networkd-functionality) +- [Techie talk](#techie-talk) +- [Wireguard explanations](#wireguard-explanations) +- [Caveats](#caveats) + +*** + +## Introduction + +We provide an introduction to Networkd, the network manager of 0-OS. + +## Boot and initial setup + +At boot, be it from an usb stick or PXE, ZOS starts up the kernel, with a few necessary parameters like farm ID and/or possible network parameters, but basically once the kernel has started, [zinit](https://github.com/threefoldtech/zinit) among other things, starts the network initializer. + +In short, that process loops over the available network interfaces and tries to obtain an IP address that also provides for a default gateway. That means: it tries to get Internet connectivity. Without it, ZOS stops there, as not being able to register itself, nor start other processes, there wouldn't be any use for it to be started anyway. + +Once it has obtained Internet connectivity, ZOS can then proceed to make itself known to the Grid, and acknowledge it's existence. It will then regularly poll the Grid for tasks. + +Once initialized, with the network daemon running (a process that will handle all things related to networking), ZOS will set up some basic services so that workloads can themselves use that network. + +## Networkd functionality + +The network daemon is in itself responsible for a few tasks, and working together with the [provision daemon](../provision) it mainly sets up the local infrastructure to get the user network resources, together with the wireguard configurations for the user's mesh network. + +The Wireguard mesh is an overlay network. That means that traffic of that network is encrypted and encapsulated in a new traffic frame that the gets transferred over the underlay network, here in essence the network that has been set up during boot of the node. + +For users or workloads that run on top of the mesh, the mesh network looks and behaves like any other directly connected workload, and as such that workload can reach other workloads or services in that mesh with the added advantage that that traffic is encrypted, protecting services and communications over that mesh from too curious eyes. + +That also means that workloads between nodes in a local network of a farmer is even protected from the farmer himself, in essence protecting the user from the farmer in case that farmer could become too curious. + +As the nodes do not have any way to be accessed, be it over the underlaying network or even the local console of the node, a user can be sure that his workload cannot be snooped upon. + +## Techie talk + +- **boot and initial setup** +For ZOS to work at all (the network is the computer), it needs an internet connection. That is: it needs to be able to communicate with the BCDB over the internet. +So ZOS starts with that: with the `internet` process, that tries go get the node to receive an IP address. That process will have set-up a bridge (`zos`), connected to an interface that is on an Internet-capable network. That bridge will have an IP address that has Internet access. +Also, that bridge is there for future public interfaces into workloads. +Once ZOS can reach the Internet, the rest of the system can be started, where ultimately, the `networkd` daemon is started. + +- **networkd initial setup** +`networkd` starts with recensing the available Network interfaces, and registers them to the BCDB (grid database), so that farmers can specify non-standard configs like for multi-nic machines. Once that is done, `networkd` registers itself to the zbus, so it can receive tasks to execute from the provsioning daemon (`provisiond`). +These tasks are mostly setting up network resources for users, where a network resource is a subnet in the user's wireguard mesh. + +- **multi-nic setups** + +When someone is a farmer, exploiting nodes somewhere in a datacentre, where the nodes have multiple NICs, it is advisable (though not necessary) to differentiate OOB traffic (like initial boot setup) from user traffic (as well the overlay network as the outgoing NAT for nodes for IPv4) to be on a different NIC. With these parameters, a user will have to make sure their switches are properly configured, more in docs later. + +- **registering and configurations** + +Once a node has booted and properly initialized, registering and configuring the node to be able to accept workloads and their associated network configs, is a two-step process. +First, the node registers it's live network setup to the BCDB. That is : all NICs with their associated IP addresses and routes are registered so a farm admin can in a second phase configure eventual separate NICs to handle different kinds of workloads. +In that secondary phase, a farm admin can then set-up the NICs and their associated IP's manually, so that workloads can start using them. + +## Wireguard explanations + +- **wireguard as pointopoint links and what that means** +Wireguard is a special type of VPN, where every instance is as well server for multiple peers as client towards multiple peers. That way you can create fanning-out connections als receive connections from multiple peers, creating effectively a mesh of connections Like this : ![like so](HIDDEN-PUBLIC.png) + +- **wireguard port management** +Every wireguard point (a network resource point) needs a destination/port combo when it's publicly reachable. The destination is a public ip, but the port is the differentiator. So we need to make sure every network wireguard listening port is unique in the node where it runs, and can be reapplied in case of a node's reboot. +ZOS registers the ports **already in use** to the BCDB, so a user can the pick a port that is not yet used. + +- **wireguard and hidden nodes** +Hidden nodes are nodes that are in essence hidden behind a firewall, and unreachable from the Internet to an internal network, be it as an IPv4 NATed host or an IPv6 host that is firewalled in any way, where it's impossible to have connection initiations form the Internet to the node. +As such, these nodes can only partake in a network as client-only towards publicly reachable peers, and can only initiate the connections themselves. (ref previous drawing). +To make sure connectivity stays up, the clients (all) have a keepalive towards all their peers so that communications towards network resources in hidden nodes can be established. + +## Caveats + +- **hidden nodes** +Hidden nodes live (mostly) behind firewalls that keep state about connections and these states have a lifetime. We try at best to keep these communications going, but depending of the firewall your mileage may vary (YMMV ;-)) + +- **local underlay network reachability** +When multiple nodes live in a same hidden network, at the moment we don't try to have the nodes establish connectivity between themselves, so all nodes in that hidden network can only reach each other through the intermediary of a node that is publicly reachable. So to get some performance, a farmer will have to have real routable nodes available in the vicinity. +So for now, a farmer is better off to have his nodes really reachable over a public network. + +- **IPv6 and IPv4 considerations** +While the mesh can work over IPv4 __and__ IPv6 at the same time, the peers can only be reached through one protocol at the same time. That is a peer is IPv4 __or__ IPv6, not both. Hence if a peer is reachable over IPv4, the client towards that peer needs to reach it over IPv4 too and thus needs an IPv4 address. +We advise strongly to have all nodes properly set-up on a routable unfirewalled IPv6 network, so that these problems have no reason to exist. diff --git a/collections/documentation/developers/internals/zos/internals/network/mesh.md b/collections/documentation/developers/internals/zos/internals/network/mesh.md new file mode 100644 index 0000000..fd9bb85 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/mesh.md @@ -0,0 +1,134 @@ +

Zero-Mesh

+ +

Table of Contents

+ +- [What It Is](#what-it-is) +- [Overlay Network](#overlay-network) +- [ZOS networkd](#zos-networkd) +- [Internet reachability per Network Resource](#internet-reachability-per-network-resource) +- [Interworkings](#interworkings) +- [Network Resource Internals](#network-resource-internals) + +*** + +## What It Is + +When a user wants to deploy a workload, whatever that may be, that workload needs connectivity. +If there is just one service to be run, things can be simple, but in general there are more than one services that need to interact to provide a full stack. Sometimes these services can live on one node, but mostly these service will be deployed over multiple nodes, in different containers. +The Mesh is created for that, where containers can communicate over an encrypted path, and that network can be specified in terms of IP addresses by the user. + +## Overlay Network + +Zero-Mesh is an overlay network. That requires that nodes need a proper working network with existing access to the Internet in the first place, being full-blown public access, or behind a firewall/home router that provides for Private IP NAT to the internet. + +Right now Zero-Mesh has support for both, where nodes behind a firewall are HIDDEN nodes, and nodes that are directly connected, be it over IPv6 or IPv4 as 'normal' nodes. +Hidden nodes can thus only be participating as client nodes for a specific user Mesh, and all publicly reachable nodes can act as aggregators for hidden clients in that user Mesh. + +Also, a Mesh is static: once it is configured, and thus during the lifetime of the network, there is one node containing the aggregator for Mesh clients that live on hidden nodes. So if then an aggregator node has died or is not reachable any more, the mesh needs to be reapplied, with __some__ publicly reachable node as aggregator node. + +So it goes a bit like ![this](HIDDEN-PUBLIC.png) +The Exit labeled NR in that graph is the point where Network Resources in Hidden Nodes connect to. These Exit NRs are then the transfer nodes between Hidden NRs. + +## ZOS networkd + +The networkd daemon receives tasks from the provisioning daemon, so that it can create the necessary resources for a Mesh participator in the User Network (A network Resource - NR). + +A network is defined as a whole by the User, using the tools in the 3bot to generate a proper configuration that can be used by the network daemon. + +What networkd takes care of, is the establishment of the mesh itself, in accordance with the configuration a farmer has given to his nodes. What is configured on top of the Mesh is user defined, and applied as such by the networkd. + +## Internet reachability per Network Resource + +Every node that participates in a User mesh, will also provide for Internet access for every network resource. +that means that every NR has the same Internet access as the node itself. Which also means, in terms of security, that a firewall in the node takes care of blocking all types of entry to the NR, effectively being an Internet access diode, for outgoing and related traffic only. +In a later phase a user will be able to define some network resource as __sole__ outgoing Internet Access point, but for now that is not yet defined. + +## Interworkings + +So How is that set up ? + +Every node participating in a User Network, sets up a Network Resource. +Basically, it's a Linux Network Namespace (sort of a network virtual machine), that contains a wireguard interface that has a list of other Network resources it needs to route encrypted packets toward. + +As a User Network has a range typically a `/16` (like `10.1.0.0/16`), that is user defined. The User then picks a subnet from that range (like e.g. `10.1.1.0/24`) to assign that to every new NR he wants to participate in that Network. + +Workloads that are then provisioned are started in a newly created Container, and that container gets a User assigned IP __in__ that subnet of the Network Resource. + +The Network resource itself then handles the routing and firewalling for the containers that are connected to it. Also, the Network Resource takes care of internet connectivity, so that the container can reach out to other services on the Internet. + +![like this](NR_layout.png) + +Also in a later phase, a User will be able to add IPv6 prefixes to his Network Resources, so that containers are reachable over IPv6. + +Fully-routed IPv6 will then be available, where an Exit NR will be the entrypoint towards that network. + +## Network Resource Internals + +Each NR is basically a router for the User Network, but to allow NRs to access the Internet through the Node's local connection, there are some other internal routers to be added. + +Internally it looks like this : + +```text ++------------------------------------------------------------------------------+ +| |wg mesh | +| +-------------+ +-----+-------+ | +| | | | NR cust1 | 100.64.0.123/16 | +| | container +----------+ 10.3.1.0/24 +----------------------+ | +| | cust1 | veth| | public | | +| +-------------+ +-------------+ | | +| | | +| +-------------+ +-------------+ | | +| | | | NR cust200 | 100.64.4.200/16 | | +| | container +----------+ 10.3.1.0/24 +----------------------+ | +| | cust200 | veth| | public | | +| +-------------+ +------+------+ | | +| |wg mesh | | +| 10.101.123.34/16 | | +| +------------+ |tonrs | +| | | +------------------+ | +| | zos +------+ | 100.64.0.1/16 | | +| | | | 10.101.12.231/16| ndmz | | +| +---+--------+ NIC +-----------------------------+ | | +| | | public +------------------+ | +| +--------+------+ | +| | | +| | | ++------------------------------------------------------------------------------+ + | + | + | + | 10.101.0.0/16 10.101.0.1 + +------------------+------------------------------------------------------------ + + NAT + -------- + rules NR custA + nft add rule inet nat postrouting oifname public masquerade + nft add rule inet filter input iifname public ct state { established, related } accept + nft add rule inet filter input iifname public drop + + rules NR custB + nft add rule inet nat postrouting oifname public masquerade + nft add rule inet filter input iifname public ct state { established, related } accept + nft add rule inet filter input iifname public drop + + rules ndmz + nft add rule inet nat postrouting oifname public masquerade + nft add rule inet filter input iifname public ct state { established, related } accept + nft add rule inet filter input iifname public drop + + + Routing + + if NR only needs to get out: + ip route add default via 100.64.0.1 dev public + + if an NR wants to use another NR as exitpoint + ip route add default via destnr + with for AllowedIPs 0.0.0.0/0 on that wg peer + +``` + +During startup of the Node, the ndmz is put in place, following the configuration if it has a single internet connection , or that with a dual-nic setup, a separate nic is used for internet access. + +The ndmz network has the carrier-grade nat allocation assigned, so we don'tinterfere with RFC1918 private IPv4 address space, so users can use any of them (and not any of `100.64.0.0/10`, of course) diff --git a/collections/documentation/developers/internals/zos/internals/network/readme.md b/collections/documentation/developers/internals/zos/internals/network/readme.md new file mode 100644 index 0000000..3958cf6 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/readme.md @@ -0,0 +1,8 @@ +

Zero-OS Networking

+ +

Table of Contents

+ +- [Introduction to networkd](./introduction.md) +- [Vocabulary Definitions](./definitions.md) +- [Wireguard Mesh Details](./mesh.md) +- [Farm Network Setup](./setup_farm_network.md) \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/internals/network/setup_farm_network.md b/collections/documentation/developers/internals/zos/internals/network/setup_farm_network.md new file mode 100644 index 0000000..23ecc84 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/setup_farm_network.md @@ -0,0 +1,123 @@ +

Setup

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Running ZOS (v2) at home](#running-zos-v2-at-home) +- [Running ZOS (v2) in a multi-node farm in a DC](#running-zos-v2-in-a-multi-node-farm-in-a-dc) + - [Necessities](#necessities) + - [IPv6](#ipv6) + - [Routing/firewalling](#routingfirewalling) + - [Multi-NIC Nodes](#multi-nic-nodes) + - [Farmers and the grid](#farmers-and-the-grid) + +*** + +## Introduction + +We present ZOSv2 network considerations. + +Running ZOS on a node is just a matter of booting it with a USB stick, or with a dhcp/bootp/tftp server with the right configuration so that the node can start the OS. +Once it starts booting, the OS detects the NICs, and starts the network configuration. A Node can only continue it's boot process till the end when it effectively has received an IP address and a route to the Internet. Without that, the Node will retry indefinitely to obtain Internet access and not finish it's startup. + +So a Node needs to be connected to a __wired__ network, providing a dhcp server and a default gateway to the Internet, be it NATed or plainly on the public network, where any route to the Internet, be it IPv4 or IPv6 or both is sufficient. + +For a node to have that ability to host user networks, we **strongly** advise to have a working IPv6 setup, as that is the primary IP stack we're using for the User Network's Mesh to function. + +## Running ZOS (v2) at home + +Running a ZOS Node at home is plain simple. Connect it to your router, plug it in the network, insert the preconfigured USB stick containing the bootloader and the `farmer_id`, power it on. +You will then see it appear in the Cockpit (`https://cockpit.testnet.grid.tf/capacity`), under your farm. + +## Running ZOS (v2) in a multi-node farm in a DC + +Multi-Node Farms, where a farmer wants to host the nodes in a data centre, have basically the same simplicity, but the nodes can boot from a boot server that provides for DHCP, and also delivers the iPXE image to load, without the need for a USB stick in every Node. + +A boot server is not really necessary, but it helps ;-). That server has a list of the MAC addresses of the nodes, and delivers the bootloader over PXE. The farmer is responsible to set-up the network, and configure the boot server. + +### Necessities + +The Farmer needs to: + +- Obtain an IPv6 prefix allocation from the provider. A `/64` will do, that is publicly reachable, but a `/48` is advisable if the farmer wants to provide IPv6 transit for User Networks +- If IPv6 is not an option, obtain an IPv4 subnet from the provider. At least one IPv4 address per node is needed, where all IP addresses are publicly reachable. +- Have the Nodes connected on that public network with a switch so that all Nodes are publicly reachable. +- In case of multiple NICS, also make sure his farm is properly registered in BCDB, so that the Node's public IP Addresses are registered. +- Properly list the MAC addresses of the Nodes, and configure the DHCP server to provide for an IP address, and in case of multiple NICs also provide for private IP addresses over DHCP per Node. +- Make sure that after first boot, the Nodes are reachable. + +### IPv6 + +IPv6, although already a real protocol since '98, has seen reluctant adoption over the time it exists. That mostly because ISPs and Carriers were reluctant to deploy it, and not seeing the need since the advent of NAT and private IP space, giving the false impression of security. +But this month (10/2019), RIPE sent a mail to all it's LIRs that the last consecutive /22 in IPv4 has been allocated. Needless to say, but that makes the transition to IPv6 in 2019 of utmost importance and necessity. +Hence, ZOS starts with IPv6, and IPv4 is merely an afterthought ;-) +So in a nutshell: we greatly encourage Farmers to have IPv6 on the Node's network. + +### Routing/firewalling + +Basically, the Nodes are self-protecting, in the sense that they provide no means at all to be accessed through listening processes at all. No service is active on the node itself, and User Networks function solely on an overlay. +That also means that there is no need for a Farm admin to protect the Nodes from exterior access, albeit some DDoS protection might be a good idea. +In the first phase we will still allow the Host OS (ZOS) to reply on ICMP ping requests, but that 'feature' might as well be blocked in the future, as once a Node is able to register itself, there is no real need to ever want to try to reach it. + +### Multi-NIC Nodes + +Nodes that Farmers deploy are typically multi-NIC Nodes, where one (typically a 1GBit NIC) can be used for getting a proper DHCP server running from where the Nodes can boot, and one other NIC (1Gbit or even 10GBit), that then is used for transfers of User Data, so that there is a clean separation, and possible injections bogus data is not possible. + +That means that there would be two networks, either by different physical switches, or by port-based VLANs in the switch (if there is only one). + +- Management NICs + The Management NIC will be used by ZOS to boot, and register itself to the GRID. Also, all communications from the Node to the Grid happens from there. +- Public NICs + +### Farmers and the grid + +A Node, being part of the Grid, has no concept of 'Farmer'. The only relationship for a Node with a Farmer is the fact that that is registered 'somewhere (TM)', and that a such workloads on a Node will be remunerated with Tokens. For the rest, a Node is a wholly stand-alone thing that participates in the Grid. + +```text + 172.16.1.0/24 + 2a02:1807:1100:10::/64 ++--------------------------------------+ +| +--------------+ | +-----------------------+ +| |Node ZOS | +-------+ | | +| | +-------------+1GBit +--------------------+ 1GBit switch | +| | | br-zos +-------+ | | +| | | | | | +| | | | | | +| | | | +------------------+----+ +| +--------------+ | | +-----------+ +| | OOB Network | | | +| | +----------+ ROUTER | +| | | | +| | | | +| | | | +| +------------+ | +----------+ | +| | Public | | | | | +| | container | | | +-----+-----+ +| | | | | | +| | | | | | +| +---+--------+ | +-------------------+--------+ | +| | | | 10GBit Switch | | +| br-pub| +-------+ | | | +| +-----+10GBit +-------------------+ | +----------> +| +-------+ | | Internet +| | | | +| | +----------------------------+ ++--------------------------------------+ + 185.69.167.128/26 Public network + 2a02:1807:1100:0::/64 + +``` + +Where the underlay part of the wireguard interfaces get instantiated in the Public container (namespace), and once created these wireguard interfaces get sent into the User Network (Network Resource), where a user can then configure the interface a he sees fit. + +The router of the farmer fulfills 2 roles: + +- NAT everything in the OOB network to the outside, so that nodes can start and register themselves, as well get tasks to execute from the BCDB. +- Route the assigned IPv4 subnet and IPv6 public prefix on the public segment, to which the public container is connected. + +As such, in case that the farmer wants to provide IPv4 public access for grid proxies, the node will need at least one (1) IPv4 address. It's free to the farmer to assign IPv4 addresses to only a part of the Nodes. +On the other hand, it is quite important to have a proper IPv6 setup, because things will work out better. + +It's the Farmer's task to set up the Router and the switches. + +In a simpler setup (small number of nodes for instance), the farmer could setup a single switch and make 2 port-based VLANs to separate OOB and Public, or even wit single-nic nodes, just put them directly on the public segment, but then he will have to provide a DHCP server on the Public network. diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/png/ndmz-dualstack.png b/collections/documentation/developers/internals/zos/internals/network/topology/png/ndmz-dualstack.png new file mode 100644 index 0000000..dc9d508 Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/topology/png/ndmz-dualstack.png differ diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/png/ndmz-hidden.png b/collections/documentation/developers/internals/zos/internals/network/topology/png/ndmz-hidden.png new file mode 100644 index 0000000..d6a462c Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/topology/png/ndmz-hidden.png differ diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/png/nr-join.png b/collections/documentation/developers/internals/zos/internals/network/topology/png/nr-join.png new file mode 100644 index 0000000..a8509b4 Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/topology/png/nr-join.png differ diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/png/nr-step-1.png b/collections/documentation/developers/internals/zos/internals/network/topology/png/nr-step-1.png new file mode 100644 index 0000000..fb077d8 Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/topology/png/nr-step-1.png differ diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/png/nr-step-2.png b/collections/documentation/developers/internals/zos/internals/network/topology/png/nr-step-2.png new file mode 100644 index 0000000..1d70324 Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/topology/png/nr-step-2.png differ diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/png/public-namespace.png b/collections/documentation/developers/internals/zos/internals/network/topology/png/public-namespace.png new file mode 100644 index 0000000..9276012 Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/topology/png/public-namespace.png differ diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/png/zos-bridge.png b/collections/documentation/developers/internals/zos/internals/network/topology/png/zos-bridge.png new file mode 100644 index 0000000..908ea0d Binary files /dev/null and b/collections/documentation/developers/internals/zos/internals/network/topology/png/zos-bridge.png differ diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/readme.md b/collections/documentation/developers/internals/zos/internals/network/topology/readme.md new file mode 100644 index 0000000..2ad5937 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/topology/readme.md @@ -0,0 +1,68 @@ +# On boot +> this is setup by `internet` daemon, which is part of the bootstrap process. + +the first basic network setup is done, the point of this setup is to connect the node to the internet, to be able to continue the rest of the boot process. + +- Go over all **PLUGGED, and PHYSICAL** interfaces +- For each matching interface, the interface is tested if it can get both IPv4 and IPv6 +- If multiple interfaces have been found to receive ipv4 from dhcp, we find the `smallest` ip, with the private gateway IP, otherwise if no private gateway ip found, we only find the one with the smallest IP. +- Once the interface is found we do the following: (we will call this interface **eth**) + - Create a bridge named `zos` + - Disable IPv6 on this bridge, and ipv6 forwarding +- Run `udhcpc` on zos bridge +![zos-bridge](png/zos-bridge.png) + +Once this setup complete, the node now has access to the internet which allows it to download and run `networkd` which takes over the network stack and continue the process as follows. + +# Network Daemon +- Validate zos setup created by the `internet` on boot daemon +- Send information about all local nics to the explorer (?) + +## Setting up `ndmz` +First we need to find the master interface for ndmz, we have the following cases: +- master of `public_config` if set. Public Config is an external configuration that is set by the farmer on the node object. that information is retrieved by the node from the public explorer. +- otherwise (if public_config is not set) check if the public namespace is set (i think that's a dead branch because if this exist (or can exist) it means the master is always set. which means it will get used always. +- otherwise find first interface with ipv6 +- otherwise check if zos has global unicast ipv6 +- otherwise hidden node (still uses zos but in hidden node setup) + +### Hidden node ndmz +![ndmz-hidden](png/ndmz-hidden.png) + +### Dualstack ndmz +![ndmz-dualstack](png/ndmz-dualstack.png) + +## Setting up Public Config +this is an external configuration step that is configured by the farmer on the node object. The node then must have setup in the explorer. + +![public-namespace](png/public-namespace.png) + +## Setting up Yggdrasil +- Get a list of all public peers with status `up` +- If hidden node: + - Find peers with IPv4 addresses +- If dual stack node: + - Filter out all peers with same prefix as the node, to avoid connecting locally only +- write down yggdrasil config, and start yggdrasil daemon via zinit +- yggdrasil runs inside the ndmz namespace +- add an ipv6 address to npub in the same prefix as yggdrasil. this way when npub6 is used as a gateway for this prefix, traffic +will be routed through yggdrasil. + +# Creating a network resource +A network resource (`NR` for short) as a user private network that lives on the node and can span multiple nodes over wireguard. When a network is deployed the node builds a user namespace as follows: +- A unique network id is generated by md5sum(user_id + network_name) then only take first 13 bytes. We will call this `net-id`. + +![nr-1](png/nr-step-1.png) + +## Create the wireguard interface +if the node has `public_config` so the `public` namespace exists. then the wireguard device is first created inside the `public` namespace then moved +to the network-resource namespace. + +Otherwise, the port is created on the host namespace and then moved to the network-resource namespace. The final result is + +![nr-2](png/nr-step-2.png) + +Finally the wireguard peer list is applied and configured, routing rules is also configured to route traffic to the wireguard interface + +# Member joining a user network (network resource) +![nr-join](png/nr-join.png) diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/uml/ndmz-dualstack.wsd b/collections/documentation/developers/internals/zos/internals/network/topology/uml/ndmz-dualstack.wsd new file mode 100644 index 0000000..8b3d6eb --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/topology/uml/ndmz-dualstack.wsd @@ -0,0 +1,57 @@ +@startuml +[zos\nbridge] as zos +[br-pub\nbridge] as brpub +[br-ndmz\nbridge] as brndmz +note top of brndmz +disable ipv6 +- net.ipv6.conf.br-ndmz.disable_ipv6 = 1 +end note +' brpub -left- zos : veth pair\n(tozos) +brpub -down- master +note right of master +master is found as described +in the readme (this can be zos bridge) +in case of a single node machine +end note + +package "ndmz namespace" { + [tonrs\nmacvlan] as tonrs + note bottom of tonrs + - net.ipv4.conf.tonrs.proxy_arp = 0 + - net.ipv6.conf.tonrs.disable_ipv6 = 0 + + Addresses: + 100.127.0.1/16 + fe80::1/64 + fd00::1 + end note + tonrs - brndmz: macvlan + + [npub6\nmacvlan] as npub6 + npub6 -down- brpub: macvlan + + [npub4\nmacvlan] as npub4 + npub4 -down- zos: macvlan + + note as MAC + gets static mac address generated + from node id. to make sure it receives + same ip address. + end note + + MAC .. npub4 + MAC .. npub6 + + note as setup + - net.ipv6.conf.all.forwarding = 1 + end note + + [ygg0] + note bottom of ygg0 + this will be added by yggdrasil setup + in the next step + end note +} + +footer (hidden node) no master with global unicast ipv6 found +@enduml diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/uml/ndmz-hidden.wsd b/collections/documentation/developers/internals/zos/internals/network/topology/uml/ndmz-hidden.wsd new file mode 100644 index 0000000..05304dc --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/topology/uml/ndmz-hidden.wsd @@ -0,0 +1,55 @@ +@startuml +[zos\nbridge] as zos +note left of zos +current select master +for hiddent ndmz setup +end note +[br-pub\nbridge] as brpub +[br-ndmz\nbridge] as brndmz +note top of brndmz +disable ipv6 +- net.ipv6.conf.br-ndmz.disable_ipv6 = 1 +end note +brpub -left- zos : veth pair\n(tozos) + +package "ndmz namespace" { + [tonrs\nmacvlan] as tonrs + note bottom of tonrs + - net.ipv4.conf.tonrs.proxy_arp = 0 + - net.ipv6.conf.tonrs.disable_ipv6 = 0 + + Addresses: + 100.127.0.1/16 + fe80::1/64 + fd00::1 + end note + tonrs - brndmz: macvlan + + [npub6\nmacvlan] as npub6 + npub6 -right- brpub: macvlan + + [npub4\nmacvlan] as npub4 + npub4 -down- zos: macvlan + + note as MAC + gets static mac address generated + from node id. to make sure it receives + same ip address. + end note + + MAC .. npub4 + MAC .. npub6 + + note as setup + - net.ipv6.conf.all.forwarding = 1 + end note + + [ygg0] + note bottom of ygg0 + this will be added by yggdrasil setup + in the next step + end note +} + +footer (hidden node) no master with global unicast ipv6 found +@enduml diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/uml/nr-join.wsd b/collections/documentation/developers/internals/zos/internals/network/topology/uml/nr-join.wsd new file mode 100644 index 0000000..0c54b18 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/topology/uml/nr-join.wsd @@ -0,0 +1,23 @@ +@startuml + +component "br-pub" as public +component "b-\nbridge" as bridge +package " namespace" { + component eth0 as eth + note right of eth + set ip as configured in the reservation + it must be in the subnet assinged to n- + in the user resource above. + - set default route through n- + end note + eth .. bridge: veth + + component [pub\nmacvlan] as pub + pub .. public + + note right of pub + only if public ipv6 is requests + also gets a consistent MAC address + end note +} +@enduml diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/uml/nr-step-1.wsd b/collections/documentation/developers/internals/zos/internals/network/topology/uml/nr-step-1.wsd new file mode 100644 index 0000000..a739c60 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/topology/uml/nr-step-1.wsd @@ -0,0 +1,31 @@ +@startuml +component [b-] as bridge +note left of bridge +- net.ipv6.conf.b-.disable_ipv6 = 1 +end note + +package "n- namespace" { + component [n-\nmacvlan] as nic + bridge .. nic: macvlan + + note bottom of nic + - nic gets the first ip ".1" in the assigned + user subnet. + - an ipv6 driven from ipv4 that is driven from the assigned ipv4 + - fe80::1/64 + end note + component [public\nmacvlan] as public + note bottom of public + - gets an ipv4 in 100.127.0.9/16 range + - get an ipv6 in the fd00::/64 prefix + - route over 100.127.0.1 + - route over fe80::1/64 + end note + note as G + - net.ipv6.conf.all.forwarding = 1 + end note +} + +component [br-ndmz] as brndmz +brndmz .. public: macvlan +@enduml diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/uml/nr-step-2.wsd b/collections/documentation/developers/internals/zos/internals/network/topology/uml/nr-step-2.wsd new file mode 100644 index 0000000..6cfcb68 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/topology/uml/nr-step-2.wsd @@ -0,0 +1,33 @@ +@startuml +component [b-] as bridge +note left of bridge +- net.ipv6.conf.b-.disable_ipv6 = 1 +end note + +package "n- namespace" { + component [n-\nmacvlan] as nic + bridge .. nic: macvlan + + note bottom of nic + - nic gets the first ip ".1" in the assigned + user subnet. + - an ipv6 driven from ipv4 that is driven from the assigned ipv4 + - fe80::1/64 + end note + component [public\nmacvlan] as public + note bottom of public + - gets an ipv4 in 100.127.0.9/16 range + - get an ipv6 in the fd00::/64 prefix + - route over 100.127.0.1 + - route over fe80::1/64 + end note + note as G + - net.ipv6.conf.all.forwarding = 1 + end note + component [w-\nwireguard] +} + + +component [br-ndmz] as brndmz +brndmz .. public: macvlan +@enduml diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/uml/public-namespace.wsd b/collections/documentation/developers/internals/zos/internals/network/topology/uml/public-namespace.wsd new file mode 100644 index 0000000..215152c --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/topology/uml/public-namespace.wsd @@ -0,0 +1,29 @@ +@startuml + +() "br-pub (Public Bridge)" as brpub + +note bottom of brpub +This bridge is always created on boot, and is either +connected to the zos bridge (in single nic setup). +or to the seond nic with public IPv6 (in dual nic setup) +end note + + +package "public namespace" { + + [public\nmacvlan] as public + public -down- brpub: macvlan + note right of public + - have a static mac generated from node id + - set the ips as configured + - set the default gateways as configured + end note + + note as global + inside namespace + - net.ipv6.conf.all.accept_ra = 2 + - net.ipv6.conf.all.accept_ra_defrtr = 1 + end note +} + +@enduml diff --git a/collections/documentation/developers/internals/zos/internals/network/topology/uml/zos-bridge.wsd b/collections/documentation/developers/internals/zos/internals/network/topology/uml/zos-bridge.wsd new file mode 100644 index 0000000..2328f00 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/topology/uml/zos-bridge.wsd @@ -0,0 +1,16 @@ +@startuml +() eth +[zos] +eth -up- zos +note left of zos +bridge takes same mac address as eth +(ipv6 is enabled on the bridge) +- net.ipv6.conf.zos.disable_ipv6 = 0 +end note +note left of eth +disable ipv6 on interface: +(ipv6 is disabled on the nic) +- net.ipv6.conf..disable_ipv6 = 1 +- net.ipv6.conf.all.forwarding = 0 +end note +@enduml diff --git a/collections/documentation/developers/internals/zos/internals/network/yggdrasil.md b/collections/documentation/developers/internals/zos/internals/network/yggdrasil.md new file mode 100644 index 0000000..7b3c189 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/yggdrasil.md @@ -0,0 +1,25 @@ +# Yggdrasil integration in 0-OS + +Since day one, 0-OS v2 networking has been design around IPv6. The goal was avoid having to deal with exhausted IPV4 address and be ready for the future. + +While this decision made sense on the long term, it pose trouble on the short term for farmer that only have access to ipv4 and are unable to ask for an upgrade to their IPS. + +In order to allow these ipv4 only nodes to join the grid, an other overlay network has to be created between all the nodes. To achieve this, Yggdrasil has been selected. + +## Yggdrasil + +[Yggdrasil network project](https://yggdrasil-network.github.io/) has been selected to be integrated into 0-OS. All 0-OS node will runs an yggdrasil daemon which means all 0-OS nodes can now communicate over the yggdrasil network. The yggdrasil integration is an experiment planned in multiple phase: + +Phase 1: Allow 0-DB container to be exposed over yggdrasil network. Implemented in v0.3.5 +Phase 2: Allow containers to request an interface with an yggdrasil IP address. + +## networkd bootstrap + +When booting, networkd will wait for 2 minute to receive an IPv6 address through router advertisement for it's `npub6` interface in the ndmz network namspace. +If after 2 minutes, no IPv6 is received, networkd will consider the node to be an IPv4 only nodes, switch to this mode and continue booting. + +### 0-DB containers + +For ipv4 only nodes, the 0-DB container will be exposed on top an yggdrasil IPv6 address. Since all the 0-OS node will also run yggdrasil, these 0-DB container will always be reachable from any container in the grid. + +For dual stack nodes, the 0-DB container will also get an yggdrasil IP in addition to the already present public IPv6. \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/internals/network/zbus.md b/collections/documentation/developers/internals/zos/internals/network/zbus.md new file mode 100644 index 0000000..c2b7a2a --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/network/zbus.md @@ -0,0 +1,46 @@ +# Network module + +## ZBus + +Network module is available on zbus over the following channel + +| module | object | version | +|--------|--------|---------| +| network|[network](#interface)| 0.0.1| + +## Home Directory + +network keeps some data in the following locations +| directory | path| +|----|---| +| root| `/var/cache/modules/network`| + + +## Interface + +```go +//Networker is the interface for the network module +type Networker interface { + // Create a new network resource + CreateNR(Network) (string, error) + // Delete a network resource + DeleteNR(Network) error + + // Join a network (with network id) will create a new isolated namespace + // that is hooked to the network bridge with a veth pair, and assign it a + // new IP from the network resource range. The method return the new namespace + // name. + // The member name specifies the name of the member, and must be unique + // The NetID is the network id to join + Join(networkdID NetID, containerID string, addrs []string) (join Member, err error) + + // ZDBPrepare creates a network namespace with a macvlan interface into it + // to allow the 0-db container to be publicly accessible + // it retusn the name of the network namespace created + ZDBPrepare() (string, error) + + // Addrs return the IP addresses of interface + // if the interface is in a network namespace netns needs to be not empty + Addrs(iface string, netns string) ([]net.IP, error) +} +``` \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/internals/node/readme.md b/collections/documentation/developers/internals/zos/internals/node/readme.md new file mode 100644 index 0000000..0679bd7 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/node/readme.md @@ -0,0 +1,50 @@ +

Node Module

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Zbus](#zbus) +- [Example](#example) + +*** + +## Introduction + +This module is responsible of registering the node on the grid, and handling of grid events. The node daemon broadcast the intended events on zbus for other modules that are interested in those events. + +The node also provide zbus interfaces to query some of the node information. + +## Zbus + +Node module is available on [zbus](https://github.com/threefoldtech/zbus) over the following channel + +| module | object | version | +|--------|--------|---------| +|host |host| 0.0.1 +|system |system| 0.0.1 +|events |events| 0.0.1 + +## Example + +```go + +//SystemMonitor interface (provided by noded) +type SystemMonitor interface { + NodeID() uint32 + Memory(ctx context.Context) <-chan VirtualMemoryStat + CPU(ctx context.Context) <-chan TimesStat + Disks(ctx context.Context) <-chan DisksIOCountersStat + Nics(ctx context.Context) <-chan NicsIOCounterStat +} + +// HostMonitor interface (provided by noded) +type HostMonitor interface { + Uptime(ctx context.Context) <-chan time.Duration +} + +// Events interface +type Events interface { + PublicConfigEvent(ctx context.Context) <-chan PublicConfigEvent + ContractCancelledEvent(ctx context.Context) <-chan ContractCancelledEvent +} +``` diff --git a/collections/documentation/developers/internals/zos/internals/provision/readme.md b/collections/documentation/developers/internals/zos/internals/provision/readme.md new file mode 100644 index 0000000..c82be5c --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/provision/readme.md @@ -0,0 +1,35 @@ +

Provision Module

+ +

Table of Contents

+ +- [ZBus](#zbus) +- [Introduction](#introduction) +- [Supported workload](#supported-workload) + + +*** + +## ZBus + +This module is autonomous module and is not reachable over `zbus`. + +## Introduction + +This module is responsible to provision/decommission workload on the node. + +It accepts new deployment over `rmb` and tries to bring them to reality by running a series of provisioning workflows based on the workload `type`. + +`provisiond` knows about all available daemons and it contacts them over `zbus` to ask for the needed services. The pull everything together and update the deployment with the workload state. + +If node was restarted, `provisiond` tries to bring all active workloads back to original state. +## Supported workload + +0-OS currently support 8 type of workloads: +- network +- `zmachine` (virtual machine) +- `zmount` (disk): usable only by a `zmachine` +- `public-ip` (v4 and/or v6): usable only by a `zmachine` +- [`zdb`](https://github.com/threefoldtech/0-DB) `namespace` +- [`qsfs`](https://github.com/threefoldtech/quantum-storage) +- `zlogs` +- `gateway` diff --git a/collections/documentation/developers/internals/zos/internals/storage/readme.md b/collections/documentation/developers/internals/zos/internals/storage/readme.md new file mode 100644 index 0000000..8114ec1 --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/storage/readme.md @@ -0,0 +1,153 @@ +

Storage Module

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [ZBus](#zbus) +- [Overview](#overview) +- [List of sub-modules](#list-of-sub-modules) +- [On Node Booting](#on-node-booting) + - [zinit unit](#zinit-unit) + - [Interface](#interface) + +*** + +## Introduction + +This module is responsible to manage everything related with storage. + +## ZBus + +Storage module is available on zbus over the following channel + +| module | object | version | +|--------|--------|---------| +| storage|[storage](#interface)| 0.0.1| + + +## Overview + +On start, storaged holds ownership of all node disks, and it separate it into 2 different sets: + +- SSD Storage: For each ssd disk available, a storage pool of type SSD is created +- HDD Storage: For each HDD disk available, a storage pool of type HDD is created + + +Then `storaged` can provide the following storage primitives: +- `subvolume`: (with quota). The btrfs subvolume can be used by used by `flistd` to support read-write operations on flists. Hence it can be used as rootfs for containers and VMs. This storage primitive is only supported on `ssd` pools. + - On boot, storaged will always create a permanent subvolume with id `zos-cache` (of 100G) which will be used by the system to persist state and to hold cache of downloaded files. +- `vdisk`: Virtual disk that can be attached to virtual machines. this is only possible on `ssd` pools. +- `device`: that is a full disk that gets allocated and used by a single `0-db` service. Note that a single 0-db instance can serve multiple zdb namespaces for multiple users. This is only possible for on `hdd` pools. + +You already can tell that ZOS can work fine with no HDD (it will not be able to server zdb workloads though), but not without SSD. Hence a zos with no SSD will never register on the grid. + +## List of sub-modules + +- disks +- 0-db +- booting + +## On Node Booting + +When the module boots: + +- Make sure to mount all available pools +- Scan available disks that are not used by any pool and create new pools on those disks. (all pools now are created with `RaidSingle` policy) +- Try to find and mount a cache sub-volume under /var/cache. +- If no cache sub-volume is available a new one is created and then mounted. + +### zinit unit + +The zinit unit file of the module specify the command line, test command, and the order where the services need to be booted. + +Storage module is a dependency for almost all other system modules, hence it has high boot presidency (calculated on boot) by zinit based on the configuration. + +The storage module is only considered running, if (and only if) the /var/cache is ready + +```yaml +exec: storaged +test: mountpoint /var/cache +``` + +### Interface + +```go + +// StorageModule is the storage subsystem interface +// this should allow you to work with the following types of storage medium +// - full disks (device) (these are used by zdb) +// - subvolumes these are used as a read-write layers for 0-fs mounts +// - vdisks are used by zmachines +// this works as following: +// a storage module maintains a list of ALL disks on the system +// separated in 2 sets of pools (SSDs, and HDDs) +// ssd pools can only be used for +// - subvolumes +// - vdisks +// hdd pools are only used by zdb as one disk +type StorageModule interface { + // Cache method return information about zos cache volume + Cache() (Volume, error) + + // Total gives the total amount of storage available for a device type + Total(kind DeviceType) (uint64, error) + // BrokenPools lists the broken storage pools that have been detected + BrokenPools() []BrokenPool + // BrokenDevices lists the broken devices that have been detected + BrokenDevices() []BrokenDevice + //Monitor returns stats stream about pools + Monitor(ctx context.Context) <-chan PoolsStats + + // Volume management + + // VolumeCreate creates a new volume + VolumeCreate(name string, size gridtypes.Unit) (Volume, error) + + // VolumeUpdate updates the size of an existing volume + VolumeUpdate(name string, size gridtypes.Unit) error + + // VolumeLookup return volume information for given name + VolumeLookup(name string) (Volume, error) + + // VolumeDelete deletes a volume by name + VolumeDelete(name string) error + + // VolumeList list all volumes + VolumeList() ([]Volume, error) + + // Virtual disk management + + // DiskCreate creates a virtual disk given name and size + DiskCreate(name string, size gridtypes.Unit) (VDisk, error) + + // DiskResize resizes the disk to given size + DiskResize(name string, size gridtypes.Unit) (VDisk, error) + + // DiskWrite writes the given raw image to disk + DiskWrite(name string, image string) error + + // DiskFormat makes sure disk has filesystem, if it already formatted nothing happens + DiskFormat(name string) error + + // DiskLookup looks up vdisk by name + DiskLookup(name string) (VDisk, error) + + // DiskExists checks if disk exists + DiskExists(name string) bool + + // DiskDelete deletes a disk + DiskDelete(name string) error + + DiskList() ([]VDisk, error) + // Device management + + //Devices list all "allocated" devices + Devices() ([]Device, error) + + // DeviceAllocate allocates a new device (formats and give a new ID) + DeviceAllocate(min gridtypes.Unit) (Device, error) + + // DeviceLookup inspects a previously allocated device + DeviceLookup(name string) (Device, error) +} +``` diff --git a/collections/documentation/developers/internals/zos/internals/vmd/readme.md b/collections/documentation/developers/internals/zos/internals/vmd/readme.md new file mode 100644 index 0000000..d30fa3e --- /dev/null +++ b/collections/documentation/developers/internals/zos/internals/vmd/readme.md @@ -0,0 +1,66 @@ +

VMD Module

+ +

Table of Contents

+ +- [ZBus](#zbus) +- [Home Directory](#home-directory) +- [Introduction](#introduction) + - [zinit unit](#zinit-unit) +- [Interface](#interface) + +*** + +## ZBus + +Storage module is available on zbus over the following channel + +| module | object | version | +|--------|--------|---------| +| vmd|[vmd](#interface)| 0.0.1| + +## Home Directory + +contd keeps some data in the following locations +| directory | path| +|----|---| +| root| `/var/cache/modules/containerd`| + +## Introduction + +The vmd module, manages all virtual machines processes, it provide the interface to, create, inspect, and delete virtual machines. It also monitor the vms to make sure they are re-spawned if crashed. Internally it uses `cloud-hypervisor` to start the Vm processes. + +It also provide the interface to configure VM logs streamers. + +### zinit unit + +`contd` must run after containerd is running, and the node boot process is complete. Since it doesn't keep state, no dependency on `stroaged` is needed + +```yaml +exec: vmd --broker unix:///var/run/redis.sock +after: + - boot + - networkd +``` + +## Interface + +```go + +// VMModule defines the virtual machine module interface +type VMModule interface { + Run(vm VM) error + Inspect(name string) (VMInfo, error) + Delete(name string) error + Exists(name string) bool + Logs(name string) (string, error) + List() ([]string, error) + Metrics() (MachineMetrics, error) + + // VM Log streams + + // StreamCreate creates a stream for vm `name` + StreamCreate(name string, stream Stream) error + // delete stream by stream id. + StreamDelete(id string) error +} +``` diff --git a/collections/documentation/developers/internals/zos/manual/api.md b/collections/documentation/developers/internals/zos/manual/api.md new file mode 100644 index 0000000..1ffdad4 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/api.md @@ -0,0 +1,273 @@ +

API

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Deployments](#deployments) + - [Deploy](#deploy) + - [Update](#update) + - [Get](#get) + - [Changes](#changes) + - [Delete](#delete) +- [Statistics](#statistics) +- [Storage](#storage) + - [List separate pools with capacity](#list-separate-pools-with-capacity) +- [Network](#network) + - [List Wireguard Ports](#list-wireguard-ports) + - [Supports IPV6](#supports-ipv6) + - [List Public Interfaces](#list-public-interfaces) + - [List Public IPs](#list-public-ips) + - [Get Public Config](#get-public-config) +- [Admin](#admin) + - [List Physical Interfaces](#list-physical-interfaces) + - [Get Public Exit NIC](#get-public-exit-nic) + - [Set Public Exit NIC](#set-public-exit-nic) +- [System](#system) + - [Version](#version) + - [DMI](#dmi) + - [Hypervisor](#hypervisor) +- [GPUs](#gpus) + - [List Gpus](#list-gpus) + + +*** + +## Introduction + +This document should list all the actions available on the node public API. which is available over [RMB](https://github.com/threefoldtech/rmb-rs) + +The node is always reachable over the node twin id as per the node object on tfchain. Once node twin is known, a [client](https://github.com/threefoldtech/zos/blob/main/client/node.go) can be initiated and used to talk to the node. + +## Deployments + +### Deploy + +| command |body| return| +|---|---|---| +| `zos.deployment.deploy` | [Deployment](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/deployment.go)|-| + +Deployment need to have valid signature, the contract must exist on chain with the correct contract hash as the deployment. + +### Update + +| command |body| return| +|---|---|---| +| `zos.deployment.update` | [Deployment](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/deployment.go)|-| + +The update call, will update (modify) an already existing deployment with new definition. The deployment must already exist on the node, the contract must have the new hash as the provided deployment, plus valid versions. + +> TODO: need more details over the deployment update calls how to handle the version + +### Get + +| command |body| return| +|---|---|---| +| `zos.deployment.get` | `{contract_id: }`|[Deployment](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/deployment.go)| + +### Changes + +| command |body| return| +|---|---|---| +| `zos.deployment.changes` | `{contract_id: }`| `[]Workloads` | + +Where: + +- [workload](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/workload.go) + +The list will contain all deployment workloads (changes) means a workload can (will) appear +multiple times in this list for each time a workload state will change. + +This means a workload will first appear in `init` state, then next time it will show the state change (with time) to the next state which can be success or failure, and so on. +This will happen for each workload in the deployment. + +### Delete +> +> You probably never need to call this command yourself, the node will delete the deployment once the contract is cancelled on the chain. + +| command |body| return| +|---|---|---| +| `zos.deployment.get` | `{contract_id: }`|-| + +## Statistics + +| command |body| return| +|---|---|---| +| `zos.statistics.get` | - |`{total: Capacity, used: Capacity, system: Capacity}`| + +Where: + +```json +Capacity { + "cur": "uint64", + "sru": "bytes", + "hru": "bytes", + "mru": "bytes", + "ipv4u": "unit64", +} +``` + +> Note that, `used` capacity equal the full workload reserved capacity PLUS the system reserved capacity +so `used = user_used + system`, while `system` is only the amount of resourced reserved by `zos` itself + +## Storage + +### List separate pools with capacity + +| command |body| return| +|---|---|---| +| `zos.storage.pools` | - |`[]Pool`| + +List all node pools with their types, size and used space +where + +```json +Pool { + "name": "pool-id", + "type": "(ssd|hdd)", + "size": , + "used": +} +``` + +## Network + +### List Wireguard Ports + +| command |body| return| +|---|---|---| +| `zos.network.list_wg_ports` | - |`[]uint16`| + +List all `reserved` ports on the node that can't be used for network wireguard. A user then need to find a free port that is not in this list to use for his network + +### Supports IPV6 + +| command |body| return| +|---|---|---| +| `zos.network.has_ipv6` | - |`bool`| + +### List Public Interfaces + +| command |body| return| +|---|---|---| +| `zos.network.interfaces` | - |`map[string][]IP` | + +list of node IPs this is a public information. Mainly to show the node yggdrasil IP and the `zos` interface. + +### List Public IPs + +| command |body| return| +|---|---|---| +| `zos.network.list_public_ips` | - |`[]IP` | + +List all user deployed public IPs that are served by this node. + +### Get Public Config + +| command |body| return| +|---|---|---| +| `zos.network.public_config_get` | - |`PublicConfig` | + +Where + +```json +PublicConfig { + "type": "string", // always vlan + "ipv4": "CIDR", + "ipv6": "CIDR", + "gw4": "IP", + "gw6": "IP", + "domain": "string", +} +``` + +returns the node public config or error if not set. If a node has public config +it means it can act like an access node to user private networks + +## Admin + +The next set of commands are ONLY possible to be called by the `farmer` only. + +### List Physical Interfaces + +| command |body| return| +|---|---|---| +| `zos.network.admin.interfaces` | - |`map[string]Interface` | + +Where + +```json +Interface { + "ips": ["ip"], + "mac": "mac-address", +} +``` + +Lists ALL node physical interfaces. +Those interfaces then can be used as an input to `set_public_nic` + +### Get Public Exit NIC + +| command |body| return| +|---|---|---| +| `zos.network.admin.get_public_nic` | - |`ExitDevice` | + +Where + +```json +ExitInterface { + "is_single": "bool", + "is_dual": "bool", + "dual_interface": "name", +} +``` + +returns the interface used by public traffic (for user workloads) + +### Set Public Exit NIC + +| command |body| return| +|---|---|---| +| `zos.network.admin.set_public_nic` | `name` |- | + +name must be one of (free) names returned by `zos.network.admin.interfaces` + +## System + +### Version + +| command |body| return| +|---|---|---| +| `zos.system.version` | - | `{zos: string, zinit: string}` | + +### DMI + +| command |body| return| +|---|---|---| +| `zos.system.dmi` | - | [DMI](https://github.com/threefoldtech/zos/blob/main/pkg/capacity/dmi/dmi.go) | + +### Hypervisor + +| command |body| return| +|---|---|---| +| `zos.system.hypervisor` | - | `string` | + +## GPUs + +### List Gpus + +| command |body| return| +|---|---|---| +| `zos.gpu.list` | - | `[]GPU` | + +Where + +```json +GPU { + "id": "string" + "vendor": "string" + "device": "string", + "contract": "uint64", +} +``` + +Lists all available node GPUs if exist diff --git a/collections/documentation/developers/internals/zos/manual/gateway/fqdn-proxy.md b/collections/documentation/developers/internals/zos/manual/gateway/fqdn-proxy.md new file mode 100644 index 0000000..07a2f8b --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/gateway/fqdn-proxy.md @@ -0,0 +1,5 @@ +# `gateway-fqdn-proxy` type + +This create a proxy with the given fqdn to the given backends. In this case the user then must configure his dns server (i.e name.com) to point to the correct node public IP. + +Full name-proxy workload data is defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/gw_fqdn.go) diff --git a/collections/documentation/developers/internals/zos/manual/gateway/name-proxy.md b/collections/documentation/developers/internals/zos/manual/gateway/name-proxy.md new file mode 100644 index 0000000..2ce40f3 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/gateway/name-proxy.md @@ -0,0 +1,5 @@ +# `gateway-name-proxy` type + +This create a proxy with the given name to the given backends. The `name` of the proxy must be owned by a name contract on the grid. The idea is that a user can reserve a name (i.e `example`). Later he can deploy a gateway work load with name `example` on any gateway node that points to specified backends. The name then is prefix by the gateway name. For example if the gateway domain is `gent0.freefarm.com` then your full QFDN is goint to be called `example.gen0.freefarm.com` + +Full name-proxy workload data is defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/gw_name.go) diff --git a/collections/documentation/developers/internals/zos/manual/ip/readme.md b/collections/documentation/developers/internals/zos/manual/ip/readme.md new file mode 100644 index 0000000..0e6068f --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/ip/readme.md @@ -0,0 +1,11 @@ +# `ip` type +The IP workload type reserves an IP from the available contract IPs list. Which means on contract creation the user must specify number of public IPs it needs to use. The contract then will allocate this number of IPs from the farm and will kept on the contract. + +When the user then add the IP workload to the deployment associated with this contract, each IP workload will pick and link to one IP from the contract. + +In minimal form, `IP` workload does not require any data. But in reality it has 2 flags to pick which kind of public IP do you want + +- `ipv4` (`bool`): pick one from the contract public Ipv4 +- `ipv6` (`bool`): pick an IPv6 over SLAAC. Ipv6 are not reserved with a contract. They are basically free if the farm infrastructure allows Ipv6 over SLAAC. + +Full `IP` workload definition can be found [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/ipv4.go) diff --git a/collections/documentation/developers/internals/zos/manual/manual.md b/collections/documentation/developers/internals/zos/manual/manual.md new file mode 100644 index 0000000..fb926c9 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/manual.md @@ -0,0 +1,187 @@ +

ZOS Manual

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Farm? Network? What are these?](#farm-network-what-are-these) +- [Creating a farm](#creating-a-farm) +- [Interaction](#interaction) +- [Deployment](#deployment) + - [Workload](#workload) + - [Types](#types) + - [API](#api) +- [Raid Controller Configuration](#raid-controller-configuration) + +*** + +## Introduction + +This document explain the usage of `ZOS`. `ZOS` usually pronounced (zero OS), got it's name from the idea of zero configuration. Since after the initial `minimal` configuration which only include which `farm` to join and what `network` (`development`, `testing`, or `production`) the owner of the node does not has to do anything more, and the node work fully autonomous. + +The farmer himself cannot control the node, or access it by any mean. The only way you can interact with a node is via it's public API. + +## Farm? Network? What are these? + +Well, `zos` is built to allow people to run `workloads` around the world this simply is enabled by allowing 3rd party data-centers to run `ZOS` on their hardware. Then a user can then find any nearby `farm` (is what we call a cluster of nodes that belong to the same `farmer`) and then they can choose to deploy capacity on that node/farm. A `farm` can consist of one or more nodes. + +So what is `network`.Well, to allow developers to build and `zos` itself and make it available during the early stages of development for testers and other enthusiastic people to try it out. To allow this we created 3 `networks` +- `development`: This is used mainly by developers to test their work. This is still available for users to deploy their capacity on (for really really cheap prices), but at the same time there is no grantee that it's stable or that data loss or corruption will happen. Also the entire network can be reset with no heads up. +- `testing`: Once new features are developed and well tested on `development` network they are released to `testing` environment. This also available for users to use with a slightly higher price than `development` network. But it's much more stable. In theory this network is stable, there should be no resets of the network, issues on this network usually are not fatal, but partial data loss can still occurs. +- `production`: Well, as the name indicates this is the most stable network (also full price) once new features are fully tested on `testing` network they are released on `production`. + +## Creating a farm + +While this is outside the scope of this document here you are a [link](https://library.threefold.me/info/manual/#/manual__create_farm) + +## Interaction + +`ZOS` provide a simple `API` that can be used to: +- Query node runtime information + - Network information + - Free `wireguard` ports + - Get public configuration + - System version + - Other (check client for details) +- Deployment management (more on that later) + - Create + - Update + - Delete + +Note that `zos` API is available over `rmb` protocol. `rmb` which means `reliable message bus` is a simple messaging protocol that enables peer to peer communication over `yggdrasil` network. Please check [`rmb`](https://github.com/threefoldtech/rmb) for more information. + +Simply put, `RMB` allows 2 entities two communicate securely knowing only their `id` an id is linked to a public key on the blockchain. Hence messages are verifiable via a signature. + +To be able to contact the node directly you need to run +- `yggdrasil` +- `rmb` (correctly configured) + +Once you have those running you can now contact the node over `rmb`. For a reference implementation (function names and parameters) please refer to [RMB documentation](../../rmb/rmb_toc.md) + +Here is a rough example of how low level creation of a deployment is done. + +```go +cl, err := rmb.Default() +if err != nil { + panic(err) +} +``` +then create an instance of the node client +```go +node := client.NewNodeClient(NodeTwinID, cl) +``` +define your deployment object +```go +dl := gridtypes.Deployment{ + Version: Version, + TwinID: Twin, //LocalTwin, + // this contract id must match the one on substrate + Workloads: []gridtypes.Workload{ + network(), // network workload definition + zmount(), // zmount workload definition + publicip(), // public ip definition + zmachine(), // zmachine definition + }, + SignatureRequirement: gridtypes.SignatureRequirement{ + WeightRequired: 1, + Requests: []gridtypes.SignatureRequest{ + { + TwinID: Twin, + Weight: 1, + }, + }, + }, +} +``` +compute hash +```go +hash, err := dl.ChallengeHash() +if err != nil { + panic("failed to create hash") +} +fmt.Printf("Hash: %x\n", hash) +``` +create the contract on `substrate` and get the `contract id` then you can link the deployment to the contract, then send to the node. + +```go +dl.ContractID = 11 // from substrate +ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) +defer cancel() +err = node.DeploymentDeploy(ctx, dl) +if err != nil { + panic(err) +} +``` + +Once the node receives the deployment. It will then fetch the contract (using the contract id) from the node recompute the deployment hash and compare with the one set on the contract. If matches, the node proceeds to process the deployment. + +## Deployment + +A deployment is a set of workloads that are contextually related. Workloads in the same deployment can reference to other workloads in the same deployment. But can't be referenced from another deployment. Well, except the network workload which can be referenced from a different deployment as long it belongs to the same user. + +Workloads has unique IDs (per deployment) that are set by the user, hence he can create multiple workloads then reference to them with the given IDs (`names`) + +For example, a deployment can define +- A private network with id `net` +- A disk with id `data` +- A public IP with id `ip` +- A container that uses: + - The container can mount the disk like `mount: {data: /mount/path}`. + - The container can get assign the public IP to itself like by referencing the IP with id `ip`. + - etc. + +### Workload +Each workload has a type which is associated with some data. So minimal definition of a workload contains: +- `name`: unique per deployment (id) +- `type`: workload type +- `data`: workload data that is proper for the selected type. + +```go + +// Workload struct +type Workload struct { + // Version is version of reservation object. On deployment creation, version must be 0 + // then only workloads that need to be updated must match the version of the deployment object. + // if a deployment update message is sent to a node it does the following: + // - validate deployment version + // - check workloads list, if a version is not matching the new deployment version, the workload is untouched + // - if a workload version is same as deployment, the workload is "updated" + // - if a workload is removed, the workload is deleted. + Version uint32 `json:"version"` + //Name is unique workload name per deployment (required) + Name Name `json:"name"` + // Type of the reservation (container, zdb, vm, etc...) + Type WorkloadType `json:"type"` + // Data is the reservation type arguments. + Data json.RawMessage `json:"data"` + // Metadata is user specific meta attached to deployment, can be used to link this + // deployment to other external systems for automation + Metadata string `json:"metadata"` + //Description human readale description of the workload + Description string `json:"description"` + // Result of reservation, set by the node + Result Result `json:"result"` +} +``` + +### Types +- Virtual machine related + - [`network`](./workload_types.md#network-type) + - [`ip`](./workload_types.md#ip-type) + - [`zmount`](./workload_types.md#zmount-type) + - [`zmachine`](./workload_types.md#zmachine-type) + - [`zlogs`](./workload_types.md#zlogs-type) +- Storage related + - [`zdb`](./workload_types.md#zdb-type) + - [`qsfs`](./workload_types.md#qsfs-type) +- Gateway related + - [`gateway-name-proxy`](./workload_types.md#gateway-name-proxy-type) + - [`gateway-fqdn-proxy`](./workload_types.md#gateway-fqdn-proxy-type) + +### API +Node is always connected to the RMB network with the node `twin`. Means the node is always reachable over RMB with the node `twin-id` as an address. + +The [node client](https://github.com/threefoldtech/zos/blob/main/client/node.go) should have a complete list of all available functions. documentations of the API can be found [here](./api.md) + +## Raid Controller Configuration + +0-OS goal is to expose raw capacity. So it is best to always try to give it access to the most raw access to the disks. In case of raid controllers, the best is to try to set it up in [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD) mode if available. \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/manual/network/readme.md b/collections/documentation/developers/internals/zos/manual/network/readme.md new file mode 100644 index 0000000..3cbc23b --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/network/readme.md @@ -0,0 +1,14 @@ +# `network` type +Private network can span multiple nodes at the same time. Which means workloads (`VMs`) that live (on different node) but part of the same virtual network can still reach each other over this `private` network. + +If one (or more) nodes are `public access nodes` you can also add your personal laptop to the nodes and be able to reach your `VMs` over `wireguard` network. + +In the simplest form a network workload consists of: +- network range +- sub-range available on this node +- private key +- list of peers + - each peer has public key + - sub-range + +Full network definition can be found [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/network.go) diff --git a/collections/documentation/developers/internals/zos/manual/qsfs/readme.md b/collections/documentation/developers/internals/zos/manual/qsfs/readme.md new file mode 100644 index 0000000..1049cdf --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/qsfs/readme.md @@ -0,0 +1,5 @@ +# `qsfs` type + +`qsfs` short for `quantum safe file system` is a FUSE filesystem which aim to be able to support unlimited local storage with remote backend for offload and backup which cannot be broke even by a quantum computer. Please read about it [here](https://github.com/threefoldtech/quantum-storage) + +To create a `qsfs` workload you need to provide the workload type as [here](https://github.com/threefoldtech/zos/blob/main/pkg/qsfsd/qsfs.go) diff --git a/collections/documentation/developers/internals/zos/manual/workload_types.md b/collections/documentation/developers/internals/zos/manual/workload_types.md new file mode 100644 index 0000000..a9e2d85 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/workload_types.md @@ -0,0 +1,108 @@ +

Workload Types

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Virtual Machine](#virtual-machine) + - [`network` type](#network-type) + - [`ip` type](#ip-type) + - [`zmount` type](#zmount-type) + - [`zmachine` type](#zmachine-type) + - [Building your `flist`](#building-your-flist) + - [`zlogs` type](#zlogs-type) +- [Storage](#storage) + - [`zdb` type](#zdb-type) + - [`qsfs` type](#qsfs-type) +- [Gateway](#gateway) + - [`gateway-name-proxy` type](#gateway-name-proxy-type) + - [`gateway-fqdn-proxy` type](#gateway-fqdn-proxy-type) + +## Introduction + +Each workload has a type which is associated with some data. We present here the different types of workload associated with Zero-OS. + +## Virtual Machine + +### `network` type +Private network can span multiple nodes at the same time. Which means workloads (`VMs`) that live (on different node) but part of the same virtual network can still reach each other over this `private` network. + +If one (or more) nodes are `public access nodes` you can also add your personal laptop to the nodes and be able to reach your `VMs` over `wireguard` network. + +In the simplest form a network workload consists of: +- network range +- sub-range available on this node +- private key +- list of peers + - each peer has public key + - sub-range + +Full network definition can be found [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/network.go) + +### `ip` type +The IP workload type reserves an IP from the available contract IPs list. Which means on contract creation the user must specify number of public IPs it needs to use. The contract then will allocate this number of IPs from the farm and will kept on the contract. + +When the user then add the IP workload to the deployment associated with this contract, each IP workload will pick and link to one IP from the contract. + +In minimal form, `IP` workload does not require any data. But in reality it has 2 flags to pick which kind of public IP do you want + +- `ipv4` (`bool`): pick one from the contract public Ipv4 +- `ipv6` (`bool`): pick an IPv6 over SLAAC. Ipv6 are not reserved with a contract. They are basically free if the farm infrastructure allows Ipv6 over SLAAC. + +Full `IP` workload definition can be found [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/ipv4.go) + +### `zmount` type +A `zmount` is a local disk that can be attached directly to a container or a virtual machine. `zmount` only require `size` as input as defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zmount.go) this workload type is only utilized via the `zmachine` workload. + +### `zmachine` type + +`zmachine` is a unified container/virtual machine type. This can be used to start a virtual machine on a `zos` node give the following: +- `flist`, this what provide the base `vm` image or container image. + - the `flist` content is what changes the `zmachine` mode. An `flist` built from a docker image or has files, or executable binaries will run in a container mode. `ZOS` will inject it's own `kernel+initramfs` to run the workload and kick start the defined `flist` `entrypoint` +- private network to join (with assigned IP) +- optional public `ipv4` or `ipv6` +- optional disks. But at least one disk is required in case running `zmachine` in `vm` mode, which is used to hold the `vm` root image. + +For more details on all parameters needed to run a `zmachine` please refer to [`zmachine` data](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zmachine.go) + +#### Building your `flist` + +Please refer to [this document](./manual.md) here about how to build an compatible `zmachine flist` + +### `zlogs` type + +Zlogs is a utility workload that allows you to stream `zmachine` logs to a remote location. + +The `zlogs` workload needs to know what `zmachine` to stream logs of and also the `target` location to stream the logs to. `zlogs` uses internally the [`tailstream`](https://github.com/threefoldtech/tailstream) so it supports any streaming url that is supported by this utility. + +`zlogs` workload runs inside the same private network as the `zmachine` instance. Which means zlogs can stream logs to other `zmachines` that is running inside the same private network (possibly on different nodes). + +For example, you can run [`logagg`](https://github.com/threefoldtech/logagg) which is a web-socket server that can work with `tailstream` web-socket protocol. + +Check `zlogs` configuration [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zlogs.go) + +## Storage + +### `zdb` type +`zdb` is a storage primitives that gives you a persisted key value store over RESP protocol. Please check [`zdb` docs](https://github.com/threefoldtech/0-db) + +Please check [here](https://github.com/threefoldtech/zos/blob/main/pkg/zdb/zdb.go) for workload data. + +### `qsfs` type + +`qsfs` short for `quantum safe file system` is a FUSE filesystem which aim to be able to support unlimited local storage with remote backend for offload and backup which cannot be broke even by a quantum computer. Please read about it [here](https://github.com/threefoldtech/quantum-storage) + +To create a `qsfs` workload you need to provide the workload type as [here](https://github.com/threefoldtech/zos/blob/main/pkg/qsfsd/qsfs.go) + +## Gateway + +### `gateway-name-proxy` type + +This create a proxy with the given name to the given backends. The `name` of the proxy must be owned by a name contract on the grid. The idea is that a user can reserve a name (i.e `example`). Later he can deploy a gateway work load with name `example` on any gateway node that points to specified backends. The name then is prefix by the gateway name. For example if the gateway domain is `gent0.freefarm.com` then your full QFDN is goint to be called `example.gen0.freefarm.com` + +Full name-proxy workload data is defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/gw_name.go) + +### `gateway-fqdn-proxy` type + +This create a proxy with the given fqdn to the given backends. In this case the user then must configure his dns server (i.e name.com) to point to the correct node public IP. + +Full name-proxy workload data is defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/gw_fqdn.go) diff --git a/collections/documentation/developers/internals/zos/manual/zdb/readme.md b/collections/documentation/developers/internals/zos/manual/zdb/readme.md new file mode 100644 index 0000000..45f88f4 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/zdb/readme.md @@ -0,0 +1,4 @@ +# `zdb` type +`zdb` is a storage primitives that gives you a persisted key value store over RESP protocol. Please check [`zdb` docs](https://github.com/threefoldtech/0-db) + +Please check [here](https://github.com/threefoldtech/zos/blob/main/pkg/zdb/zdb.go) for workload data. diff --git a/collections/documentation/developers/internals/zos/manual/zlogs/readme.md b/collections/documentation/developers/internals/zos/manual/zlogs/readme.md new file mode 100644 index 0000000..b77ade4 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/zlogs/readme.md @@ -0,0 +1,11 @@ +# `zlogs` type + +Zlogs is a utility workload that allows you to stream `zmachine` logs to a remote location. + +The `zlogs` workload needs to know what `zmachine` to stream logs of and also the `target` location to stream the logs to. `zlogs` uses internally the [`tailstream`](https://github.com/threefoldtech/tailstream) so it supports any streaming url that is supported by this utility. + +`zlogs` workload runs inside the same private network as the `zmachine` instance. Which means zlogs can stream logs to other `zmachines` that is running inside the same private network (possibly on different nodes). + +For example, you can run [`logagg`](https://github.com/threefoldtech/logagg) which is a web-socket server that can work with `tailstream` web-socket protocol. + +Check `zlogs` configuration [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zlogs.go) diff --git a/collections/documentation/developers/internals/zos/manual/zmachine/cloud-console.md b/collections/documentation/developers/internals/zos/manual/zmachine/cloud-console.md new file mode 100644 index 0000000..f1e0324 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/zmachine/cloud-console.md @@ -0,0 +1,14 @@ +# Cloud console + +- `cloud-console` is a tool to view machine logging and interact with the machine you have deployed +- It always runs on the machine's private network ip and port number equla to `20000 +last octect` of machine private IP +- For example if the machine ip is `10.20.2.2/24` this means + - `cloud-console` is running on `10.20.2.1:20002` +- For the cloud-console to run we need to start the cloud-hypervisor with option "--serial pty" instead of tty, this allows us to interact with the vm from another process `cloud-console` in our case +- To be able to connect to the web console you should first start wireguard to connect to the private network + +``` +wg-quick up wireguard.conf +``` + +- Then go to your browser with the network router IP `10.20.2.1:20002` diff --git a/collections/documentation/developers/internals/zos/manual/zmachine/readme.md b/collections/documentation/developers/internals/zos/manual/zmachine/readme.md new file mode 100644 index 0000000..e94a0e7 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/zmachine/readme.md @@ -0,0 +1,13 @@ +# `zmachine` type + +`zmachine` is a unified container/virtual machine type. This can be used to start a virtual machine on a `zos` node give the following: +- `flist`, this what provide the base `vm` image or container image. + - the `flist` content is what changes the `zmachine` mode. An `flist` built from a docker image or has files, or executable binaries will run in a container mode. `ZOS` will inject it's own `kernel+initramfs` to run the workload and kick start the defined `flist` `entrypoint` +- private network to join (with assigned IP) +- optional public `ipv4` or `ipv6` +- optional disks. But at least one disk is required in case running `zmachine` in `vm` mode, which is used to hold the `vm` root image. + +For more details on all parameters needed to run a `zmachine` please refer to [`zmachine` data](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zmachine.go) + +# Building your `flist`. +Please refer to [this document](../manual.md) here about how to build an compatible `zmachine flist` diff --git a/collections/documentation/developers/internals/zos/manual/zmachine/zmachine.md b/collections/documentation/developers/internals/zos/manual/zmachine/zmachine.md new file mode 100644 index 0000000..ddbd102 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/zmachine/zmachine.md @@ -0,0 +1,410 @@ +# Zmachine + +A `Zmachine` is an instance of virtual compute capacity. There are 2 kinds of Zmachines. +One is a `VM`, standard in cloud environments. Next to this it can also be a `container`. +On the Zos level, both of these are implemented as virtual machines. Depending on +the context, it will be considered to be either a VM or a container. In either +scenario, the `Zmachine` is started from an `Flist`. + +> Note, both VM and Container on ZOS are actually served as Virtual Machines. The +only difference is that if you are running in VM mode, you only need to provide a **raw** +disk image (image.raw) in your flist. + +## Container + +A container is meant to host `microservice`. The `microservice` architecture generally +dictates that each service should be run in it's own container (therefore providing +a level of isolation), and communicate with other containers it depends on over the +network. + +Similar to docker. In Zos, a container is actually also run in a virtualized environment. +Similar to containers, some setup is done on behalf of the user. After setup this is done, +the users `entrypoint` is started. + +It should be noted that a container has no control over the kernel +used to run it, if this is required, a `VM` should be used instead. Furthermore, +a container should ideally only have 1 process running. A container can be a single +binary, or a complete filesystem. In general, the first should be preferred, and +if you need the latter, it might be an indication that you actually want a `VM`. + +For containers, the network setup will be created for you. Your init process can +assume that it will be fully set up (according to the config you provided) by the +time it is started. Mountpoints will also be setup for you. The environment variables +passed will be available inside the container. + +## VM + +In container mode, zos provide a minimal kernel that is used to run a light weight VM +and then run your app from your flist. If you need control over the kernel you can actually +still provide it inside the flist as follows: + +- /boot/vmlinuz +- /boot/initrd.img [optional] + +**NOTE**: the vmlinuz MUST be an EFI kernel (not compressed) if building your own kernel, or you can use the [extract-vmlinux](https://github.com/torvalds/linux/blob/master/scripts/extract-vmlinux) script to extract the EFI kernel. To test if your kernel is a valid elf kernel run command +`readelf -n ` + +Any of those files can be a symlink to another file in the flist. + +If ZOS found the `/boot/vmlinuz` file, it will use this with the initrd.img if also exists. otherwise zos will use the built-in minimal kernel and run in `container` mode. + +### Building an ubuntu VM flist + +This is a guide to help you build a working VM flist. + +This guide is for ubuntu `jammy` + +prepare rootfs + +```bash +mkdir ubuntu:jammy +``` + +bootstrap ubuntu + +```bash +sudo debootstrap jammy ubuntu:jammy http://archive.ubuntu.com/ubuntu +``` + +this will create and download the basic rootfs for ubuntu jammy in the directory `ubuntu:jammy`. +After its done we can then chroot into this directory to continue installing the necessary packages needed and configure +few things. + +> I am using script called `arch-chroot` which is available by default on arch but you can also install on ubuntu to continue +the following steps + +```bash +sudo arch-chroot ubuntu:jammy +``` + +> This script (similar to the `chroot` command) switch root to that given directory but also takes care of mounting /dev /sys, etc.. for you +and clean it up on exit. + +Next just remove this link and re-create the file with a valid name to be able to continue + +```bash +# make sure to set the path correctly +export PATH=/usr/local/sbin/:/usr/local/bin/:/usr/sbin/:/usr/bin/:/sbin:/bin + +rm /etc/resolv.conf +echo 'nameserver 1.1.1.1' > /etc/resolv.conf +``` + +Install cloud-init + +```bash +apt-get update +apt-get install cloud-init openssh-server curl +# to make sure we have clean setup +cloud-init clean +``` + +Also really important that we install a kernel + +```bash +apt-get install linux-modules-extra-5.15.0-25-generic +``` + +> I choose this package because it will also install extra modules for us and a generic kernel + +Next make sure that virtiofs is part of the initramfs image + +```bash +echo 'fs-virtiofs' >> /etc/initramfs-tools/modules +update-initramfs -c -k all +``` + +clean up cache + +```bash +apt-get clean +``` + +Last thing we do inside the container before we actually upload the flist +is to make sure the kernel is in the correct format + +This step does not require that we stay in the chroot so hit `ctr+d` or type `exit` + +you should be out of the arch-chroot now + +```bash +curl -O https://raw.githubusercontent.com/torvalds/linux/master/scripts/extract-vmlinux +chmod +x extract-vmlinux + +sudo ./extract-vmlinux ubuntu:jammy/boot/vmlinuz | sudo tee ubuntu:jammy/boot/vmlinuz-5.15.0-25-generic.elf > /dev/null +# then replace original kernel +sudo mv ubuntu:jammy/boot/vmlinuz-5.15.0-25-generic.elf ubuntu:jammy/boot/vmlinuz-5.15.0-25-generic +``` + +To verify you can do this: + +```bash +ls -l ubuntu:jammy/boot +``` + +and it should show something like + +```bash +total 101476 +-rw-r--r-- 1 root root 260489 Mar 30 2022 config-5.15.0-25-generic +drwxr-xr-x 1 root root 54 Jun 28 15:35 grub +lrwxrwxrwx 1 root root 28 Jun 28 15:35 initrd.img -> initrd.img-5.15.0-25-generic +-rw-r--r-- 1 root root 41392462 Jun 28 15:39 initrd.img-5.15.0-25-generic +lrwxrwxrwx 1 root root 28 Jun 28 15:35 initrd.img.old -> initrd.img-5.15.0-25-generic +-rw------- 1 root root 6246119 Mar 30 2022 System.map-5.15.0-25-generic +lrwxrwxrwx 1 root root 25 Jun 28 15:35 vmlinuz -> vmlinuz-5.15.0-25-generic +-rw-r--r-- 1 root root 55988436 Jun 28 15:50 vmlinuz-5.15.0-25-generic +lrwxrwxrwx 1 root root 25 Jun 28 15:35 vmlinuz.old -> vmlinuz-5.15.0-25-generic +``` + +Now package the tar for upload + +```bash +sudo rm -rf ubuntu:jammy/dev/* +sudo tar -czf ubuntu-jammy.tar.gz -C ubuntu:jammy . +``` + +Upload to the hub, and use it to create a Zmachine + +## VM Image [deprecated] + +In a VM image mode, you run your own operating system (for now only linux is supported) +The image provided must be + +- EFI bootable +- Cloud-init enabled. + +You can find later in this document how to create your own bootable image. + +A VM reservations must also have at least 1 volume, as the boot image +will be copied to this volume. The size of the root disk will be the size of this +volume. + +The image used to the boot the VM must has cloud-init enabled on boot. Cloud-init +receive its config over the NoCloud source. This takes care of setting up networking, hostname +, root authorized_keys. + +> This method of building a full VM from a raw image is not recommended and will get phased out in +the future. It's better to use either the container method to run containerized Apps. Another option +is to run your own kernel from an flist (explained below) + +### Expected Flist structure + +An `Zmachine` will be considered a `VM` if it contains an `/image.raw` file. + +`/image.raw` is used as "boot disk". This `/image.raw` is copied to the first attached +volume of the `VM`. Cloud-init will take care of resizing the filesystem on the image +to take the full disk size allocated in the deployment. + +Note if the `image.raw` size is larger than the allocated disk. the workload for the VM +will fail. + +### Expected Flist structure + +Any Flist will boot as a container, **UNLESS** is has a `/image.raw` file. There is +no need to specify a kernel yourself (it will be provided). + +### Known issues + +- We need to do proper performance testing for `virtio-fs`. There seems to be some + suboptimal performance right now. +- It's not currently possible to get container logs. +- TODO: more testing + +## Creating VM image + +This is a simple tutorial on how to create your own VM image +> Note: Please consider checking the official vm images repo on the hub before building your own +image. this can save you a lot of time (and network traffic) here + +### Use one of ubuntu cloud-images + +If the ubuntu images in the official repo are not enough, you can simply upload one of the official images as follows + +- Visit +- Select the version you want (let's assume bionic) +- Go to bionic, then click on current +- download the amd64.img file like this one +- This is a `Qcow2` image, this is not supported by zos. So we need to convert this to a raw disk image using the following command + +```bash +qemu-img convert -p -f qcow2 -O raw bionic-server-cloudimg-amd64.img image.raw +``` + +- now we have the raw image (image.raw) time to compress and upload to the hub + +```bash +tar -czf ubuntu-18.04-lts.tar.gz image.raw +``` + +- now visit the hub and login or create your own account, then click on upload my file button +- Select the newly created tar.gz file +- Now you should be able to use this flist to create Zmachine workloads + +### Create an image from scratch + +This is an advanced scenario and you will require some prior knowledge of how to create local VMs and how to prepare the installation medium, +and installing your OS of choice. + +Before we continue you need to have some hypervisor that you can use locally. Libvirt/Qemu are good choices. Hence we skip on what you need to do to install and configure your system correctly not how to create the VM + +#### VM Requirements + +Create a VM with enough CPU and Memory to handle the installation process note that this does not relate on what your choices for CPU and Memory are going to be for the actual VM running on the grid. + +We going to install arch linux image. So we will have to create a VM with + +- Disk of about 2GB (note this also is not related to the final VM running on the grid, on installation the OS image will expand to use the entire allocated disk attached to the VM eventually). The smaller the disk is better this can be different for each OS. +- Add the arch installation iso or any other installation medium + +#### Boot the VM (locally) + +Boot the VM to start installation. The boot must support EFI booting because ZOS only support images with esp partition. So make sure that both your hypervisor and boot/installation medium supports this. + +For example in Libvirt Manager make sure you are using the right firmware (UEFI) + +#### Installation + +We going to follow the installation manual for Arch linux but with slight tweaks: + +- Make sure VM is booted with UEFI, run `efivar -l` command see if you get any output. Otherwise the machine is probably booted in BIOS mode. +- With `parted` create 2 partitions + - an esp (boot) partition of 100M + - a root partition that spans the remaining of the disk + +```bash +DISK=/dev/vda +# First, create a gpt partition table +parted $DISK mklabel gpt +# Secondly, create the esp partition of 100M +parted $DISK mkpart primary 1 100M +# Mark first part as esp +parted $DISK set 1 esp on +# Use the remaining part as root that takes the remaining +# space on disk +parted $DISK mkpart primary 100M 100% + +# To verify everything is correct do +parted $DISK print + +# this should 2 partitions the first one is slightly less that 100M and has flags (boot, esp), the second one takes the remaining space +``` + +We need to format the partitions as follows: + +```bash +# this one has to be vfat of size 32 as follows +mkfs.vfat -F 32 /dev/vda1 +# This one can be anything based on your preference as long as it's supported by you OS kernel. we going with ext4 in this tutorial +mkfs.ext4 -L cloud-root /dev/vda2 +``` + +Note the label assigned to the /dev/vda2 (root) partition this can be anything but it's needed to configure the boot later when installing the boot loader. Otherwise you can use the partition UUID. + +Next, we need to mount the disks + +```bash +mount /dev/vda2 /mnt +mkdir /mnt/boot +mount /dev/vda1 /mnt/boot +``` + +After disks are mounted as above, we need to start the installation + +```bash +pacstrap /mnt base linux linux-firmware vim openssh cloud-init cloud-guest-utils +``` + +This will install basic arch linux but will also include cloud-init, cloud-guest-utils, openssh, and vim for convenience. + +Following the installation guid to generate fstab file + +``` +genfstab -U /mnt >> /mnt/etc/fstab +``` + +And arch-chroot into /mnt `arch-chroot /mnt` to continue the setup. please follow all steps in the installation guide to set timezone, and locales as needed. + +- You don't have to set the hostname, this will be setup later on zos when the VM is deployed via cloud-init +- let's drop the root password all together since login to the VM over ssh will require key authentication only, you can do this by running + +```bash +passwd -d root +``` + +We make sure required services are enabled + +```bash +systemctl enable sshd +systemctl enable systemd-networkd +systemctl enable systemd-resolved +systemctl enable cloud-init +systemctl enable cloud-final + +# make sure we using resolved +rm /etc/resolv.conf +ln -s /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf +``` + +Finally installing the boot loader as follows +> Only grub2 has been tested and known to work. + +```bash +pacman -S grub +``` + +Then we need to install grub + +``` +grub-install --target=x86_64-efi --efi-directory=esp --removable +``` + +Change default values as follows + +``` +vim /etc/default/grub +``` + +And make sure to change `GRUB_CMDLINE_LINUX_DEFAULT` as follows + +``` +GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 console=tty console=ttyS0" +``` + +> Note: we removed the `quiet` and add the console flags. + +Also set the `GRUB_TIMEOUT` to 0 for a faster boot + +``` +GRUB_TIMEOUT=0 +``` + +Then finally generating the config + +``` +grub-mkconfig -o /boot/grub/grub.cfg +``` + +Last thing we need to do is clean up + +- pacman cache by running `rm -rf /var/cache/pacman/pkg` +- cloud-init state by running `cloud-init clean` + +Click `Ctrl+D` to exit the change root, then power off by running `poweroff` command. + +> NOTE: if you booted the machine again you always need to do `cloud-init clean` as long as it's not yet deployed on ZOS this to make sure the image has a clean state +> +#### Converting the disk + +Based on your hypervisor of choice you might need to convert the disk to a `raw` image same way we did with ubuntu image. + +```bash +# this is an optional step in case you used a qcoq disk for the installation. If the disk is already `raw` you can skip this +qemu-img convert -p -f qcow2 -O raw /path/to/vm/disk.img image.raw +``` + +Compress and tar the image.raw as before, and upload to the hub. + +``` +tar -czf arch-linux.tar.gz image.raw +``` diff --git a/collections/documentation/developers/internals/zos/manual/zmount/readme.md b/collections/documentation/developers/internals/zos/manual/zmount/readme.md new file mode 100644 index 0000000..e7de260 --- /dev/null +++ b/collections/documentation/developers/internals/zos/manual/zmount/readme.md @@ -0,0 +1,2 @@ +# `zmount` type +A `zmount` is a local disk that can be attached directly to a container or a virtual machine. `zmount` only require `size` as input as defined [here](https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/zmount.go) this workload type is only utilized via the `zmachine` workload. diff --git a/collections/documentation/developers/internals/zos/performance/cpubench.md b/collections/documentation/developers/internals/zos/performance/cpubench.md new file mode 100644 index 0000000..2d3f8a7 --- /dev/null +++ b/collections/documentation/developers/internals/zos/performance/cpubench.md @@ -0,0 +1,85 @@ +

CPUBenchmark

+ +

Table of Contents

+ +- [Overview](#overview) +- [Configuration](#configuration) +- [Details](#details) +- [Result Sample](#result-sample) +- [Result Explanation](#result-explanation) + +*** + +## Overview + +The `CPUBenchmark` task is designed to measure the performance of the CPU. it utilizes the [cpu-benchmark-simple](https://github.com/threefoldtech/cpu-benchmark-simple) tool and includes a zos stub to gather the number of workloads running on the node. + +## Configuration + +- Name: `cpu-benchmark` +- Schedule: 4 times a day +- Jitter: 0 + +## Details + +- The benchmark simply runs a `CRC64` computation task, calculates the time spent in the computation and reports it in `seconds`. +- The computation is performed in both single-threaded and multi-threaded scenarios. +- Lower time = better performance: for a single threaded benchmark, a lower execution time indicates better performance. + +## Result Sample + +```json +{ + "description": "Measures the performance of the node CPU by reporting the time spent of computing a task in seconds.", + "name": "cpu-benchmark", + "result": { + "multi": 1.105, + "single": 1.135, + "threads": 1, + "workloads": 0 + }, + "timestamp": 1700504403 +} +``` + +## Result Explanation + +The best way to know what's a good or bad value is by testing and comparing different hardware. +Here are some examples: + +**1x Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz** (Q3'2017) + +``` +Single thread score: 0.777 +Multi threads score: 13.345 [16 threads] +``` + +**1x Intel(R) Pentium(R) CPU G4400 @ 3.30GHz** (Q3'2015) + +``` +Single thread score: 1.028 +Multi threads score: 2.089 [2 threads] +``` + +**1x Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz** (Q2'2012) + +``` +Single thread score: 2.943 +Multi threads score: 12.956 [4 threads] +``` + +**2x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz** (Q1'2012) + +``` +Single thread score: 1.298 +Multi threads score: 44.090 [32 threads] +``` + +**2x Intel(R) Xeon(R) CPU L5640 @ 2.27GHz** (Q1'2010) + +``` +Single thread score: 2.504 +Multi threads score: 72.452 [24 threads] +``` + +As you can see, the more recent the CPU is, the faster it is, but for a same launch period, you can see Xeon way better than regular/desktop CPU. You have to take in account the amount of threads and the time per threads. \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/performance/healthcheck.md b/collections/documentation/developers/internals/zos/performance/healthcheck.md new file mode 100644 index 0000000..a41c059 --- /dev/null +++ b/collections/documentation/developers/internals/zos/performance/healthcheck.md @@ -0,0 +1,38 @@ +

Health Check

+ +

Table of Contents

+ +- [Overview](#overview) +- [Configuration](#configuration) +- [Details](#details) +- [Result Sample](#result-sample) + +*** + +## Overview + +Health check task executes some checks over ZOS components to determine if the node is in a usable state or not and set flags for the Power Daemon to stop uptime reports if the node is unusable. + +## Configuration + +- Name: `healthcheck` +- Schedule: Every 20 mins. + +## Details + +- Check if the node cache disk is usable or not by trying to write some data to it. If it failed, it set the Readonly flag. + +## Result Sample + +```json +{ + "description": "health check task runs multiple checks to ensure the node is in a usable state and set flags for the power daemon to stop reporting uptime if it is not usable", + "name": "healthcheck", + "result": { + "cache": [ + "failed to write to cache: open /var/cache/healthcheck: operation not permitted" + ] + }, + "timestamp": 1701599580 +} +``` \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/performance/iperf.md b/collections/documentation/developers/internals/zos/performance/iperf.md new file mode 100644 index 0000000..d7a36dc --- /dev/null +++ b/collections/documentation/developers/internals/zos/performance/iperf.md @@ -0,0 +1,80 @@ +

IPerf

+ +

Table of Contents

+ +- [Overview](#overview) +- [Configuration](#configuration) +- [Details](#details) +- [Result Sample](#result-sample) + +*** + +## Overview + +The `iperf` package is designed to facilitate network performance testing using the `iperf3` tool. with both UDP and TCP over IPv4 and IPv6. + +## Configuration + +- Name: `iperf` +- Schedule: 4 times a day +- Jitter: 20 min + +## Details + +- The package using the iperf binary to examine network performance under different conditions. +- It randomly fetch PublicConfig data for randomly public nodes on the chain + all public node from free farm. These nodes serve as the targets for the iperf tests. +- For each node, it run the test with 4 times. through (UDP/TCP) using both node IPs (v4/v6) +- result will be a slice of all public node report (4 for each) each one will include: + ``` + UploadSpeed: Upload speed (in bits per second). + DownloadSpeed: Download speed (in bits per second). + NodeID: ID of the node where the test was conducted. + NodeIpv4: IPv4 address of the node. + TestType: Type of the test (TCP or UDP). + Error: Any error encountered during the test. + CpuReport: CPU utilization report (in percentage). + ``` + +## Result Sample + +```json +{ + "description": "Test public nodes network performance with both UDP and TCP over IPv4 and IPv6", + "name": "iperf", + "result": [ + { + "cpu_report": { + "host_system": 2.4433388913571044, + "host_total": 3.542919199613454, + "host_user": 1.0996094859359695, + "remote_system": 0.24430594945859846, + "remote_total": 0.3854457128784448, + "remote_user": 0.14115962407747246 + }, + "download_speed": 1041274.4792242317, + "error": "", + "node_id": 124, + "node_ip": "88.99.30.200", + "test_type": "tcp", + "upload_speed": 1048549.3668460822 + }, + { + "cpu_report": { + "host_system": 0, + "host_total": 0, + "host_user": 0, + "remote_system": 0, + "remote_total": 0, + "remote_user": 0 + }, + "download_speed": 0, + "error": "unable to connect to server - server may have stopped running or use a different port, firewall issue, etc.: Network unreachable", + "node_id": 124, + "node_ip": "2a01:4f8:10a:710::2", + "test_type": "tcp", + "upload_speed": 0 + } + ], + "timestamp": 1700507035 +} +``` \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/performance/performance.md b/collections/documentation/developers/internals/zos/performance/performance.md new file mode 100644 index 0000000..7f3ea76 --- /dev/null +++ b/collections/documentation/developers/internals/zos/performance/performance.md @@ -0,0 +1,90 @@ +

Performance Monitor Package

+ +

Table of Contents

+ +- [Overview](#overview) +- [Flow](#flow) +- [Node Initialization Check](#node-initialization-check) +- [Scheduling](#scheduling) +- [RMB Commands](#rmb-commands) +- [Caching](#caching) +- [Registered Tests](#registered-tests) +- [Test Suite](#test-suite) + +*** + +## Overview + +The `perf` package is a performance monitor in `zos` nodes. it schedules tasks, cache their results and allows retrieval of these results through `RMB` calls. + +## Flow + +1. The `perf` monitor is started by the `noded` service in zos. +2. Tasks are registered with a schedule in the new monitor. +3. A bus handler is opened to allow result retrieval. + +## Node Initialization Check + +To ensure that the node always has a test result available, a check is performed on node startup for all the registered tasks, if a task doesn't have any stored result, it will run immediately without waiting for the next scheduled time. + +## Scheduling + +- Tasks are scheduled using a 6 fields cron format. this format provides flexibility to define time, allowing running tasks periodically or at specific time. + +- Each task has a jitter which is the maximum number of seconds the task could sleep before it runs, this happens to prevent all tests ending up running at exactly the same time. So, for example, if a task is scheduled to run at `06:00` and its jitter is `10`, it is expected to run anywhere between `06:00` and `06:10`. + +## RMB Commands + +- `zos.perf.get`: + + - Payload: a payload type that contains the name of the test + + ```go + type Payload struct { + Name string + } + ``` + + Possible values: + + - `"public-ip-validation"` + - `"cpu-benchmark"` + - `"iperf"` + + - Return: a single task result. + + - Possible Error: `ErrResultNotFound` if no result is stored for the given task. + +- `zos.perf.get_all`: + + - Return: all stored results + +The rmb direct client can be used to call these commands. check the [example](https://github.com/threefoldtech/tfgrid-sdk-go/blob/development/rmb-sdk-go/examples/rpc_client/main.go) + +## Caching + +Results are stored in a Redis server running on the node. + +The key in redis is the name of the task prefixed with the word `perf`. +The value is an instance of `TaskResult` struct contains: + +- Name of the task +- Timestamp when the task was run +- A brief description about what the task do +- The actual returned result from the task + +Notes: + +- Storing results by a key ensures each new result overrides the old one, so there is always a single result for each task. +- Storing results prefixed with `perf` eases retrieving all the results stored by this module. + +## Registered Tests + +- [Public IP Validation](./publicips.md) +- [CPUBenchmark](./cpubench.md) +- [IPerf](./iperf.md) +- [Health Check](./healthcheck.md) + +## Test Suite + +Go to [this link](https://app.testlodge.com/a/26076/projects/40893/suites/234919) for a test suite covering the test cases for the performance testing. \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/performance/publicips.md b/collections/documentation/developers/internals/zos/performance/publicips.md new file mode 100644 index 0000000..0549512 --- /dev/null +++ b/collections/documentation/developers/internals/zos/performance/publicips.md @@ -0,0 +1,55 @@ +

Public IPs Validation Task

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Configuration](#configuration) +- [Task Details](#task-details) +- [Result](#result) + - [Result Sample](#result-sample) + +*** + +## Introduction + +The goal of the task is to make sure public IPs assigned to a farm are valid and can be assigned to deployments. + +## Configuration + +- Name: `public-ip-validation` +- Schedule: 4 times a day +- Jitter: 10 min + +## Task Details + +- The task depends on `Networkd` ensuring the proper test network setup is correct and will fail if it wasn't setup properly. The network setup consists of a test Namespace and a MacVLAN as part of it. All steps are done inside the test Namespace. +- Decide if the node should run the task or another one in the farm based on the node ID. The node with the least ID and with power target as up should run it. The other will log why they shouldn't run the task and return with no errors. This is done to ensure only one node runs the task to avoid problems like assigning the same IP. +- Get public IPs set on the farm. +- Remove all IPs and routes added to the test MacVLAN to ensure any remaining from previous task run are removed. +- Skip IPs that are assigned to a contract. +- Set the MacVLAN link up. +- Iterate over all public IPs and add them with the provided gateway to the MacVLAN. +- Validate the IP by querying an external source that return the public IP for the node. +- If the public IP returned matches the IP added in the link, then the IP is valid. Otherwise, it is invalid. +- Remove all IPs and routes between each IP to make them available for other deployments. +- After iterating over all public IPs, set the link down. + +## Result + +The task only returns a single map of String (IP) to IPReport. The report consists of the IP state (valid, invalid or skipped) and the reason for the state. + +### Result Sample + +```json +{ + "description": "Runs on the least NodeID node in a farm to validate all its IPs.", + "name": "public-ip-validation", + "result": { + "185.206.122.29/24": { + "reason": "public ip or gateway data are not valid", + "state": "invalid" + } + }, + "timestamp": 1700504421 +} +``` \ No newline at end of file diff --git a/collections/documentation/developers/internals/zos/readme.md b/collections/documentation/developers/internals/zos/readme.md new file mode 100644 index 0000000..33d16df --- /dev/null +++ b/collections/documentation/developers/internals/zos/readme.md @@ -0,0 +1,28 @@ +

Zero-OS

+ +

Table of Contents

+ +- [Manual](./manual/manual.md) +- [Workload Types](./manual/workload_types.md) +- [Internal Modules](./internals/internals.md) + - [Identity](./internals/identity/index.md) + - [Node ID Generation](./internals/identity/identity.md) + - [Node Upgrade](./internals/identity/upgrade.md) + - [Node](./internals/node/index.md) + - [Storage](./internals/storage/index.md) + - [Network](./internals/network/index.md) + - [Introduction](./internals/network/introduction.md) + - [Definitions](./internals/network/definitions.md) + - [Mesh](./internals/network/mesh.md) + - [Setup](./internals/network/setup_farm_network.md) + - [Flist](./internals/flist/index.md) + - [Container](./internals/container/index.md) + - [VM](./internals/vmd/index.md) + - [Provision](./internals/provision/index.md) +- [Capacity](./internals/capacity.md) +- [Performance Monitor Package](./performance/performance.md) + - [Public IPs Validation Task](./performance/publicips.md) + - [CPUBenchmark](./performance/cpubench.md) + - [IPerf](./performance/iperf.md) + - [Health Check](./performance/healthcheck.md) +- [API](./manual/api.md) diff --git a/collections/documentation/developers/internals/zos/release/readme.md b/collections/documentation/developers/internals/zos/release/readme.md new file mode 100644 index 0000000..6af1a51 --- /dev/null +++ b/collections/documentation/developers/internals/zos/release/readme.md @@ -0,0 +1,31 @@ +# Releases of Zero-OS + +We use a simple pipeline release workflow. Building and file distribution are made using GitHub Actions. +Usable files are available on the [Zero-OS Hub](https://hub.grid.tf/tf-zos). + +This pipeline is made to match the 3 different type of running mode of 0-OS. For more information head to the [upgrade documentation](../identity/upgrade.md). + +## Development build + +On a push to main branch on the zos repository, a new development build is triggered. If the build succeed, +binaries are packed into an flist and uploaded to the [tf-autobuilder](https://hub.grid.tf/tf-autobuilder) repository of the hub. + +This flist is then promoted into the [tf-zos](https://hub.grid.tf/tf-zos) repository of the hub and a symlink to this latest build is made (`tf-autobuilder/zos:development-3:latest.flist`) + +## Releases +We create 3 types of releases: +- QA release, in this release the version is suffixed by `qa` for example `v3.5.0-qa1`. +- RC release, in this release the version is suffixed by `rc` for example `v3.5.0-rc2`. +- Main release, is this release the version has no suffix, for example `v3.5.0` + +The release cycle goes like this: +- As mentioned before devnet is updated the moment new code is available on `main` branch. Since the `dev` release is auto linked to the latest `flist` on the hub. Nodes on devnet will auto update to the latest available build. +- Creating a `qa` release, will not not trigger the same behavior on `qa` net, same for both testnet and mainnet. Instead a workflow must be triggered, this is only to make sure 100% that an update is needed. +- Once the build of the release is available, a [deploy](../../.github/workflows/grid-deploy.yaml) workflow needed to be triggered with the right version to deploy on the proper network. + - The work flow all what it does is linking the right version under the hub [tf-zos](https://hub.grid.tf/tf-zos) repo + +> The `deploy` flow is rarely used, the on chain update is also available. By setting the right version on tfchain, the link on the hub is auto-updated and hence the deploy workflow won't be needed to be triggered. Although we have it now as a safety net in case something goes wrong (chain is broken) and we need to force a specific version on ZOS. + +- Development: https://playground.hub.grid.tf/tf-autobuilder/zos:development-3:latest.flist +- Testing: https://playground.hub.grid.tf/tf-zos/zos:testing-3:latest.flist +- Production: https://playground.hub.grid.tf/tf-zos/zos:production-3:latest.flist diff --git a/collections/documentation/developers/javascript/grid3_javascript_capacity_planning.md b/collections/documentation/developers/javascript/grid3_javascript_capacity_planning.md new file mode 100644 index 0000000..f1db3cc --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_capacity_planning.md @@ -0,0 +1,110 @@ +

Capacity Planning

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) + +*** + +## Introduction + +It's almost the same as in [deploying a single VM](../javascript/grid3_javascript_vm.md) the only difference is you can automate the choice of the node to deploy on using code. We now support `FilterOptions` to filter nodes based on specific criteria e.g the node resources (CRU, SRU, HRU, MRU) or being part of a specific farm or located in some country, or being a gateway or not + +## Example + +```ts +FilterOptions: { accessNodeV4?: boolean; accessNodeV6?: boolean; city?: string; country?: string; cru?: number; hru?: number; mru?: number; sru?: number; farmId?: number; farmName?: string; gateway?: boolean; publicIPs?: boolean; certified?: boolean; dedicated?: boolean; availableFor?: number; page?: number;} +``` + +```ts +import { DiskModel, FilterOptions, MachineModel, MachinesModel, NetworkModel } from "../src"; +import { config, getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + + // create network Object + const n = new NetworkModel(); + n.name = "dynamictest"; + n.ip_range = "10.249.0.0/16"; + + // create disk Object + const disk = new DiskModel(); + disk.name = "dynamicDisk"; + disk.size = 8; + disk.mountpoint = "/testdisk"; + + const vmQueryOptions: FilterOptions = { + cru: 1, + mru: 2, // GB + sru: 9, + country: "Belgium", + availableFor: grid3.twinId, + }; + + // create vm node Object + const vm = new MachineModel(); + vm.name = "testvm"; + vm.node_id = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; // TODO: allow random choise + vm.disks = [disk]; + vm.public_ip = false; + vm.planetary = true; + vm.cpu = 1; + vm.memory = 1024 * 2; + vm.rootfs_size = 0; + vm.flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"; + vm.entrypoint = "/sbin/zinit init"; + vm.env = { + SSH_KEY: config.ssh_key, + }; + + // create VMs Object + const vms = new MachinesModel(); + vms.name = "dynamicVMS"; + vms.network = n; + vms.machines = [vm]; + vms.metadata = "{'testVMs': true}"; + vms.description = "test deploying VMs via ts grid3 client"; + + // deploy vms + const res = await grid3.machines.deploy(vms); + log(res); + + // get the deployment + const l = await grid3.machines.getObj(vms.name); + log(l); + + // // delete + // const d = await grid3.machines.delete({ name: vms.name }); + // log(d); + + await grid3.disconnect(); +} + +main(); +``` + +In this example you can notice the criteria for `server1` + +```typescript +const server1_options: FilterOptions = { + cru: 1, + mru: 2, // GB + sru: 9, + country: "Belgium", + availableFor: grid3.twinId, +}; + +``` + +Here we want all the nodes with `CRU:1`, `MRU:2`, `SRU:9`, located in `Belgium` and available for me (not rented for someone else). + +> Note some libraries allow reverse lookup of countries codes by name e.g [i18n-iso-countries](https://www.npmjs.com/package/i18n-iso-countries) + +and then in the MachineModel, we specified the `node_id` to be the first value of our filteration + +```typescript +vm.node_id = +(await nodes.filterNodes(server1_options))[0].nodeId; +``` diff --git a/collections/documentation/developers/javascript/grid3_javascript_caprover.md b/collections/documentation/developers/javascript/grid3_javascript_caprover.md new file mode 100644 index 0000000..1b1e1e3 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_caprover.md @@ -0,0 +1,232 @@ +

Deploy CapRover

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Leader Node](#leader-node) + - [Code Example](#code-example) + - [Environment Variables](#environment-variables) +- [Worker Node](#worker-node) + - [Code Example](#code-example-1) + - [Environment Variables](#environment-variables-1) +- [Questions and Feedback](#questions-and-feedback) + +*** + +## Introduction + +In this section, we show how to deploy CapRover with the Javascript client. + +This deployment is very similar to what we have in the section [Deploy a VM](./grid3_javascript_vm.md), but the environment variables are different. + +## Leader Node + +We present here a code example and the environment variables to deploy a CapRover Leader node. + +For further details about the Leader node deployment, [read this documentation](https://github.com/freeflowuniverse/freeflow_caprover#a-leader-node-deploymentsetup). + +### Code Example + +```ts +import { + DiskModel, + FilterOptions, + MachineModel, + MachinesModel, + NetworkModel, +} from "../src"; +import { config, getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + + const vmQueryOptions: FilterOptions = { + cru: 4, + mru: 4, // GB + sru: 10, + farmId: 1, + }; + + const CAPROVER_FLIST = + "https://hub.grid.tf/tf-official-apps/tf-caprover-latest.flist"; + // create network Object + const n = new NetworkModel(); + n.name = "wedtest"; + n.ip_range = "10.249.0.0/16"; + + // create disk Object + const disk = new DiskModel(); + disk.name = "wedDisk"; + disk.size = 10; + disk.mountpoint = "/var/lib/docker"; + + // create vm node Object + const vm = new MachineModel(); + vm.name = "testvm"; + vm.node_id = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; + vm.disks = [disk]; + vm.public_ip = true; + vm.planetary = false; + vm.cpu = 4; + vm.memory = 1024 * 4; + vm.rootfs_size = 0; + vm.flist = CAPROVER_FLIST; + vm.entrypoint = "/sbin/zinit init"; + vm.env = { + PUBLIC_KEY: config.ssh_key, + SWM_NODE_MODE: "leader", + CAPROVER_ROOT_DOMAIN: "rafy.grid.tf", // update me + DEFAULT_PASSWORD: "captain42", + CAPTAIN_IMAGE_VERSION: "latest", + }; + + // create VMs Object + const vms = new MachinesModel(); + vms.name = "newVMS5"; + vms.network = n; + vms.machines = [vm]; + vms.metadata = "{'testVMs': true}"; + vms.description = "caprover leader machine/node"; + + // deploy vms + const res = await grid3.machines.deploy(vms); + log(res); + + // get the deployment + const l = await grid3.machines.getObj(vms.name); + log(l); + + log( + `You can access Caprover via the browser using: https://captain.${vm.env.CAPROVER_ROOT_DOMAIN}` + ); + + // // delete + // const d = await grid3.machines.delete({ name: vms.name }); + // log(d); + + await grid3.disconnect(); +} + +main(); +``` + + + +### Environment Variables + +- PUBLIC_KEY: Your public IP to be able to access the VM. +- SWM_NODE_MODE: Caprover Node type which must be `leader` as we are deploying a leader node. +- CAPROVER_ROOT_DOMAIN: The domain which you we will use to bind the deployed VM. +- DEFAULT_PASSWORD: Caprover default password you want to deploy with. + + + +## Worker Node + +We present here a code example and the environment variables to deploy a CapRover Worker node. + +Note that before deploying the Worker node, you should check the following: + +- Get the Leader node public IP address. +- The Worker node should join the cluster from the UI by adding public IP address and the private SSH key. + +For further information, [read this documentation](https://github.com/freeflowuniverse/freeflow_caprover#step-4-access-the-captain-dashboard). + +### Code Example + +```ts +import { + DiskModel, + FilterOptions, + MachineModel, + MachinesModel, + NetworkModel, +} from "../src"; +import { config, getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + + const vmQueryOptions: FilterOptions = { + cru: 4, + mru: 4, // GB + sru: 10, + farmId: 1, + }; + + const CAPROVER_FLIST = + "https://hub.grid.tf/tf-official-apps/tf-caprover-latest.flist"; + // create network Object + const n = new NetworkModel(); + n.name = "wedtest"; + n.ip_range = "10.249.0.0/16"; + + // create disk Object + const disk = new DiskModel(); + disk.name = "wedDisk"; + disk.size = 10; + disk.mountpoint = "/var/lib/docker"; + + // create vm node Object + const vm = new MachineModel(); + vm.name = "capworker1"; + vm.node_id = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; + vm.disks = [disk]; + vm.public_ip = true; + vm.planetary = false; + vm.cpu = 4; + vm.memory = 1024 * 4; + vm.rootfs_size = 0; + vm.flist = CAPROVER_FLIST; + vm.entrypoint = "/sbin/zinit init"; + vm.env = { + // These env. vars needed to be changed based on the leader node. + PUBLIC_KEY: config.ssh_key, + SWM_NODE_MODE: "worker", + LEADER_PUBLIC_IP: "185.206.122.157", + CAPTAIN_IMAGE_VERSION: "latest", + }; + + // create VMs Object + const vms = new MachinesModel(); + vms.name = "newVMS6"; + vms.network = n; + vms.machines = [vm]; + vms.metadata = "{'testVMs': true}"; + vms.description = "caprover worker machine/node"; + + // deploy vms + const res = await grid3.machines.deploy(vms); + log(res); + + // get the deployment + const l = await grid3.machines.getObj(vms.name); + log(l); + + // // delete + // const d = await grid3.machines.delete({ name: vms.name }); + // log(d); + + await grid3.disconnect(); +} + +main(); +``` + + + +### Environment Variables + +The deployment of the Worker node is similar to the deployment of the Leader node, with the exception of the environment variables which differ slightly. + +- PUBLIC_KEY: Your public IP to be able to access the VM. +- SWM_NODE_MODE: Caprover Node type which must be `worker` as we are deploying a worker node. +- LEADER_PUBLIC_IP: Leader node public IP. + + + +## Questions and Feedback + +If you have any questions, you can ask the ThreeFold community for help on the [ThreeFold Forum](http://forum.threefold.io/) or on the [ThreeFold Grid Tester Community](https://t.me/threefoldtesting) on Telegram. \ No newline at end of file diff --git a/collections/documentation/developers/javascript/grid3_javascript_gpu_support.md b/collections/documentation/developers/javascript/grid3_javascript_gpu_support.md new file mode 100644 index 0000000..4c56ca6 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_gpu_support.md @@ -0,0 +1,91 @@ +

GPU Support and JavaScript

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) + +*** + +## Introduction + +We present here a quick introduction to GPU support with JavaScript. + +There are a couple of updates regarding finding nodes with GPU, querying node for GPU information and deploying with support of GPU. + +This is an ongoing development and this section will be updated as new information comes in. + +## Example + +Here is an example script to deploy with GPU support: + +```ts +import { DiskModel, FilterOptions, MachineModel, MachinesModel, NetworkModel } from "../src"; +import { config, getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + + // create network Object + const n = new NetworkModel(); + n.name = "vmgpuNetwork"; + n.ip_range = "10.249.0.0/16"; + + // create disk Object + const disk = new DiskModel(); + disk.name = "vmgpuDisk"; + disk.size = 100; + disk.mountpoint = "/testdisk"; + + const vmQueryOptions: FilterOptions = { + cru: 8, + mru: 16, // GB + sru: 100, + availableFor: grid3.twinId, + hasGPU: true, + rentedBy: grid3.twinId, + }; + + // create vm node Object + const vm = new MachineModel(); + vm.name = "vmgpu"; + vm.node_id = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; // TODO: allow random choice + vm.disks = [disk]; + vm.public_ip = false; + vm.planetary = true; + vm.cpu = 8; + vm.memory = 1024 * 16; + vm.rootfs_size = 0; + vm.flist = "https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist"; + vm.entrypoint = "/"; + vm.env = { + SSH_KEY: config.ssh_key, + }; + vm.gpu = ["0000:0e:00.0/1002/744c"]; // gpu card's id, you can check the available gpu from the dashboard + + // create VMs Object + const vms = new MachinesModel(); + vms.name = "vmgpu"; + vms.network = n; + vms.machines = [vm]; + vms.metadata = ""; + vms.description = "test deploying VM with GPU via ts grid3 client"; + + // deploy vms + const res = await grid3.machines.deploy(vms); + log(res); + + // get the deployment + const l = await grid3.machines.getObj(vms.name); + log(l); + + // delete + const d = await grid3.machines.delete({ name: vms.name }); + log(d); + + await grid3.disconnect(); +} + +main(); +``` \ No newline at end of file diff --git a/collections/documentation/developers/javascript/grid3_javascript_installation.md b/collections/documentation/developers/javascript/grid3_javascript_installation.md new file mode 100644 index 0000000..3040880 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_installation.md @@ -0,0 +1,124 @@ +

Installation

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Prerequisites](#prerequisites) +- [Installation](#installation) + - [External Package](#external-package) + - [Local Usage](#local-usage) +- [Getting Started](#getting-started) + - [Client Configuration](#client-configuration) +- [Generate the Documentation](#generate-the-documentation) +- [How to Run the Scripts](#how-to-run-the-scripts) +- [Reference API](#reference-api) + +*** + +## Introduction + +We present here the general steps required to install and use the ThreeFold Grid Client. + +The [Grid Client](https://github.com/threefoldtech/tfgrid-sdk-ts/tree/development/packages/grid_client) is written using [TypeScript](https://www.typescriptlang.org/) to provide more convenience and type-checked code. It is used to deploy workloads like virtual machines, kubernetes clusters, quantum storage, and more. + +## Prerequisites + +To install the Grid Client, you will need the following on your machine: + +- [Node.js](https://nodejs.org/en) ^18 +- npm 8.2.0 or higher +- may need to install libtool (**apt-get install libtool**) + +> Note: [nvm](https://nvm.sh/) is the recommended way for installing node. + +To use the Grid Client, you will need the following on the TFGrid: + +- A TFChain account +- TFT in your wallet + +If it is not the case, please visit the [Get started section](../../system_administrators/getstarted/tfgrid3_getstarted.md). + +## Installation + +### External Package + +To install the external package, simply run the following command: + +```bash +yarn add @threefold/grid_client +``` + +> Note: For the **qa**, **test** and **main** networks, please use @2.1.1 version. + +### Local Usage + +To use the Grid Client locally, clone the repository then install the Grid Client: + +- Clone the repository + - ```bash + git clone https://github.com/threefoldtech/tfgrid-sdk-ts + ``` +- Install the Grid Client + - With yarn + - ```bash + yarn install + ``` + - With npm + - ```bash + npm install + ``` + +> Note: In the directory **grid_client/scripts**, we provided a set of scripts to test the Grid Client. + +## Getting Started + +You will need to set the client configuration either by setting the json file manually (**scripts/config.json**) or by using the provided script (**scripts/client_loader.ts**). + +### Client Configuration + +Make sure to set the client configuration properly before using the Grid Client. + +- **network**: The network environment (**dev**, **qa**, **test** or **main**). + +- **mnemonic**: The 12 words mnemonics for your account. + - Learn how to create one [here](../../dashboard/wallet_connector.md). + +- **storeSecret**: This is any word that will be used for encrypting/decrypting the keys on ThreeFold key-value store. + +- **ssh_key**: The public SSH key set on your machine. + +> Note: Only networks can't be isolated, all projects can see the same network. + +## Generate the Documentation + +The easiest way to test the installation is to run the following command with either yarn or npm to generate the Grid Client documentation: + +* With yarn + * ``` + yarn run serve-docs + ``` +* With npm + * ``` + npm run serve-docs + ``` + +> Note: You can also use the command **yarn run** to see all available options. + +## How to Run the Scripts + +You can explore the Grid Client by testing the different scripts proposed in **grid_client/scripts**. + +- Update your customized deployments specs if needed +- Run using [ts-node](https://www.npmjs.com/ts-node) + - With yarn + - ```bash + yarn run ts-node --project tsconfig-node.json scripts/zdb.ts + ``` + - With npx + - ```bash + npx ts-node --project tsconfig-node.json scripts/zdb.ts + ``` + +## Reference API + +While this is still a work in progress, you can have a look [here](https://threefoldtech.github.io/tfgrid-sdk-ts/packages/grid_client/docs/api/index.html). diff --git a/collections/documentation/developers/javascript/grid3_javascript_kubernetes.md b/collections/documentation/developers/javascript/grid3_javascript_kubernetes.md new file mode 100644 index 0000000..645f3e2 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_kubernetes.md @@ -0,0 +1,186 @@ +

Deploying a Kubernetes Cluster

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Prerequisites](#prerequisites) +- [Example code](#example-code) +- [Detailed explanation](#detailed-explanation) + - [Building network](#building-network) + - [Building nodes](#building-nodes) + - [Building cluster](#building-cluster) + - [Deploying](#deploying) + - [Getting deployment information](#getting-deployment-information) + - [Deleting deployment](#deleting-deployment) + +*** + +## Introduction + +We show how to deploy a Kubernetes cluster on the TFGrid with the Javascript client. + +## Prerequisites + +- Make sure you have your [client](./grid3_javascript_loadclient.md) prepared + +## Example code + +```ts +import { FilterOptions, K8SModel, KubernetesNodeModel, NetworkModel } from "../src"; +import { config, getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + + // create network Object + const n = new NetworkModel(); + n.name = "monNetwork"; + n.ip_range = "10.238.0.0/16"; + n.addAccess = true; + + const masterQueryOptions: FilterOptions = { + cru: 2, + mru: 2, // GB + sru: 2, + availableFor: grid3.twinId, + farmId: 1, + }; + + const workerQueryOptions: FilterOptions = { + cru: 1, + mru: 1, // GB + sru: 1, + availableFor: grid3.twinId, + farmId: 1, + }; + + // create k8s node Object + const master = new KubernetesNodeModel(); + master.name = "master"; + master.node_id = +(await grid3.capacity.filterNodes(masterQueryOptions))[0].nodeId; + master.cpu = 1; + master.memory = 1024; + master.rootfs_size = 0; + master.disk_size = 1; + master.public_ip = false; + master.planetary = true; + + // create k8s node Object + const worker = new KubernetesNodeModel(); + worker.name = "worker"; + worker.node_id = +(await grid3.capacity.filterNodes(workerQueryOptions))[0].nodeId; + worker.cpu = 1; + worker.memory = 1024; + worker.rootfs_size = 0; + worker.disk_size = 1; + worker.public_ip = false; + worker.planetary = true; + + // create k8s Object + const k = new K8SModel(); + k.name = "testk8s"; + k.secret = "secret"; + k.network = n; + k.masters = [master]; + k.workers = [worker]; + k.metadata = "{'testk8s': true}"; + k.description = "test deploying k8s via ts grid3 client"; + k.ssh_key = config.ssh_key; + + // deploy + const res = await grid3.k8s.deploy(k); + log(res); + + // get the deployment + const l = await grid3.k8s.getObj(k.name); + log(l); + + // // delete + // const d = await grid3.k8s.delete({ name: k.name }); + // log(d); + + await grid3.disconnect(); +} + +main(); +``` + +## Detailed explanation + +### Building network + +```typescript +// create network Object +const n = new NetworkModel(); +n.name = "monNetwork"; +n.ip_range = "10.238.0.0/16"; + +``` + +### Building nodes + +```typescript +// create k8s node Object +const master = new KubernetesNodeModel(); +master.name = "master"; +master.node_id = +(await grid3.capacity.filterNodes(masterQueryOptions))[0].nodeId; +master.cpu = 1; +master.memory = 1024; +master.rootfs_size = 0; +master.disk_size = 1; +master.public_ip = false; +master.planetary = true; + + // create k8s node Object +const worker = new KubernetesNodeModel(); +worker.name = "worker"; +worker.node_id = +(await grid3.capacity.filterNodes(workerQueryOptions))[0].nodeId; +worker.cpu = 1; +worker.memory = 1024; +worker.rootfs_size = 0; +worker.disk_size = 1; +worker.public_ip = false; +worker.planetary = true; + +``` + +### Building cluster + +Here we specify the cluster project name, cluster secret, network model to be used, master and workers nodes and sshkey to access them + +```ts +// create k8s Object +const k = new K8SModel(); +k.name = "testk8s"; +k.secret = "secret"; +k.network = n; +k.masters = [master]; +k.workers = [worker]; +k.metadata = "{'testk8s': true}"; +k.description = "test deploying k8s via ts grid3 client"; +k.ssh_key = config.ssh_key; +``` + +### Deploying + +use `deploy` function to deploy the kubernetes project + +```ts +const res = await grid3.k8s.deploy(k); +log(res); +``` + +### Getting deployment information + +```ts +const l = await grid3.k8s.getObj(k.name); +log(l); +``` + +### Deleting deployment + +```ts +const d = await grid3.k8s.delete({ name: k.name }); +log(d); +``` diff --git a/collections/documentation/developers/javascript/grid3_javascript_kvstore.md b/collections/documentation/developers/javascript/grid3_javascript_kvstore.md new file mode 100644 index 0000000..5075086 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_kvstore.md @@ -0,0 +1,101 @@ +

Using TFChain KVStore

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Prerequisites](#prerequisites) +- [Example code](#example-code) + - [setting values](#setting-values) + - [getting key](#getting-key) + - [listing keys](#listing-keys) + - [deleting key](#deleting-key) + +*** + +## Introduction + +As part of the tfchain, we support a keyvalue store module that can be used for any value within `2KB` range. practically it's used to save the user configurations state, so it can be built up again on any machine, given they used the same mnemonics and same secret. + +## Prerequisites + +- Make sure you have your [client](./grid3_javascript_loadclient.md) prepared + +## Example code + +```ts +import { getClient } from "./client_loader"; +import { log } from "./utils"; + +/* +KVStore example usage: +*/ +async function main() { + //For creating grid3 client with KVStore, you need to specify the KVStore storage type in the pram: + + const gridClient = await getClient(); + + //then every module will use the KVStore to save its configuration and restore it. + + // also you can use it like this: + const db = gridClient.kvstore; + + // set key + const key = "hamada"; + const exampleObj = { + key1: "value1", + key2: 2, + }; + // set key + await db.set({ key, value: JSON.stringify(exampleObj) }); + + // list all the keys + const keys = await db.list(); + log(keys); + + // get the key + const data = await db.get({ key }); + log(JSON.parse(data)); + + // remove the key + await db.remove({ key }); + + await gridClient.disconnect(); +} + +main(); + +``` + +### setting values + +`db.set` is used to set key to any value `serialized as string` + +```ts +await db.set({ key, value: JSON.stringify(exampleObj) }); +``` + +### getting key + +`db.get` is used to get a specific key + +```ts +const data = await db.get({ key }); +log(JSON.parse(data)); +``` + +### listing keys + +`db.list` is used to list all the keys. + +```ts +const keys = await db.list(); +log(keys); +``` + +### deleting key + +`db.remove` is used to delete a specific key. + +```ts +await db.remove({ key }); +``` diff --git a/collections/documentation/developers/javascript/grid3_javascript_loadclient.md b/collections/documentation/developers/javascript/grid3_javascript_loadclient.md new file mode 100644 index 0000000..fc7c025 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_loadclient.md @@ -0,0 +1,68 @@ +

Grid3 Client

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Client Configurations](#client-configurations) +- [Creating/Initializing The Grid3 Client](#creatinginitializing-the-grid3-client) +- [What is `rmb-rs` | Reliable Message Bus --rust](#what-is-rmb-rs--reliable-message-bus---rust) +- [Grid3 Client Options](#grid3-client-options) + +## Introduction + +Grid3 Client is a client used for deploying workloads (VMs, ZDBs, k8s, etc.) on the TFGrid. + +## Client Configurations + +so you have to set up your configuration file to be like this: + +```json +{ + "network": "dev", + "mnemonic": "", + "storeSecret": "secret", + "ssh_key": "" +} +``` + +## Creating/Initializing The Grid3 Client + +```ts +async function getClient(): Promise { + const gridClient = new GridClient({ + network: "dev", // can be dev, qa, test, main, or custom + mnemonic: "", + }); + await gridClient.connect(); + + return gridClient; + } +``` + +The grid client uses `rmb-rs` tool to send requests to/from nodes. + +## What is `rmb-rs` | Reliable Message Bus --rust + +Reliable message bus is a secure communication panel that allows bots to communicate together in a chat like way. It makes it very easy to host a service or a set of functions to be used by anyone, even if your service is running behind NAT. + +Out of the box RMB provides the following: + +- Guarantee authenticity of the messages. You are always sure that the received message is from whoever is pretending to be +- End to End encryption +- Support for 3rd party hosted relays. Anyone can host a relay and people can use it safely since there is no way messages can be inspected while +using e2e. That's similar to home servers by matrix + +## Grid3 Client Options + +- network: `dev` for devnet, `test` for testnet +- mnemonics: used for signing the requests. +- storeSecret: used to encrypt data while storing in backend. It's any word that will be used for encrypting/decrypting the keys on threefold key-value store. If left empty, the Grid client will use the mnemonics as the storeSecret. +- BackendStorage : can be `auto` which willl automatically adapt if running in node environment to use `filesystem backend` or the browser enviornment to use `localstorage backend`. Also you can set it to `kvstore` to use the tfchain keyvalue store module. +- keypairType: is defaulted to `sr25519`, most likely you will never need to change it. `ed25519` is supported too. + +for more details, check [client options](https://github.com/threefoldtech/tfgrid-sdk-ts/blob/development/packages/grid_client/docs/client_configuration.md) + +> Note: The choice of the node is completely up to the user at this point. They need to do the capacity planning. Check [Node Finder](../../dashboard/deploy/node_finder.md) to know which nodes fits your deployment criteria. + +Check the document for [capacity planning using code](../javascript/grid3_javascript_capacity_planning.md) if you want to automate it +> Note: this feature is still experimental diff --git a/collections/documentation/developers/javascript/grid3_javascript_qsfs.md b/collections/documentation/developers/javascript/grid3_javascript_qsfs.md new file mode 100644 index 0000000..df6b02a --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_qsfs.md @@ -0,0 +1,297 @@ +

Deploying a VM with QSFS

+ +

Table of Contents

+ +- [Prerequisites](#prerequisites) +- [Code Example](#code-example) +- [Detailed Explanation](#detailed-explanation) + - [Getting the Client](#getting-the-client) + - [Preparing QSFS](#preparing-qsfs) + - [Deploying a VM with QSFS](#deploying-a-vm-with-qsfs) + - [Getting the Deployment Information](#getting-the-deployment-information) + - [Deleting a Deployment](#deleting-a-deployment) + +*** + +## Prerequisites + +First, make sure that you have your [client](./grid3_javascript_loadclient.md) prepared. + +## Code Example + +```ts +import { FilterOptions, MachinesModel, QSFSZDBSModel } from "../src"; +import { config, getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + + const qsfs_name = "wed2710q1"; + const machines_name = "wed2710t1"; + + const vmQueryOptions: FilterOptions = { + cru: 1, + mru: 1, // GB + sru: 1, + availableFor: grid3.twinId, + farmId: 1, + }; + + const qsfsQueryOptions: FilterOptions = { + hru: 6, + availableFor: grid3.twinId, + farmId: 1, + }; + + const qsfsNodes = []; + + const allNodes = await grid3.capacity.filterNodes(qsfsQueryOptions); + if (allNodes.length >= 2) { + qsfsNodes.push(+allNodes[0].nodeId, +allNodes[1].nodeId); + } else { + throw Error("Couldn't find nodes for qsfs"); + } + + const vmNode = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; + + const qsfs: QSFSZDBSModel = { + name: qsfs_name, + count: 8, + node_ids: qsfsNodes, + password: "mypassword", + disk_size: 1, + description: "my qsfs test", + metadata: "", + }; + + const vms: MachinesModel = { + name: machines_name, + network: { + name: "wed2710n1", + ip_range: "10.201.0.0/16", + }, + machines: [ + { + name: "wed2710v1", + node_id: vmNode, + disks: [ + { + name: "wed2710d1", + size: 1, + mountpoint: "/mydisk", + }, + ], + qsfs_disks: [ + { + qsfs_zdbs_name: qsfs_name, + name: "wed2710d2", + minimal_shards: 2, + expected_shards: 4, + encryption_key: "hamada", + prefix: "hamada", + cache: 1, + mountpoint: "/myqsfsdisk", + }, + ], + public_ip: false, + public_ip6: false, + planetary: true, + cpu: 1, + memory: 1024, + rootfs_size: 0, + flist: "https://hub.grid.tf/tf-official-apps/base:latest.flist", + entrypoint: "/sbin/zinit init", + env: { + SSH_KEY: config.ssh_key, + }, + }, + ], + metadata: "{'testVMs': true}", + description: "test deploying VMs via ts grid3 client", + }; + + async function cancel(grid3) { + // delete + const d = await grid3.machines.delete({ name: machines_name }); + log(d); + const r = await grid3.qsfs_zdbs.delete({ name: qsfs_name }); + log(r); + } + //deploy qsfs + const res = await grid3.qsfs_zdbs.deploy(qsfs); + log(">>>>>>>>>>>>>>>QSFS backend has been created<<<<<<<<<<<<<<<"); + log(res); + + const vm_res = await grid3.machines.deploy(vms); + log(">>>>>>>>>>>>>>>vm has been created<<<<<<<<<<<<<<<"); + log(vm_res); + + // get the deployment + const l = await grid3.machines.getObj(vms.name); + log(">>>>>>>>>>>>>>>Deployment result<<<<<<<<<<<<<<<"); + log(l); + + // await cancel(grid3); + + await grid3.disconnect(); +} + +main(); +``` + +## Detailed Explanation + +We present a detailed explanation of the example shown above. + +### Getting the Client + +```ts +const grid3 = getClient(); +``` + +### Preparing QSFS + +```ts +const qsfs_name = "wed2710q1"; +const machines_name = "wed2710t1"; +``` + +We prepare here some names to use across the client for the QSFS and the machines project + +```ts + const qsfsQueryOptions: FilterOptions = { + hru: 6, + availableFor: grid3.twinId, + farmId: 1, + }; + const qsfsNodes = []; + + const allNodes = await grid3.capacity.filterNodes(qsfsQueryOptions); + if (allNodes.length >= 2) { + qsfsNodes.push(+allNodes[0].nodeId, +allNodes[1].nodeId); + } else { + throw Error("Couldn't find nodes for qsfs"); + } + + const vmNode = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; + + const qsfs: QSFSZDBSModel = { + name: qsfs_name, + count: 8, + node_ids: qsfsNodes, + password: "mypassword", + disk_size: 1, + description: "my qsfs test", + metadata: "", + }; + +const res = await grid3.qsfs_zdbs.deploy(qsfs); +log(">>>>>>>>>>>>>>>QSFS backend has been created<<<<<<<<<<<<<<<"); +log(res); +``` + +Here we deploy `8` ZDBs on nodes `2,3` with password `mypassword`, all of them having disk size of `10GB` + +### Deploying a VM with QSFS + +```ts +const vmQueryOptions: FilterOptions = { + cru: 1, + mru: 1, // GB + sru: 1, + availableFor: grid3.twinId, + farmId: 1, +}; + +const vmNode = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; + + // deploy vms +const vms: MachinesModel = { + name: machines_name, + network: { + name: "wed2710n1", + ip_range: "10.201.0.0/16", + }, + machines: [ + { + name: "wed2710v1", + node_id: vmNode, + disks: [ + { + name: "wed2710d1", + size: 1, + mountpoint: "/mydisk", + }, + ], + qsfs_disks: [ + { + qsfs_zdbs_name: qsfs_name, + name: "wed2710d2", + minimal_shards: 2, + expected_shards: 4, + encryption_key: "hamada", + prefix: "hamada", + cache: 1, + mountpoint: "/myqsfsdisk", + }, + ], + public_ip: false, + public_ip6: false, + planetary: true, + cpu: 1, + memory: 1024, + rootfs_size: 0, + flist: "https://hub.grid.tf/tf-official-apps/base:latest.flist", + entrypoint: "/sbin/zinit init", + env: { + SSH_KEY: config.ssh_key, + }, + }, + ], + metadata: "{'testVMs': true}", + description: "test deploying VMs via ts grid3 client", +}; +const vm_res = await grid3.machines.deploy(vms); +log(">>>>>>>>>>>>>>>vm has been created<<<<<<<<<<<<<<<"); +log(vm_res); +``` + +So this deployment is almost similiar to what we have in the [vm deployment section](./grid3_javascript_vm.md). We only have a new section `qsfs_disks` + +```ts + qsfs_disks: [{ + qsfs_zdbs_name: qsfs_name, + name: "wed2710d2", + minimal_shards: 2, + expected_shards: 4, + encryption_key: "hamada", + prefix: "hamada", + cache: 1, + mountpoint: "/myqsfsdisk" + }], +``` + +`qsfs_disks` is a list, representing all of the QSFS disks used within that VM. + +- `qsfs_zdbs_name`: that's the backend ZDBs we defined in the beginning +- `expected_shards`: how many ZDBs that QSFS should be working with +- `minimal_shards`: the minimal possible amount of ZDBs to recover the data with when losing disks e.g due to failure +- `mountpoint`: where it will be mounted on the VM `/myqsfsdisk` + +### Getting the Deployment Information + +```ts +const l = await grid3.machines.getObj(vms.name); +log(l); +``` + +### Deleting a Deployment + +```ts +// delete +const d = await grid3.machines.delete({ name: machines_name }); +log(d); +const r = await grid3.qsfs_zdbs.delete({ name: qsfs_name }); +log(r); +``` diff --git a/collections/documentation/developers/javascript/grid3_javascript_qsfs_zdbs.md b/collections/documentation/developers/javascript/grid3_javascript_qsfs_zdbs.md new file mode 100644 index 0000000..9c3a3f2 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_qsfs_zdbs.md @@ -0,0 +1,142 @@ +

Deploying ZDBs for QSFS

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Prerequisites](#prerequisites) +- [Example code](#example-code) +- [Detailed explanation](#detailed-explanation) + - [Getting the client](#getting-the-client) + - [Preparing the nodes](#preparing-the-nodes) + - [Preparing ZDBs](#preparing-zdbs) + - [Deploying the ZDBs](#deploying-the-zdbs) + - [Getting deployment information](#getting-deployment-information) + - [Deleting a deployment](#deleting-a-deployment) + +*** + +## Introduction + +We show how to deploy ZDBs for QSFS on the TFGrid with the Javascript client. + +## Prerequisites + +- Make sure you have your [client](./grid3_javascript_loadclient.md) prepared + +## Example code + +````typescript +import { FilterOptions, QSFSZDBSModel } from "../src"; +import { getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + const qsfs_name = "zdbsQsfsDemo"; + const qsfsQueryOptions: FilterOptions = { + hru: 8, + availableFor: grid3.twinId, + farmId: 1, + }; + const qsfsNodes = []; + + const allNodes = await grid3.capacity.filterNodes(qsfsQueryOptions); + + if (allNodes.length >= 2) { + qsfsNodes.push(+allNodes[0].nodeId, +allNodes[1].nodeId); + } else { + throw Error("Couldn't find nodes for qsfs"); + } + + const qsfs: QSFSZDBSModel = { + name: qsfs_name, + count: 12, + node_ids: qsfsNodes, + password: "mypassword", + disk_size: 1, + description: "my zdbs test", + metadata: "", + }; + const deploy_res = await grid3.qsfs_zdbs.deploy(qsfs); + log(deploy_res); + + const zdbs_data = await grid3.qsfs_zdbs.get({ name: qsfs_name }); + log(zdbs_data); + + + await grid3.disconnect(); +} +main(); + +```` + +## Detailed explanation + +### Getting the client + +```typescript +const grid3 = getClient(); +``` + +### Preparing the nodes + +we need to deploy the zdbs on two different nodes so, we setup the filters here to retrieve the available nodes. + +````typescript +const qsfsQueryOptions: FilterOptions = { + hru: 16, + availableFor: grid3.twinId, + farmId: 1, +}; +const qsfsNodes = []; + +const allNodes = await grid3.capacity.filterNodes(qsfsQueryOptions); + +if (allNodes.length >= 2) { + qsfsNodes.push(+allNodes[0].nodeId, +allNodes[1].nodeId); +} else { + throw Error("Couldn't find nodes for qsfs"); +} +```` + +Now we have two nodes in `qsfsNode`. + +### Preparing ZDBs + +````typescript +const qsfs_name = "zdbsQsfsDemo"; +```` + +We prepare here a name to use across the client for the QSFS ZDBs + +### Deploying the ZDBs + +````typescript +const qsfs: QSFSZDBSModel = { + name: qsfs_name, + count: 12, + node_ids: qsfsNodes, + password: "mypassword", + disk_size: 1, + description: "my qsfs test", + metadata: "", + }; +const deploy_res = await grid3.qsfs_zdbs.deploy(qsfs); +log(deploy_res); +```` + +Here we deploy `12` ZDBs on nodes in `qsfsNode` with password `mypassword`, all of them having disk size of `1GB`, the client already add 4 zdbs for metadata. + +### Getting deployment information + +````typescript +const zdbs_data = await grid3.qsfs_zdbs.get({ name: qsfs_name }); +log(zdbs_data); +```` + +### Deleting a deployment + +````typescript +const delete_response = await grid3.qsfs_zdbs.delete({ name: qsfs_name }); +log(delete_response); +```` diff --git a/collections/documentation/developers/javascript/grid3_javascript_readme.md b/collections/documentation/developers/javascript/grid3_javascript_readme.md new file mode 100644 index 0000000..7072d19 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_readme.md @@ -0,0 +1,24 @@ +

Javascript Client

+ +This section covers developing projects on top of Threefold Grid using Javascript language. + +Javascript has a huge ecosystem, and first class citizen when it comes to blockchain technologies like substrate and that was one of the reasons for it to become one the very first supported languages on the grid. + +Please make sure to check the [basics](../../system_administrators/getstarted/tfgrid3_getstarted.md) before continuing. + +

Table of Contents

+ +- [Installation](./grid3_javascript_installation.md) +- [Loading Client](./grid3_javascript_loadclient.md) +- [Deploy a VM](./grid3_javascript_vm.md) +- [Capacity Planning](./grid3_javascript_capacity_planning.md) +- [Deploy Multiple VMs](./grid3_javascript_vms.md) +- [Deploy CapRover](./grid3_javascript_caprover.md) +- [Gateways](./grid3_javascript_vm_gateways.md) +- [Deploy a Kubernetes Cluster](./grid3_javascript_kubernetes.md) +- [Deploy a ZDB](./grid3_javascript_zdb.md) +- [Deploy ZDBs for QSFS](./grid3_javascript_qsfs_zdbs.md) +- [QSFS](./grid3_javascript_qsfs.md) +- [Key Value Store](./grid3_javascript_kvstore.md) +- [VM with Wireguard and Gateway](./grid3_wireguard_gateway.md) +- [GPU Support](./grid3_javascript_gpu_support.md) \ No newline at end of file diff --git a/collections/documentation/developers/javascript/grid3_javascript_run_scripts.md b/collections/documentation/developers/javascript/grid3_javascript_run_scripts.md new file mode 100644 index 0000000..7c3a2d5 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_run_scripts.md @@ -0,0 +1,15 @@ +## How to run the scripts + +- Set your grid3 client configuration in `scripts/client_loader.ts` or easily use one of `config.json` +- update your customized deployments specs +- Run using [ts-node](https://www.npmjs.com/ts-node) + +```bash +npx ts-node --project tsconfig-node.json scripts/zdb.ts +``` + +or + +```bash +yarn run ts-node --project tsconfig-node.json scripts/zdb.ts +``` diff --git a/collections/documentation/developers/javascript/grid3_javascript_vm.md b/collections/documentation/developers/javascript/grid3_javascript_vm.md new file mode 100644 index 0000000..5c9069e --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_vm.md @@ -0,0 +1,194 @@ + +

Deploying a VM

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example](#example) +- [Detailed Explanation](#detailed-explanation) + - [Building Network](#building-network) +- [Building the Disk Model](#building-the-disk-model) +- [Building the VM](#building-the-vm) +- [Building VMs Collection](#building-vms-collection) +- [deployment](#deployment) +- [Getting Deployment Information](#getting-deployment-information) +- [Deleting a Deployment](#deleting-a-deployment) + +*** + +## Introduction + +We present information on how to deploy a VM with the Javascript client with concrete examples. + +## Example + +```ts +import { DiskModel, FilterOptions, MachineModel, MachinesModel, NetworkModel } from "../src"; +import { config, getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + + // create network Object + const n = new NetworkModel(); + n.name = "dynamictest"; + n.ip_range = "10.249.0.0/16"; + + // create disk Object + const disk = new DiskModel(); + disk.name = "dynamicDisk"; + disk.size = 8; + disk.mountpoint = "/testdisk"; + + const vmQueryOptions: FilterOptions = { + cru: 1, + mru: 1, // GB + sru: 1, + availableFor: grid3.twinId, + country: "Belgium", + }; + + // create vm node Object + const vm = new MachineModel(); + vm.name = "testvm"; + vm.node_id = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; // TODO: allow random choice + vm.disks = [disk]; + vm.public_ip = false; + vm.planetary = true; + vm.cpu = 1; + vm.memory = 1024; + vm.rootfs_size = 0; + vm.flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"; + vm.entrypoint = "/sbin/zinit init"; + vm.env = { + SSH_KEY: config.ssh_key, + }; + + // create VMs Object + const vms = new MachinesModel(); + vms.name = "dynamicVMS"; + vms.network = n; + vms.machines = [vm]; + vms.metadata = "{'testVMs': true}"; + vms.description = "test deploying VMs via ts grid3 client"; + + // deploy vms + const res = await grid3.machines.deploy(vms); + log(res); + + // get the deployment + const l = await grid3.machines.getObj(vms.name); + log(l); + + // // delete + // const d = await grid3.machines.delete({ name: vms.name }); + // log(d); + + await grid3.disconnect(); +} + +main(); +``` + +## Detailed Explanation + +### Building Network + +```ts +// create network Object +const n = new NetworkModel(); +n.name = "dynamictest"; +n.ip_range = "10.249.0.0/16"; +``` + +Here we prepare the network model that is going to be used by specifying a name to our network and the range it will be spanning over + +## Building the Disk Model + +```ts +// create disk Object +const disk = new DiskModel(); +disk.name = "dynamicDisk"; +disk.size = 8; +disk.mountpoint = "/testdisk"; +``` + +here we create the disk model specifying its name, size in GB and where it will be mounted eventually + +## Building the VM + +```ts +// create vm node Object +const vm = new MachineModel(); +vm.name = "testvm"; +vm.node_id = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; // TODO: allow random choice +vm.disks = [disk]; +vm.public_ip = false; +vm.planetary = true; +vm.cpu = 1; +vm.memory = 1024; +vm.rootfs_size = 0; +vm.flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"; +vm.entrypoint = "/sbin/zinit init"; +vm.env = { + SSH_KEY: config.ssh_key, +}; +``` + +Now we go to the VM model, that will be used to build our `zmachine` object + +We need to specify its + +- name +- node_id: where it will get deployed +- disks: disks model collection +- memory +- root filesystem size +- flist: the image it is going to start from. Check the [supported flists](../flist/grid3_supported_flists.md) +- entry point: entrypoint command / script to execute +- env: has the environment variables needed e.g sshkeys used +- public ip: if we want to have a public ip attached to the VM +- planetary: to enable planetary network on VM + +## Building VMs Collection + +```ts +// create VMs Object +const vms = new MachinesModel(); +vms.name = "dynamicVMS"; +vms.network = n; +vms.machines = [vm]; +vms.metadata = "{'testVMs': true}"; +vms.description = "test deploying VMs via ts grid3 client"; +``` + +Here it's quite simple we can add one or more VM to the `machines` property to have them deployed as part of our project + +## deployment + +```ts +// deploy vms +const res = await grid3.machines.deploy(vms); +log(res); +``` + +## Getting Deployment Information + +can do so based on the name you gave to the `vms` collection + +```ts +// get the deployment +const l = await grid3.machines.getObj(vms.name); +log(l); +``` + +## Deleting a Deployment + +```ts +// delete +const d = await grid3.machines.delete({ name: vms.name }); +log(d); +``` + +In the underlying layer we cancel the contracts that were created on the chain and as a result all of the workloads tied to his project will get deleted. diff --git a/collections/documentation/developers/javascript/grid3_javascript_vm_gateways.md b/collections/documentation/developers/javascript/grid3_javascript_vm_gateways.md new file mode 100644 index 0000000..052d9f3 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_vm_gateways.md @@ -0,0 +1,189 @@ +

Deploying a VM and exposing it over a Gateway Prefix

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example code](#example-code) +- [Detailed explanation](#detailed-explanation) + - [deploying](#deploying) + - [getting deployment object](#getting-deployment-object) + - [deletion](#deletion) +- [Deploying a VM and exposing it over a Gateway using a Full domain](#deploying-a-vm-and-exposing-it-over-a-gateway-using-a-full-domain) +- [Example code](#example-code-1) +- [Detailed explanation](#detailed-explanation-1) + - [deploying](#deploying-1) + - [get deployment object](#get-deployment-object) + - [deletion](#deletion-1) + +*** + +## Introduction + +After the [deployment of a VM](./grid3_javascript_vm.md), now it's time to expose it to the world + +## Example code + +```ts +import { FilterOptions, GatewayNameModel } from "../src"; +import { getClient } from "./client_loader"; +import { log } from "./utils"; + +// read more about the gateway types in this doc: https://github.com/threefoldtech/zos/tree/main/docs/gateway +async function main() { + const grid3 = await getClient(); + + const gatewayQueryOptions: FilterOptions = { + gateway: true, + farmId: 1, + }; + + const gw = new GatewayNameModel(); + gw.name = "test"; + gw.node_id = +(await grid3.capacity.filterNodes(gatewayQueryOptions))[0].nodeId; + gw.tls_passthrough = false; + // the backends have to be in this format `http://ip:port` or `https://ip:port`, and the `ip` pingable from the node so using the ygg ip or public ip if available. + gw.backends = ["http://185.206.122.35:8000"]; + + // deploy + const res = await grid3.gateway.deploy_name(gw); + log(res); + + // get the deployment + const l = await grid3.gateway.getObj(gw.name); + log(l); + + // // delete + // const d = await grid3.gateway.delete_name({ name: gw.name }); + // log(d); + + grid3.disconnect(); +} + +main(); + +``` + +## Detailed explanation + +```ts +const gw = new GatewayNameModel(); +gw.name = "test"; +gw.node_id = +(await grid3.capacity.filterNodes(gatewayQueryOptions))[0].nodeId; +gw.tls_passthrough = false; +gw.backends = ["http://185.206.122.35:8000"]; +``` + +- we created a gateway name model and gave it a `name` -that's why it's called GatewayName- `test` to be deployed on gateway node to end up with a domain `test.gent01.devnet.grid.tf`, +- we create a proxy for the gateway to send the traffic coming to `test.ghent01.devnet.grid.tf` to the backend `http://185.206.122.35`, we say `tls_passthrough is false` to let the gateway terminate the traffic, if you replace it with `true` your backend service needs to be able to do the TLS termination + +### deploying + +```ts +// deploy +const res = await grid3.gateway.deploy_name(gw); +log(res); +``` + +this deploys `GatewayName` on the grid + +### getting deployment object + +```ts +const l = await grid3.gateway.getObj(gw.name); +log(l); +``` + +getting the deployment information can be done using `getObj` + +### deletion + +```ts +const d = await grid3.gateway.delete_name({ name: gw.name }); +log(d); +``` + +## Deploying a VM and exposing it over a Gateway using a Full domain + +After the [deployment of a VM](./grid3_javascript_vm.md), now it's time to expose it to the world + +## Example code + +```ts +import { FilterOptions, GatewayFQDNModel } from "../src"; +import { getClient } from "./client_loader"; +import { log } from "./utils"; + +// read more about the gateway types in this doc: https://github.com/threefoldtech/zos/tree/main/docs/gateway +async function main() { + const grid3 = await getClient(); + + const gatewayQueryOptions: FilterOptions = { + gateway: true, + farmId: 1, + }; + const gw = new GatewayFQDNModel(); + gw.name = "applyFQDN"; + gw.node_id = +(await grid3.capacity.filterNodes(gatewayQueryOptions))[0].nodeId; + gw.fqdn = "test.hamada.grid.tf"; + gw.tls_passthrough = false; + // the backends have to be in this format `http://ip:port` or `https://ip:port`, and the `ip` pingable from the node so using the ygg ip or public ip if available. + gw.backends = ["http://185.206.122.35:8000"]; + + // deploy + const res = await grid3.gateway.deploy_fqdn(gw); + log(res); + + // get the deployment + const l = await grid3.gateway.getObj(gw.name); + log(l); + + // // delete + // const d = await grid3.gateway.delete_fqdn({ name: gw.name }); + // log(d); + + grid3.disconnect(); +} + +main(); +``` + +## Detailed explanation + +```ts +const gw = new GatewayFQDNModel(); +gw.name = "applyFQDN"; +gw.node_id = 1; +gw.fqdn = "test.hamada.grid.tf"; +gw.tls_passthrough = false; +gw.backends = ["my yggdrasil IP"]; +``` + +- we created a `GatewayFQDNModel` and gave it a name `applyFQDNN` to be deployed on gateway node `1` and specified the fully qualified domain `fqdn` to a domain we own `test.hamada.grid.tf` +- we created a record on our name provider for `test.hamada.grid.tf` to point to the IP of gateway node `1` +- we specified the backened would be an yggdrassil ip so once this is deployed when we go to `test.hamada.grid.tf` we go to the gateway server and from their our traffic goes to the backend. + +### deploying + +```ts +// deploy +const res = await grid3.gateway.deploy_fqdn(gw); +log(res); +``` + +this deploys `GatewayName` on the grid + +### get deployment object + +```ts +const l = await grid3.gateway.getObj(gw.name); +log(l); +``` + +getting the deployment information can be done using `getObj` + +### deletion + +```ts +const d = await grid3.gateway.delete_fqdn({ name: gw.name }); +log(d); +``` diff --git a/collections/documentation/developers/javascript/grid3_javascript_vms.md b/collections/documentation/developers/javascript/grid3_javascript_vms.md new file mode 100644 index 0000000..b928007 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_vms.md @@ -0,0 +1,108 @@ + +

Deploying multiple VMs

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Example code](#example-code) + +*** + +## Introduction + +It is possible to deploy multiple VMs with the Javascript client. + +## Example code + +```ts +import { DiskModel, FilterOptions, MachineModel, MachinesModel, NetworkModel } from "../src"; +import { config, getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + + // create network Object + const n = new NetworkModel(); + n.name = "monNetwork"; + n.ip_range = "10.238.0.0/16"; + + // create disk Object + const disk1 = new DiskModel(); + disk1.name = "newDisk1"; + disk1.size = 1; + disk1.mountpoint = "/newDisk1"; + + const vmQueryOptions: FilterOptions = { + cru: 1, + mru: 1, // GB + sru: 1, + availableFor: grid3.twinId, + farmId: 1, + }; + + // create vm node Object + const vm1 = new MachineModel(); + vm1.name = "testvm1"; + vm1.node_id = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; + vm1.disks = [disk1]; + vm1.public_ip = false; + vm1.planetary = true; + vm1.cpu = 1; + vm1.memory = 1024; + vm1.rootfs_size = 0; + vm1.flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"; + vm1.entrypoint = "/sbin/zinit init"; + vm1.env = { + SSH_KEY: config.ssh_key, + }; + + // create disk Object + const disk2 = new DiskModel(); + disk2.name = "newDisk2"; + disk2.size = 1; + disk2.mountpoint = "/newDisk2"; + + // create another vm node Object + const vm2 = new MachineModel(); + vm2.name = "testvm2"; + vm2.node_id = +(await grid3.capacity.filterNodes(vmQueryOptions))[1].nodeId; + vm2.disks = [disk2]; + vm2.public_ip = false; + vm2.planetary = true; + vm2.cpu = 1; + vm2.memory = 1024; + vm2.rootfs_size = 0; + vm2.flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"; + vm2.entrypoint = "/sbin/zinit init"; + vm2.env = { + SSH_KEY: config.ssh_key, + }; + + // create VMs Object + const vms = new MachinesModel(); + vms.name = "monVMS"; + vms.network = n; + vms.machines = [vm1, vm2]; + vms.metadata = "{'testVMs': true}"; + vms.description = "test deploying VMs via ts grid3 client"; + + // deploy vms + const res = await grid3.machines.deploy(vms); + log(res); + + // get the deployment + const l = await grid3.machines.getObj(vms.name); + log(l); + + // // delete + // const d = await grid3.machines.delete({ name: vms.name }); + // log(d); + + await grid3.disconnect(); +} + +main(); +``` + +It's similiar to the previous section of [deploying a single VM](../javascript/grid3_javascript_vm.md), but just adds more vm objects to vms collection. diff --git a/collections/documentation/developers/javascript/grid3_javascript_zdb.md b/collections/documentation/developers/javascript/grid3_javascript_zdb.md new file mode 100644 index 0000000..d773269 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_javascript_zdb.md @@ -0,0 +1,143 @@ +

Deploying ZDB

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Prerequisites](#prerequisites) +- [Example code](#example-code) +- [Detailed explanation](#detailed-explanation) + - [Getting the client](#getting-the-client) + - [Building the model](#building-the-model) + - [preparing ZDBs collection](#preparing-zdbs-collection) + - [Deployment](#deployment) + - [Getting Deployment information](#getting-deployment-information) + - [Deleting a deployment](#deleting-a-deployment) + +*** + +## Introduction + +We show how to deploy ZDB on the TFGrid with the Javascript client. + +## Prerequisites + +- Make sure you have your [client](./grid3_javascript_loadclient.md) prepared + +## Example code + +```ts +import { FilterOptions, ZDBModel, ZdbModes, ZDBSModel } from "../src"; +import { getClient } from "./client_loader"; +import { log } from "./utils"; + +async function main() { + const grid3 = await getClient(); + + const zdbQueryOptions: FilterOptions = { + sru: 1, + hru: 1, + availableFor: grid3.twinId, + farmId: 1, + }; + + // create zdb object + const zdb = new ZDBModel(); + zdb.name = "hamada"; + zdb.node_id = +(await grid3.capacity.filterNodes(zdbQueryOptions))[0].nodeId; + zdb.mode = ZdbModes.user; + zdb.disk_size = 1; + zdb.publicNamespace = false; + zdb.password = "testzdb"; + + // create zdbs object + const zdbs = new ZDBSModel(); + zdbs.name = "tttzdbs"; + zdbs.zdbs = [zdb]; + zdbs.metadata = '{"test": "test"}'; + + // deploy zdb + const res = await grid3.zdbs.deploy(zdbs); + log(res); + + // get the deployment + const l = await grid3.zdbs.getObj(zdbs.name); + log(l); + + // // delete + // const d = await grid3.zdbs.delete({ name: zdbs.name }); + // log(d); + + await grid3.disconnect(); +} + +main(); +``` + +## Detailed explanation + +### Getting the client + +```ts +const grid3 = getClient(); +``` + +### Building the model + +```ts +// create zdb object +const zdb = new ZDBModel(); +zdb.name = "hamada"; +zdb.node_id = +(await grid3.capacity.filterNodes(zdbQueryOptions))[0].nodeId; +zdb.mode = ZdbModes.user; +zdb.disk_size = 1; +zdb.publicNamespace = false; +zdb.password = "testzdb"; +``` + +Here we define a `ZDB model` and setting the relevant properties e.g + +- name +- node_id : where to deploy on +- mode: `user` or `seq` +- disk_size: disk size in GB +- publicNamespace: a public namespace can be read-only if a password is set +- password: namespace password + +### preparing ZDBs collection + +```ts +// create zdbs object +const zdbs = new ZDBSModel(); +zdbs.name = "tttzdbs"; +zdbs.zdbs = [zdb]; +zdbs.metadata = '{"test": "test"}'; +``` + +you can attach multiple ZDBs into the collection and send it for deployment + +### Deployment + +```ts +const res = await grid3.zdbs.deploy(zdbs); +log(res); +``` + +### Getting Deployment information + +`getObj` gives detailed information about the workload. + +```ts +// get the deployment +const l = await grid3.zdbs.getObj(zdbs.name); +log(l); +``` + +### Deleting a deployment + +`.delete` method helps cancelling the relevant contracts related to that ZDBs deployment + +```ts +// delete +const d = await grid3.zdbs.delete({ name: zdbs.name }); +log(d); +``` diff --git a/collections/documentation/developers/javascript/grid3_wireguard_gateway.md b/collections/documentation/developers/javascript/grid3_wireguard_gateway.md new file mode 100644 index 0000000..2cb1b89 --- /dev/null +++ b/collections/documentation/developers/javascript/grid3_wireguard_gateway.md @@ -0,0 +1,302 @@ +

Deploying a VM with Wireguard and Gateway

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Client Configurations](#client-configurations) +- [Code Example](#code-example) +- [Detailed Explanation](#detailed-explanation) + - [Get the Client](#get-the-client) + - [Get the Nodes](#get-the-nodes) + - [Deploy the VM](#deploy-the-vm) + - [Deploy the Gateway](#deploy-the-gateway) + - [Get the Deployments Information](#get-the-deployments-information) + - [Disconnect the Client](#disconnect-the-client) + - [Delete the Deployments](#delete-the-deployments) +- [Conclusion](#conclusion) + +*** + +## Introduction + +We present here the relevant information when it comes to deploying a virtual machine with Wireguard and a gateway. + + + + +## Client Configurations + +To configure the client, have a look at [this section](./grid3_javascript_loadclient.md). + + + +## Code Example + +```ts +import { FilterOptions, GatewayNameModel, GridClient, MachineModel, MachinesModel, NetworkModel } from "../src"; +import { config, getClient } from "./client_loader"; +import { log } from "./utils"; + +function createNetworkModel(gwNode: number, name: string): NetworkModel { + return { + name, + addAccess: true, + accessNodeId: gwNode, + ip_range: "10.238.0.0/16", + } as NetworkModel; +} +function createMachineModel(node: number) { + return { + name: "testvm1", + node_id: node, + public_ip: false, + planetary: true, + cpu: 1, + memory: 1024 * 2, + rootfs_size: 0, + disks: [], + flist: "https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-22.04.flist", + entrypoint: "/usr/bin/python3 -m http.server --bind ::", + env: { + SSH_KEY: config.ssh_key, + }, + } as MachineModel; +} +function createMachinesModel(vm: MachineModel, network: NetworkModel): MachinesModel { + return { + name: "newVMs", + network, + machines: [vm], + metadata: "", + description: "test deploying VMs with wireguard via ts grid3 client", + } as MachinesModel; +} +function createGwModel(node_id: number, ip: string, networkName: string, name: string, port: number) { + return { + name, + node_id, + tls_passthrough: false, + backends: [`http://${ip}:${port}`], + network: networkName, + } as GatewayNameModel; +} + +async function main() { + const grid3 = await getClient(); + + const gwNode = +(await grid3.capacity.filterNodes({ gateway: true }))[0].nodeId; + + const vmQueryOptions: FilterOptions = { + cru: 1, + mru: 2, // GB + availableFor: grid3.twinId, + farmId: 1, + }; + const vmNode = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; + + const network = createNetworkModel(gwNode, "monNetwork"); + const vm = createMachineModel(vmNode); + const machines = createMachinesModel(vm, network); + log(`Deploying vm on node: ${vmNode}, with network node: ${gwNode}`); + + // deploy the vm + const vmResult = await grid3.machines.deploy(machines); + log(vmResult); + + const deployedVm = await grid3.machines.getObj(machines.name); + log("+++ deployed vm +++"); + log(deployedVm); + + // deploy the gateway + const vmPrivateIP = (deployedVm as { interfaces: { ip: string }[] }[])[0].interfaces[0].ip; + const gateway = createGwModel(gwNode, vmPrivateIP, network.name, "pyserver", 8000); + log(`deploying gateway ${network.name} on node ${gwNode}`); + + const gatewayResult = await grid3.gateway.deploy_name(gateway); + log(gatewayResult); + + log("+++ Deployed gateway +++"); + + const deployedGw = await grid3.gateway.getObj(gateway.name); + log(deployedGw); + + await grid3.disconnect(); +} + +main(); + +``` + + +## Detailed Explanation + +What we need to do with that code is: Deploy a name gateway with the wireguard IP as the backend; that allows accessing a server inside the vm through the gateway using the private network (wireguard) as the backend. + +This will be done through the following steps: + +### Get the Client + +```ts +const grid3 = getClient(); +``` + +### Get the Nodes + +Determine the deploying nodes for the vm, network and gateway. + +- Gateway and network access node + + ```ts + const gwNode = +(await grid3.capacity.filterNodes({ gateway: true }))[0].nodeId; + ``` + + Using the `filterNodes` method, will get the first gateway node id, we will deploy the gateway and will use it as our network access node. + + > The gateway node must be the same as the network access node. +- VM node + + we need to set the filter options first for this example we will deploy the vm with 1 cpu, 2 GB of memory. + now will crete a `FilterOptions` object with that specs and get the firs node id of the result. + + ```ts + const vmQueryOptions: FilterOptions = { + cru: 1, + mru: 2, // GB + availableFor: grid3.twinId, + farmId: 1, + }; + const vmNode = +(await grid3.capacity.filterNodes(vmQueryOptions))[0].nodeId; + ``` + +### Deploy the VM + +We need to create the network and machine models, the deploy the VM + +```ts +const network = createNetworkModel(gwNode, "monNetwork"); +const vm = createMachineModel(vmNode); +const machines = createMachinesModel(vm, network); +log(`Deploying vm on node: ${vmNode}, with network node: ${gwNode}`); + +// deploy the vm +const vmResult = await grid3.machines.deploy(machines); +log(vmResult); +``` + +- `CreateNetWorkModel` : + we are creating a network and set the node id to be `gwNode`, the name `monNetwork` and inside the function we set `addAccess: true` to add __wireguard__ access. + +- `createMachineModel` and `createMachinesModel` is similar to the previous section of [deploying a single VM](../javascript/grid3_javascript_vm.md), but we are passing the created `NetworkModel` to the machines model and the entry point here runs a simple python server. + +### Deploy the Gateway + +Now we have our VM deployed with it's network, we need to make the gateway on the same node, same network and pointing to the VM's private IP address. + +- Get the VM's private IP address: + + ```ts + const vmPrivateIP = (deployedVm as { interfaces: { ip: string }[] }[])[0].interfaces[0].ip; + ``` + +- Create the Gateway name model: + + ```ts + const gateway = createGwModel(gwNode, vmPrivateIP, network.name, "pyserver", 8000); + ``` + + This will create a `GatewayNameModel` with the following properties: + + - `name` : the subdomain name + - `node_id` : the gateway node id + - `tls_passthrough: false` + - `backends: [`http://${ip}:${port}`]` : the private ip address and the port number of our machine + - `network: networkName` : the network name, we already created earlier. + +### Get the Deployments Information + + ```ts + const deployedVm = await grid3.machines.getObj(machines.name); + log("+++ deployed vm +++"); + log(deployedVm); + + log("+++ Deployed gateway +++"); + const deployedGw = await grid3.gateway.getObj(gateway.name); + log(deployedGw); + ``` + +- `deployedVm` : is an array of one object contains the details about the vm deployment. + + ```ts + [ + { + version: 0, + contractId: 30658, + nodeId: 11, + name: 'testvm1', + created: 1686225126, + status: 'ok', + message: '', + flist: 'https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-22.04.flist', + publicIP: null, + planetary: '302:9e63:7d43:b742:3582:a831:cd41:3f19', + interfaces: [ { network: 'monNetwork', ip: '10.238.2.2' } ], + capacity: { cpu: 1, memory: 2048 }, + mounts: [], + env: { + SSH_KEY: 'ssh' + }, + entrypoint: '/usr/bin/python3 -m http.server --bind ::', + metadata: '{"type":"vm","name":"newVMs","projectName":""}', + description: 'test deploying VMs with wireguard via ts grid3 client', + rootfs_size: 0, + corex: false + } + ] + ``` + +- `deployedGw` : is an array of one object contains the details of the gateway name. + + ```ts + [ + { + version: 0, + contractId: 30659, + name: 'pyserver1', + created: 1686225139, + status: 'ok', + message: '', + type: 'gateway-name-proxy', + domain: 'pyserver1.gent02.dev.grid.tf', + tls_passthrough: false, + backends: [ 'http://10.238.2.2:8000' ], + metadata: '{"type":"gateway","name":"pyserver1","projectName":""}', + description: '' + } + ] + ``` + + Now we can access the vm using the `domain` that returned in the object. + +### Disconnect the Client + +finally we need to disconnect the client using `await grid3.disconnect();` + +### Delete the Deployments + +If we want to delete the deployments we can just do this: + +```ts + const deletedMachines = await grid3.machines.delete({ name: machines.name}); + log(deletedMachines); + + const deletedGW = await grid3.gateway.delete_name({ name: gateway.name}); + log(deletedGW); +``` + + + +## Conclusion + +This section presented a detailed description on how to create a virtual machine with private IP using Wireguard and use it as a backend for a name gateway. + +If you have any questions, you can ask the ThreeFold community for help on the [ThreeFold Forum](http://forum.threefold.io/) or on the [ThreeFold Grid Tester Community](https://t.me/threefoldtesting) on Telegram. \ No newline at end of file diff --git a/collections/documentation/developers/javascript/sidebar.md b/collections/documentation/developers/javascript/sidebar.md new file mode 100644 index 0000000..421992b --- /dev/null +++ b/collections/documentation/developers/javascript/sidebar.md @@ -0,0 +1,11 @@ +- [Installation](@grid3_javascript_installation) +- [Loading client](@grid3_javascript_loadclient) +- [Deploy a VM](@grid3_javascript_vm) +- [Capacity planning](@grid3_javascript_capacity_planning) +- [Deploy multiple VMs](@grid3_javascript_vms) +- [Deploy CapRover](@grid3_javascript_caprover) +- [Gateways](@grid3_javascript_vm_gateways) +- [Deploy a Kubernetes cluster](@grid3_javascript_kubernetes) +- [Deploy a ZDB](@grid3_javascript_zdb) +- [QSFS](@grid3_javascript_qsfs) +- [Key Value Store](@grid3_javascript_kvstore) diff --git a/collections/documentation/developers/proxy/commands.md b/collections/documentation/developers/proxy/commands.md new file mode 100644 index 0000000..94e78ad --- /dev/null +++ b/collections/documentation/developers/proxy/commands.md @@ -0,0 +1,127 @@ +

Commands

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Work on Docs](#work-on-docs) +- [To start the GridProxy server](#to-start-the-gridproxy-server) +- [Run tests](#run-tests) + +*** + +## Introduction + +The Makefile makes it easier to do mostly all the frequently commands needed to work on the project. + +## Work on Docs + +we are using [swaggo/swag](https://github.com/swaggo/swag) to generate swagger docs based on the annotation inside the code. + +- install swag executable binary + + ```bash + go install github.com/swaggo/swag/cmd/swag@latest + ``` + +- now if you check the binary directory inside go directory you will find the executable file. + + ```bash + ls $(go env GOPATH)/bin + ``` + +- to run swag you can either use the full path `$(go env GOPATH)/bin/swag` or export go binary to `$PATH` + + ```bash + export PATH=$PATH:$(go env GOPATH)/bin + ``` + +- use swag to format code comments. + + ```bash + swag fmt + ``` + +- update the docs + + ```bash + swag init + ``` + +- to parse external types from vendor + + ```bash + swag init --parseVendor + ``` + +- for a full generate docs command + + ```bash + make docs + ``` + +## To start the GridProxy server + +After preparing the postgres database you can `go run` the main file in `cmds/proxy_server/main.go` which responsible for starting all the needed server/clients. + +The server options + +| Option | Description | +| ------------------ | ----------------------------------------------------------------------------------------------------------------------- | +| -address | Server ip address (default `":443"`) | +| -ca | certificate authority used to generate certificate (default `"https://acme-staging-v02.api.letsencrypt.org/directory"`) | +| -cert-cache-dir | path to store generated certs in (default `"/tmp/certs"`) | +| -domain | domain on which the server will be served | +| -email | email address to generate certificate with | +| -log-level | log level `[debug\|info\|warn\|error\|fatal\|panic]` (default `"info"`) | +| -no-cert | start the server without certificate | +| -postgres-db | postgres database | +| -postgres-host | postgres host | +| -postgres-password | postgres password | +| -postgres-port | postgres port (default 5432) | +| -postgres-user | postgres username | +| -tfchain-url | tF chain url (default `"wss://tfchain.dev.grid.tf/ws"`) | +| -relay-url | RMB relay url (default`"wss://relay.dev.grid.tf"`) | +| -mnemonics | Dummy user mnemonics for relay calls | +| -v | shows the package version | + +For a full server setup: + +```bash +make restart +``` + +## Run tests + +There is two types of tests in the project + +- Unit Tests + - Found in `pkg/client/*_test.go` + - Run with `go test -v ./pkg/client` +- Integration Tests + - Found in `tests/queries/` + - Run with: + + ```bash + go test -v \ + --seed 13 \ + --postgres-host \ + --postgres-db tfgrid-graphql \ + --postgres-password postgres \ + --postgres-user postgres \ + --endpoint \ + --mnemonics + ``` + + - Or to run a specific test you can append the previous command with + + ```bash + -run + ``` + + You can found the TestName in the `tests/queries/*_test.go` files. + +To run all the tests use + +```bash +make test-all +``` diff --git a/collections/documentation/developers/proxy/contributions.md b/collections/documentation/developers/proxy/contributions.md new file mode 100644 index 0000000..3960676 --- /dev/null +++ b/collections/documentation/developers/proxy/contributions.md @@ -0,0 +1,55 @@ +

Contributions Guide

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Project structure](#project-structure) + - [Internal](#internal) + - [Pkg](#pkg) +- [Writing tests](#writing-tests) + +*** + +## Introduction + +We propose a quick guide to learn how to contribute. + +## Project structure + +The main structure of the code base is as follows: + +- `charts`: helm chart +- `cmds`: includes the project Golang entrypoints +- `docs`: project documentation +- `internal`: contains the explorer API logic and the cert manager implementation, this where most of the feature work will be done +- `pkg`: contains client implementation and shared libs +- `tests`: integration tests +- `tools`: DB tools to prepare the Postgres DB for testing and development +- `rootfs`: ZOS root endpoint that will be mounted in the docker image + +### Internal + +- `explorer`: contains the explorer server logic: + - `db`: the db connection and operations + - `mw`: defines the generic action mount that will be be used as http handler +- `certmanager`: logic to ensure certificates are available and up to date + +`server.go` includes the logic for all the API operations. + +### Pkg + +- `client`: client implementation +- `types`: defines all the API objects + +## Writing tests + +Adding a new endpoint should be accompanied with a corresponding test. Ideally every change or bug fix should include a test to ensure the new behavior/fix is working as intended. + +Since these are integration tests, you need to first make sure that your local db is already seeded with the ncessary data. See tools [doc](./db_testing.md) for more information about how to prepare your db. + +Testing tools offer two clients that are the basic of most tests: + +- `local`: this client connects to the local db +- `proxy client`: this client connects to the running local instance + +You need to start an instance of the server before running the tests. Check [here](./commands.md) for how to start. diff --git a/collections/documentation/developers/proxy/database.md b/collections/documentation/developers/proxy/database.md new file mode 100644 index 0000000..58c327a --- /dev/null +++ b/collections/documentation/developers/proxy/database.md @@ -0,0 +1,21 @@ +

Database

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Max Open Connections](#max-open-connections) + +*** + +## Introduction + +The grid proxy has access to a postgres database containing information about the tfgrid, specifically information about grid nodes, farms, twins, and contracts.\ +The database is filled/updated by this [indexer](https://github.com/threefoldtech/tfchain_graphql). +The grid proxy mainly retrieves information from the db with a few modifications for efficient retrieval (e.g. adding indices, caching node gpus, etc..). + +## Max Open Connections + +The postgres database can handle 100 open connections concurrently (that is the default value set by postgres), this number can be increased, depending on the infrastructure, by modifying it in the postgres.conf file where the db is deployed, or by executing the following query `ALTER system SET max_connections=size-of-connection`, but this requires a db restart to take effect.\ +The explorer creates a connection pool to the postgres db, with a max open pool connections set to a specific number (currently 80).\ +It's important to distinguish between the database max connections, and the max pool open connections, because if the pool did not have any constraints, it would try to open as many connections as it wanted, without any notion of the maximum connections the database accepts. It's the database responsibility then to accept or deny the connection.\ +This is why the max number of open pool connections is set to 80: It's below the max connections the database could handle (100), and it gives room for other actors outside of the explorer to open connections with the database.\ diff --git a/collections/documentation/developers/proxy/db_testing.md b/collections/documentation/developers/proxy/db_testing.md new file mode 100644 index 0000000..60bffed --- /dev/null +++ b/collections/documentation/developers/proxy/db_testing.md @@ -0,0 +1,45 @@ +

DB for testing

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Run postgresql container](#run-postgresql-container) +- [Create the DB](#create-the-db) + - [Method 1: Generate a db with relevant schema using the db helper tool:](#method-1-generate-a-db-with-relevant-schema-using-the-db-helper-tool) + - [Method 2: Fill the DB from a Production db dump file, for example if you have `dump.sql` file, you can run:](#method-2-fill-the-db-from-a-production-db-dump-file-for-example-if-you-have-dumpsql-file-you-can-run) + +*** + +## Introduction + +We show how to use a database for testing. + +## Run postgresql container + + ```bash + docker run --rm --name postgres \ + -e POSTGRES_USER=postgres \ + -e POSTGRES_PASSWORD=postgres \ + -e POSTGRES_DB=tfgrid-graphql \ + -p 5432:5432 -d postgres + ``` + +## Create the DB +you can either Generate a db with relevant schema to test things locally quickly, or load a previously taken DB dump file: + +### Method 1: Generate a db with relevant schema using the db helper tool: + + ```bash + cd tools/db/ && go run . \ + --postgres-host 127.0.0.1 \ + --postgres-db tfgrid-graphql \ + --postgres-password postgres \ + --postgres-user postgres \ + --reset \ + ``` + +### Method 2: Fill the DB from a Production db dump file, for example if you have `dump.sql` file, you can run: + + ```bash + psql -h 127.0.0.1 -U postgres -d tfgrid-graphql < dump.sql + ``` diff --git a/collections/documentation/developers/proxy/explorer.md b/collections/documentation/developers/proxy/explorer.md new file mode 100644 index 0000000..2651cae --- /dev/null +++ b/collections/documentation/developers/proxy/explorer.md @@ -0,0 +1,38 @@ +

The Grid Explorer

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Explorer Overview](#explorer-overview) +- [Explorer Endpoints](#explorer-endpoints) + +*** + +## Introduction + +The Grid Explorer is a rest API used to index a various information from the TFChain. + +## Explorer Overview + +- Due to limitations on indexing information from the blockchain, Complex inter-tables queries and limitations can't be applied directly on the chain. +- Here comes the TFGridDB, a shadow database contains all the data on the chain which is being updated each 2 hours. +- Then the explorer can apply a raw SQL queries on the database with all limitations and filtration needed. +- The used technology to extract the info from the blockchain is Subsquid check the [repo](https://github.com/threefoldtech/tfchain_graphql). + +## Explorer Endpoints + +| HTTP Verb | Endpoint | Description | +| --------- | --------------------------- | ---------------------------------- | +| GET | `/contracts` | Show all contracts on the chain | +| GET | `/farms` | Show all farms on the chain | +| GET | `/gateways` | Show all gateway nodes on the grid | +| GET | `/gateways/:node_id` | Get a single gateway node details | +| GET | `/gateways/:node_id/status` | Get a single node status | +| GET | `/nodes` | Show all nodes on the grid | +| GET | `/nodes/:node_id` | Get a single node details | +| GET | `/nodes/:node_id/status` | Get a single node status | +| GET | `/stats` | Show the grid statistics | +| GET | `/twins` | Show all the twins on the chain | +| GET | `/nodes/:node_id/statistics`| Get a single node ZOS statistics | + +For the available filters on each node. check `/swagger/index.html` endpoint on the running instance. diff --git a/collections/documentation/developers/proxy/production.md b/collections/documentation/developers/proxy/production.md new file mode 100644 index 0000000..fe4e108 --- /dev/null +++ b/collections/documentation/developers/proxy/production.md @@ -0,0 +1,117 @@ +

Running Proxy in Production

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Production Run](#production-run) +- [To upgrade the machine](#to-upgrade-the-machine) +- [Dockerfile](#dockerfile) +- [Update helm package](#update-helm-package) +- [Install the chart using helm package](#install-the-chart-using-helm-package) + +*** + +## Introduction + +We show how to run grid proxy in production. + +## Production Run + +- Download the latest binary [here](https://github.com/threefoldtech/tfgrid-sdk-go/tree/development/grid-client) +- add the execution permission to the binary and move it to the bin directory + + ```bash + chmod +x ./gridproxy-server + mv ./gridproxy-server /usr/local/bin/gridproxy-server + ``` + +- Add a new systemd service + +```bash +cat << EOF > /etc/systemd/system/gridproxy-server.service +[Unit] +Description=grid proxy server +After=network.target + +[Service] +ExecStart=gridproxy-server --domain gridproxy.dev.grid.tf --email omar.elawady.alternative@gmail.com -ca https://acme-v02.api.letsencrypt.org/directory --postgres-host 127.0.0.1 --postgres-db db --postgres-password password --postgres-user postgres --mnemonics +Type=simple +Restart=always +User=root +Group=root + +[Install] +WantedBy=multi-user.target +Alias=gridproxy.service +EOF +``` + +- enable the service + + ```bash + systemctl enable gridproxy.service + ``` + +- start the service + + ```bash + systemctl start gridproxy.service + ``` + +- check the status + + ```bash + systemctl status gridproxy.service + ``` + +- The command options: + - domain: the host domain which will generate ssl certificate to. + - email: the mail used to run generate the ssl certificate. + - ca: certificate authority server url, e.g. + - let's encrypt staging: `https://acme-staging-v02.api.letsencrypt.org/directory` + - let's encrypt production: `https://acme-v02.api.letsencrypt.org/directory` + - postgres -\*: postgres connection info. + +## To upgrade the machine + +- just replace the binary with the new one and apply + +```bash +systemctl restart gridproxy-server.service +``` + +- it you have changes in the `/etc/systemd/system/gridproxy-server.service` you have to run this command first + +```bash +systemctl daemon-reload +``` + +## Dockerfile + +To build & run dockerfile + +```bash +docker build -t threefoldtech/gridproxy . +docker run --name gridproxy -e POSTGRES_HOST="127.0.0.1" -e POSTGRES_PORT="5432" -e POSTGRES_DB="db" -e POSTGRES_USER="postgres" -e POSTGRES_PASSWORD="password" -e MNEMONICS="" threefoldtech/gridproxy +``` + +## Update helm package + +- Do `helm lint charts/gridproxy` +- Regenerate the packages `helm package -u charts/gridproxy` +- Regenerate index.yaml `helm repo index --url https://threefoldtech.github.io/tfgridclient_proxy/ .` +- Push your changes + +## Install the chart using helm package + +- Adding the repo to your helm + + ```bash + helm repo add gridproxy https://threefoldtech.github.io/tfgridclient_proxy/ + ``` + +- install a chart + + ```bash + helm install gridproxy/gridproxy + ``` diff --git a/collections/documentation/developers/proxy/proxy.md b/collections/documentation/developers/proxy/proxy.md new file mode 100644 index 0000000..7dc936c --- /dev/null +++ b/collections/documentation/developers/proxy/proxy.md @@ -0,0 +1,149 @@ +

Introducing Grid Proxy

+ +

Table of Content

+ +- [About](#about) +- [How to Use the Project](#how-to-use-the-project) +- [Used Technologies \& Prerequisites](#used-technologies--prerequisites) +- [Start for Development](#start-for-development) +- [Setup for Production](#setup-for-production) +- [Get and Install the Binary](#get-and-install-the-binary) +- [Add as a Systemd Service](#add-as-a-systemd-service) + +*** + + + +## About + +The TFGrid client Proxy acts as an interface to access information about the grid. It supports features such as filtering, limitation, and pagination to query the various entities on the grid like nodes, contracts and farms. Additionally the proxy can contact the required twin ID to retrieve stats about the relevant objects and performing ZOS calls. + +The proxy is used as the backend of several threefold projects like: + +- [Dashboard](../../dashboard/dashboard.md) + + + +## How to Use the Project + +If you don't want to care about setting up your instance you can use one of the live instances. each works against a different TFChain network. + +- Dev network: + - Swagger: +- Qa network: + - Swagger: +- Test network: + - Swagger: +- Main network: + - Swagger: + +Or follow the [development guide](#start-for-development) to run yours. +By default, the instance runs against devnet. to configure that you will need to config this while running the server. + +> Note: You may face some differences between each instance and the others. that is normal because each network is in a different stage of development and works correctly with others parts of the Grid on the same network. + + +## Used Technologies & Prerequisites + +1. **GoLang**: Mainly the two parts of the project written in `Go 1.17`, otherwise you can just download the compiled binaries from github [releases](https://github.com/threefoldtech/tfgrid-sdk-go/releases) +2. **Postgresql**: Used to load the TFGrid DB +3. **Docker**: Containerize the running services such as Postgres and Redis. +4. **Mnemonics**: Secret seeds for adummy identity to use for the relay client. + +For more about the prerequisites and how to set up and configure them. follow the [Setup guide](./setup.md) + + + +## Start for Development + +To start the services for development or testing make sure first you have all the [Prerequisites](#used-technologies--prerequisites). + +- Clone this repo + + ```bash + git clone https://github.com/threefoldtech/tfgrid-sdk-go.git + cd tfgrid-sdk-go/grid-proxy + ``` + +- The `Makefile` has all that you need to deal with Db, Explorer, Tests, and Docs. + + ```bash + make help # list all the available subcommands. + ``` + +- For a quick test explorer server. + + ```bash + make all-start e= + ``` + + Now you can access the server at `http://localhost:8080` +- Run the tests + + ```bash + make test-all + ``` + +- Generate docs. + + ```bash + make docs + ``` + +To run in development environment see [here](./db_testing.md) how to generate test db or load a db dump then use: + +```sh +go run cmds/proxy_server/main.go --address :8080 --log-level debug -no-cert --postgres-host 127.0.0.1 --postgres-db tfgrid-graphql --postgres-password postgres --postgres-user postgres --mnemonics +``` + +Then visit `http://localhost:8080/` + +For more illustrations about the commands needed to work on the project, see the section [Commands](./commands.md). For more info about the project structure and contributions guidelines check the section [Contributions](./contributions.md). + + + +## Setup for Production + +## Get and Install the Binary + +- You can either build the project: + + ```bash + make build + chmod +x cmd/proxy_server/server \ + && mv cmd/proxy_server/server /usr/local/bin/gridproxy-server + ``` + +- Or download a release: + Check the [releases](https://github.com/threefoldtech/tfgrid-sdk-go/releases) page and edit the next command with the chosen version. + + ```bash + wget https://github.com/threefoldtech/tfgrid-sdk-go/releases/download/v1.6.7-rc2/tfgridclient_proxy_1.6.7-rc2_linux_amd64.tar.gz \ + && tar -xzf tfgridclient_proxy_1.6.7-rc2_linux_amd64.tar.gz \ + && chmod +x server \ + && mv server /usr/local/bin/gridproxy-server + ``` + +## Add as a Systemd Service + +- Create the service file + + ```bash + cat << EOF > /etc/systemd/system/gridproxy-server.service + [Unit] + Description=grid proxy server + After=network.target + + [Service] + ExecStart=gridproxy-server --domain gridproxy.dev.grid.tf --email omar.elawady.alternative@gmail.com -ca https://acme-v02.api.letsencrypt.org/directory --substrate wss://tfchain.dev.grid.tf/ws --postgres-host 127.0.0.1 --postgres-db db --postgres-password password --postgres-user postgres --mnemonics + Type=simple + Restart=always + User=root + Group=root + + [Install] + WantedBy=multi-user.target + Alias=gridproxy.service + EOF + ``` + diff --git a/collections/documentation/developers/proxy/proxy_readme.md b/collections/documentation/developers/proxy/proxy_readme.md new file mode 100644 index 0000000..aaf4266 --- /dev/null +++ b/collections/documentation/developers/proxy/proxy_readme.md @@ -0,0 +1,25 @@ +

Grid Proxy

+ +Welcome to the *Grid Proxy* section of the TFGrid Manual! + +In this comprehensive guide, we delve into the intricacies of the ThreeFold Grid Proxy, a fundamental component that empowers the ThreeFold Grid ecosystem. + +This section is designed to provide users, administrators, and developers with a detailed understanding of the TFGrid Proxy, offering step-by-step instructions for its setup, essential commands, and insights into its various functionalities. + +The Grid Proxy plays a pivotal role in facilitating secure and efficient communication between nodes within the ThreeFold Grid, contributing to the decentralized and autonomous nature of the network. + +Whether you are a seasoned ThreeFold enthusiast or a newcomer exploring the decentralized web, this manual aims to be your go-to resource for navigating the ThreeFold Grid Proxy landscape. + +To assist you on your journey, we have organized the content into distinct chapters below, covering everything from initial setup procedures and database testing to practical commands, contributions, and insights into the ThreeFold Explorer and the Grid Proxy Database functionalities. + +

Table of Contents

+ +- [Introducing Grid Proxy](./proxy.md) +- [Setup](./setup.md) +- [DB Testing](./db_testing.md) +- [Commands](./commands.md) +- [Contributions](./contributions.md) +- [Explorer](./explorer.md) +- [Database](./database.md) +- [Production](./production.md) +- [Release](./release.md) \ No newline at end of file diff --git a/collections/documentation/developers/proxy/release.md b/collections/documentation/developers/proxy/release.md new file mode 100644 index 0000000..5f5fe84 --- /dev/null +++ b/collections/documentation/developers/proxy/release.md @@ -0,0 +1,32 @@ +

Release Grid-Proxy

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Steps](#steps) +- [Debugging](#debugging) + +*** + +## Introduction + +We show the steps to release a new version of the Grid Proxy. + +## Steps + +To release a new version of the Grid-Proxy component, follow these steps: + +Update the `appVersion` field in the `charts/Chart.yaml` file. This field should reflect the new version number of the release. + +The release process includes generating and pushing a Docker image with the latest GitHub tag. This step is automated through the `gridproxy-release.yml` workflow. + +Trigger the `gridproxy-release.yml` workflow by pushing the desired tag to the repository. This will initiate the workflow, which will generate the Docker image based on the tag and push it to the appropriate registry. + +## Debugging +In the event that the workflow does not run automatically after pushing the tag and making the release, you can manually execute it using the GitHub Actions interface. Follow these steps: + +Go to the [GitHub Actions page](https://github.com/threefoldtech/tfgrid-sdk-go/actions/workflows/gridproxy-release.yml) for the Grid-Proxy repository. + +Locate the workflow named gridproxy-release.yml. + +Trigger the workflow manually by selecting the "Run workflow" option. \ No newline at end of file diff --git a/collections/documentation/developers/proxy/setup.md b/collections/documentation/developers/proxy/setup.md new file mode 100644 index 0000000..fa8d07f --- /dev/null +++ b/collections/documentation/developers/proxy/setup.md @@ -0,0 +1,50 @@ +

Setup

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Install Golang](#install-golang) +- [Docker](#docker) +- [Postgres](#postgres) +- [Get Mnemonics](#get-mnemonics) + +*** + +## Introduction + +We show how to set up grid proxy. + +## Install Golang + +To install Golang, you can follow the official [guide](https://go.dev/doc/install). + +## Docker + +Docker is useful for running the TFGridDb in container environment. Read this to [install Docker engine](../../system_administrators/computer_it_basics/docker_basics.md#install-docker-desktop-and-docker-engine). + +Note: it will be necessary to follow step #2 in the previous article to run docker without sudo. if you want to avoid that. edit the docker commands in the `Makefile` and add sudo. + +## Postgres + +If you have docker installed you can run postgres on a container with: + +```bash +make db-start +``` + +Then you can either load a dump of the database if you have one: + +```bash +make db-dump p=~/dump.sql +``` + +or easier you can fill the database tables with randomly generated data with the script `tools/db/generate.go` to do that run: + +```bash +make db-fill +``` + +## Get Mnemonics + +1. Install [polkadot extension](https://github.com/polkadot-js/extension) on your browser. +2. Create a new account from the extension. It is important to save the seeds. diff --git a/collections/documentation/developers/tfchain/farming_policies.md b/collections/documentation/developers/tfchain/farming_policies.md new file mode 100644 index 0000000..e997b8f --- /dev/null +++ b/collections/documentation/developers/tfchain/farming_policies.md @@ -0,0 +1,94 @@ +

Farming Policies

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Farming Policy Fields](#farming-policy-fields) +- [Limits on linked policy](#limits-on-linked-policy) +- [Creating a Policy](#creating-a-policy) +- [Linking a policy to a Farm](#linking-a-policy-to-a-farm) + +*** + +## Introduction + +A farming policy defines how farming rewards are handed out for nodes. Every node has a farming policy attached. A farming policy is either linked to a farm, in which case new nodes are given the farming policy of the farm they are in once they register themselves. Alternatively a farming policy can be a "default". These are not attached to a farm, but instead they are used for nodes registered in farms which don't have a farming policy. Multiple defaults can exist at the same time, and the most fitting should be chosen. + +## Farming Policy Fields + +A farming policy has the following fields: + +- id (used to link policies) +- name +- Default. This indicates if the policy can be used by any new node (if the parent farm does not have a dedicated attached policy). Essentially, a `Default` policy serves as a base which can be overriden per farm by linking a non default policy to said farm. +- Reward tft per CU, SU and NU, IPV4 +- Minimal uptime needed in integer format (example 995) +- Policy end (After this block number the policy can not be linked to new farms any more) +- If this policy is immutable or not. Immutable policies can never be changed again + +Additionally, we also use the following fields, though those are only useful for `Default` farming policies: + +- Node needs to be certified +- Farm needs to be certified (with certification level, which will be changed to an enum). + +In case a farming policy is not attached to a farm, new nodes will pick the most appropriate farming policy from the default ones. To decide which one to pick, they should be considered in order with most restrictive first until one matches. That means: + +- First check for the policy with highest farming certification (in the current case gold) and certified nodes +- Then check for a policy with highest farming certification (in the current case gold) and non certified nodes +- Check for policy without farming certification but certified nodes +- Last check for a policy without any kind of certification + +Important here is that certification of a node only happens after it comes live for the first time. As such, when a node gets certified, farming certification needs to be re evaluated, but only if the currently attached farming policy on the node is a `Default` policy (as specifically linked policies have priority over default ones). When evaluating again, we first consider if we are eligible for the farming policy linked to the farm, if any. + +## Limits on linked policy + +When a council member attaches a policy to a farm, limits can be set. These limits define how much a policy can be used for nodes, before it becomes unusable and gets removed. The limits currently are: + +- Farming Policy ID: the ID of the farming policy which we want to limit to a farm. +- CU. Every time a node is added in the farm, it's CU is calculated and deducted from this amount. If the amount drops below 0, the maximum amount of CU that can be attached to this policy is reached. +- SU. Every time a node is added in the farm, it's SU is calculated and deducted from this amount. If the amount drops below 0, the maximum amount of SU that can be attached to this policy is reached. +- End date. After this date the policy is not effective anymore and can't be used. It is removed from the farm and a default policy is used. +- Certification. If set, only certified nodes can get this policy. Non certified nodes get a default policy. + +Once a limit is reached, the farming policy is removed from the farm, so new nodes will get one of the default policies until a new policy is attached to the farm. + +## Creating a Policy + +A council member can create a Farming Policy (DAO) in the following way: + +1: Open [PolkadotJS](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Ftfchain.grid.tf#/extrinsics) apps on the corresponding network and go to `Extrinsics` +2: Now select the account to propose from (should be an account that's a council member). +3: Select as action `dao` -> `propose` +5: Set a `threshold` (amount of farmers to vote) +6: Select an actions `tfgridModule` -> `createFarmingPolicy` and fill in all the fields. +7: Create a forum post with the details of the farming policy and fill in the link of that post in the `link` field +8: Give it some good `description`. +9: Duration is optional (by default it's 7 days). A proposal cannot be closed before the duration is "expired". If you wish to set a duration, the duration should be expressed in number of blocks from `now`. For example, 2 hours is equal to 1200 blocks (blocktime is 6 seconds) in this case, the duration should be filled in as `1200`. +10: If all the fields are filled in, click `Propose`, now Farmers can vote. A proposal can be closed manually once there are enough votes AND the proposal is expired. To close go to extrinsics -> `dao` -> `close` -> fill in proposal hash and index (both can be found in chainstate). + +All (su, cu, nu, ipv4) values should be expressed in units USD. Minimal uptime should be expressed as integer that represents an percentage (example: `95`). + +Policy end is optional (0 or some block number in the future). This is used for expiration. + +For reference: + +![image](./img/create_policy.png) + +## Linking a policy to a Farm + +First identify the policy ID to link to a farm. You can check for farming policies in [chainstate](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Ftfchain.grid.tf#/chainstate) -> `tfgridModule` -> `farmingPolciesMap`, start with ID 1 and increment with 1 until you find the farming policy which was created when the proposal was expired and closed. + +1: Open [PolkadotJS](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Ftfchain.grid.tf#/extrinsics) apps on the corresponding network and go to `Extrinsics` +2: Now select the account to propose from (should be an account that's a council member). +3: Select as proposal `dao` -> `propose` +4: Set a `threshold` (amount of farmers to vote) +5: Select an actions `tfgridModule` -> `attachPolicyToFarm` and fill in all the fields (FarmID and Limits). +6: Limits contains a `farming_policy_id` (Required) and cu, su, end, node count (which are all optional). It also contains `node_certification`, if this is set to true only certified nodes can have this policy. +7: Create a forum post with the details of why we want to link that farm to that policy and fill in the link of that post in the `link` field +8: Give it some good `description`. +9: Duration is optional (by default it's 7 days). A proposal cannot be closed before the duration is "expired". If you wish to set a duration, the duration should be expressed in number of blocks from `now`. For example, 2 hours is equal to 1200 blocks (blocktime is 6 seconds) in this case, the duration should be filled in as `1200`. +10: If all the fields are filled in, click `Propose`, now Farmers can vote. A proposal can be closed manually once there are enough votes AND the proposal is expired. To close go to extrinsics -> `dao` -> `close` -> fill in proposal hash and index (both can be found in chainstate). + +For reference: + +![image](./img/attach.png) diff --git a/collections/documentation/developers/tfchain/img/TF.png b/collections/documentation/developers/tfchain/img/TF.png new file mode 100644 index 0000000..528b5d9 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/TF.png differ diff --git a/collections/documentation/developers/tfchain/img/attach.png b/collections/documentation/developers/tfchain/img/attach.png new file mode 100644 index 0000000..96e3c5f Binary files /dev/null and b/collections/documentation/developers/tfchain/img/attach.png differ diff --git a/collections/documentation/developers/tfchain/img/close_proposal.png b/collections/documentation/developers/tfchain/img/close_proposal.png new file mode 100644 index 0000000..07e66a2 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/close_proposal.png differ diff --git a/collections/documentation/developers/tfchain/img/create_contract.png b/collections/documentation/developers/tfchain/img/create_contract.png new file mode 100644 index 0000000..f082e80 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/create_contract.png differ diff --git a/collections/documentation/developers/tfchain/img/create_policy.png b/collections/documentation/developers/tfchain/img/create_policy.png new file mode 100644 index 0000000..fa344e7 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/create_policy.png differ diff --git a/collections/documentation/developers/tfchain/img/create_provider.png b/collections/documentation/developers/tfchain/img/create_provider.png new file mode 100644 index 0000000..e8668a2 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/create_provider.png differ diff --git a/collections/documentation/developers/tfchain/img/propose_approve.png b/collections/documentation/developers/tfchain/img/propose_approve.png new file mode 100644 index 0000000..667f66f Binary files /dev/null and b/collections/documentation/developers/tfchain/img/propose_approve.png differ diff --git a/collections/documentation/developers/tfchain/img/proposed_approve.png b/collections/documentation/developers/tfchain/img/proposed_approve.png new file mode 100644 index 0000000..5202c4c Binary files /dev/null and b/collections/documentation/developers/tfchain/img/proposed_approve.png differ diff --git a/collections/documentation/developers/tfchain/img/query_provider.png b/collections/documentation/developers/tfchain/img/query_provider.png new file mode 100644 index 0000000..de66d4c Binary files /dev/null and b/collections/documentation/developers/tfchain/img/query_provider.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_approve.png b/collections/documentation/developers/tfchain/img/service_contract_approve.png new file mode 100644 index 0000000..1e0d034 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_approve.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_bill.png b/collections/documentation/developers/tfchain/img/service_contract_bill.png new file mode 100644 index 0000000..55e84fe Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_bill.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_cancel.png b/collections/documentation/developers/tfchain/img/service_contract_cancel.png new file mode 100644 index 0000000..7669510 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_cancel.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_create.png b/collections/documentation/developers/tfchain/img/service_contract_create.png new file mode 100644 index 0000000..69ec62a Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_create.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_id.png b/collections/documentation/developers/tfchain/img/service_contract_id.png new file mode 100644 index 0000000..e49c396 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_id.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_reject.png b/collections/documentation/developers/tfchain/img/service_contract_reject.png new file mode 100644 index 0000000..6235530 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_reject.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_set_fees.png b/collections/documentation/developers/tfchain/img/service_contract_set_fees.png new file mode 100644 index 0000000..6cfa91a Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_set_fees.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_set_metadata.png b/collections/documentation/developers/tfchain/img/service_contract_set_metadata.png new file mode 100644 index 0000000..e472145 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_set_metadata.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_state.png b/collections/documentation/developers/tfchain/img/service_contract_state.png new file mode 100644 index 0000000..e824552 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_state.png differ diff --git a/collections/documentation/developers/tfchain/img/service_contract_twin_from_account.png b/collections/documentation/developers/tfchain/img/service_contract_twin_from_account.png new file mode 100644 index 0000000..293bad2 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/service_contract_twin_from_account.png differ diff --git a/collections/documentation/developers/tfchain/img/vote_proposal.png b/collections/documentation/developers/tfchain/img/vote_proposal.png new file mode 100644 index 0000000..16111a0 Binary files /dev/null and b/collections/documentation/developers/tfchain/img/vote_proposal.png differ diff --git a/collections/documentation/developers/tfchain/introduction.md b/collections/documentation/developers/tfchain/introduction.md new file mode 100644 index 0000000..a983b68 --- /dev/null +++ b/collections/documentation/developers/tfchain/introduction.md @@ -0,0 +1,57 @@ +

ThreeFold Chain

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Deployed instances](#deployed-instances) +- [Create a TFChain twin](#create-a-tfchain-twin) +- [Get your twin ID](#get-your-twin-id) + +*** + +## Introduction + +ThreeFold blockchain (aka TFChain) serves as a registry for Nodes, Farms, Digital Twins and Smart Contracts. +It is the backbone of [ZOS](https://github.com/threefoldtech/zos) and other components. + +## Deployed instances + +- Development network (Devnet): + + - Polkadot UI: [https://polkadot.js.org/apps/?rpc=wss%3A%2F%2F/tfchain.dev.grid.tf#/explorer](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2F/tfchain.dev.grid.tf#/explorer) + - Websocket url: `wss://tfchain.dev.grid.tf` + - GraphQL UI: [https://graphql.dev.grid.tf/graphql](https://graphql.dev.grid.tf/graphql) + +- QA testing network (QAnet): + + - Polkadot UI: [https://polkadot.js.org/apps/?rpc=wss%3A%2F%2F/tfchain.qa.grid.tf#/explorer](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2F/tfchain.qa.grid.tf#/explorer) + - Websocket url: `wss://tfchain.qa.grid.tf` + - GraphQL UI: [https://graphql.qa.grid.tf/graphql](https://graphql.qa.grid.tf/graphql) + +- Test network (Testnet): + + - Polkadot UI: [https://polkadot.js.org/apps/?rpc=wss%3A%2F%2F/tfchain.test.grid.tf#/explorer](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2F/tfchain.test.grid.tf#/explorer) + - Websocket url: `wss://tfchain.test.grid.tf` + - GraphQL UI: [https://graphql.test.grid.tf/graphql](https://graphql.test.grid.tf/graphql) + +- Production network (Mainnet): + + - Polkadot UI: [https://polkadot.js.org/apps/?rpc=wss%3A%2F%2F/tfchain.grid.tf#/explorer](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2F/tfchain.grid.tf#/explorer) + - Websocket url: `wss://tfchain.grid.tf` + - GraphQL UI: [https://graphql.grid.tf/graphql](https://graphql.grid.tf/graphql) + +## Create a TFChain twin + +A twin is a unique identifier linked to a specific account on a given TFChain network. +Actually there are 2 ways to create a twin: + +- With the [Dashboard](../../dashboard/wallet_connector.md) + - a twin is automatically generated while creating a TFChain account +- With the TFConnect app + - a twin is automatically generated while creating a farm (in this case the twin will be created on mainnet) + +## Get your twin ID + +One can retrieve the twin ID associated to his account going to `Developer` -> `Chain state` -> `tfgridModule` -> `twinIdByAccountID()`. + +![service_contract_twin_from_account](img/service_contract_twin_from_account.png) diff --git a/collections/documentation/developers/tfchain/tfchain.md b/collections/documentation/developers/tfchain/tfchain.md new file mode 100644 index 0000000..a575535 --- /dev/null +++ b/collections/documentation/developers/tfchain/tfchain.md @@ -0,0 +1,95 @@ +

ThreeFold Chain

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Twins](#twins) +- [Farms](#farms) +- [Nodes](#nodes) +- [Node Contract](#node-contract) +- [Rent Contract](#rent-contract) +- [Name Contract](#name-contract) +- [Contract billing](#contract-billing) +- [Contract locking](#contract-locking) +- [Contract grace period](#contract-grace-period) +- [DAO](#dao) +- [Farming Policies](#farming-policies) +- [Node Connection price](#node-connection-price) +- [Node Certifiers](#node-certifiers) + +*** + +## Introduction + +ThreeFold Chain (TFChain) is the base layer for everything that interacts with the grid. Nodes, farms, users are registered on the chain. It plays the central role in achieving decentralised consensus between a user and Node to deploy a certain workload. A contract can be created on the chain that is essentially an agreement between a node and user. + +## Twins + +A twin is the central Identity object that is used for every entity that lives on the grid. A twin optionally has an IPV6 planetary network address which can be used for communication between twins no matter of the location they are in. A twin is coupled to a private/public keypair on chain. This keypair can hold TFT on TF Chain. + +## Farms + +A farm must be created before a Node can be booted. Every farms needs to have an unique name and is linked to the Twin that creates the farm. Once a farm is created, a unique ID is generated. This ID can be used to provide to the boot image of a Node. + +## Nodes + +When a node is booted for the first time, it registers itself on the chain and a unique identity is generated for this Node. + +## Node Contract + +A node contract is a contract between a user and a Node to deploy a certain workload. The contract is specified as following: + +``` +{ + "contract_id": auto generated, + "node_id": unique id of the node, + "deployment_data": some additional deployment data + "deployment_hash": hash of the deployment definition signed by the user + "public_ips": number of public ips to attach to the deployment contract +} +``` + +We don't save the raw workload definition on the chain but only a hash of the definition. After the contract is created, the user must send the raw deployment to the specified node in the contract. He can find where to send this data by looking up the Node's twin and contacting that twin over the planetary network. + +## Rent Contract + +A rent contract is also a contract between a user and a Node, but instead of being able to reserve a part of the node's capacity, the full capacity is rented. Once a rent contract is created on a Node by a user, only this user can deploy node contracts on this specific node. A discount of 50% is given if a the user wishes to rent the full capacity of a node by creating a rent contract. All node contracts deployed on a node where a user has a rent contract are free of use expect for the public ip's which can be added on a node contract. + +## Name Contract + +A name contract is a contract that specifies a unique name to be used on the grid's webgateways. Once a name contract is created, this name can be used as and entrypoint for an application on the grid. + +## Contract billing + +Every contract is billed every 1 hour on the chain, the amount that is due is deducted from the user's wallet every 24 hours or when the user cancels his contract. The total amount acrued in those 24 hours gets send to following destinations: + +- 10% goes to the threefold foundation +- 5% goes to staking pool wallet (to be implemented in a later phase) +- 50% goes to certified sales channel +- 35% TFT gets burned + +See [pricing](../../../knowledge_base/cloud/pricing/pricing.md) for more information on how the cost for a contract is calculated. + +## Contract locking + +To not overload the chain with transfer events and others we choose to lock the amount due for a contract every hour and after 24 hours unlock the amount and deduct it in one go. This lock is saved on a user's account, if the user has multiple contracts the locked amount will be stacked. + +## Contract grace period + +When the owner of a contract runs out funds on his wallet to pay for his deployment, the contract goes in to a Grace Period state. The deployment, whatever that might be, will be unaccessible during this period to the user. When the wallet is funded with TFT again, the contract goes back to a normal operating state. If the grace period runs out (by default 2 weeks) the user's deployment and data will be deleted from the node. + +## DAO + +See [DAO](../../dashboard/tfchain/tf_dao.md) for more information on the DAO on TF Chain. + +## Farming Policies + +See [farming_policies](farming_policies.md) for more information on the farming policies on TF Chain. + +## Node Connection price + +A connection price is set to every new Node that boots on the Grid. This connection price influences the amount of TFT farmed in a period. The connection price set on a node is permanent. The DAO can propose the increase / decrease of the connection price. At the time of writing the connection price is set to $ 0.08. When the DAO proposes a connection price and the vote is passed, new nodes will attach to the new connection price. + +## Node Certifiers + +Node certifiers are entities who are allowed to set a node's certification level to `Certified`. The DAO can propose to add / remove entities that can certify nodes. This is usefull for allowing approved resellers of Threefold nodes to mark nodes as Certified. A certified node farms 25% more tokens than `Diy` a node. \ No newline at end of file diff --git a/collections/documentation/developers/tfchain/tfchain_external_service_contract.md b/collections/documentation/developers/tfchain/tfchain_external_service_contract.md new file mode 100644 index 0000000..992186a --- /dev/null +++ b/collections/documentation/developers/tfchain/tfchain_external_service_contract.md @@ -0,0 +1,142 @@ +

External Service Contract: How to set and execute

+

Table of Contents

+ +- [Introduction](#introduction) +- [Step 1: Create the contract and get its unique ID](#step-1-create-contract--get-unique-id) +- [Step 2: Fill contract](#step-2-fill-contract) +- [Step 3: Both parties approve contract](#step-3-both-parties-approve-contract) +- [Step 4: Bill for the service](#step-4-bill-for-the-service) +- [Step 5: Cancel the contract](#step-5-cancel-the-contract) + +*** + + +# Introduction + +It is now possible to create a generic contract between two TFChain users (without restriction of account type) for some external service and bill for it. + +The initial scenario is when two parties, a service provider and a consumer of the service, want to use TFChain to automatically handle the billing/payment process for an agreement (in TFT) they want to make for a service which is external from the grid. +This is actually a more direct and generic feature if we compare to the initial rewarding model where a service provider (or solution provider) is receiving TFT from a rewards distribution process, linked to a node contract and based on a cloud capacity consumption, which follows specific billing rules. + +The initial requirements are: +- Both service and consumer need to have their respective twin created on TFChain (if not, see [here](tfchain.md#create-a-tfchain-twin) how to do it) +- Consumer account needs to be funded (lack of funds will simply result in the contract cancelation while billed) + +In the following steps we detail the sequence of extrinsics that need to be called in TFChain Polkadot portal for setting up and executing such contract. + +Make sure to use right [links](tfchain.md#deployed-instances) depending on the targeted network. + + +# Step 1: Create contract / Get unique ID + +## Create service contract + +The contract creation can be initiated by both service or consumer. +In TFChain Polkadot portal, the one who iniciates the contract should go to `Developer` -> `Extrinsics` -> `smartContractModule` -> `serviceContractCreate()`, using the account he pretends to use in the contract, and select the corresponding service and consumer accounts before submiting the transaction. + +![service_contract_create](img/service_contract_create.png) + +Once executed the service contract is `Created` between the two parties and a unique ID is generated. + +## Last service contract ID + +To get the last generated service contract ID go to `Developer` -> `Chain state` -> `smartContractModule` -> `serviceContractID()`. + +![service_contract_id](img/service_contract_id.png) + +## Parse service contract + +To get the corresponding contract details, go to `Developer` -> `Chain state` -> `smartContractModule` -> `serviceContracts()` and provide the contract ID. +You should see the following details: + +![service_contract_state](img/service_contract_state.png) + +Check if the contract fields are correct, especially the twin ID of both service and consumer, to be sure you get the right contract ID, referenced as `serviceContractId`. + +## Wrong contract ID ? + +If twin IDs are wrong ([how to get my twin ID?](tfchain.md#get-your-twin-id)) on service contract fields it means the contract does not correspond to the last created contract. +In this case parse the last contracts on stack by decreasing `serviceContractId` and try to identify the right one; or the contract was simply not created so you should repeat the creation process and evaluate the error log. + + +# Step 2: Fill contract + +Once created, the service contract must be filled with its relative `per hour` fees: +- `baseFee` is the constant "per hour" price (in TFT) for the service. +- `variableFee` is the maximum "per hour" amount (in TFT) that can be billed extra. + +To provide these values (only service can set fees), go to `Developer` -> `Extrinsics` -> `smartContractModule` -> `serviceContractSetFees()` specifying `serviceContractId`. + +![service_contract_set_fees](img/service_contract_set_fees.png) + +Some metadata (the description of the service for example) must be filled in a similar way (`Developer` -> `Extrinsics` -> `smartContractModule` -> `serviceContractSetMetadata()`). +In this case service or consumer can set metadata. + +![service_contract_set_metadata](img/service_contract_set_metadata.png) + +The agreement will be automatically considered `Ready` when both metadata and fees are set (`metadata` not empty and `baseFee` greater than zero). +Note that whenever this condition is not reached both extrinsics can still be called to modify agreement. +You can check the contract status at each step of flow by parsing it as shown [here](#parse-service-contract). + + +# Step 3: Both parties approve contract + +Now having the agreement ready the contract can be submited for approval. +To approve the agreement, go to `Developer` -> `Extrinsics` -> `smartContractModule` -> `serviceContractApprove()` specifying `serviceContractId`. + +![service_contract_approve](img/service_contract_approve.png) + +To reject the agreement, go to `Developer` -> `Extrinsics` -> `smartContractModule` -> `serviceContractReject()` specifying `serviceContractId`. + +![service_contract_reject](img/service_contract_reject.png) + +The contract needs to be explicitly `Approved` by both service and consumer to be ready for billing. +Before reaching this state, if one of the parties decides to call the rejection extrinsic, it will instantly lead to the cancelation of the contract (and its permanent removal). + + +# Step 4: Bill for the service + +Once the contract is accepted by both it can be billed. + +## Send bill to consumer + +Only the service can bill the consumer going to `Developer` -> `Extrinsics` -> `smartContractModule` -> `serviceContractBill()` specifying `serviceContractId` and billing informations such as `variableAmount` and some `metadata`. + +![service_contract_bill](img/service_contract_bill.png) + +## Billing frequency + +⚠️ Important: because a service should not charge the user if it doesn't work, it is required that bills be send in less than 1 hour intervals. +Any bigger interval will result in a bounded 1 hour bill (in other words, extra time will not be billed). +It is the service responsability to bill on right frequency! + +## Amount due calculation + +When the bill is received, the chain calculates the bill amount based on the agreement values as follows: + +~~~ +amount = baseFee * T / 3600 + variableAmount +~~~ + +where `T` is the elapsed time, in seconds and bounded by 3600 (see [above](#billing-frequency)), since last effective billing operation occured. + +## Protection against draining + +Note that if `variableAmount` is too high (i.e `variableAmount > variableFee * T / 3600`) the billing extrinsic will fail. +The `variableFee` value in the contract is interpreted as being "per hour" and acts as a protection mechanism to avoid consumer draining. +Indeed, as it is technically possible for the service to send a bill every second, there would be no gain for that (unless overloading the chain uselessly). +So it is also the service responsability to set a suitable `variableAmount` according to the billing frequency! + +## Billing considerations + +Then, if all goes well and no error is dispatched after submitting the transaction, the consumer pays for the due amount calculated from the bill (see calculation detail [above](#amount-due-calculation)). +In practice the amount is transferred from the consumer twin account to the service twin account. +Be aware that if the consumer is out of funds the billing will fail AND the contract will automatically be canceled. + + +# Step 5: Cancel the contract + +At every moment of the flow since the contract is created it can be canceled (and definitively removed). +Only the service or the consumer can do it going to `Developer` -> `Extrinsics` -> `smartContractModule` -> `serviceContractCancel()` specifying `serviceContractId`. + +![service_contract_cancel](img/service_contract_cancel.png) diff --git a/collections/documentation/developers/tfchain/tfchain_solution_provider.md b/collections/documentation/developers/tfchain/tfchain_solution_provider.md new file mode 100644 index 0000000..c027d16 --- /dev/null +++ b/collections/documentation/developers/tfchain/tfchain_solution_provider.md @@ -0,0 +1,81 @@ +

Solution Provider

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Changes to Contract Creation](#changes-to-contract-creation) +- [Creating a Provider](#creating-a-provider) +- [Council needs to approve a provider before it can be used](#council-needs-to-approve-a-provider-before-it-can-be-used) + +*** + +## Introduction + +> Note: While the solution provider program is still active, the plan is to discontinue the program in the near future. We will update the manual as we get more information. We currently do not accept new solution providers. + +A "solution" is something running on the grid, created by a community member. This can be brought forward to the council, who can vote on it to recognize it as a solution. On contract creation, a recognized solution can be referenced, in which case part of the payment goes toward the address coupled to the solution. On chain a solution looks as follows: + +- Description (should be some text, limited in length. Limit should be rather low, if a longer one is desired a link can be inserted. 160 characters should be enough imo). +- Up to 5 payout addresses, each with a payout percentage. This is the percentage of the payout received by the associated address. The amount is deducted from the payout to the treasury and specified as percentage of the total contract cost. As such, the sum of these percentages can never exceed 50%. If this value is not 50%, the remainder is payed to the treasure. Example: 10% payout percentage to addr 1, 5% payout to addr 2. This means 15% goes to the 2 listed addresses combined and 35% goes to the treasury (instead of usual 50). Rest remains as is. If the cost would be 10TFT, 1TFT goes to the address1, 0.5TFT goes to address 2, 3.5TFT goes to the treasury, instead of the default 5TFT to the treasury +- A unique code. This code is used to link a solution to the contract (numeric ID). + +This means contracts need to carry an optional solution code. If the code is not specified (default), the 50% goes entirely to the treasury (as is always the case today). + +A solution can be created by calling the extrinsic `smartContractModule` -> `createSolutionProvider` with parameters: + +- description +- link (to website) +- list of providers + +Provider: + +- who (account id) +- take (amount of take this account should get) specified as an integer of max 50. example: 25 + +A forum post should be created with the details of the created solution provider, the dao can vote to approve this or not. If the solution provider get's approved, it can be referenced on contract creation. + +Note that a solution can be deleted. In this case, existing contracts should fall back to the default behavior (i.e. if code not found -> default). + +## Changes to Contract Creation + +When creating a contract, a `solution_provider_id` can be passed. An error will be returned if an invalid or non-approved solution provider id is passed. + +## Creating a Provider + +Creating a provider is as easy as going to the [polkadotJS UI](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Ftfchain.dev.grid.tf#/extrinsics) (Currently only on devnet) + +Select module `SmartContractModule` -> `createSolutionProvider(..)` + +Fill in all the details, you can specify up to 5 target accounts which can have a take of the TFT generated from being a provider. Up to a total maximum of 50%. `Take` should be specified as a integer, example (`25`). + +Once this object is created, a forum post should be created here: + +![create](./img/create_provider.png) + +## Council needs to approve a provider before it can be used + +First propose the solution to be approved: + +![propose_approve](./img/propose_approve.png) + +After submission it should like like this: + +![proposed_approved](./img/proposed_approve.png) + +Now another member of the council needs to vote: + +![vote](./img/vote_proposal.png) + +After enough votes are reached, it can be closed: + +![close](./img/close_proposal.png) + +If the close was executed without error the solution should be approved and ready to be used + +Query the solution: `chainstate` -> `SmartContractModule` -> `solutionProviders` + +![query](./img/query_provider.png) + +Now the solution provider can be referenced on contract creation: + +![create](./img/create_contract.png) diff --git a/collections/documentation/developers/tfcmd/tfcmd.md b/collections/documentation/developers/tfcmd/tfcmd.md new file mode 100644 index 0000000..daa502a --- /dev/null +++ b/collections/documentation/developers/tfcmd/tfcmd.md @@ -0,0 +1,15 @@ +

TFCMD

+ +TFCMD (`tfcmd`) is a command line interface to interact and develop on Threefold Grid using command line. + +Consult the [ThreeFoldTech TFCMD repository](https://github.com/threefoldtech/tfgrid-sdk-go/tree/development/grid-cli) for the latest updates. Make sure to read the [basics](../../system_administrators/getstarted/tfgrid3_getstarted.md). + +

Table of Contents

+ +- [Getting Started](./tfcmd_basics.md) +- [Deploy a VM](./tfcmd_vm.md) +- [Deploy Kubernetes](./tfcmd_kubernetes.md) +- [Deploy ZDB](./tfcmd_zdbs.md) +- [Gateway FQDN](./tfcmd_gateway_fqdn.md) +- [Gateway Name](./tfcmd_gateway_name.md) +- [Contracts](./tfcmd_contracts.md) \ No newline at end of file diff --git a/collections/documentation/developers/tfcmd/tfcmd_basics.md b/collections/documentation/developers/tfcmd/tfcmd_basics.md new file mode 100644 index 0000000..8816eea --- /dev/null +++ b/collections/documentation/developers/tfcmd/tfcmd_basics.md @@ -0,0 +1,67 @@ +

TFCMD Getting Started

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Installation](#installation) +- [Login](#login) +- [Commands](#commands) +- [Using TFCMD](#using-tfcmd) + +*** + +## Introduction + +This section covers the basics on how to set up and use TFCMD (`tfcmd`). + +TFCMD is available as binaries. Make sure to download the latest release and to stay up to date with new releases. + +## Installation + +An easy way to use TFCMD is to download and extract the TFCMD binaries to your path. + +- Download latest release from [releases](https://github.com/threefoldtech/tfgrid-sdk-go/releases) + - ``` + wget + ``` +- Extract the binaries + - ``` + tar -xvf + ``` +- Move `tfcmd` to any `$PATH` directory: + ```bash + mv tfcmd /usr/local/bin + ``` + +## Login + +Before interacting with Threefold Grid with `tfcmd` you should login with your mnemonics and specify the grid network: + +```console +$ tfcmd login +Please enter your mnemonics: +Please enter grid network (main,test): +``` + +This validates your mnemonics and store your mnemonics and network to your default configuration dir. +Check [UserConfigDir()](https://pkg.go.dev/os#UserConfigDir) for your default configuration directory. + +## Commands + +You can run the command `tfcmd help` at any time to access the help section. This will also display the available commands. + +| Command | Description | +| ---------- | ---------------------------------------------------------- | +| cancel | Cancel resources on Threefold grid | +| completion | Generate the autocompletion script for the specified shell | +| deploy | Deploy resources to Threefold grid | +| get | Get a deployed resource from Threefold grid | +| help | Help about any command | +| login | Login with mnemonics to a grid network | +| version | Get latest build tag | + +## Using TFCMD + +Once you've logged in, you can use commands to deploy workloads on the TFGrid. Read the next sections for more information on different types of workloads available with TFCMD. + + diff --git a/collections/documentation/developers/tfcmd/tfcmd_contracts.md b/collections/documentation/developers/tfcmd/tfcmd_contracts.md new file mode 100644 index 0000000..bb14c5d --- /dev/null +++ b/collections/documentation/developers/tfcmd/tfcmd_contracts.md @@ -0,0 +1,99 @@ +

Contracts

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Get](#get) + - [Get Contracts](#get-contracts) + - [Get Contract](#get-contract) +- [Cancel](#cancel) + - [Optional Flags](#optional-flags) + +*** + +## Introduction + +We explain how to handle contracts on the TFGrid with `tfcmd`. + +## Get + +### Get Contracts + +Get all contracts + +```bash +tfcmd get contracts +``` + +Example: + +```console +$ tfcmd get contracts +5:13PM INF starting peer session=tf-1184566 twin=81 +Node contracts: +ID Node ID Type Name Project Name +50977 21 network vm1network vm1 +50978 21 vm vm1 vm1 +50980 14 Gateway Name gatewaytest gatewaytest + +Name contracts: +ID Name +50979 gatewaytest +``` + +### Get Contract + +Get specific contract + +```bash +tfcmd get contract +``` + +Example: + +```console +$ tfcmd get contract 50977 +5:14PM INF starting peer session=tf-1185180 twin=81 +5:14PM INF contract: +{ + "contract_id": 50977, + "twin_id": 81, + "state": "Created", + "created_at": 1702480020, + "type": "node", + "details": { + "nodeId": 21, + "deployment_data": "{\"type\":\"network\",\"name\":\"vm1network\",\"projectName\":\"vm1\"}", + "deployment_hash": "21adc91ef6cdc915d5580b3f12732ac9", + "number_of_public_ips": 0 + } +} +``` + +## Cancel + +Cancel specified contracts or all contracts. + +```bash +tfcmd cancel contracts ... [Flags] +``` + +Example: + +```console +$ tfcmd cancel contracts 50856 50857 +5:17PM INF starting peer session=tf-1185964 twin=81 +5:17PM INF contracts canceled successfully +``` + +### Optional Flags + +- all: cancel all twin's contracts. + +Example: + +```console +$ tfcmd cancel contracts --all +5:17PM INF starting peer session=tf-1185964 twin=81 +5:17PM INF contracts canceled successfully +``` \ No newline at end of file diff --git a/collections/documentation/developers/tfcmd/tfcmd_gateway_fqdn.md b/collections/documentation/developers/tfcmd/tfcmd_gateway_fqdn.md new file mode 100644 index 0000000..538438f --- /dev/null +++ b/collections/documentation/developers/tfcmd/tfcmd_gateway_fqdn.md @@ -0,0 +1,87 @@ +

Gateway FQDN

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Deploy](#deploy) + - [Required Flags](#required-flags) + - [Optional Flags](#optional-flags) +- [Get](#get) +- [Cancel](#cancel) + +*** + +## Introduction + +We explain how to use gateway fully qualified domain names on the TFGrid using `tfcmd`. + +## Deploy + +```bash +tfcmd deploy gateway fqdn [flags] +``` + +### Required Flags + +- name: name for the gateway deployment also used for canceling the deployment. must be unique. +- node: node id to deploy gateway on. +- backends: list of backends the gateway will forward requests to. +- fqdn: FQDN pointing to the specified node. + +### Optional Flags + +-tls: add TLS passthrough option (default false). + +Example: + +```console +$ tfcmd deploy gateway fqdn -n gatewaytest --node 14 --backends http://93.184.216.34:80 --fqdn example.com +3:34PM INF deploying gateway fqdn +3:34PM INF gateway fqdn deployed +``` + +## Get + +```bash +tfcmd get gateway fqdn +``` + +gateway is the name used when deploying gateway-fqdn using tfcmd. + +Example: + +```console +$ tfcmd get gateway fqdn gatewaytest +2:05PM INF gateway fqdn: +{ + "NodeID": 14, + "Backends": [ + "http://93.184.216.34:80" + ], + "FQDN": "awady.gridtesting.xyz", + "Name": "gatewaytest", + "TLSPassthrough": false, + "Description": "", + "NodeDeploymentID": { + "14": 19653 + }, + "SolutionType": "gatewaytest", + "ContractID": 19653 +} +``` + +## Cancel + +```bash +tfcmd cancel +``` + +deployment-name is the name of the deployment specified in while deploying using tfcmd. + +Example: + +```console +$ tfcmd cancel gatewaytest +3:37PM INF canceling contracts for project gatewaytest +3:37PM INF gatewaytest canceled +``` \ No newline at end of file diff --git a/collections/documentation/developers/tfcmd/tfcmd_gateway_name.md b/collections/documentation/developers/tfcmd/tfcmd_gateway_name.md new file mode 100644 index 0000000..a4c8191 --- /dev/null +++ b/collections/documentation/developers/tfcmd/tfcmd_gateway_name.md @@ -0,0 +1,88 @@ +

Gateway Name

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Deploy](#deploy) + - [Required Flags](#required-flags) + - [Optional Flags](#optional-flags) +- [Get](#get) +- [Cancel](#cancel) + +*** + +## Introduction + +We explain how to use gateway names on the TFGrid using `tfcmd`. + +## Deploy + +```bash +tfcmd deploy gateway name [flags] +``` + +### Required Flags + +- name: name for the gateway deployment also used for canceling the deployment. must be unique. +- backends: list of backends the gateway will forward requests to. + +### Optional Flags + +- node: node id gateway should be deployed on. +- farm: farm id gateway should be deployed on, if set choose available node from farm that fits vm specs (default 1). note: node and farm flags cannot be set both. +-tls: add TLS passthrough option (default false). + +Example: + +```console +$ tfcmd deploy gateway name -n gatewaytest --node 14 --backends http://93.184.216.34:80 +3:34PM INF deploying gateway name +3:34PM INF fqdn: gatewaytest.gent01.dev.grid.tf +``` + +## Get + +```bash +tfcmd get gateway name +``` + +gateway is the name used when deploying gateway-name using tfcmd. + +Example: + +```console +$ tfcmd get gateway name gatewaytest +1:56PM INF gateway name: +{ + "NodeID": 14, + "Name": "gatewaytest", + "Backends": [ + "http://93.184.216.34:80" + ], + "TLSPassthrough": false, + "Description": "", + "SolutionType": "gatewaytest", + "NodeDeploymentID": { + "14": 19644 + }, + "FQDN": "gatewaytest.gent01.dev.grid.tf", + "NameContractID": 19643, + "ContractID": 19644 +} +``` + +## Cancel + +```bash +tfcmd cancel +``` + +deployment-name is the name of the deployment specified in while deploying using tfcmd. + +Example: + +```console +$ tfcmd cancel gatewaytest +3:37PM INF canceling contracts for project gatewaytest +3:37PM INF gatewaytest canceled +``` \ No newline at end of file diff --git a/collections/documentation/developers/tfcmd/tfcmd_kubernetes.md b/collections/documentation/developers/tfcmd/tfcmd_kubernetes.md new file mode 100644 index 0000000..9a7c2b1 --- /dev/null +++ b/collections/documentation/developers/tfcmd/tfcmd_kubernetes.md @@ -0,0 +1,147 @@ +

Kubernetes

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Deploy](#deploy) + - [Required Flags](#required-flags) + - [Optional Flags](#optional-flags) +- [Get](#get) +- [Cancel](#cancel) + +*** + +## Introduction + +In this section, we explain how to deploy Kubernetes workloads on the TFGrid using `tfcmd`. + +## Deploy + +```bash +tfcmd deploy kubernetes [flags] +``` + +### Required Flags + +- name: name for the master node deployment also used for canceling the cluster deployment. must be unique. +- ssh: path to public ssh key to set in the master node. + +### Optional Flags + +- master-node: node id master should be deployed on. +- master-farm: farm id master should be deployed on, if set choose available node from farm that fits master specs (default 1). note: master-node and master-farm flags cannot be set both. +- workers-node: node id workers should be deployed on. +- workers-farm: farm id workers should be deployed on, if set choose available node from farm that fits master specs (default 1). note: workers-node and workers-farm flags cannot be set both. +- ipv4: assign public ipv4 for master node (default false). +- ipv6: assign public ipv6 for master node (default false). +- ygg: assign yggdrasil ip for master node (default true). +- master-cpu: number of cpu units for master node (default 1). +- master-memory: master node memory size in GB (default 1). +- master-disk: master node disk size in GB (default 2). +- workers-number: number of workers nodes (default 0). +- workers-ipv4: assign public ipv4 for each worker node (default false) +- workers-ipv6: assign public ipv6 for each worker node (default false) +- workers-ygg: assign yggdrasil ip for each worker node (default true) +- workers-cpu: number of cpu units for each worker node (default 1). +- workers-memory: memory size for each worker node in GB (default 1). +- workers-disk: disk size in GB for each worker node (default 2). + +Example: + +```console +$ tfcmd deploy kubernetes -n kube --ssh ~/.ssh/id_rsa.pub --master-node 14 --workers-number 2 --workers-node 14 +4:21PM INF deploying network +4:22PM INF deploying cluster +4:22PM INF master yggdrasil ip: 300:e9c4:9048:57cf:504f:c86c:9014:d02d +``` + +## Get + +```bash +tfcmd get kubernetes +``` + +kubernetes is the name used when deploying kubernetes cluster using tfcmd. + +Example: + +```console +$ tfcmd get kubernetes examplevm +3:14PM INF k8s cluster: +{ + "Master": { + "Name": "kube", + "Node": 14, + "DiskSize": 2, + "PublicIP": false, + "PublicIP6": false, + "Planetary": true, + "Flist": "https://hub.grid.tf/tf-official-apps/threefoldtech-k3s-latest.flist", + "FlistChecksum": "c87cf57e1067d21a3e74332a64ef9723", + "ComputedIP": "", + "ComputedIP6": "", + "YggIP": "300:e9c4:9048:57cf:e8a0:662b:4e66:8faa", + "IP": "10.20.2.2", + "CPU": 1, + "Memory": 1024 + }, + "Workers": [ + { + "Name": "worker1", + "Node": 14, + "DiskSize": 2, + "PublicIP": false, + "PublicIP6": false, + "Planetary": true, + "Flist": "https://hub.grid.tf/tf-official-apps/threefoldtech-k3s-latest.flist", + "FlistChecksum": "c87cf57e1067d21a3e74332a64ef9723", + "ComputedIP": "", + "ComputedIP6": "", + "YggIP": "300:e9c4:9048:57cf:66d0:3ee4:294e:d134", + "IP": "10.20.2.2", + "CPU": 1, + "Memory": 1024 + }, + { + "Name": "worker0", + "Node": 14, + "DiskSize": 2, + "PublicIP": false, + "PublicIP6": false, + "Planetary": true, + "Flist": "https://hub.grid.tf/tf-official-apps/threefoldtech-k3s-latest.flist", + "FlistChecksum": "c87cf57e1067d21a3e74332a64ef9723", + "ComputedIP": "", + "ComputedIP6": "", + "YggIP": "300:e9c4:9048:57cf:1ae5:cc51:3ffc:81e", + "IP": "10.20.2.2", + "CPU": 1, + "Memory": 1024 + } + ], + "Token": "", + "NetworkName": "", + "SolutionType": "kube", + "SSHKey": "", + "NodesIPRange": null, + "NodeDeploymentID": { + "14": 22743 + } +} +``` + +## Cancel + +```bash +tfcmd cancel +``` + +deployment-name is the name of the deployment specified in while deploying using tfcmd. + +Example: + +```console +$ tfcmd cancel kube +3:37PM INF canceling contracts for project kube +3:37PM INF kube canceled +``` \ No newline at end of file diff --git a/collections/documentation/developers/tfcmd/tfcmd_vm.md b/collections/documentation/developers/tfcmd/tfcmd_vm.md new file mode 100644 index 0000000..21e1471 --- /dev/null +++ b/collections/documentation/developers/tfcmd/tfcmd_vm.md @@ -0,0 +1,171 @@ + +

Deploy a VM

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Deploy](#deploy) + - [Flags](#flags) + - [Required Flags](#required-flags) + - [Optional Flags](#optional-flags) + - [Examples](#examples) + - [Deploy a VM without GPU](#deploy-a-vm-without-gpu) + - [Deploy a VM with GPU](#deploy-a-vm-with-gpu) +- [Get](#get) + - [Get Example](#get-example) +- [Cancel](#cancel) + - [Cancel Example](#cancel-example) +- [Questions and Feedback](#questions-and-feedback) + +*** + +# Introduction + +In this section, we explore how to deploy a virtual machine (VM) on the ThreeFold Grid using `tfcmd`. + +# Deploy + +You can deploy a VM with `tfcmd` using the following template accompanied by required and optional flags: + +```bash +tfcmd deploy vm [flags] +``` + +## Flags + +When you use `tfcmd`, there are two required flags (`name` and `ssh`), while the other remaining flags are optional. Using such optional flags can be used to deploy a VM with a GPU for example or to set an IPv6 address and much more. + +### Required Flags + +- **name**: name for the VM deployment also used for canceling the deployment. The name must be unique. +- **ssh**: path to public ssh key to set in the VM. + +### Optional Flags + +- **node**: node ID the VM should be deployed on. +- **farm**: farm ID the VM should be deployed on, if set choose available node from farm that fits vm specs (default `1`). Note: node and farm flags cannot both be set. +- **cpu**: number of cpu units (default `1`). +- **disk**: size of disk in GB mounted on `/data`. If not set, no disk workload is made. +- **entrypoint**: entrypoint for the VM FList (default `/sbin/zinit init`). Note: setting this without the flist option will fail. +- **flist**: FList used in the VM (default `https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-22.04.flist`). Note: setting this without the entrypoint option will fail. +- **ipv4**: assign public ipv4 for the VM (default `false`). +- **ipv6**: assign public ipv6 for the VM (default `false`). +- **memory**: memory size in GB (default `1`). +- **rootfs**: root filesystem size in GB (default `2`). +- **ygg**: assign yggdrasil ip for the VM (default `true`). +- **gpus**: assign a list of gpus' IDs to the VM. Note: setting this without the node option will fail. + +## Examples + +We present simple examples on how to deploy a virtual machine with or without a GPU using `tfcmd`. + +### Deploy a VM without GPU + +```console +$ tfcmd deploy vm --name examplevm --ssh ~/.ssh/id_rsa.pub --cpu 2 --memory 4 --disk 10 +12:06PM INF deploying network +12:06PM INF deploying vm +12:07PM INF vm yggdrasil ip: 300:e9c4:9048:57cf:7da2:ac99:99db:8821 +``` +### Deploy a VM with GPU + +```console +$ tfcmd deploy vm --name examplevm --ssh ~/.ssh/id_rsa.pub --cpu 2 --memory 4 --disk 10 --gpus '0000:0e:00.0/1882/543f' --gpus '0000:0e:00.0/1887/593f' --node 12 +12:06PM INF deploying network +12:06PM INF deploying vm +12:07PM INF vm yggdrasil ip: 300:e9c4:9048:57cf:7da2:ac99:99db:8821 +``` + +# Get + +To get the VM, use the following template: + +```bash +tfcmd get vm +``` + +Make sure to replace `` with the name of the VM specified using `tfcmd`. + +## Get Example + +In the following example, the name of the deployment to get is `examplevm`. + +```console +$ tfcmd get vm examplevm +3:20PM INF vm: +{ + "Name": "examplevm", + "NodeID": 15, + "SolutionType": "examplevm", + "SolutionProvider": null, + "NetworkName": "examplevmnetwork", + "Disks": [ + { + "Name": "examplevmdisk", + "SizeGB": 10, + "Description": "" + } + ], + "Zdbs": [], + "Vms": [ + { + "Name": "examplevm", + "Flist": "https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-22.04.flist", + "FlistChecksum": "", + "PublicIP": false, + "PublicIP6": false, + "Planetary": true, + "Corex": false, + "ComputedIP": "", + "ComputedIP6": "", + "YggIP": "301:ad3a:9c52:98d1:cd05:1595:9abb:e2f1", + "IP": "10.20.2.2", + "Description": "", + "CPU": 2, + "Memory": 4096, + "RootfsSize": 2048, + "Entrypoint": "/sbin/zinit init", + "Mounts": [ + { + "DiskName": "examplevmdisk", + "MountPoint": "/data" + } + ], + "Zlogs": null, + "EnvVars": { + "SSH_KEY": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDcGrS1RT36rHAGLK3/4FMazGXjIYgWVnZ4bCvxxg8KosEEbs/DeUKT2T2LYV91jUq3yibTWwK0nc6O+K5kdShV4qsQlPmIbdur6x2zWHPeaGXqejbbACEJcQMCj8szSbG8aKwH8Nbi8BNytgzJ20Ysaaj2QpjObCZ4Ncp+89pFahzDEIJx2HjXe6njbp6eCduoA+IE2H9vgwbIDVMQz6y/TzjdQjgbMOJRTlP+CzfbDBb6Ux+ed8F184bMPwkFrpHs9MSfQVbqfIz8wuq/wjewcnb3wK9dmIot6CxV2f2xuOZHgNQmVGratK8TyBnOd5x4oZKLIh3qM9Bi7r81xCkXyxAZbWYu3gGdvo3h85zeCPGK8OEPdYWMmIAIiANE42xPmY9HslPz8PAYq6v0WwdkBlDWrG3DD3GX6qTt9lbSHEgpUP2UOnqGL4O1+g5Rm9x16HWefZWMjJsP6OV70PnMjo9MPnH+yrBkXISw4CGEEXryTvupfaO5sL01mn+UOyE= abdulrahman@AElawady-PC\n" + }, + "NetworkName": "examplevmnetwork" + } + ], + "QSFS": [], + "NodeDeploymentID": { + "15": 22748 + }, + "ContractID": 22748 +} +``` + +# Cancel + +To cancel your VM deployment, use the following template: + +```bash +tfcmd cancel +``` + +Make sure to replace `` with the name of the deployment specified using `tfcmd`. + +## Cancel Example + +In the following example, the name of the deployment to cancel is `examplevm`. + +```console +$ tfcmd cancel examplevm +3:37PM INF canceling contracts for project examplevm +3:37PM INF examplevm canceled +``` + +# Questions and Feedback + +If you have any questions or feedback, you can ask the ThreeFold community for help on the [ThreeFold Forum](http://forum.threefold.io/) or on the [ThreeFold Grid Tester Community](https://t.me/threefoldtesting) on Telegram. \ No newline at end of file diff --git a/collections/documentation/developers/tfcmd/tfcmd_zdbs.md b/collections/documentation/developers/tfcmd/tfcmd_zdbs.md new file mode 100644 index 0000000..b9c01d7 --- /dev/null +++ b/collections/documentation/developers/tfcmd/tfcmd_zdbs.md @@ -0,0 +1,125 @@ +

ZDBs

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Deploy](#deploy) + - [Required Flags](#required-flags) + - [Optional Flags](#optional-flags) +- [Get](#get) +- [Cancel](#cancel) + +*** + +## Introduction + +In this section, we explore how to use ZDBs related commands using `tfcmd` to interact with the TFGrid. + +## Deploy + +```bash +tfcmd deploy zdb [flags] +``` + +### Required Flags + +- project_name: project name for the ZDBs deployment also used for canceling the deployment. must be unique. +- size: HDD of zdb in GB. + +### Optional Flags + +- node: node id zdbs should be deployed on. +- farm: farm id zdbs should be deployed on, if set choose available node from farm that fits zdbs deployment specs (default 1). note: node and farm flags cannot be set both. +- count: count of zdbs to be deployed (default 1). +- names: a slice of names for the number of ZDBs. +- password: password for ZDBs deployed +- description: description for your ZDBs, it's optional. +- mode: the enumeration of the modes 0-db can operate in (default user). +- public: if zdb gets a public ip6 (default false). + +Example: + +- Deploying ZDBs + +```console +$ tfcmd deploy zdb --project_name examplezdb --size=10 --count=2 --password=password +12:06PM INF deploying zdbs +12:06PM INF zdb 'examplezdb0' is deployed +12:06PM INF zdb 'examplezdb1' is deployed +``` + +## Get + +```bash +tfcmd get zdb +``` + +`zdb-project-name` is the name of the deployment specified in while deploying using tfcmd. + +Example: + +```console +$ tfcmd get zdb examplezdb +3:20PM INF zdb: +{ + "Name": "examplezdb", + "NodeID": 11, + "SolutionType": "examplezdb", + "SolutionProvider": null, + "NetworkName": "", + "Disks": [], + "Zdbs": [ + { + "name": "examplezdb1", + "password": "password", + "public": false, + "size": 10, + "description": "", + "mode": "user", + "ips": [ + "2a10:b600:1:0:c4be:94ff:feb1:8b3f", + "302:9e63:7d43:b742:469d:3ec2:ab15:f75e" + ], + "port": 9900, + "namespace": "81-36155-examplezdb1" + }, + { + "name": "examplezdb0", + "password": "password", + "public": false, + "size": 10, + "description": "", + "mode": "user", + "ips": [ + "2a10:b600:1:0:c4be:94ff:feb1:8b3f", + "302:9e63:7d43:b742:469d:3ec2:ab15:f75e" + ], + "port": 9900, + "namespace": "81-36155-examplezdb0" + } + ], + "Vms": [], + "QSFS": [], + "NodeDeploymentID": { + "11": 36155 + }, + "ContractID": 36155, + "IPrange": "" +} +``` + +## Cancel + +```bash +tfcmd cancel +``` + +`zdb-project-name` is the name of the deployment specified in while deploying using tfcmd. + +Example: + +```console +$ tfcmd cancel examplezdb +3:37PM INF canceling contracts for project examplezdb +3:37PM INF examplezdb canceled +``` \ No newline at end of file diff --git a/collections/documentation/developers/tfrobot/tfrobot.md b/collections/documentation/developers/tfrobot/tfrobot.md new file mode 100644 index 0000000..c8b2d5f --- /dev/null +++ b/collections/documentation/developers/tfrobot/tfrobot.md @@ -0,0 +1,13 @@ +

TFROBOT

+ +TFROBOT (`tfrobot`) is a command line interface tool that offers simultaneous mass deployment of groups of VMs on the ThreeFold Grid, with support of multiple retries for failed deployments, and customizable configurations, where you can define node groups, VMs groups and other configurations through a YAML or a JSON file. + +Consult the [ThreeFoldTech TFROBOT repository](https://github.com/threefoldtech/tfgrid-sdk-go/tree/development/tfrobot) for the latest updates and read the [basics](../../system_administrators/getstarted/tfgrid3_getstarted.md) to get up to speed if needed. + +

Table of Contents

+ +- [Installation](./tfrobot_installation.md) +- [Configuration File](./tfrobot_config.md) +- [Deployment](./tfrobot_deploy.md) +- [Commands and Flags](./tfrobot_commands_flags.md) +- [Supported Configurations](./tfrobot_configurations.md) \ No newline at end of file diff --git a/collections/documentation/developers/tfrobot/tfrobot_commands_flags.md b/collections/documentation/developers/tfrobot/tfrobot_commands_flags.md new file mode 100644 index 0000000..f33c59d --- /dev/null +++ b/collections/documentation/developers/tfrobot/tfrobot_commands_flags.md @@ -0,0 +1,57 @@ +

Commands and Flags

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Commands](#commands) +- [Subcommands](#subcommands) +- [Flags](#flags) + +*** + +## Introduction + +We present the various commands, subcommands and flags available with TFROBOT. + + +## Commands + +You can run the command `tfrobot help` at any time to access the help section. This will also display the available commands. + +| Command | Description | +| ---------- | ---------------------------------------------------------- | +| completion | Generate the autocompletion script for the specified shell | +| help | Help about any command | +| version | Get latest build tag | + +Use `tfrobot [command] --help` for more information about a command. + +## Subcommands + +You can use subcommands to deploy and cancel workloads on the TFGrid. + +- **deploy:** used to mass deploy groups of vms with specific configurations + ```bash + tfrobot deploy -c path/to/your/config.yaml + ``` +- **cancel:** used to cancel all vms deployed using specific configurations + ```bash + tfrobot cancel -c path/to/your/config.yaml + ``` +- **load:** used to load all vms deployed using specific configurations + ```bash + tfrobot load -c path/to/your/config.yaml + ``` + +## Flags + +You can use different flags to configure your deployment. + +| Flag | Usage | +| :---: | :---: | +| -c | used to specify path to configuration file | +| -o | used to specify path to output file to store the output info in | +| -d | allow debug logs to appear in the output logs | +| -h | help | + +> **Note:** Make sure to use every flag once. If the flag is repeated, it will ignore all values and take the last value of the flag.` \ No newline at end of file diff --git a/collections/documentation/developers/tfrobot/tfrobot_config.md b/collections/documentation/developers/tfrobot/tfrobot_config.md new file mode 100644 index 0000000..55c2850 --- /dev/null +++ b/collections/documentation/developers/tfrobot/tfrobot_config.md @@ -0,0 +1,131 @@ +

Configuration File

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Examples](#examples) + - [YAML Example](#yaml-example) + - [JSON Example](#json-example) +- [Create a Configuration File](#create-a-configuration-file) + +*** + +## Introduction + +To use TFROBOT, the user needs to create a YAML or a JSON configuration file that will contain the mass deployment information, such as the groups information, number of VMs to deploy how, the compute, storage and network resources needed, as well as the user's credentials, such as the SSH public key, the network (main, test, dev, qa) and the TFChain mnemonics. + +## Examples + +We present here a configuration file example that deploys 3 nodes with 2 vcores, 16GB of RAM, 100GB of SSD, 50GB of HDD and an IPv4 address. The same deployment is shown with a YAML file and with a JSON file. Parsing is based on file extension, TFROBOT will use JSON format if the file has a JSON extension and YAML format otherwise. + +You can use this example for guidance, and make sure to replace placeholders and adapt the groups based on your actual project details. To the minimum, `ssh_key1` should be replaced by the user SSH public key and `example-mnemonic` should be replaced by the user mnemonics. + +Note that if no IPs are specified as true (IPv4 or IPv6), an Yggdrasil IP address will automatically be assigned to the VM, as at least one IP should be set to allow an SSH connection to the VM. + +### YAML Example + +``` +node_groups: + - name: group_a + nodes_count: 3 + free_cpu: 2 + free_mru: 16 + free_ssd: 100 + free_hdd: 50 + dedicated: false + public_ip4: true + public_ip6: false + certified: false + region: europe +vms: + - name: examplevm123 + vms_count: 5 + node_group: group_a + cpu: 1 + mem: 0.25 + public_ip4: true + public_ip6: false + ssd: + - size: 15 + mount_point: /mnt/ssd + flist: https://hub.grid.tf/tf-official-apps/base:latest.flist + entry_point: /sbin/zinit init + root_size: 0 + ssh_key: example1 + env_vars: + user: user1 + pwd: 1234 +ssh_keys: + example1: ssh_key1 +mnemonic: example-mnemonic +network: dev +max_retries: 5 +``` + +### JSON Example + +``` +{ + "node_groups": [ + { + "name": "group_a", + "nodes_count": 3, + "free_cpu": 2, + "free_mru": 16, + "free_ssd": 100, + "free_hdd": 50, + "dedicated": false, + "public_ip4": true, + "public_ip6": false, + "certified": false, + "region": europe, + } + ], + "vms": [ + { + "name": "examplevm123", + "vms_count": 5, + "node_group": "group_a", + "cpu": 1, + "mem": 0.25, + "public_ip4": true, + "public_ip6": false, + "ssd": [ + { + "size": 15, + "mount_point": "/mnt/ssd" + } + ], + "flist": "https://hub.grid.tf/tf-official-apps/base:latest.flist", + "entry_point": "/sbin/zinit init", + "root_size": 0, + "ssh_key": "example1", + "env_vars": { + "user": "user1", + "pwd": "1234" + } + } + ], + "ssh_keys": { + "example1": "ssh_key1" + }, + "mnemonic": "example-mnemonic", + "network": "dev", + "max_retries": 5 +} +``` + +## Create a Configuration File + +You can start with the example above and adjust for your specific deployment needs. + +- Create directory + ``` + mkdir tfrobot_deployments && cd $_ + ``` +- Create configuration file and adjust with the provided example above + ``` + nano config.yaml + ``` + +Once you've set your configuration file, all that's left is to deploy on the TFGrid. Read the next section for more information on how to deploy with TFROBOT. \ No newline at end of file diff --git a/collections/documentation/developers/tfrobot/tfrobot_configurations.md b/collections/documentation/developers/tfrobot/tfrobot_configurations.md new file mode 100644 index 0000000..7ceb867 --- /dev/null +++ b/collections/documentation/developers/tfrobot/tfrobot_configurations.md @@ -0,0 +1,68 @@ +

Supported Configurations

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Config File](#config-file) +- [Node Group](#node-group) +- [Vms Groups](#vms-groups) +- [Disk](#disk) + +*** + +## Introduction + +When deploying with TFROBOT, you can set different configurations allowing for personalized deployments. + +## Config File + +| Field | Description| Supported Values| +| :---: | :---: | :---: | +| [node_group](#node-group) | description of all resources needed for each node_group | list of structs of type node_group | +| [vms](#vms-groups) | description of resources needed for deploying groups of vms belong to node_group | list of structs of type vms | +| ssh_keys | map of ssh keys with key=name and value=the actual ssh key | map of string to string | +| mnemonic | mnemonic of the user | should be valid mnemonic | +| network | valid network of ThreeFold Grid networks | main, test, qa, dev | +| max_retries | times of retries of failed node groups | positive integer | + +## Node Group + +| Field | Description| Supported Values| +| :---: | :---: | :---: | +| name | name of node_group | node group name should be unique | +| nodes_count | number of nodes in node group| nonzero positive integer | +| free_cpu | number of cpu of node | nonzero positive integer max = 32 | +| free_mru | free memory in the node in GB | min = 0.25, max = 256 | +| free_ssd | free ssd storage in the node in GB | positive integer value | +| free_hdd | free hdd storage in the node in GB | positive integer value | +| dedicated | are nodes dedicated | `true` or `false` | +| public_ip4 | should the nodes have free ip v4 | `true` or `false` | +| public_ip6 | should the nodes have free ip v6 | `true` or `false` | +| certified | should the nodes be certified(if false the nodes could be certified or DIY) | `true` or `false` | +| region | region could be the name of the continents the nodes are located in | africa, americas, antarctic, antarctic ocean, asia, europe, oceania, polar | + +## Vms Groups + +| Field | Description| Supported Values| +| :---: | :---: | :---: | +| name | name of vm group | string value with no special characters | +| vms_count | number of vms in vm group| nonzero positive integer | +| node_group | name of node_group the vm belongs to | should be defined in node_groups | +| cpu | number of cpu for vm | nonzero positive integer max = 32 | +| mem | free memory in the vm in GB | min = 0.25, max 256 | +| planetary | should the vm have yggdrasil ip | `true` or `false` | +| public_ip4 | should the vm have free ip v4 | `true` or `false` | +| public_ip6 | should the vm have free ip v6 | `true` or `false` | +| flist | should be a link to valid flist | valid flist url with `.flist` or `.fl` extension | +| entry_point | entry point of the flist | path to the entry point in the flist | +| ssh_key | key of ssh key defined in the ssh_keys map | should be valid ssh_key defined in the ssh_keys map | +| env_vars | map of env vars | map of type string to string | +| ssd | list of disks | should be of type disk| +| root_size | root size in GB | 0 for default root size, max 10TB | + +## Disk + +| Field | Description| Supported Values| +| :---: | :---: | :---: | +| size | disk size in GB| positive integer min = 15 | +| mount_point | disk mount point | path to mountpoint | diff --git a/collections/documentation/developers/tfrobot/tfrobot_deploy.md b/collections/documentation/developers/tfrobot/tfrobot_deploy.md new file mode 100644 index 0000000..7e16d12 --- /dev/null +++ b/collections/documentation/developers/tfrobot/tfrobot_deploy.md @@ -0,0 +1,59 @@ + + +

Deployment

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Prerequisites](#prerequisites) +- [Deploy Workloads](#deploy-workloads) +- [Delete Workloads](#delete-workloads) +- [Logs](#logs) +- [Using TFCMD with TFROBOT](#using-tfcmd-with-tfrobot) + - [Get Contracts](#get-contracts) + +*** + +## Introduction + +We present how to deploy workloads on the ThreeFold Grid using TFROBOT. + +## Prerequisites + +To deploy workloads on the TFGrid with TFROBOT, you first need to [install TFROBOT](./tfrobot_installation.md) on your machine and create a [configuration file](./tfrobot_config.md). + +## Deploy Workloads + +Once you've installed TFROBOT and created a configuration file, you can deploy on the TFGrid with the following command. Make sure to indicate the path to your configuration file. + +```bash +tfrobot deploy -c ./config.yaml +``` + +## Delete Workloads + +To delete the contracts, you can use the following line. Make sure to indicate the path to your configuration file. + +```bash +tfrobot cancel -c ./config.yaml +``` + +## Logs + +To ensure a complete log history, append `2>&1 | tee path/to/log/file` to the command being executed. + +```bash +tfrobot deploy -c ./config.yaml 2>&1 | tee path/to/log/file +``` + +## Using TFCMD with TFROBOT + +### Get Contracts + +The TFCMD tool works well with TFROBOT, as it can be used to query the TFGrid, for example you can see the contracts created by TFROBOT by running the TFCMD command, taking into consideration that you are using the same mnemonics and are on the same network: + +```bash +tfcmd get contracts +``` + +For more information on TFCMD, [read the documentation](../tfcmd/tfcmd.md). \ No newline at end of file diff --git a/collections/documentation/developers/tfrobot/tfrobot_installation.md b/collections/documentation/developers/tfrobot/tfrobot_installation.md new file mode 100644 index 0000000..deec2b8 --- /dev/null +++ b/collections/documentation/developers/tfrobot/tfrobot_installation.md @@ -0,0 +1,36 @@ +

Installation

+ +

Table of Contents

+ +- [Introduction](#introduction) +- [Installation](#installation) + +*** + +## Introduction + +This section covers the basics on how to install TFROBOT (`tfrobot`). + +TFROBOT is available as binaries. Make sure to download the latest release and to stay up to date with new releases. + +## Installation + +To install TFROBOT, simply download and extract the TFROBOT binaries to your path. + +- Create a new directory for `tfgrid-sdk-go` + ``` + mkdir tfgrid-sdk-go + cd tfgrid-sdk-go + ``` +- Download latest release from [releases](https://github.com/threefoldtech/tfgrid-sdk-go/releases) + - ``` + wget https://github.com/threefoldtech/tfgrid-sdk-go/releases/download/v0.14.4/tfgrid-sdk-go_Linux_x86_64.tar.gz + ``` +- Extract the binaries + - ``` + tar -xvf tfgrid-sdk-go_Linux_x86_64.tar.gz + ``` +- Move `tfrobot` to any `$PATH` directory: + ```bash + mv tfrobot /usr/local/bin + ``` \ No newline at end of file