v4 infra: zos, registrar, hub, bootstrap #158

Open
opened 2025-01-02 14:52:33 +00:00 by thabeta · 14 comments
Owner

Description

Set up zos4 (dev mode) in an isolated environment (box) with restricted access to our team to be able to ship to farmers

Requirements

  • Isolated Environment: The box should be empty and isolated from other systems.
  • Restricted Access: Only our team should have access to the box.
  • No external users should be able to deploy
  • Duration: This restriction should remain in place for 6 months.

Deliverables

  • Boostrap (building images) #160
  • registrar to track these nodes #161
  • adjust the statistics websites to reflect them? or should we have a new version of these?
  • track their uptime for rewarding (handled in the registar)

UPDATES

Registrar needed more work for tracking uptime for the rewards and auth using httpsig (both are done)

dev instance of the registar is deployed: https://github.com/threefoldtech/tf_operations/issues/3083
however the full flow isn't tested yet.

Path of the upgrade wasn't thought properly so we are going to monitoring flist symlink to decide if the system should upgrade or not

Still in progress:

Still working on the changes for the zos nodes with the registrar
Bootstrap and the hub don't have the changes needed yet. Maxime is working on that.

## Description Set up zos4 (dev mode) in an isolated environment (box) with restricted access to our team to be able to ship to farmers ## Requirements - Isolated Environment: The box should be empty and isolated from other systems. - Restricted Access: Only our team should have access to the box. - No external users should be able to deploy - Duration: This restriction should remain in place for 6 months. - ## Deliverables - [x] Boostrap (building images) https://git.ourworld.tf/tfgrid/circle_engineering/issues/160 - [x] registrar to track these nodes https://git.ourworld.tf/tfgrid/circle_engineering/issues/161 - [ ] adjust the statistics websites to reflect them? or should we have a new version of these? - [x] track their uptime for rewarding (handled in the registar) ## UPDATES Registrar needed more work for tracking uptime for the rewards and auth using httpsig (both are done) dev instance of the registar is deployed: https://github.com/threefoldtech/tf_operations/issues/3083 however the full flow isn't tested yet. Path of the upgrade wasn't thought properly so we are going to monitoring flist symlink to decide if the system should upgrade or not Still in progress: Still working on the changes for the zos nodes with the registrar Bootstrap and the hub don't have the changes needed yet. Maxime is working on that.
thabeta added the
Story
label 2025-01-02 14:53:02 +00:00
thabeta added this to the tfgrid_3_16 project 2025-01-02 14:53:07 +00:00
thabeta self-assigned this 2025-01-02 14:53:11 +00:00
Author
Owner

Should that be in 3.16 project? as that's completely isolated?

Should that be in 3.16 project? as that's completely isolated?
thabeta added the due date 2025-01-15 2025-01-02 14:54:34 +00:00
thabeta modified the project from tfgrid_3_16 to tfgrid_4_0 2025-01-09 13:11:02 +00:00
Author
Owner

https://github.com/threefoldtech/zosbase here's a base repo for all of zos related packages (refactoring to use it in zos3 and zos3light will be done later)
here's a repo for zos4 https://github.com/threefoldtech/zos4

today and tmw the team will be working into replacing tfchain related code with the registar

https://github.com/threefoldtech/zosbase here's a base repo for all of zos related packages (refactoring to use it in zos3 and zos3light will be done later) here's a repo for zos4 https://github.com/threefoldtech/zos4 today and tmw the team will be working into replacing tfchain related code with the registar
Author
Owner

Registrar needed more work for tracking uptime for the rewards and auth using httpsig (both are done)

dev instance of the registar is deployed: https://github.com/threefoldtech/tf_operations/issues/3083
however the full flow isn't tested yet.

Path of the upgrade wasn't thought properly so we are going to monitoring flist symlink to decide if the system should upgrade or not

Still in progress:

  • Still working on the changes for the zos nodes with the registrar
  • Bootstrap and the hub don't have the changes needed yet. Maxime is working on that.
Registrar needed more work for tracking uptime for the rewards and auth using httpsig (both are done) dev instance of the registar is deployed: https://github.com/threefoldtech/tf_operations/issues/3083 however the full flow isn't tested yet. Path of the upgrade wasn't thought properly so we are going to monitoring flist symlink to decide if the system should upgrade or not Still in progress: - Still working on the changes for the zos nodes with the registrar - Bootstrap and the hub don't have the changes needed yet. Maxime is working on that.
thabeta changed title from Zos4 devmode box to v4 infra: zos, registrar, hub, bootstrap 2025-01-27 12:48:31 +00:00
Author
Owner

well upgrading by following the symlinks gives away the controlled upgrades that we have right now. Current suggestion is moving that into the version into the registrar.

well upgrading by following the symlinks gives away the controlled upgrades that we have right now. Current suggestion is moving that into the version into the registrar.
Author
Owner

Still doing changes within zos, it's tightly coupled with tfchain, but still progressing

Still doing changes within zos, it's tightly coupled with tfchain, but still progressing
Author
Owner

Node booted with localsetup image

Registered on the registrar. What is pending is fixing the flow for updating the node, and update the code for the upgrader module to handle the self-update of zos4.

Aside from that, this has all of the current feature set.

Node booted with localsetup ![image](/attachments/2a855384-5fb1-46ea-9815-da360946b3e7) Registered on the registrar. What is pending is fixing the flow for updating the node, and update the code for the upgrader module to handle the self-update of zos4. Aside from that, this has all of the current feature set.
166 KiB
Author
Owner

full flow using bootstrap, hub, zos4 and the registarar on devnet

image

full flow using bootstrap, hub, zos4 and the registarar on devnet ![image](/attachments/29186960-87b8-4e66-8708-cb20bdd3be27)
364 KiB
Author
Owner

right now working on testing the upgrade procedure, everything seems to be in place

one more note: some of the flists used in zos image are coming from v3 hub, e.g traefik, not a deal breaker for now

right now working on testing the upgrade procedure, everything seems to be in place one more note: some of the flists used in zos image are coming from v3 hub, e.g traefik, not a deal breaker for now
Author
Owner

upgrade seems to work fine and the nodes are catching the updates

  • updating the hub worker for symlinking to latest (no issue)

now checking if we can boot a VM in that node (beyond the issue scope)

upgrade seems to work fine and the nodes are catching the updates - updating the hub worker for symlinking to latest (no issue) now checking if we can boot a VM in that node (beyond the issue scope)
Author
Owner

Also, how people are going to create farms? to use as a kernel param while booting?a small website or a desktop app? or from the phone?new phone app?

Also, how people are going to create farms? to use as a kernel param while booting?a small website or a desktop app? or from the phone?new phone app?
Author
Owner

didn't get an answer from kds yet on how to create the farms

also today ops tried to boot a new node on qa, and it's not working. Maxime and Ashraf are needed to debug it image

didn't get an answer from kds yet on how to create the farms also today ops tried to boot a new node on qa, and it's not working. Maxime and Ashraf are needed to debug it <img width="960" alt="image" src="attachments/d69e0197-9cab-476e-8f2d-e082037bb07c">
1.1 MiB
thabeta modified the due date from 2025-01-15 to 2025-01-21 2025-02-13 14:40:07 +00:00
thabeta modified the due date from 2025-01-21 to 2025-02-18 2025-02-13 14:40:28 +00:00
Owner

OK good to know. So I'll tell vendors they can't test yet. Please lmk when it's possible for them to test.

OK good to know. So I'll tell vendors they can't test yet. Please lmk when it's possible for them to test.
Author
Owner

Well the upgrade worked fine image

There are small things need to happen https://github.com/threefoldtech/tf_operations/issues/3138#issuecomment-2662660245 but Maxime's garage was caught on fire.

Well the upgrade worked fine ![image](/attachments/11ab1642-3f8b-4629-b168-6698a32bbb93) There are small things need to happen https://github.com/threefoldtech/tf_operations/issues/3138#issuecomment-2662660245 but Maxime's garage was caught on fire.
466 KiB
Author
Owner

booted couple of nodes on dev, qa, and booting more on test and main

remaining tooling to show the number of nodes, farms, creating accounts, farms

will provide the docs to use some cli tools for it.

booted couple of nodes on dev, qa, and booting more on test and main remaining tooling to show the number of nodes, farms, creating accounts, farms will provide the docs to use some cli tools for it.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

2025-02-18

Dependencies

No dependencies set.

Reference: tfgrid/circle_engineering#158
No description provided.