stronger health check for all ZOS Nodes #7

Open
opened 2024-03-05 04:21:28 +00:00 by despiegk · 8 comments
Owner

A separate component to be ran by the guardians to execute healthchecks / benchmarks on a VM along with ZOS checks on the node

Spawner

Another component that's supposed to launch (deploy) these VMs with healthchecks on a predefined farms or even randomly on farms (needs to happen on all nodes in the farm)

Healthchecker VM

Should execute benchmark tests for
1- CPU
2- Disk
3- Network

And exposes them on an endpoint or a way to notify some other component

Aggregator/Collector

Should be able to collect test results from the VMs deployed either via pulling or via webhook

Syncing the results

That information from each guardian should be propagated to other guardians so that can be using maybe etcd or some other component based on raft, not sure if introducing a tendermint cluster would be useful, given thees results are collected every 3 mins

A [separate component](https://github.com/threefoldtech/guardians_healthchecker) to be ran by the guardians to execute healthchecks / benchmarks on a VM along with ZOS checks on the node ## Spawner Another component that's supposed to launch (deploy) these VMs with healthchecks on a predefined farms or even randomly on farms (needs to happen on all nodes in the farm) ## Healthchecker VM Should execute benchmark tests for 1- CPU 2- Disk 3- Network And exposes them on an endpoint or a way to notify some other component ## Aggregator/Collector Should be able to collect test results from the VMs deployed either via pulling or via webhook ## Syncing the results That information from each guardian should be propagated to other guardians so that can be using maybe etcd or some other component based on raft, not sure if introducing a tendermint cluster would be useful, given thees results are collected every 3 mins
despiegk added this to the (deleted) project 2024-03-05 04:21:28 +00:00
despiegk added the
Story
label 2024-03-05 04:21:33 +00:00
Member

agreed not to do any post requests, https://github.com/threefoldtech/tf_operations/issues/2353
also the nodes will expose lastdeployment timestamp

agreed not to do any post requests, https://github.com/threefoldtech/tf_operations/issues/2353 also the nodes will expose lastdeployment timestamp
Member
https://github.com/threefoldtech/tfgrid-sdk-go/issues/891 waiting this to be merged https://github.com/threefoldtech/zos/pull/2247
Member

30th of April

30th of April
Author
Owner

kristof needs to get more info, and then we can create better specs

kristof needs to get more info, and then we can create better specs
Member

until things are cleared up we are still proceeding with the obvious ones

https://github.com/threefoldtech/home/issues/1535

until things are cleared up we are still proceeding with the obvious ones https://github.com/threefoldtech/home/issues/1535
despiegk modified the project from (deleted) to tfgrid_3_14 2024-05-22 07:56:25 +00:00
despiegk modified the project from tfgrid_3_14 to tfgrid_3_17 2024-06-02 05:37:24 +00:00
Author
Owner

is ok to move to 3.15, we need better specs indeed, story needs update

is ok to move to 3.15, we need better specs indeed, story needs update
ashraf was assigned by thabeta 2024-06-02 06:59:15 +00:00
despiegk changed title from health check to stronger health check for all ZOS Nodes 2024-07-03 12:55:51 +00:00
Member

suggested specs (simple and pragmatic) https://github.com/threefoldtech/home/issues/1535

suggested specs (simple and pragmatic) https://github.com/threefoldtech/home/issues/1535
thabeta added the due date 2024-08-15 2024-08-01 11:36:46 +00:00
thabeta self-assigned this 2024-08-01 12:54:45 +00:00
Member
FYI: everything ins the suggested specs is implemented, https://github.com/threefoldtech/tf-images/pull/272 https://github.com/threefoldtech/guardians_healthchecker/tree/development/spawner let me know if that's sufficient @despiegk
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

2024-08-15

Dependencies

No dependencies set.

Reference: tfgrid/circle_engineering#7
No description provided.