stronger health check for all ZOS Nodes #7

opened 2024-03-05 04:21:28 +00:00 by despiegk · 8 comments

A separate component to be ran by the guardians to execute healthchecks / benchmarks on a VM along with ZOS checks on the node


Another component that's supposed to launch (deploy) these VMs with healthchecks on a predefined farms or even randomly on farms (needs to happen on all nodes in the farm)

Healthchecker VM

Should execute benchmark tests for
1- CPU
2- Disk
3- Network

And exposes them on an endpoint or a way to notify some other component


Should be able to collect test results from the VMs deployed either via pulling or via webhook

Syncing the results

That information from each guardian should be propagated to other guardians so that can be using maybe etcd or some other component based on raft, not sure if introducing a tendermint cluster would be useful, given thees results are collected every 3 mins

A [separate component]( to be ran by the guardians to execute healthchecks / benchmarks on a VM along with ZOS checks on the node ## Spawner Another component that's supposed to launch (deploy) these VMs with healthchecks on a predefined farms or even randomly on farms (needs to happen on all nodes in the farm) ## Healthchecker VM Should execute benchmark tests for 1- CPU 2- Disk 3- Network And exposes them on an endpoint or a way to notify some other component ## Aggregator/Collector Should be able to collect test results from the VMs deployed either via pulling or via webhook ## Syncing the results That information from each guardian should be propagated to other guardians so that can be using maybe etcd or some other component based on raft, not sure if introducing a tendermint cluster would be useful, given thees results are collected every 3 mins
despiegk added this to the (deleted) project 2024-03-05 04:21:28 +00:00
despiegk added the
label 2024-03-05 04:21:33 +00:00

agreed not to do any post requests,
also the nodes will expose lastdeployment timestamp

agreed not to do any post requests, also the nodes will expose lastdeployment timestamp
Owner waiting this to be merged

30th of April

30th of April

kristof needs to get more info, and then we can create better specs

kristof needs to get more info, and then we can create better specs

until things are cleared up we are still proceeding with the obvious ones

until things are cleared up we are still proceeding with the obvious ones
despiegk modified the project from (deleted) to tfgrid_3_14 2024-05-22 07:56:25 +00:00
despiegk modified the project from tfgrid_3_14 to tfgrid_3_17 2024-06-02 05:37:24 +00:00

is ok to move to 3.15, we need better specs indeed, story needs update

is ok to move to 3.15, we need better specs indeed, story needs update
ashraf was assigned by thabeta 2024-06-02 06:59:15 +00:00
despiegk changed title from health check to stronger health check for all ZOS Nodes 2024-07-03 12:55:51 +00:00

suggested specs (simple and pragmatic)

suggested specs (simple and pragmatic)
thabeta added the due date 2024-08-15 2024-08-01 11:36:46 +00:00
thabeta self-assigned this 2024-08-01 12:54:45 +00:00
FYI: everything ins the suggested specs is implemented, let me know if that's sufficient @despiegk
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.



No dependencies set.

Reference: tfgrid/circle_product_management#7
No description provided.