updated smaller collections for manual
This commit is contained in:
85
collections/developers/internals/zos/performance/cpubench.md
Normal file
85
collections/developers/internals/zos/performance/cpubench.md
Normal file
@@ -0,0 +1,85 @@
|
||||
<h1> CPUBenchmark </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Configuration](#configuration)
|
||||
- [Details](#details)
|
||||
- [Result Sample](#result-sample)
|
||||
- [Result Explanation](#result-explanation)
|
||||
|
||||
***
|
||||
|
||||
## Overview
|
||||
|
||||
The `CPUBenchmark` task is designed to measure the performance of the CPU. it utilizes the [cpu-benchmark-simple](https://github.com/threefoldtech/cpu-benchmark-simple) tool and includes a zos stub to gather the number of workloads running on the node.
|
||||
|
||||
## Configuration
|
||||
|
||||
- Name: `cpu-benchmark`
|
||||
- Schedule: 4 times a day
|
||||
- Jitter: 0
|
||||
|
||||
## Details
|
||||
|
||||
- The benchmark simply runs a `CRC64` computation task, calculates the time spent in the computation and reports it in `seconds`.
|
||||
- The computation is performed in both single-threaded and multi-threaded scenarios.
|
||||
- Lower time = better performance: for a single threaded benchmark, a lower execution time indicates better performance.
|
||||
|
||||
## Result Sample
|
||||
|
||||
```json
|
||||
{
|
||||
"description": "Measures the performance of the node CPU by reporting the time spent of computing a task in seconds.",
|
||||
"name": "cpu-benchmark",
|
||||
"result": {
|
||||
"multi": 1.105,
|
||||
"single": 1.135,
|
||||
"threads": 1,
|
||||
"workloads": 0
|
||||
},
|
||||
"timestamp": 1700504403
|
||||
}
|
||||
```
|
||||
|
||||
## Result Explanation
|
||||
|
||||
The best way to know what's a good or bad value is by testing and comparing different hardware.
|
||||
Here are some examples:
|
||||
|
||||
**1x Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz** (Q3'2017)
|
||||
|
||||
```
|
||||
Single thread score: 0.777
|
||||
Multi threads score: 13.345 [16 threads]
|
||||
```
|
||||
|
||||
**1x Intel(R) Pentium(R) CPU G4400 @ 3.30GHz** (Q3'2015)
|
||||
|
||||
```
|
||||
Single thread score: 1.028
|
||||
Multi threads score: 2.089 [2 threads]
|
||||
```
|
||||
|
||||
**1x Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz** (Q2'2012)
|
||||
|
||||
```
|
||||
Single thread score: 2.943
|
||||
Multi threads score: 12.956 [4 threads]
|
||||
```
|
||||
|
||||
**2x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz** (Q1'2012)
|
||||
|
||||
```
|
||||
Single thread score: 1.298
|
||||
Multi threads score: 44.090 [32 threads]
|
||||
```
|
||||
|
||||
**2x Intel(R) Xeon(R) CPU L5640 @ 2.27GHz** (Q1'2010)
|
||||
|
||||
```
|
||||
Single thread score: 2.504
|
||||
Multi threads score: 72.452 [24 threads]
|
||||
```
|
||||
|
||||
As you can see, the more recent the CPU is, the faster it is, but for a same launch period, you can see Xeon way better than regular/desktop CPU. You have to take in account the amount of threads and the time per threads.
|
@@ -0,0 +1,38 @@
|
||||
<h1> Health Check </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Configuration](#configuration)
|
||||
- [Details](#details)
|
||||
- [Result Sample](#result-sample)
|
||||
|
||||
***
|
||||
|
||||
## Overview
|
||||
|
||||
Health check task executes some checks over ZOS components to determine if the node is in a usable state or not and set flags for the Power Daemon to stop uptime reports if the node is unusable.
|
||||
|
||||
## Configuration
|
||||
|
||||
- Name: `healthcheck`
|
||||
- Schedule: Every 20 mins.
|
||||
|
||||
## Details
|
||||
|
||||
- Check if the node cache disk is usable or not by trying to write some data to it. If it failed, it set the Readonly flag.
|
||||
|
||||
## Result Sample
|
||||
|
||||
```json
|
||||
{
|
||||
"description": "health check task runs multiple checks to ensure the node is in a usable state and set flags for the power daemon to stop reporting uptime if it is not usable",
|
||||
"name": "healthcheck",
|
||||
"result": {
|
||||
"cache": [
|
||||
"failed to write to cache: open /var/cache/healthcheck: operation not permitted"
|
||||
]
|
||||
},
|
||||
"timestamp": 1701599580
|
||||
}
|
||||
```
|
80
collections/developers/internals/zos/performance/iperf.md
Normal file
80
collections/developers/internals/zos/performance/iperf.md
Normal file
@@ -0,0 +1,80 @@
|
||||
<h1> IPerf </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Configuration](#configuration)
|
||||
- [Details](#details)
|
||||
- [Result Sample](#result-sample)
|
||||
|
||||
***
|
||||
|
||||
## Overview
|
||||
|
||||
The `iperf` package is designed to facilitate network performance testing using the `iperf3` tool. with both UDP and TCP over IPv4 and IPv6.
|
||||
|
||||
## Configuration
|
||||
|
||||
- Name: `iperf`
|
||||
- Schedule: 4 times a day
|
||||
- Jitter: 20 min
|
||||
|
||||
## Details
|
||||
|
||||
- The package using the iperf binary to examine network performance under different conditions.
|
||||
- It randomly fetch PublicConfig data for randomly public nodes on the chain + all public node from free farm. These nodes serve as the targets for the iperf tests.
|
||||
- For each node, it run the test with 4 times. through (UDP/TCP) using both node IPs (v4/v6)
|
||||
- result will be a slice of all public node report (4 for each) each one will include:
|
||||
```
|
||||
UploadSpeed: Upload speed (in bits per second).
|
||||
DownloadSpeed: Download speed (in bits per second).
|
||||
NodeID: ID of the node where the test was conducted.
|
||||
NodeIpv4: IPv4 address of the node.
|
||||
TestType: Type of the test (TCP or UDP).
|
||||
Error: Any error encountered during the test.
|
||||
CpuReport: CPU utilization report (in percentage).
|
||||
```
|
||||
|
||||
## Result Sample
|
||||
|
||||
```json
|
||||
{
|
||||
"description": "Test public nodes network performance with both UDP and TCP over IPv4 and IPv6",
|
||||
"name": "iperf",
|
||||
"result": [
|
||||
{
|
||||
"cpu_report": {
|
||||
"host_system": 2.4433388913571044,
|
||||
"host_total": 3.542919199613454,
|
||||
"host_user": 1.0996094859359695,
|
||||
"remote_system": 0.24430594945859846,
|
||||
"remote_total": 0.3854457128784448,
|
||||
"remote_user": 0.14115962407747246
|
||||
},
|
||||
"download_speed": 1041274.4792242317,
|
||||
"error": "",
|
||||
"node_id": 124,
|
||||
"node_ip": "88.99.30.200",
|
||||
"test_type": "tcp",
|
||||
"upload_speed": 1048549.3668460822
|
||||
},
|
||||
{
|
||||
"cpu_report": {
|
||||
"host_system": 0,
|
||||
"host_total": 0,
|
||||
"host_user": 0,
|
||||
"remote_system": 0,
|
||||
"remote_total": 0,
|
||||
"remote_user": 0
|
||||
},
|
||||
"download_speed": 0,
|
||||
"error": "unable to connect to server - server may have stopped running or use a different port, firewall issue, etc.: Network unreachable",
|
||||
"node_id": 124,
|
||||
"node_ip": "2a01:4f8:10a:710::2",
|
||||
"test_type": "tcp",
|
||||
"upload_speed": 0
|
||||
}
|
||||
],
|
||||
"timestamp": 1700507035
|
||||
}
|
||||
```
|
@@ -0,0 +1,90 @@
|
||||
<h1> Performance Monitor Package </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Flow](#flow)
|
||||
- [Node Initialization Check](#node-initialization-check)
|
||||
- [Scheduling](#scheduling)
|
||||
- [RMB Commands](#rmb-commands)
|
||||
- [Caching](#caching)
|
||||
- [Registered Tests](#registered-tests)
|
||||
- [Test Suite](#test-suite)
|
||||
|
||||
***
|
||||
|
||||
## Overview
|
||||
|
||||
The `perf` package is a performance monitor in `zos` nodes. it schedules tasks, cache their results and allows retrieval of these results through `RMB` calls.
|
||||
|
||||
## Flow
|
||||
|
||||
1. The `perf` monitor is started by the `noded` service in zos.
|
||||
2. Tasks are registered with a schedule in the new monitor.
|
||||
3. A bus handler is opened to allow result retrieval.
|
||||
|
||||
## Node Initialization Check
|
||||
|
||||
To ensure that the node always has a test result available, a check is performed on node startup for all the registered tasks, if a task doesn't have any stored result, it will run immediately without waiting for the next scheduled time.
|
||||
|
||||
## Scheduling
|
||||
|
||||
- Tasks are scheduled using a 6 fields cron format. this format provides flexibility to define time, allowing running tasks periodically or at specific time.
|
||||
|
||||
- Each task has a jitter which is the maximum number of seconds the task could sleep before it runs, this happens to prevent all tests ending up running at exactly the same time. So, for example, if a task is scheduled to run at `06:00` and its jitter is `10`, it is expected to run anywhere between `06:00` and `06:10`.
|
||||
|
||||
## RMB Commands
|
||||
|
||||
- `zos.perf.get`:
|
||||
|
||||
- Payload: a payload type that contains the name of the test
|
||||
|
||||
```go
|
||||
type Payload struct {
|
||||
Name string
|
||||
}
|
||||
```
|
||||
|
||||
Possible values:
|
||||
|
||||
- `"public-ip-validation"`
|
||||
- `"cpu-benchmark"`
|
||||
- `"iperf"`
|
||||
|
||||
- Return: a single task result.
|
||||
|
||||
- Possible Error: `ErrResultNotFound` if no result is stored for the given task.
|
||||
|
||||
- `zos.perf.get_all`:
|
||||
|
||||
- Return: all stored results
|
||||
|
||||
The rmb direct client can be used to call these commands. check the [example](https://github.com/threefoldtech/tfgrid-sdk-go/blob/development/rmb-sdk-go/examples/rpc_client/main.go)
|
||||
|
||||
## Caching
|
||||
|
||||
Results are stored in a Redis server running on the node.
|
||||
|
||||
The key in redis is the name of the task prefixed with the word `perf`.
|
||||
The value is an instance of `TaskResult` struct contains:
|
||||
|
||||
- Name of the task
|
||||
- Timestamp when the task was run
|
||||
- A brief description about what the task do
|
||||
- The actual returned result from the task
|
||||
|
||||
Notes:
|
||||
|
||||
- Storing results by a key ensures each new result overrides the old one, so there is always a single result for each task.
|
||||
- Storing results prefixed with `perf` eases retrieving all the results stored by this module.
|
||||
|
||||
## Registered Tests
|
||||
|
||||
- [Public IP Validation](./publicips.md)
|
||||
- [CPUBenchmark](./cpubench.md)
|
||||
- [IPerf](./iperf.md)
|
||||
- [Health Check](./healthcheck.md)
|
||||
|
||||
## Test Suite
|
||||
|
||||
Go to [this link](https://app.testlodge.com/a/26076/projects/40893/suites/234919) for a test suite covering the test cases for the performance testing.
|
@@ -0,0 +1,55 @@
|
||||
<h1> Public IPs Validation Task </h1>
|
||||
|
||||
<h2>Table of Contents</h2>
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [Configuration](#configuration)
|
||||
- [Task Details](#task-details)
|
||||
- [Result](#result)
|
||||
- [Result Sample](#result-sample)
|
||||
|
||||
***
|
||||
|
||||
## Introduction
|
||||
|
||||
The goal of the task is to make sure public IPs assigned to a farm are valid and can be assigned to deployments.
|
||||
|
||||
## Configuration
|
||||
|
||||
- Name: `public-ip-validation`
|
||||
- Schedule: 4 times a day
|
||||
- Jitter: 10 min
|
||||
|
||||
## Task Details
|
||||
|
||||
- The task depends on `Networkd` ensuring the proper test network setup is correct and will fail if it wasn't setup properly. The network setup consists of a test Namespace and a MacVLAN as part of it. All steps are done inside the test Namespace.
|
||||
- Decide if the node should run the task or another one in the farm based on the node ID. The node with the least ID and with power target as up should run it. The other will log why they shouldn't run the task and return with no errors. This is done to ensure only one node runs the task to avoid problems like assigning the same IP.
|
||||
- Get public IPs set on the farm.
|
||||
- Remove all IPs and routes added to the test MacVLAN to ensure any remaining from previous task run are removed.
|
||||
- Skip IPs that are assigned to a contract.
|
||||
- Set the MacVLAN link up.
|
||||
- Iterate over all public IPs and add them with the provided gateway to the MacVLAN.
|
||||
- Validate the IP by querying an external source that return the public IP for the node.
|
||||
- If the public IP returned matches the IP added in the link, then the IP is valid. Otherwise, it is invalid.
|
||||
- Remove all IPs and routes between each IP to make them available for other deployments.
|
||||
- After iterating over all public IPs, set the link down.
|
||||
|
||||
## Result
|
||||
|
||||
The task only returns a single map of String (IP) to IPReport. The report consists of the IP state (valid, invalid or skipped) and the reason for the state.
|
||||
|
||||
### Result Sample
|
||||
|
||||
```json
|
||||
{
|
||||
"description": "Runs on the least NodeID node in a farm to validate all its IPs.",
|
||||
"name": "public-ip-validation",
|
||||
"result": {
|
||||
"185.206.122.29/24": {
|
||||
"reason": "public ip or gateway data are not valid",
|
||||
"state": "invalid"
|
||||
}
|
||||
},
|
||||
"timestamp": 1700504421
|
||||
}
|
||||
```
|
Reference in New Issue
Block a user