updated smaller collections for manual
This commit is contained in:
297
collections/technology/qsss/manual/qsfs_setup.md
Normal file
297
collections/technology/qsss/manual/qsfs_setup.md
Normal file
@@ -0,0 +1,297 @@
|
||||
# QSFS getting started on ubuntu setup
|
||||
|
||||
## Get components
|
||||
|
||||
The following steps can be followed to set up a qsfs instance on a fresh
|
||||
ubuntu instance.
|
||||
|
||||
- Install the fuse kernel module (`apt-get update && apt-get install fuse3`)
|
||||
- Install the individual components, by downloading the latest release from the
|
||||
respective release pages:
|
||||
- 0-db-fs: https://github.com/threefoldtech/0-db-fs/releases
|
||||
- 0-db: https://github.com/threefoldtech/0-db, if multiple binaries
|
||||
are available in the assets, choose the one ending in `static`
|
||||
- 0-stor: https://github.com/threefoldtech/0-stor_v2/releases, if
|
||||
multiple binaries are available in the assets, choose the one
|
||||
ending in `musl`
|
||||
- Make sure all binaries are executable (`chmod +x $binary`)
|
||||
|
||||
## Setup and run 0-stor
|
||||
|
||||
There are instructions below for a local 0-stor configuration. You can also deploy an eVDC and use the [provided 0-stor configuration](evdc_storage) for a simple cloud hosted solution.
|
||||
|
||||
We will run 6 0-db instances as backends for 0-stor. 4 are used for the
|
||||
metadata, 2 are used for the actual data. The metadata always consists
|
||||
of 4 nodes. The data backends can be increased. You can choose to either
|
||||
run 7 separate 0-db processes, or a single process with 7 namespaces.
|
||||
For the purpose of this setup, we will start 7 separate processes, as
|
||||
such:
|
||||
|
||||
> This assumes you have moved the download 0-db binary to `/tmp/0-db`
|
||||
|
||||
```bash
|
||||
/tmp/0-db --background --mode user --port 9990 --data /tmp/zdb-meta/zdb0/data --index /tmp/zdb-meta/zdb0/index
|
||||
/tmp/0-db --background --mode user --port 9991 --data /tmp/zdb-meta/zdb1/data --index /tmp/zdb-meta/zdb1/index
|
||||
/tmp/0-db --background --mode user --port 9992 --data /tmp/zdb-meta/zdb2/data --index /tmp/zdb-meta/zdb2/index
|
||||
/tmp/0-db --background --mode user --port 9993 --data /tmp/zdb-meta/zdb3/data --index /tmp/zdb-meta/zdb3/index
|
||||
|
||||
/tmp/0-db --background --mode seq --port 9980 --data /tmp/zdb-data/zdb0/data --index /tmp/zdb-data/zdb0/index
|
||||
/tmp/0-db --background --mode seq --port 9981 --data /tmp/zdb-data/zdb1/data --index /tmp/zdb-data/zdb1/index
|
||||
/tmp/0-db --background --mode seq --port 9982 --data /tmp/zdb-data/zdb2/data --index /tmp/zdb-data/zdb2/index
|
||||
```
|
||||
|
||||
Now that the data storage is running, we can create the config file for
|
||||
0-stor. The (minimal) config for this example setup will look as follows:
|
||||
|
||||
```toml
|
||||
minimal_shards = 2
|
||||
expected_shards = 3
|
||||
redundant_groups = 0
|
||||
redundant_nodes = 0
|
||||
socket = "/tmp/zstor.sock"
|
||||
prometheus_port = 9100
|
||||
zdb_data_dir_path = "/tmp/zdbfs/data/zdbfs-data"
|
||||
max_zdb_data_dir_size = 25600
|
||||
|
||||
[encryption]
|
||||
algorithm = "AES"
|
||||
key = "000001200000000001000300000004000a000f00b00000000000000000000000"
|
||||
|
||||
[compression]
|
||||
algorithm = "snappy"
|
||||
|
||||
[meta]
|
||||
type = "zdb"
|
||||
|
||||
[meta.config]
|
||||
prefix = "someprefix"
|
||||
|
||||
[meta.config.encryption]
|
||||
algorithm = "AES"
|
||||
key = "0101010101010101010101010101010101010101010101010101010101010101"
|
||||
|
||||
[[meta.config.backends]]
|
||||
address = "[::1]:9990"
|
||||
|
||||
[[meta.config.backends]]
|
||||
address = "[::1]:9991"
|
||||
|
||||
[[meta.config.backends]]
|
||||
address = "[::1]:9992"
|
||||
|
||||
[[meta.config.backends]]
|
||||
address = "[::1]:9993"
|
||||
|
||||
[[groups]]
|
||||
[[groups.backends]]
|
||||
address = "[::1]:9980"
|
||||
|
||||
[[groups.backends]]
|
||||
address = "[::1]:9981"
|
||||
|
||||
[[groups.backends]]
|
||||
address = "[::1]:9982"
|
||||
```
|
||||
|
||||
> A full explanation of all options can be found in the 0-stor readme:
|
||||
https://github.com/threefoldtech/0-stor_v2/#config-file-explanation
|
||||
|
||||
This guide assumes the config file is saved as `/tmp/zstor_config.toml`.
|
||||
|
||||
Now `zstor` can be started. Assuming the downloaded binary was saved as
|
||||
`/tmp/zstor`:
|
||||
|
||||
`/tmp/zstor -c /tmp/zstor_config.toml monitor`. If you don't want the
|
||||
process to block your terminal, you can start it in the background:
|
||||
`nohup /tmp/zstor -c /tmp/zstor_config.toml monitor &`.
|
||||
|
||||
## Setup and run 0-db
|
||||
|
||||
First we will get the hook script. The hook script can be found in the
|
||||
[quantum_storage repo on github](https://github.com/threefoldtech/quantum-storage).
|
||||
A slightly modified version is found here:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -ex
|
||||
|
||||
action="$1"
|
||||
instance="$2"
|
||||
zstorconf="/tmp/zstor_config.toml"
|
||||
zstorbin="/tmp/zstor"
|
||||
|
||||
if [ "$action" == "ready" ]; then
|
||||
${zstorbin} -c ${zstorconf} test
|
||||
exit $?
|
||||
fi
|
||||
|
||||
if [ "$action" == "jump-index" ]; then
|
||||
namespace=$(basename $(dirname $3))
|
||||
if [ "${namespace}" == "zdbfs-temp" ]; then
|
||||
# skipping temporary namespace
|
||||
exit 0
|
||||
fi
|
||||
|
||||
tmpdir=$(mktemp -p /tmp -d zdb.hook.XXXXXXXX.tmp)
|
||||
dirbase=$(dirname $3)
|
||||
|
||||
# upload dirty index files
|
||||
for dirty in $5; do
|
||||
file=$(printf "i%d" $dirty)
|
||||
cp ${dirbase}/${file} ${tmpdir}/
|
||||
done
|
||||
|
||||
${zstorbin} -c ${zstorconf} store -s -d -f ${tmpdir} -k ${dirbase} &
|
||||
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if [ "$action" == "jump-data" ]; then
|
||||
namespace=$(basename $(dirname $3))
|
||||
if [ "${namespace}" == "zdbfs-temp" ]; then
|
||||
# skipping temporary namespace
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# backup data file
|
||||
${zstorbin} -c ${zstorconf} store -s --file "$3"
|
||||
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if [ "$action" == "missing-data" ]; then
|
||||
# restore missing data file
|
||||
${zstorbin} -c ${zstorconf} retrieve --file "$3"
|
||||
exit $?
|
||||
fi
|
||||
|
||||
# unknown action
|
||||
exit 1
|
||||
```
|
||||
|
||||
> This guide assumes the file is saved as `/tmp/zdbfs/zdb-hook.sh. Make sure the
|
||||
> file is executable, i.e. chmod +x /tmp/zdbfs/zdb-hook.sh`
|
||||
|
||||
The local 0-db which is used by 0-db-fs can be started as follows:
|
||||
|
||||
```bash
|
||||
/tmp/0-db \
|
||||
--index /tmp/zdbfs/index \
|
||||
--data /tmp/zdbfs/data \
|
||||
--datasize 67108864 \
|
||||
--mode seq \
|
||||
--hook /tmp/zdbfs/zdb-hook.sh \
|
||||
--background
|
||||
```
|
||||
|
||||
## Setup and run 0-db-fs
|
||||
|
||||
Finally, we will start 0-db-fs. This guides opts to mount the fuse
|
||||
filesystem in `/mnt`. Again, assuming the 0-db-fs binary was saved as
|
||||
`/tmp/0-db-fs`:
|
||||
|
||||
```bash
|
||||
/tmp/0-db-fs /mnt -o autons -o background
|
||||
```
|
||||
|
||||
You should now have the qsfs filesystem mounted at `/mnt`. As you write
|
||||
data, it will save it in the local 0-db, and it's data containers will
|
||||
be periodically encoded and uploaded to the backend data storage 0-db's.
|
||||
The data files in the local 0-db will never occupy more than 25GiB of
|
||||
space (as configured in the 0-stor config file). If a data container is
|
||||
removed due to space constraints, and data inside of it needs to be
|
||||
accessed by the filesystem (e.g. a file is being read), then the data
|
||||
container is recovered from the backend storage 0-db's by 0-stor, and
|
||||
0-db can subsequently serve this data to 0-db-fs.
|
||||
|
||||
### 0-db-fs limitation
|
||||
|
||||
Any workload should be supported on this filesystem, with some exceptions:
|
||||
|
||||
- Opening a file in 'always append mode' will not have the expected behavior
|
||||
- There is no support of O_TMPFILE by fuse layer, which is a feature required by
|
||||
overlayfs, thus this is not supported. Overlayfs is used by Docker for example.
|
||||
|
||||
## docker setup
|
||||
|
||||
It is possible to run the zstor in a docker container. First, create a data directory
|
||||
on your host. Then, save the config file in the data directory as `zstor.toml`. Ensure
|
||||
the storage 0-db's are running as desribed above. Then, run the docker container
|
||||
as such:
|
||||
|
||||
```
|
||||
docker run -ti --privileged --rm --network host --name fstest -v /path/to/data:/data -v /mnt:/mnt:shared azmy/qsfs
|
||||
```
|
||||
|
||||
The filesystem is now available in `/mnt`.
|
||||
|
||||
## Autorepair
|
||||
|
||||
Autorepair automatically repairs object stored in the backend when one or more shards
|
||||
are not reachable anymore. It does this by periodically checking if all the backends
|
||||
are still reachable. If it detects that one or more of the backends used by an encoded
|
||||
object are not reachable, the healthy shards are downloaded, the object is restored
|
||||
and encoded again (possibly with a new config, if it has since changed), and uploaded
|
||||
again.
|
||||
|
||||
Autorepair does not validate the integrity of individual shards. This is protectected
|
||||
against by having multiple spare (redundant) shards for an object. Corrupt shards
|
||||
are detected when the object is rebuild, and removed before attempting to rebuild.
|
||||
Autorepair also does not repair the metadata of objects.
|
||||
|
||||
## Monitoring, alerting and statistics
|
||||
|
||||
0-stor collects metrics about the system. It can be configured with a 0-db-fs mountpoint,
|
||||
which will trigger 0-stor to collect 0-db-fs statistics, next to some 0-db statistics
|
||||
which are always collected. If the `prometheus_port` config option is set, 0-stor
|
||||
will serve metrics on this port for scraping by prometheus. You can then set up
|
||||
graphs and alerts in grafana. Some examples include: disk space used vs available
|
||||
per 0-db backend, total entries in 0-db backends, which backends are tracked, ...
|
||||
When 0-db-fs monitoring is enabled, statistics are also exported about the filesystem
|
||||
itself, such as read/write speeds, syscalls, and internal metrics
|
||||
|
||||
For a full overview of all available stats, you can set up a prometheus scraper against
|
||||
a running instance, and use the embedded promQl to see everything available.
|
||||
|
||||
## Data safety
|
||||
|
||||
As explained in the auto repair section, data is periodically checked and rebuild if
|
||||
0-db backends become unreachable. This ensures that data, once stored, remains available,
|
||||
as long as the metadata is still present. When needed, the system can be expanded with more
|
||||
0-db backends, and the encoding config can be changed if needed (e.g. to change encryption keys).
|
||||
|
||||
## Performance
|
||||
|
||||
Qsfs is not a high speed filesystem, nor is it a distributed filesystem. It is intended to
|
||||
be used for archive purposes. For this reason, the qsfs stack focusses on data safety first.
|
||||
Where needed, reliability is chosen over availability (i.e. we won't write data if we can't
|
||||
guarantee all the conditions in the required storage profile is met).
|
||||
|
||||
With that being said, there are currently 2 limiting factors in the setup:
|
||||
- speed of the disk on which the local 0-db is running
|
||||
- network
|
||||
|
||||
The first is the speed of the disk for the local 0-db. This imposes a hard limit on
|
||||
the throughput of the filesystem. Performance testing has shown that write speeds
|
||||
on the filesystem reach performance of roughly 1/3rd of the raw performance of the
|
||||
disk for writing, and 1/2nd of the read performance. Note that in the case of _very_
|
||||
fast disks (mostly NVMe SSD's), the cpu might become a bottleneck if it is old and
|
||||
has a low clock speed. Though this should not be a problem.
|
||||
|
||||
The network is more of a soft cap. All 0-db data files will be encoded and distributed
|
||||
over the network. This means that the upload speed of the node needs to be able to
|
||||
handle this data througput. In the case of random data (which is not compressable),
|
||||
the required upload speed would be the write speed of the 0-db-fs, increased by the
|
||||
overhead generated by the storage policy. There is no feedback to 0-db-fs if the upload
|
||||
of data is lagging behind. This means that in cases where a sustained high speed write
|
||||
load is applied, the local 0-db might eventually grow bigger than the configured size limit
|
||||
until the upload managed to catch up. If this happens for prolonged periods of time, it
|
||||
is technically possible to run out of space on the disk. For this reason, you should
|
||||
always have some extra space available on the disk to account for temprorary cache
|
||||
excess.
|
||||
|
||||
When encoded data needs to be recovered from backend nodes (if it is not in cache),
|
||||
the read speed will be equal to the connection speed of the slowest backend, as all
|
||||
shards are recovered before the data is build. This means that recovery of historical
|
||||
data will generally be a slow process. Since we primarily focus on archive storage,
|
||||
we do not consider this a priority.
|
Reference in New Issue
Block a user