info_tfgrid/collections/tech/qsss/qss_algorithm.md

113 lines
5.7 KiB
Markdown
Raw Normal View History

2024-02-27 11:11:12 +00:00
# Quantum Safe Storage Algoritm
The Quantum Safe Storage Algorithm is the heart of the Storage engine. The storage engine takes the original data objects and creates data part descriptions that it stores over many virtual storage devices (ZDB/s)
Data gets stored over multiple ZDB's in such a way that data can never be lost.
Unique features
- data always append, can never be lost
- even a quantum computer cannot decrypt the data
- is spread over multiple sites, sites can be lost, data will still be available
- protects for datarot.
2024-03-18 12:28:08 +00:00
## Why
2024-02-27 11:11:12 +00:00
Today we produce more data than ever before. We could not continue to make full copies of data to make sure it is stored reliably. This will simply not scale. We need to move from securing the whole dataset to securing all the objects that make up a dataset.
2024-03-18 12:28:08 +00:00
ThreeFold is using space technology to store data (fragments) over multiple devices (physical storage devices in TFNodes). The solution does not distribute and store parts of an object (file, photo, movie...) but describes the part of an object. This could be visualized by thinking of it as equations.
## How is it done today
![alt text](storage_today.png)
In most distributed systems as used on the Internet or in blockchain land today the data will get replicated (sometimes after sharding, which means distributed based on the content of the file and spread out over the world).
This leads to a lot of overhead and minimal control where the data is.
In well optimized systems overhead will be 400% but in some it can be orders of magnitude higher to get to a reasonable redundancy level.
2024-02-27 11:11:12 +00:00
2024-03-18 12:28:08 +00:00
## The Quantum Safe storage System Works Differently
2024-02-27 11:11:12 +00:00
2024-03-18 12:28:08 +00:00
![alt text](qsss_overview.png)
ThreeFold has developed a new storage algoritm which is more efficient, ultra reliable and allows you full control over where you want your data to be stored.
ThreeFold's approach is different, lets try to visualize by means of simple analogy with equations.
2024-02-27 11:11:12 +00:00
Let a,b,c,d.... be the parts of that original object. You could create endless unique equations using these parts. A simple example: let's assume we have 3 parts of original objects that have the following values:
2024-03-18 12:28:08 +00:00
2024-02-27 11:11:12 +00:00
```
a=1
b=2
c=3
```
2024-03-18 12:28:08 +00:00
(and for reference the part of real-world objects is not a simple number like `1` but a unique digital number describing the part, like the binary code for it `110101011101011101010111101110111100001010101111011.....`).
With these numbers we could create endless amounts of equations:
2024-02-27 11:11:12 +00:00
```
1: a+b+c=6
2: c-b-a=0
3: b-c+a=0
4: 2b+a-c=2
5: 5c-b-a=12
2024-03-18 12:28:08 +00:00
2024-02-27 11:11:12 +00:00
......
2024-03-18 12:28:08 +00:00
2024-02-27 11:11:12 +00:00
```
2024-03-18 12:28:08 +00:00
Mathematically we only need 3 to describe the content (=value) of the fragments. But creating more adds reliability. Now store those equations distributed (one equation per physical storage device) and forget the original object. So we no longer have access to the values of a, b, c and see, and we just remember the locations of all the equations created with the original data fragments.
Mathematically we need three equations (any 3 of the total) to recover the original values for a, b or c. So do a request to retrieve 3 of the many equations and the first 3 to arrive are good enough to recalculate the original values. Three randomly retrieved equations are:
2024-02-27 11:11:12 +00:00
```
5c-b-a=12
b-c+a=0
2b+a-c=2
```
And this is a mathematical system we could solve:
2024-03-18 12:28:08 +00:00
2024-02-27 11:11:12 +00:00
- First: `b-c+a=0 -> b=c-a`
- Second: `2b+a-c=2 -> c=2b+a-2 -> c=2(c-a)+a-2 -> c=2c-2a+a-2 -> c=a+2`
- Third: `5c-b-a=12 -> 5(a+2)-(c-a)-a=12 -> 5a+10-(a+2)+a-a=12 -> 5a-a-2=2 -> 4a=4 -> a=1`
Now that we know `a=1` we could solve the rest `c=a+2=3` and `b=c-a=2`. And we have from 3 random equations regenerated the original fragments and could now recreate the original object.
The redundancy and reliability in such system comes in the form of creating (more than needed) equations and storing them. As shown these equations in any random order could recreate the original fragments and therefore redundancy comes in at a much lower overhead.
2024-03-18 12:28:08 +00:00
In our system we don't don this with 3 parts but with thousands.
2024-02-27 11:11:12 +00:00
### Example of 16/4
![](img/quantumsafe_storage_algo.jpg)
Each object is fragmented into 16 parts. So we have 16 original fragments for which we need 16 equations to mathematically describe them. Now let's make 20 equations and store them dispersedly on 20 devices. To recreate the original object we only need 16 equations, the first 16 that we find and collect which allows us to recover the fragment and in the end the original object. We could lose any 4 of those original 20 equations.
2024-03-18 12:28:08 +00:00
The likelihood of losing 4 independent, dispersed storage devices at the same time is very low. Since we have continuous monitoring of all of the stored equations, we could create additional equations immediately when one of them is missing, making it an auto-regeneration of lost data and a self-repairing storage system.
2024-02-27 11:11:12 +00:00
2024-03-18 12:28:08 +00:00
> The overhead in this example is 4 out of 20 which is a mere **20%** instead of **400%** .
## Can be used for content delivery.
2024-02-27 11:11:12 +00:00
This system can be used as backend for content delivery networks.
2024-03-18 12:28:08 +00:00
e.g. Content distribution Policy could be a 10/50 distribution which means, the content of a movie would be distributed over 60 locations from which we can loose 50 at the same time.
2024-02-27 11:11:12 +00:00
If someone now wants to download the data, the first 10 locations who answer fastest will provide enough of the data parts to allow the data to be rebuild.
2024-03-18 12:28:08 +00:00
The overhead here is more, compared to previous example, but stil order of magnitude lower compared to other cdn systems.
2024-02-27 11:11:12 +00:00
2024-03-18 12:28:08 +00:00
## The Quantum Safe Storage System is capable to avoid Datarot
2024-02-27 11:11:12 +00:00
2024-03-18 12:28:08 +00:00
Datarot is the fact that data storage degrades over time and becomes unreadable, on e.g. a harddisk.
2024-02-27 11:11:12 +00:00
The storage system provided by ThreeFold intercepts this silent data corruption, making that it can pass by unnotified.
> see also https://en.wikipedia.org/wiki/Data_degradation