Doing a garbage collect of the Sha256 based databases means we remove
all the records that have been deleted from our file.
We also sort the file to have all the jump-tables at the end, making it
much cheaper on memory-locality to find (or not) items in the DB.
The downsides are that this prune step takes time, writes dozens of MBs
and that we lose checkpoints. The latter means we no longer can rollback
to a safe position, simply because we flushed those records.
So we want to do this often enough to avoid fragmentation but not too
often because it creates a greater risk on data consistency.
This algoritm checks the actual data and calculates the fragmentation of
the jump-tables to decide if we want to start a GC.
When we do GC, we try to do as many files as makes sense, to make sure we
can wait quite a long time before we need to do a new GC.
Boost throws an exception when the resize fails, which would cause
a total shutdown of the client. So make sure we catch it in the
scheduled task to avoid this problem.
In some unit tests I noticed that we write a block that was just loaded
from disk, this check avoids this overhead.
Not sure how relevant this is for normal operations.
The recovering of orphans was recursive and that meant there was a max
length of headers we could process with a gap in the chain due to normal
stack-depth for recursivity (approx 50k).
As headers are being provided to us from external peers this could be a
DOS vector.
This implementation avoids this problem by not being recursive.
When, on loading, the blockindex and the UTXO don't agree then try to find an older UTXO
state where they do agree.
The most typical state issue is where a block stored in the blocksdb is not available in
the index due to corruption or similar.
When reindexing we now always first try to open a file read-write because
the algo to only open the last file in a sequence as RW failed in such
situations where we don't know the amount of files yet.
To allow the UTXO to actually use the power of checkpoints we need to
make sure that the block-validation state is not stored separately from
it.
The goal is that when we have some curruption we can just go back to an
earlier state of the UTXO and re-validate the blocks to get to the
current tip. The often seen problem is that corruption will instead
leave the block-index (leveldb) with an incorrect state so the replay
fails.
This change solves that by no longer reading the block-validation-state
and no longer writing it on a state change.
The UTXO keeps outdated records around in an append-only file, which
means we need to do a garbage collect regularly.
This new algo uses the commulatative amount of changes since last GC
(aka prune) as an indicator to plan a new one.
The effect should be much smaller files to keep in memory and the data and
jump tables being much more localized which should result in higher
throughput.
THe builder now allows you to create a message with the
serviceId, the messageId and the requestId pre-set.
This benefits code that just takes the output from the
builder and calls 'send' on it immediately saving several
lines of code.
Ship unit-testing functionality in the releease builds too,
external users may want to compile their apps in debug mode while
flowee is a release build.
This improves the double spend proof orphans code.
Also add a DSProof log-category and lots of log lines to make
looking at a debug build much more fun.
From now on, make sure that the minor always has at least 2 digits
which allows us to use string compare of versions even if we have
more than 9 releases in a year.
The FloweeJS component uses this class but requires the usage
of the NodeJS 'main' thread. Which needs safe access to the jobs
list at the same time the Flowee workers need access.
Simple solution; add a mutex.