Now multiple different targets use this database, the messages
were confusing. Next to that "pruning" is not really correct
and it lead to confusing. More correct is the term GC.
In the rare case when we already created a new DB file, but never
managed to write a .info file, the constructor would end up rejecting
the whole DB.
This just drops off the last useless file and goes from there.
Fix a bug that could make the wait between saves too long and now aim to
make the saving more often because this helps us determine in time that
the file is full so we don't go over the end.
Write a changesSincePruning integer which has the following effects;
after a restart;
* We will do a prune faster and/or more often for nodes that are not
running very long stretches of time.
* We will skip pruning files that don't need it.
Context; pruning of the UTXO is throwing away already spent outputs
that are still in our dirty files.
This avoids an internal error from failing one block due
to a perceived missing input while the block may be fine
but other issues need looking at (for instance disk errors).
Checking if my database file is full can not be assumed to only happen
on end-of-block if blocks get really large.
I'm taking a small performance hit to check this on insert and
flushSomeNodesToDisk on a more regular basis.
The 'server' library has always been a catch-all and
ideally only the hub links it in (far future goal).
In line with this I move a list of files out of server
into the utils lib.
I choose 'utils' because all these are plain old data
objects that many crypto apps will find useful.
now in utils/primitives/
* CScript
* CPubKey
* CTransaction
* CBlock
* FastTransaction
* FastBlock
* CScript
streams.h is now in utils/streaming/
hash.h is now in utils/
This is really only relevant for people using btrfs on Linux, we will
set the utxo dir to be no-copy-on-write which causes all new databases
to follow this flag and be faster due to lack of copying on write.
The reason there are no standard library versions of lock-free
containers is because you want to always take full advantage of
the details in question.
In this case (read millions of times for each modification) it makes
no sense to use anything other than a standard container, but put in
a copy-on-write block. Simple and easy.
Replace the m_buckets unsorted map with a lock-free version
based on atomic pointers. (BucketMap)
remove the m_leafs and move those into the bucket struct.
Make the access to the jumptable transactional to avoid one big lock
over all datastructures.
On my threadripper 2990WX the entire 150GB BCH blockchain was
parsed and imported in under 3 hours.
At least 50% of the UTXO entries reuse the posOnDisk
because they are outputs on the same tx. This changes
such entries from using 2 bytes to 1 byte only when saved
in a bucket.
On low core-count machines the request to save could end up being
delayed since all threads are busy doing things like validating
transactions.
So when a really big block came in I ended up throttling while there
was no save method running in parallel at all.
This fixes this and also makes throtteling be a per-thread thing
instead of doing it inside the mutex.
The current system writes based on the amount of changes, which is a bit
risky as long as we have small blocks as the amount of changes may for
hours be below the "lets write" limit.
So also write every 5 minutes, just to make sure we won't keep data in
memory longer than needed. Also this allows pruning to happen for people
that don't run their node all the time.
we only save a subsection of stuff in memory on each round of saving,
but I set the counter to zero on how long to wait for the next save.
This makes us save more often till we actually saved most stuff, avoiding
buildups of caches until we do the periodic big flush.
This change makes the limits be static variables, which means they are
only settable once for a process. The only usecase so far is to use much
smaller limits in testing situations.
This allows queries like RPC and mempool-accept to continue
even during a prune (we just use the 'old' DB) and when its
done the last active query will delete the old DB.
This assumes a sane filesystem where you can rename a file that
is in use.
When pruning we sort leafs by txid / output and refrain from
writing the txid repeatedly for outputs of the same tx.
Additionally, use the fact that we only ever get to a leaf via
a bucket and since the first 64 bits of a txid is there, skip
repeating them when writing to disk.
Last, make pruning have different strategies.
This should shrink the utxo by about 40%