As Bitcoin grows in usage the amount of bandwidth needed to download blocks and transaction broadcasts increases.
Clients implementing ''simplified payment verification'' do not attempt to fully verify the block chain, instead just checking that block headers connect together correctly and trusting that the transactions in a chain of high difficulty are in fact valid.
See the Bitcoin paper for more detail on this mode.
Today, [clients](https://bitcoin.org/en/developer-guide#simplified-payment-verification-spv) have to download the entire contents of blocks and all broadcast transactions, only to throw away the vast majority of the transactions that are not relevant to their wallets.
This slows down their synchronization process, wastes users bandwidth (which on phones is often metered) and increases memory usage.
All three problems are triggering real user complaints for the Android "Bitcoin Wallet" app which implements SPV mode.
In order to make chain synchronization fast, cheap and able to run on older phones with limited memory we want to have remote peers throw away irrelevant transactions before sending them across the network.
* Privacy: Because Bloom filters are probabilistic, with the false positive rate chosen by the client, nodes can trade off precision vs bandwidth usage.
A node with access to lots of bandwidth may choose to have a high FP rate, meaning the remote peer cannot accurately know which transactions belong to the client and which don't.
A node with very little bandwidth may choose to use a very accurate filter meaning that they only get sent transactions actually relevant to their wallet, but remote peers may be able to correlate transactions with IP addresses (and each other).
* Bloom filters are compact and testing membership in them is fast.
This results in satisfying performance characteristics with minimal risk of opening up potential for DoS attacks.
Upon receiving a <code>filterload</code> command, the remote peer will immediately restrict the broadcast transactions it announces (in `inv` packets) to transactions matching the filter, where the matching algorithm is specified below.
The flags control the update behaviour of the matching algorithm.
The given data element will be added to the Bloom filter.
A filter must have been previously provided using <code>filterload</code>.
This command is useful if a new key or script is added to a clients wallet whilst it has connections to the network open, it avoids the need to re-calculate and send an entirely new filter to every peer (though doing so is usually advisable to maintain anonymity).
Thus, a <code>merkleblock</code> message is a block header, plus a part of a merkle tree which can be used to extract identifying information for transactions that matched the filter and prove that the matching transaction data really did appear in the solved block.
Clients can use this data to be sure that the remote node is not feeding them fake transactions that never appeared in a real block, although lying through omission is still possible.
| 1 byte | fRelay | bool | If false then broadcast transactions will not be announced until a filter{load,add,clear} command is received. If missing or true, no change in protocol behaviour occurs. |
SPV clients that wish to use Bloom filtering would normally set fRelay to false in the version message, then set a filter based on their wallet (or a subset of it, if they are overlapping different peers).
Being able to opt-out of inv messages until the filter is set prevents a client being flooded with traffic in the brief window of time between finishing version handshaking and setting the filter.
The <code>getdata</code> command is extended to allow a new type in the <code>inv</code> submessage.
The type field can now be <code>MSG_FILTERED_BLOCK (== 3)</code> rather than <code>MSG_BLOCK</code>.
If no filter has been set on the connection, a request for filtered blocks is ignored. If a filter has been set, a <code>merkleblock</code> message is returned for the requested block hash.
In addition, because a <code>merkleblock</code> message contains only a list of transaction hashes, transactions matching the filter should also be sent in separate tx messages after the merkleblock is sent.
This avoids a slow roundtrip that would otherwise be required (receive hashes, didn't see some of these transactions yet, ask for them).
Note that because there is currently no way to request transactions which are already in a block from a node (aside from requesting the full block), the set of matching transactions that the requesting node hasn't either received or announced with an inv must be sent and any additional transactions which match the filter may also be sent.
This allows for clients (such as the reference client) to limit the number of invs it must remember a given node to have announced while still providing nodes with, at a minimum, all the transactions it needs.
2. For each output, test each data element of the output script.
This means each hash and key in the output script is tested independently.
**Important:** if an output matches whilst testing a transaction, the node might need to update the filter by inserting the serialized COutPoint structure.
In this way addresses, keys and script hashes (for P2SH outputs) can all be added to the filter.
You can also match against classes of transactions that are marked with well known data elements in either inputs or outputs, for example, to implement various forms of [Smart property](https://en.bitcoin.it/wiki/Smart_Property).
The test for outpoints is there to ensure you can find transactions spending outputs in your wallet, even though you don't know anything about their form.
As you can see, once set on a connection the filter is **not static** and can change throughout the connections lifetime.
This is done to avoid the following race condition:
By updating the bloom filter atomically in step 2 with the discovered outpoint, the filter will match against TX 2 in step 3 and the client will learn about all relevant transactions, despite that there is no pause between the node processing the first and second blocks.
The nFlags field of the filter controls the nodes precise update behaviour and is a bit field.
* <code>BLOOM_UPDATE_NONE (0)</code> means the filter is not adjusted when a match is found.
* <code>BLOOM_UPDATE_ALL (1)</code> means if the filter matches any data element in a scriptPubKey the outpoint is serialized and inserted into the filter.
* <code>BLOOM_UPDATE_P2PUBKEY_ONLY (2)</code> means the outpoint is inserted into the filter only if a data element in the scriptPubKey is matched, and that script is of the standard "pay to pubkey" or "pay to multisig" forms.
These distinctions are useful to avoid too-rapid degradation of the filter due to an increasing false positive rate.
We can observe that a wallet which expects to receive only payments of the standard pay-to-address form doesn't need automatic filter updates because any transaction that spends one of its own outputs has a predictable data element in the input (the pubkey that hashes to the address).
If a wallet might receive pay-to-address outputs and also pay-to-pubkey or pay-to-multisig outputs then `BLOOM_UPDATE_P2PUBKEY_ONLY` is appropriate, as it avoids unnecessary expansions of the filter for the most common types of output but still ensures correct behaviour with payments that explicitly specify keys.
A ''Merkle tree'' is a way of arranging a set of items as leaf nodes of tree in which the interior nodes are hashes of the concatenations of their child hashes.
The root node is called the ''Merkle root''.
Every Bitcoin block contains a Merkle root of the tree formed from the blocks transactions.
By providing some elements of the trees interior nodes (called a ''Merkle branch'') a proof is formed that the given transaction was indeed in the block when it was being mined, but the size of the proof is much smaller than the size of the original block.
i.e. if the filter is initialized with 4 hash functions and a tweak of 0x00000005, when the second function (index 1) is needed h1 would be equal to 4221880218.
Let N be the number of elements you wish to insert into the set and P be the probability of a false positive, where 1.0 is "match everything" and zero is unachievable.
The size S of the filter in bytes is given by <code>(-1 / pow(log(2), 2) * N * log(P)) / 8</code>.
Of course you must ensure it does not go over the maximum size (36,000: selected as it represents a filter of 20,000 items with false positive rate of < 0.1% or 10,000 items and a false positive rate of < 0.0001%).