Optimizations

Interior nodes are not assumed to be stored anywhere. The tree structure is determined by the function which computes root-hashes for arbitrary leaf ranges on the fly. The performance of the tree depends highly on the efficiency of this operation. The recursive version of this function (e.g., RFC 9162, Section 2) is slow, affecting significantly state computation and generation of proofs.

Subroots

The above operation can be made iterative by accumulatively hashing together the root-hashes for ranges whose size is a power of two (“subroots”) and can as such be computed efficiently. Subroot computation has significant impact on performance (>500% speedup) while keeping peak memory usage reasonably low (e.g., 200 MiB for a tree with several tens of millions of entries) and linear with respect to tree size.

Note

For, say, comparison purposes, you can disable this feature by passing disable_optimizations=True when initializing the BaseMerkleTree superclass.

Effect of I/O operation

Subroot computation is CPU-bound except for loading leaf hashes to memory. This operation is implementation specific, since it depends on the particular storage backend which the tree operates upon (see _get_leaves in this section). The effect of this operation (usually I/O) can be significant. Take care to implement it in the most efficient way facilitated by your working framework (e.g., bulk fetching the dataset).

Caching

In view of the above technique, subroot computation is the only massively repeated and relatively costly operation. It thus makes sense to apply memoization for ranges whose size exceeds a certain threshold (128 leaves by default). For example, after sufficiently many cache hits (e.g. 2MiB cache memory), proof generation becomes at least 5 times faster for a tree with several tens of million of entries. Practically, a pretty big tree with sufficiently long uptime will respond instantly with negligible penalty in memory usage.

Cache capacity is controlled in bytes via the capacity parameter, which is passed to BaseMerkleTree and defaults to 1GiB (this should be overabundant for any imaginable use case). The minimum size of leaf ranges with cacheable root-hash is controlled via the threshold parameter, which is similarly passed to BaseMerkleTree and defaults to 128.

Note

For, say, comparison purposes, you can disable this feature by passing disable_cache=True when initializing the BaseMerkleTree superclass.