Show HN: ToplingDB - A Persistent Key-Value Store for External Storage
ToplingDB is forked from RocksDB, where we have replaced almost all components with more efficient alternatives(db_bench shows ToplingDB is about ~8x faster than RocksDB):
* MemTable: SkipList is replaced by CSPP(Crash Safe Parallel Patricia trie), which is 8x faster.
* SST: BlockBasedTable is replaced by ToplingZipTable, implemented by searchable compression algo, it is very small and fast, typically less than 1μs per lookup:
* Keys/Indexes are compressed using NestLoudsTrie(a multi-layer nesting LOUDS succinct trie).
* Values in a SST are compressed together with better zip ratio than zstd, and can unzip by a single value at 1GB/sec.
* BlockCache is no longer needed, double caching(BlockCache & PageCache) is avoided
Other hotspots are also improved:* Flush MemTable to L0 is omited, greatly reducing write amp and is very friendly for large(GB) MemTable
* MemTable serves as the index of Key to "value position in WAL log"
* Since WAL file content almost always in page cache, thus value content can be efficiently accessed by mmap
* When Flush happens, MemTable is dumpped as an SST and WAL is treated as a blob file
* CSPP MemTable use integer index instead of physical pointers, thus in-memory format is exactly same with in-file format
* Prefix cache for searching candidate SSTs and prefix cache for scanning by iterators * Caching fixed len key prefix into an array, binary search it as an uint array
* Distributed compaction(superior replacement to rocksdb remote compaction) * Gracefully support MergeOperator, CompactionFilter, PropertiesCollector...
* Out of the box, development efforts are significantly reduced
* Very easy to share compaction service on spot instances for many DB nodes
Useful Bonus Feature:* Config by json/yaml: can config almost all features
* Optional embeded WebView: show db structures in web browser, refreshing pages like animation
* Online update db configs by http
MySQL integration, ToplingDB has integrated into MySQL by MyTopling, which is forked from MyRocks with great improvements, like improvements of ToplingDB on RocksDB:
* WBWI(WriteBatchWithIndex): like MemTable, SkipList is replace with CSPP, 20x faster(speedup is more than MemTable).
* LockManager & LockTracker: 10x faster
* Encoding & Decoding: 5x faster
* Others ....
MyRocks has many disadvantages compared to InnoDB, while MyTopling outperforms InnoDB at almost all aspect - excluding feature differences.
We have create ~100 PRs for RocksDB, in which ~40 were accepted. Our PRs are mostly "small" changes, since big changes are not likely accepted.
ToplingDB has been deployed in numerous production environments.
Welcome every one using ToplingDB & MyTopling, and discuss in https://github.com/topling/toplingdb/discussions
No comments yet