Skip to content

Commit

Permalink
Generate README.md from README.adoc
Browse files Browse the repository at this point in the history
  • Loading branch information
anacrolix committed Feb 23, 2024
1 parent 39063c9 commit bf0bce6
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 0 deletions.
5 changes: 5 additions & 0 deletions FILES
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
README.adoc:
The canonical asciidoc formatted README. Bite me. I don't like Markdown anymore.

README.md:
Created from README.adoc for crates.io.
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ go_possum_h := go/cpossum/possum.h
.PHONY: $(go_possum_h)
$(go_possum_h):
cbindgen -q --output $@

README.md: README.adoc
npx downdoc $<
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Possum

## What is it?

Possum is a key-value cache stored directly on disk. It supports concurrent access from multiple processes without requiring interprocess communication or sidecar processes. It efficiently evicts data by using hole punching and sparse files. It supports read snapshots by using file cloning, a feature available on [filesystems](https://www.ctrl.blog/entry/file-cloning.html) such as Btrfs, XFS, ZFS, APFS and ReFSv2.

Value reads and writes occur directly on files, allowing memory-mapping, zero-copy/splicing, vector I/O, polling and more.

## Why?

I couldn’t find cache implementations that supported storing directly on disk, and concurrent access from multiple processes without IPC, but also maintaining disk usage limits. They always seem to be in-memory with snapshots to disk, or single-process with the disk management (including all keys) maintained in memory too. There are plenty of single-process, and fewer concurrent key-value disk stores, but they have no regard for space limits and don’t do user-level caching well. So while in-memory key-value caches are well-supported, there is an unfilled niche for disk-caches, particularly one that can be shared by multiple applications. There are plenty of systems out there that have huge amounts of disk space that could be shared by package managers, HTTP proxies and reverse proxies, thumbnail generation, API caches and more where main memory latency isn’t needed.

## Technical Details

Possum maintains a pool of sparse files and a manifest file to synchronize writes and reads.

Writing is done by starting a batch. Possum records keys and exposes a location to write values concurrently. Value writes can be done using file handles to allow splicing and efficient system calls to offload to disk, and by copying or renaming files directly into the cache. When writes are completed, the batch keys are committed, and the appended and new files are released back into the cache for future reads and writes. Batches can be prepared concurrently. Writing does not block reading.

Reading is done by locking the cache, and recording the keys to be read. When all the keys to be read have been submitted, the regions of files in the cache containing the value data are cloned to temporary files (a very efficient, constant-time system operation) and made available for read for the lifetime of the snapshot.

The committing of keys when a write batch is finalized, and the creation of a read snapshot are the only serial operations. Write batch preparation, and read snapshots can coexist in any arrangement and quantity, including concurrently with the aforementioned serial operations.

Efficiently removing data, and creating read snapshots requires hole punching, and sparse files respectively. Both features are available on Windows, macOS, Solaris, BSDs and Linux, depending on the filesystem in use.

## Supported systems

macOS and Linux are supported. BSD should work. Solaris requires a small amount of work to complete the implementation. Windows is not supported: It has the necessary features, but [only ReFS implements block cloning](https://learn.microsoft.com/en-us/windows/win32/fileio/block-cloning).

## anacrolix/squirrel

I previously wrote [anacrolix/squirrel](https://github.com/anacrolix/squirrel), which can fit the role of both in-memory or disk caching due to SQLite’s VFS. However, it’s written in Go and so is less accessible from other languages. Using SQLite for value storage means large streaming writes are exclusive, even within a single transaction. Value data is serialized to the blob format, potentially across multiple linked pages, and copied multiple times as items are evicted and moved around. Direct file I/O isn’t available with SQLite, and the size of values must be known before they can be written, which can mean copying streams to temporary files before they can be written into the SQLite file.

0 comments on commit bf0bce6

Please sign in to comment.