Database Schema¶

This document describes the logical data model used by VFS. The model is backend-agnostic, with specific implementations for each storage backend.

Logical Data Model¶

VFS organizes data into collections (similar to tables or trees). Each collection stores key-value pairs with specific semantics.

Collections Overview¶

Collection	Purpose	Key Format	Value Type
`files`	File/directory metadata	file_id (u64)	FileEntry
`paths`	Path to ID mapping	path (string)	file_id (u64)
`contents`	Content blobs (CAS)	hash (32 bytes)	ContentBlob
`versions`	Version history	file_id + version_num	VersionEntry
`tags`	Tag definitions	tag_id (u64)	TagInfo
`tag_names`	Tag name lookup	tag_name (string)	tag_id (u64)
`file_tags`	File-tag associations	file_id + tag_id	timestamp
`file_meta`	Custom metadata	file_id + key	value (string)
`settings`	Vault configuration	key (string)	value (string)

Data Structures¶

FileEntry¶

Represents a file or directory in the virtual filesystem.

struct FileEntry {
    id: u64,
    parent_id: Option<u64>,     // None for root
    name: String,
    file_type: FileType,        // File or Directory
    content_hash: Option<Hash>, // None for directories
    size: u64,
    created_at: Timestamp,
    modified_at: Timestamp,
}

enum FileType {
    File,
    Directory,
}

Serialized format (bincode/MessagePack):

[id: 8 bytes][parent_id: 9 bytes][name: var][type: 1 byte]
[hash: 33 bytes][size: 8 bytes][created: 8 bytes][modified: 8 bytes]

ContentBlob¶

Stores actual file content using content-addressable storage.

struct ContentBlob {
    hash: [u8; 32],      // SHA-256
    data: Vec<u8>,       // Raw content
    size: u64,
    ref_count: u32,      // Reference counting for GC
    created_at: Timestamp,
}

VersionEntry¶

Records a point-in-time snapshot of a file.

struct VersionEntry {
    file_id: u64,
    version_num: u32,
    content_hash: Hash,
    size: u64,
    created_at: Timestamp,
}

Key format: file_id (8 bytes) + version_num (4 bytes)

TagInfo¶

Defines a tag that can be applied to files.

struct TagInfo {
    id: u64,
    name: String,
    color: Option<String>,  // Hex color code
    created_at: Timestamp,
}

Key Encoding¶

Keys are encoded consistently across backends:

Numeric Keys¶

u64 values are encoded as big-endian bytes for proper ordering
Example: 1000 → [0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0xE8]

String Keys¶

UTF-8 encoded
Paths use forward slashes, no trailing slash (except root)

Composite Keys¶

Components are length-prefixed or use fixed sizes
Example file_tag key: file_id (8 bytes) + tag_id (8 bytes)

fn encode_file_tag_key(file_id: u64, tag_id: u64) -> [u8; 16] {
    let mut key = [0u8; 16];
    key[0..8].copy_from_slice(&file_id.to_be_bytes());
    key[8..16].copy_from_slice(&tag_id.to_be_bytes());
    key
}

Value Serialization¶

Values are serialized using a compact binary format:

Primary: bincode¶

Fast, compact, Rust-native
Used for internal structures

Alternative: MessagePack¶

More portable
Better for potential cross-language access

Configuration¶

[storage]
serialization = "bincode"  # or "msgpack", "cbor"

Backend-Specific Implementations¶

SQLite Schema¶

When using SQLite, collections map to tables:

-- File metadata
CREATE TABLE files (
    id              INTEGER PRIMARY KEY,
    parent_id       INTEGER REFERENCES files(id) ON DELETE CASCADE,
    name            TEXT NOT NULL,
    file_type       INTEGER NOT NULL,  -- 0=file, 1=directory
    content_hash    BLOB,
    size            INTEGER NOT NULL DEFAULT 0,
    created_at      INTEGER NOT NULL,  -- Unix timestamp
    modified_at     INTEGER NOT NULL,
    UNIQUE(parent_id, name)
);

CREATE INDEX idx_files_parent ON files(parent_id);
CREATE INDEX idx_files_hash ON files(content_hash);

-- Path lookup (denormalized for performance)
CREATE TABLE paths (
    path    TEXT PRIMARY KEY,
    file_id INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE
);

-- Content-addressable storage
CREATE TABLE contents (
    hash        BLOB PRIMARY KEY,  -- 32 bytes SHA-256
    data        BLOB NOT NULL,
    size        INTEGER NOT NULL,
    ref_count   INTEGER NOT NULL DEFAULT 1,
    created_at  INTEGER NOT NULL
);

-- Version history
CREATE TABLE versions (
    file_id     INTEGER NOT NULL,
    version_num INTEGER NOT NULL,
    content_hash BLOB NOT NULL,
    size        INTEGER NOT NULL,
    created_at  INTEGER NOT NULL,
    PRIMARY KEY (file_id, version_num)
);

CREATE INDEX idx_versions_created ON versions(created_at);

-- Tags
CREATE TABLE tags (
    id          INTEGER PRIMARY KEY,
    name        TEXT NOT NULL UNIQUE,
    color       TEXT,
    created_at  INTEGER NOT NULL
);

CREATE TABLE file_tags (
    file_id     INTEGER NOT NULL,
    tag_id      INTEGER NOT NULL,
    created_at  INTEGER NOT NULL,
    PRIMARY KEY (file_id, tag_id)
);

CREATE INDEX idx_file_tags_tag ON file_tags(tag_id);

-- Custom metadata
CREATE TABLE file_meta (
    file_id     INTEGER NOT NULL,
    key         TEXT NOT NULL,
    value       TEXT NOT NULL,
    PRIMARY KEY (file_id, key)
);

-- Full-text search (FTS5)
CREATE VIRTUAL TABLE fts_content USING fts5(
    path,
    content,
    content_rowid='file_id'
);

-- Vault settings
CREATE TABLE settings (
    key     TEXT PRIMARY KEY,
    value   TEXT NOT NULL
);

Sled/LMDB/RocksDB Schema¶

For key-value backends, collections map to separate trees/databases:

Tree: files
  Key: u64 (big-endian)
  Value: bincode(FileEntry)

Tree: paths
  Key: UTF-8 string
  Value: u64 (big-endian)

Tree: contents
  Key: 32 bytes (SHA-256)
  Value: raw bytes (content data)

Tree: contents_meta
  Key: 32 bytes (SHA-256)
  Value: bincode(ContentMeta { size, ref_count, created_at })

Tree: versions
  Key: file_id (8 bytes) + version_num (4 bytes)
  Value: bincode(VersionEntry)

Tree: tags
  Key: u64 (big-endian)
  Value: bincode(TagInfo)

Tree: tag_names
  Key: UTF-8 string
  Value: u64 (big-endian)

Tree: file_tags
  Key: file_id (8 bytes) + tag_id (8 bytes)
  Value: timestamp (8 bytes)

Tree: file_meta
  Key: file_id (8 bytes) + key_len (2 bytes) + key (UTF-8)
  Value: UTF-8 string

Tree: settings
  Key: UTF-8 string
  Value: UTF-8 string

Tree: _meta
  Key: "version" | "backend" | "created_at"
  Value: varies

Common Queries¶

List directory contents¶

// Find all files where parent_id = directory_id
fn list_directory(storage: &dyn StorageBackend, dir_id: u64) -> Result<Vec<FileEntry>> {
    // Scan files collection, filter by parent_id
    // (SQLite can use index, KV stores scan and filter)
}

SQLite:

SELECT * FROM files WHERE parent_id = ? ORDER BY file_type DESC, name;

Get file by path¶

fn get_file_by_path(storage: &dyn StorageBackend, path: &str) -> Result<Option<FileEntry>> {
    // 1. Look up file_id in paths collection
    // 2. Get FileEntry from files collection
}

SQLite:

SELECT f.* FROM files f
JOIN paths p ON f.id = p.file_id
WHERE p.path = ?;

Get file content¶

fn get_content(storage: &dyn StorageBackend, file_id: u64) -> Result<Vec<u8>> {
    // 1. Get FileEntry, extract content_hash
    // 2. Get content from contents collection
}

Version history¶

fn get_versions(storage: &dyn StorageBackend, file_id: u64) -> Result<Vec<VersionEntry>> {
    // Scan versions with prefix = file_id bytes
}

SQLite:

SELECT * FROM versions WHERE file_id = ? ORDER BY version_num DESC;

Files by tag¶

fn files_with_tag(storage: &dyn StorageBackend, tag_name: &str) -> Result<Vec<FileEntry>> {
    // 1. Get tag_id from tag_names
    // 2. Scan file_tags with tag_id suffix
    // 3. Get FileEntry for each file_id
}

SQLite:

SELECT f.* FROM files f
JOIN file_tags ft ON f.id = ft.file_id
JOIN tags t ON ft.tag_id = t.id
WHERE t.name = ?;

Full-Text Search¶

SQLite (FTS5)¶

Built-in full-text search:

-- Search
SELECT path, snippet(fts_content, 1, '<b>', '</b>', '...', 32) as snippet
FROM fts_content
WHERE fts_content MATCH ?
ORDER BY rank;

-- Index on insert
INSERT INTO fts_content(rowid, path, content)
VALUES (?, ?, ?);

-- Update on content change
DELETE FROM fts_content WHERE rowid = ?;
INSERT INTO fts_content(rowid, path, content) VALUES (?, ?, ?);

Tantivy (Sled/LMDB/RocksDB)¶

Separate search index using tantivy:

// Index structure
let schema = Schema::builder()
    .add_text_field("path", TEXT | STORED)
    .add_text_field("content", TEXT)
    .add_u64_field("file_id", INDEXED | STORED)
    .build();

// Search
let query_parser = QueryParser::for_index(&index, vec![content_field]);
let query = query_parser.parse_query(query_str)?;
let results = searcher.search(&query, &TopDocs::with_limit(limit))?;

ID Generation¶

Auto-increment (SQLite)¶

SQLite handles ID generation via AUTOINCREMENT.

Monotonic IDs (KV stores)¶

For key-value backends, use a counter in the settings collection:

fn next_id(storage: &dyn StorageBackend, collection: &str) -> Result<u64> {
    storage.transaction(|txn| {
        let key = format!("_next_id_{}", collection);
        let current: u64 = txn.get("settings", key.as_bytes())?
            .map(|v| u64::from_be_bytes(v.try_into().unwrap()))
            .unwrap_or(0);
        let next = current + 1;
        txn.put("settings", key.as_bytes(), &next.to_be_bytes())?;
        Ok(next)
    })
}

Maintenance Operations¶

Garbage Collection¶

Remove unreferenced content blobs:

fn garbage_collect(storage: &dyn StorageBackend) -> Result<GcStats> {
    // 1. Scan contents_meta for ref_count = 0
    // 2. Delete from contents and contents_meta
}

SQLite:

DELETE FROM contents WHERE ref_count = 0;

Version Pruning¶

Keep last N versions:

fn prune_versions(storage: &dyn StorageBackend, keep: u32) -> Result<PruneStats> {
    // For each file_id in versions:
    //   Get version count
    //   If > keep, delete oldest (version_num < max - keep)
    //   Decrement ref_count on deleted content hashes
}

SQLite:

DELETE FROM versions
WHERE (file_id, version_num) NOT IN (
    SELECT file_id, version_num FROM versions v2
    WHERE v2.file_id = versions.file_id
    ORDER BY version_num DESC
    LIMIT ?
);

Compaction¶

Reclaim disk space:

Backend	Method
SQLite	`VACUUM`
Sled	Automatic compaction
LMDB	Copy to new database
RocksDB	`compact_range()`

Migration¶

When migrating between backends, the migration tool:

Opens source with old backend
Creates destination with new backend
Iterates all collections, copying key-value pairs
Rebuilds search index
Verifies integrity (checksums)

Data format (bincode) is the same across backends, so no transformation is needed.