Database Schema¶
This document describes the logical data model used by VFS. The model is backend-agnostic, with specific implementations for each storage backend.
Logical Data Model¶
VFS organizes data into collections (similar to tables or trees). Each collection stores key-value pairs with specific semantics.
Collections Overview¶
| Collection | Purpose | Key Format | Value Type |
|---|---|---|---|
files |
File/directory metadata | file_id (u64) | FileEntry |
paths |
Path to ID mapping | path (string) | file_id (u64) |
contents |
Content blobs (CAS) | hash (32 bytes) | ContentBlob |
versions |
Version history | file_id + version_num | VersionEntry |
tags |
Tag definitions | tag_id (u64) | TagInfo |
tag_names |
Tag name lookup | tag_name (string) | tag_id (u64) |
file_tags |
File-tag associations | file_id + tag_id | timestamp |
file_meta |
Custom metadata | file_id + key | value (string) |
settings |
Vault configuration | key (string) | value (string) |
Data Structures¶
FileEntry¶
Represents a file or directory in the virtual filesystem.
struct FileEntry {
id: u64,
parent_id: Option<u64>, // None for root
name: String,
file_type: FileType, // File or Directory
content_hash: Option<Hash>, // None for directories
size: u64,
created_at: Timestamp,
modified_at: Timestamp,
}
enum FileType {
File,
Directory,
}
Serialized format (bincode/MessagePack):
[id: 8 bytes][parent_id: 9 bytes][name: var][type: 1 byte]
[hash: 33 bytes][size: 8 bytes][created: 8 bytes][modified: 8 bytes]
ContentBlob¶
Stores actual file content using content-addressable storage.
struct ContentBlob {
hash: [u8; 32], // SHA-256
data: Vec<u8>, // Raw content
size: u64,
ref_count: u32, // Reference counting for GC
created_at: Timestamp,
}
VersionEntry¶
Records a point-in-time snapshot of a file.
struct VersionEntry {
file_id: u64,
version_num: u32,
content_hash: Hash,
size: u64,
created_at: Timestamp,
}
Key format: file_id (8 bytes) + version_num (4 bytes)
TagInfo¶
Defines a tag that can be applied to files.
struct TagInfo {
id: u64,
name: String,
color: Option<String>, // Hex color code
created_at: Timestamp,
}
Key Encoding¶
Keys are encoded consistently across backends:
Numeric Keys¶
- u64 values are encoded as big-endian bytes for proper ordering
- Example:
1000→[0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0xE8]
String Keys¶
- UTF-8 encoded
- Paths use forward slashes, no trailing slash (except root)
Composite Keys¶
- Components are length-prefixed or use fixed sizes
- Example file_tag key:
file_id (8 bytes) + tag_id (8 bytes)
fn encode_file_tag_key(file_id: u64, tag_id: u64) -> [u8; 16] {
let mut key = [0u8; 16];
key[0..8].copy_from_slice(&file_id.to_be_bytes());
key[8..16].copy_from_slice(&tag_id.to_be_bytes());
key
}
Value Serialization¶
Values are serialized using a compact binary format:
Primary: bincode¶
- Fast, compact, Rust-native
- Used for internal structures
Alternative: MessagePack¶
- More portable
- Better for potential cross-language access
Configuration¶
Backend-Specific Implementations¶
SQLite Schema¶
When using SQLite, collections map to tables:
-- File metadata
CREATE TABLE files (
id INTEGER PRIMARY KEY,
parent_id INTEGER REFERENCES files(id) ON DELETE CASCADE,
name TEXT NOT NULL,
file_type INTEGER NOT NULL, -- 0=file, 1=directory
content_hash BLOB,
size INTEGER NOT NULL DEFAULT 0,
created_at INTEGER NOT NULL, -- Unix timestamp
modified_at INTEGER NOT NULL,
UNIQUE(parent_id, name)
);
CREATE INDEX idx_files_parent ON files(parent_id);
CREATE INDEX idx_files_hash ON files(content_hash);
-- Path lookup (denormalized for performance)
CREATE TABLE paths (
path TEXT PRIMARY KEY,
file_id INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE
);
-- Content-addressable storage
CREATE TABLE contents (
hash BLOB PRIMARY KEY, -- 32 bytes SHA-256
data BLOB NOT NULL,
size INTEGER NOT NULL,
ref_count INTEGER NOT NULL DEFAULT 1,
created_at INTEGER NOT NULL
);
-- Version history
CREATE TABLE versions (
file_id INTEGER NOT NULL,
version_num INTEGER NOT NULL,
content_hash BLOB NOT NULL,
size INTEGER NOT NULL,
created_at INTEGER NOT NULL,
PRIMARY KEY (file_id, version_num)
);
CREATE INDEX idx_versions_created ON versions(created_at);
-- Tags
CREATE TABLE tags (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
color TEXT,
created_at INTEGER NOT NULL
);
CREATE TABLE file_tags (
file_id INTEGER NOT NULL,
tag_id INTEGER NOT NULL,
created_at INTEGER NOT NULL,
PRIMARY KEY (file_id, tag_id)
);
CREATE INDEX idx_file_tags_tag ON file_tags(tag_id);
-- Custom metadata
CREATE TABLE file_meta (
file_id INTEGER NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,
PRIMARY KEY (file_id, key)
);
-- Full-text search (FTS5)
CREATE VIRTUAL TABLE fts_content USING fts5(
path,
content,
content_rowid='file_id'
);
-- Vault settings
CREATE TABLE settings (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
Sled/LMDB/RocksDB Schema¶
For key-value backends, collections map to separate trees/databases:
Tree: files
Key: u64 (big-endian)
Value: bincode(FileEntry)
Tree: paths
Key: UTF-8 string
Value: u64 (big-endian)
Tree: contents
Key: 32 bytes (SHA-256)
Value: raw bytes (content data)
Tree: contents_meta
Key: 32 bytes (SHA-256)
Value: bincode(ContentMeta { size, ref_count, created_at })
Tree: versions
Key: file_id (8 bytes) + version_num (4 bytes)
Value: bincode(VersionEntry)
Tree: tags
Key: u64 (big-endian)
Value: bincode(TagInfo)
Tree: tag_names
Key: UTF-8 string
Value: u64 (big-endian)
Tree: file_tags
Key: file_id (8 bytes) + tag_id (8 bytes)
Value: timestamp (8 bytes)
Tree: file_meta
Key: file_id (8 bytes) + key_len (2 bytes) + key (UTF-8)
Value: UTF-8 string
Tree: settings
Key: UTF-8 string
Value: UTF-8 string
Tree: _meta
Key: "version" | "backend" | "created_at"
Value: varies
Common Queries¶
List directory contents¶
// Find all files where parent_id = directory_id
fn list_directory(storage: &dyn StorageBackend, dir_id: u64) -> Result<Vec<FileEntry>> {
// Scan files collection, filter by parent_id
// (SQLite can use index, KV stores scan and filter)
}
SQLite:
Get file by path¶
fn get_file_by_path(storage: &dyn StorageBackend, path: &str) -> Result<Option<FileEntry>> {
// 1. Look up file_id in paths collection
// 2. Get FileEntry from files collection
}
SQLite:
Get file content¶
fn get_content(storage: &dyn StorageBackend, file_id: u64) -> Result<Vec<u8>> {
// 1. Get FileEntry, extract content_hash
// 2. Get content from contents collection
}
Version history¶
fn get_versions(storage: &dyn StorageBackend, file_id: u64) -> Result<Vec<VersionEntry>> {
// Scan versions with prefix = file_id bytes
}
SQLite:
Files by tag¶
fn files_with_tag(storage: &dyn StorageBackend, tag_name: &str) -> Result<Vec<FileEntry>> {
// 1. Get tag_id from tag_names
// 2. Scan file_tags with tag_id suffix
// 3. Get FileEntry for each file_id
}
SQLite:
SELECT f.* FROM files f
JOIN file_tags ft ON f.id = ft.file_id
JOIN tags t ON ft.tag_id = t.id
WHERE t.name = ?;
Full-Text Search¶
SQLite (FTS5)¶
Built-in full-text search:
-- Search
SELECT path, snippet(fts_content, 1, '<b>', '</b>', '...', 32) as snippet
FROM fts_content
WHERE fts_content MATCH ?
ORDER BY rank;
-- Index on insert
INSERT INTO fts_content(rowid, path, content)
VALUES (?, ?, ?);
-- Update on content change
DELETE FROM fts_content WHERE rowid = ?;
INSERT INTO fts_content(rowid, path, content) VALUES (?, ?, ?);
Tantivy (Sled/LMDB/RocksDB)¶
Separate search index using tantivy:
// Index structure
let schema = Schema::builder()
.add_text_field("path", TEXT | STORED)
.add_text_field("content", TEXT)
.add_u64_field("file_id", INDEXED | STORED)
.build();
// Search
let query_parser = QueryParser::for_index(&index, vec![content_field]);
let query = query_parser.parse_query(query_str)?;
let results = searcher.search(&query, &TopDocs::with_limit(limit))?;
ID Generation¶
Auto-increment (SQLite)¶
SQLite handles ID generation via AUTOINCREMENT.
Monotonic IDs (KV stores)¶
For key-value backends, use a counter in the settings collection:
fn next_id(storage: &dyn StorageBackend, collection: &str) -> Result<u64> {
storage.transaction(|txn| {
let key = format!("_next_id_{}", collection);
let current: u64 = txn.get("settings", key.as_bytes())?
.map(|v| u64::from_be_bytes(v.try_into().unwrap()))
.unwrap_or(0);
let next = current + 1;
txn.put("settings", key.as_bytes(), &next.to_be_bytes())?;
Ok(next)
})
}
Maintenance Operations¶
Garbage Collection¶
Remove unreferenced content blobs:
fn garbage_collect(storage: &dyn StorageBackend) -> Result<GcStats> {
// 1. Scan contents_meta for ref_count = 0
// 2. Delete from contents and contents_meta
}
SQLite:
Version Pruning¶
Keep last N versions:
fn prune_versions(storage: &dyn StorageBackend, keep: u32) -> Result<PruneStats> {
// For each file_id in versions:
// Get version count
// If > keep, delete oldest (version_num < max - keep)
// Decrement ref_count on deleted content hashes
}
SQLite:
DELETE FROM versions
WHERE (file_id, version_num) NOT IN (
SELECT file_id, version_num FROM versions v2
WHERE v2.file_id = versions.file_id
ORDER BY version_num DESC
LIMIT ?
);
Compaction¶
Reclaim disk space:
| Backend | Method |
|---|---|
| SQLite | VACUUM |
| Sled | Automatic compaction |
| LMDB | Copy to new database |
| RocksDB | compact_range() |
Migration¶
When migrating between backends, the migration tool:
- Opens source with old backend
- Creates destination with new backend
- Iterates all collections, copying key-value pairs
- Rebuilds search index
- Verifies integrity (checksums)
Data format (bincode) is the same across backends, so no transformation is needed.