orama-vault/docs/SECURITY_MODEL.md
2026-02-27 06:53:06 +02:00

16 KiB

Orama Vault -- Security Model

Threat Model

Threat Severity Mitigation Status
Single node compromise Medium Shamir SSS: a single share reveals zero information about the secret. Attacker gets one share out of N; needs K to reconstruct. Implemented
K-1 node collusion High Information-theoretic security: K-1 shares provide exactly zero bits of information about the secret. This is not computational -- it is mathematically proven. Implemented
All N nodes collude Critical Not defended. If all N guardians collude, they can reconstruct the secret. Mitigated by: (1) nodes operated by different parties, (2) geographic distribution, (3) proactive re-sharing invalidates old shares. By design
Quantum adversary Future Post-quantum KEM (ML-KEM-768) and signatures (ML-DSA-65) interfaces are defined. Hybrid key exchange (X25519 + ML-KEM-768) is implemented. Stub (Phase 2)
Replay attack on push Medium Monotonic version counter. Each push must have a version strictly greater than the stored version. Replaying an old push is rejected. Implemented
Rollback attack Medium Anti-rollback via monotonic version file per identity. Attacker cannot downgrade a share to an older version. Implemented
Disk corruption Medium HMAC-SHA256 checksum per share file. On read, the checksum is verified before returning data. Corruption is detected and surfaced as an error. Implemented
Disk tampering Medium Same HMAC integrity check. An attacker who modifies share.bin on disk cannot forge a valid checksum without the integrity key. Implemented
Network eavesdropping High All inter-node traffic uses WireGuard (encrypted tunnel). Client-to-guardian will use TLS in Phase 3. Partial (WireGuard: yes, TLS: Phase 3)
Timing side-channels Low All HMAC verifications and auth token checks use constant-time comparison (`diff = x ^ y` accumulator).
Memory disclosure Low Secure memory: secureZero (volatile zero-fill that cannot be optimized away), mlock (prevents swap to disk), SecureBuffer RAII wrapper. Server secret zeroed on Guardian deinit. Implemented
Resource exhaustion Medium Request body size limits (1 MiB push, 4 KiB pull), share size limit (512 KiB), peer protocol max payload (1 MiB). Systemd MemoryMax=512M. Implemented
Man-in-the-middle (peer) High WireGuard provides authenticated encryption between peers. Only nodes with valid WireGuard keys can communicate on port 7501. Implemented (via WireGuard)
Man-in-the-middle (client) High TLS termination planned for Phase 3. Currently plain TCP on port 7500. In production, the Orama gateway provides TLS. Gateway-level
Unauthorized push/pull Medium Challenge-response auth module exists with HMAC-based tokens. Not yet wired to HTTP handlers. Phase 2
Share epoch mixing High After proactive re-sharing, old and new shares are algebraically independent. Mixing shares from different epochs does NOT reconstruct the secret. Tested and verified. Implemented

Shamir Secret Sharing Security

Information-Theoretic Security

Shamir's Secret Sharing provides perfect secrecy -- this is the strongest possible security guarantee:

  • K shares can reconstruct the secret (Lagrange interpolation at x=0).
  • K-1 shares provide exactly zero information about the secret.
  • This is not a computational assumption. It holds against adversaries with unlimited computing power, including quantum computers.

Proof sketch: A polynomial of degree K-1 is uniquely determined by K points. With only K-1 points, there are exactly 256 (= |GF(2^8)|) distinct polynomials passing through those points, one for each possible value of the constant term (the secret byte). Each is equally likely. Therefore, the conditional probability distribution of the secret given K-1 shares is uniform over GF(2^8) -- identical to the prior distribution. No information is gained.

For a multi-byte secret of length L, this applies independently to each byte position, since each byte uses an independent random polynomial.

Threshold and Share Count

The system uses an adaptive threshold:

K = max(3, floor(N/3))

Where N is the number of alive guardians. This means:

Alive Nodes (N) Threshold (K) Fault Tolerance (N-K)
3 3 0
5 3 2
9 3 6
10 3 7
14 4 10
50 16 34
100 33 67

The minimum threshold of 3 ensures that at least 3 guardians must cooperate to reconstruct, even in small clusters. This prevents trivial collusion.


GF(2^8) Choice Rationale

The finite field GF(2^8) = GF(256) was chosen for Shamir arithmetic:

  1. Same field as AES. The irreducible polynomial x^8 + x^4 + x^3 + x + 1 (0x11B) is the AES field polynomial. This is the most studied and battle-tested GF(2^8) instantiation in cryptography.

  2. Byte-aligned. Each field element is exactly one byte. No encoding overhead, no multi-precision arithmetic, no serialization complexity.

  3. O(1) arithmetic. Precomputed exp/log tables (512 + 256 = 768 bytes total, generated at Zig comptime) give constant-time multiplication, inversion, and division via table lookups. The generator element is 3 (0x03), a primitive element of the multiplicative group of order 255.

  4. 255 distinct evaluation points. Shares are evaluated at x = 1, 2, ..., N (never x=0, which would reveal the secret). This supports up to 255 shares per secret, far exceeding the Orama network size.

  5. Exhaustively verified. The implementation includes tests that verify:

    • All 256x256 multiplication pairs produce valid results.
    • Multiplicative identity: 1 * a = a for all a.
    • Multiplicative inverse: a * inv(a) = 1 for all nonzero a.
    • Commutativity, associativity, and distributivity (sampled).
    • The exp table generates all 255 nonzero elements exactly once (confirming 3 is a primitive element).

Key Wrapping (Planned Architecture)

Status: The key wrapping scheme is designed but not yet fully implemented. The crypto primitives (AES-256-GCM, HKDF-SHA256) are implemented and tested.

The planned key hierarchy:

User Secret (root seed / mnemonic)
    |
    +-- DEK (Data Encryption Key) -- random 256-bit AES key
    |     |
    |     +-- Encrypts the secret via AES-256-GCM
    |
    +-- KEK1 (Key Encryption Key 1) -- derived from mnemonic via HKDF
    |     |
    |     +-- Wraps DEK (AES-256-GCM)
    |     +-- Stored alongside the encrypted secret
    |
    +-- KEK2 (Key Encryption Key 2) -- derived from username+passphrase via HKDF
          |
          +-- Wraps DEK (AES-256-GCM)
          +-- Stored alongside the encrypted secret

Recovery Path A (Mnemonic):

  1. User provides mnemonic.
  2. Derive KEK1 = HKDF(mnemonic, "orama-kek1-v1").
  3. Unwrap DEK from wrapped_dek1.bin.
  4. Decrypt secret with DEK.

Recovery Path B (Username + Passphrase):

  1. User provides username + passphrase.
  2. Derive identity = SHA-256(username).
  3. Pull K shares from guardians.
  4. Reconstruct encrypted blob via Lagrange interpolation.
  5. Derive KEK2 = HKDF(passphrase, "orama-kek2-v1").
  6. Unwrap DEK from wrapped_dek2.bin.
  7. Decrypt secret with DEK.

HMAC Integrity

Every stored share has an associated HMAC-SHA256 checksum:

checksum = HMAC-SHA256(integrity_key, share_data)

On read, the checksum is recomputed and compared in constant time:

fn constantTimeEqual(a: []const u8, b: []const u8) bool {
    if (a.len != b.len) return false;
    var diff: u8 = 0;
    for (a, b) |x, y| {
        diff |= x ^ y;
    }
    return diff == 0;
}

This detects:

  • Accidental disk corruption (bit flips, sector failures).
  • Intentional tampering by an attacker with disk access.
  • Partial writes (if the share was updated but checksum wasn't, or vice versa).

Anti-Rollback Protection

Each identity has a monotonic version counter stored in a separate file (version). On push:

  1. Read current version from <data_dir>/shares/<identity>/version.
  2. If the file exists and the new version is <= the stored version, reject with 400.
  3. If the new version is strictly greater, proceed with the write.
  4. Write the new version atomically (temp + rename).

This prevents an attacker from replacing a current share with an older version, which could be part of an attack to force reconstruction with a known set of shares.


Timing Attack Prevention

All security-sensitive comparisons use constant-time operations:

  1. HMAC verification (src/crypto/hmac.zig): constantTimeEqual with XOR accumulator.
  2. Challenge verification (src/auth/challenge.zig): timingSafeEqual with same pattern.
  3. Session token verification (src/auth/session.zig): timingSafeEqual with same pattern.

The pattern:

var diff: u8 = 0;
for (a, b) |x, y| {
    diff |= x ^ y;
}
return diff == 0;

This ensures the comparison takes the same time regardless of where (or whether) the bytes differ. An attacker cannot learn partial information about expected values by measuring response times.


Secure Memory

The src/crypto/secure_mem.zig module provides:

secureZero

pub fn secureZero(buf: []u8) void {
    std.crypto.secureZero(u8, @as([]volatile u8, @volatileCast(buf)));
}

Uses volatile semantics to prevent the compiler from optimizing away the zero-fill. This is critical for erasing keys, secrets, and intermediate cryptographic material from memory.

mlock / munlock

pub fn mlock(ptr: [*]const u8, len: usize) void {
    if (builtin.os.tag == .linux) {
        const result = std.posix.mlock(ptr[0..len]);
        // Non-fatal on failure
    }
}

Locks memory pages so they are never written to swap. This prevents key material from being persisted to disk in a swap partition. Requires either CAP_IPC_LOCK capability or sufficient RLIMIT_MEMLOCK.

The systemd service file sets LimitMEMLOCK=67108864 (64 MiB) to allow mlock.

SecureBuffer

RAII wrapper that combines allocation, mlock, and automatic zeroing:

pub const SecureBuffer = struct {
    data: []u8,
    allocator: std.mem.Allocator,

    pub fn deinit(self: *SecureBuffer) void {
        secureZero(self.data);       // volatile zero
        munlock(self.data.ptr, ...); // unlock pages
        self.allocator.free(self.data);
    }
};

Used for all key material that has a defined lifetime.

Server Secret Zeroing

The Guardian.deinit() method zeroes the 32-byte server secret:

pub fn deinit(self: *Guardian) void {
    self.nodes.deinit();
    @memset(&self.server_secret, 0);
}

Share Zeroing

All Share.deinit() calls zero the share data before freeing:

pub fn deinit(self: Share, allocator: std.mem.Allocator) void {
    const mutable: []u8 = @constCast(self.y);
    @memset(mutable, 0);
    allocator.free(mutable);
}

Similarly, the split operation zeros the coefficient buffer (which contains the secret as coeffs[0]) on cleanup.


Post-Quantum Roadmap

Current State: Stubs

The post-quantum modules exist with correct interfaces but provide zero security:

  • ML-KEM-768 (src/crypto/pq_kem.zig): keygen() returns random bytes. encaps() returns random shared secret. decaps() returns random shared secret. They do NOT perform real lattice operations.

  • ML-DSA-65 (src/crypto/pq_sig.zig): keygen() returns random bytes. sign() returns SHA-256 hash as placeholder. verify() ALWAYS SUCCEEDS -- provides zero signature verification.

Both modules log a one-time warning when first used:

pq_kem: STUB implementation in use -- provides ZERO post-quantum security
pq_sig: STUB implementation in use -- provides ZERO post-quantum security, verify() ALWAYS succeeds

Planned Implementation (Phase 2)

Replace stubs with liboqs-backed implementations:

Algorithm Standard Security Level Key Sizes
ML-KEM-768 FIPS 203 ~192-bit post-quantum PK: 1184, SK: 2400, CT: 1088, SS: 32
ML-DSA-65 FIPS 204 ~192-bit post-quantum PK: 1952, SK: 4032, Sig: 3309 max

Integration plan:

  1. Link liboqs as a C dependency via Zig's @cImport.
  2. Replace random byte generation with actual OQS_KEM_ml_kem_768_* and OQS_SIG_ml_dsa_65_* calls.
  3. The hybrid module (src/crypto/hybrid.zig) already combines X25519 + ML-KEM correctly -- once the ML-KEM stub is replaced, hybrid key exchange will provide real post-quantum protection.

Hybrid Key Exchange

The hybrid module (src/crypto/hybrid.zig) implements X25519 + ML-KEM-768:

shared_secret = HKDF-SHA256(X25519_SS || ML-KEM_SS, salt=0^32, info="orama-hybrid-v1")

This ensures:

  • If X25519 is broken (quantum computer), ML-KEM still protects.
  • If ML-KEM is broken (unknown classical attack), X25519 still protects.
  • Both must be broken simultaneously to compromise the shared secret.

The X25519 portion is fully functional using Zig's std.crypto.dh.X25519. Only the ML-KEM portion is currently a stub.


WireGuard Transport Security

All guardian-to-guardian communication (port 7501) is restricted to the WireGuard overlay network (10.0.0.x addresses). WireGuard provides:

  1. Authenticated encryption: ChaCha20-Poly1305 with per-peer keys derived from Noise IK handshake.
  2. Perfect forward secrecy: New ephemeral keys every 2 minutes or 2^64 messages.
  3. Mutual authentication: Only nodes with authorized public keys can join the overlay.
  4. Replay protection: Built-in counter-based replay rejection.

An attacker who does not have a valid WireGuard private key cannot:

  • Connect to port 7501 on any guardian.
  • Observe peer-to-peer traffic contents.
  • Inject or replay messages.

This is defense-in-depth: even if the binary peer protocol had vulnerabilities, the WireGuard layer prevents exploitation from outside the cluster.


Proactive Re-sharing Security

The Herzberg-Jarecki-Krawczyk-Yung re-sharing protocol ensures:

  1. Forward secrecy for shares. After re-sharing, old shares are algebraically independent from new shares. An attacker who compromises old shares (before re-sharing) and new shares (after re-sharing) from different guardians cannot combine them.

  2. Secret preservation. The secret itself does not change during re-sharing. Only the polynomial representation changes. sum(q_i(0)) = 0 ensures the constant term (secret) is preserved.

  3. Epoch isolation. Tested and verified: mixing one new share with K-1 old shares does NOT reconstruct the original secret. The test in src/sss/reshare.zig confirms this with high probability.

  4. No secret reconstruction. At no point during re-sharing does any single party learn the secret. Each guardian only processes deltas and updates its own share.

Re-sharing is triggered:

  • On node topology changes (join/leave detected by discovery module).
  • Periodically every 24 hours.
  • When alive count drops below the safety threshold (K+1).

Resource Limits

Resource Limit Where Enforced
Process memory 512 MiB systemd MemoryMax=512M
mlock memory 64 MiB systemd LimitMEMLOCK=67108864
Push request body 1 MiB handler_push.zig MAX_BODY_SIZE
Pull request body 4 KiB handler_pull.zig MAX_BODY_SIZE
Decoded share size 512 KiB handler_push.zig MAX_SHARE_SIZE
Peer protocol payload 1 MiB protocol.zig MAX_PAYLOAD_SIZE
HTTP read buffer 64 KiB listener.zig READ_BUF_SIZE
Share file read 1 MiB / 10 MiB handler_pull.zig / file_store.zig

Systemd Security Hardening

The systemd service file applies defense-in-depth:

PrivateTmp=yes              # Isolated /tmp
ProtectSystem=strict        # Read-only filesystem except explicit paths
ReadWritePaths=/opt/orama/.orama/data/vault  # Only data dir is writable
NoNewPrivileges=yes         # Cannot gain new privileges (no setuid, no capabilities)
LimitMEMLOCK=67108864       # Allow mlock for secure memory
MemoryMax=512M              # Hard memory limit

This means even if the guardian process is compromised, the attacker:

  • Cannot write to the filesystem outside the data directory.
  • Cannot escalate privileges.
  • Cannot consume unbounded memory.
  • Has isolated temporary file access.