orama/docs/SECURITY.md
anonpenguin23 e2b6f7d721 docs: add security hardening and OramaOS deployment docs
- Document WireGuard IPv6 disable, service auth, token security, process isolation
- Introduce OramaOS architecture, enrollment flow, and management via Gateway API
- Add troubleshooting for RQLite/Olric auth, OramaOS LUKS/enrollment issues
2026-02-28 15:41:04 +02:00

7.6 KiB

Security Hardening

This document describes all security measures applied to the Orama Network, covering both Phase 1 (service hardening on existing Ubuntu nodes) and Phase 2 (OramaOS locked-down image).

Phase 1: Service Hardening

These measures apply to all nodes (Ubuntu and OramaOS).

Network Isolation

CIDR Validation (Step 1.1)

  • WireGuard subnet restricted to 10.0.0.0/24 across all components: firewall rules, rate limiter, auth module, and WireGuard PostUp/PostDown iptables rules
  • Prevents other tenants on shared VPS providers from bypassing the firewall via overlapping 10.x.x.x ranges

IPv6 Disabled (Step 1.2)

  • IPv6 disabled system-wide via sysctl: net.ipv6.conf.all.disable_ipv6=1
  • Prevents services bound to 0.0.0.0 from being reachable via IPv6 (which had no firewall rules)

Authentication

Internal Endpoint Auth (Step 1.3)

  • /v1/internal/wg/peers and /v1/internal/wg/peer/remove now require cluster secret validation
  • Peer removal additionally validates the request originates from a WireGuard subnet IP

RQLite Authentication (Step 1.7)

  • RQLite runs with -auth flag pointing to a credentials file
  • All RQLite HTTP requests include Authorization: Basic <base64> headers
  • Credentials generated at cluster genesis, distributed to joining nodes via join response
  • Both the central RQLite client wrapper and the standalone CoreDNS RQLite client send auth

Olric Gossip Encryption (Step 1.8)

  • Olric memberlist uses a 32-byte encryption key for all gossip traffic
  • Key generated at genesis, distributed via join response
  • Prevents rogue nodes from joining the gossip ring and poisoning caches
  • Note: encryption is all-or-nothing (coordinated restart required when enabling)

IPFS Cluster TrustedPeers (Step 1.9)

  • IPFS Cluster TrustedPeers populated with actual cluster peer IDs (was ["*"])
  • New peers added to TrustedPeers on all existing nodes during join
  • Prevents unauthorized peers from controlling IPFS pinning

Vault V1 Auth Enforcement (Step 1.14)

  • V1 push/pull endpoints require a valid session token when vault-guardian is configured
  • Previously, auth was optional for backward compatibility — any WG peer could read/overwrite Shamir shares

Token & Key Storage

Refresh Token Hashing (Step 1.5)

  • Refresh tokens stored as SHA-256 hashes in RQLite (never plaintext)
  • On lookup: hash the incoming token, query by hash
  • On revocation: hash before revoking (both single-token and by-subject)
  • Existing tokens invalidated on upgrade (users re-authenticate)

API Key Hashing (Step 1.6)

  • API keys stored as HMAC-SHA256 hashes using a server-side secret
  • HMAC secret generated at cluster genesis, stored in ~/.orama/secrets/api-key-hmac-secret
  • On lookup: compute HMAC, query by hash — fast enough for every request (unlike bcrypt)
  • In-memory cache uses raw key as cache key (never persisted)
  • During rolling upgrade: dual lookup (HMAC first, then raw as fallback) until all nodes upgraded

TURN Secret Encryption (Step 1.15)

  • TURN shared secrets encrypted at rest in RQLite using AES-256-GCM
  • Encryption key derived via HKDF from the cluster secret with purpose string "turn-encryption"

TLS & Transport

InsecureSkipVerify Fix (Step 1.10)

  • During node join, TLS verification uses TOFU (Trust On First Use)
  • Invite token output includes the CA certificate fingerprint (SHA-256)
  • Joining node verifies the server cert fingerprint matches before proceeding
  • After join: CA cert stored locally for future connections

WebSocket Origin Validation (Step 1.4)

  • All WebSocket upgraders validate the Origin header against the node's configured domain
  • Non-browser clients (no Origin header) are still allowed
  • Prevents cross-site WebSocket hijacking attacks

Process Isolation

Dedicated User (Step 1.11)

  • All services run as the orama user (not root)
  • Caddy and CoreDNS get AmbientCapabilities=CAP_NET_BIND_SERVICE for ports 80/443 and 53
  • WireGuard stays as root (kernel netlink requires it)
  • vault-guardian already had proper hardening

systemd Hardening (Step 1.12)

  • All service units include:
    ProtectSystem=strict
    ProtectHome=yes
    NoNewPrivileges=yes
    PrivateDevices=yes
    ProtectKernelTunables=yes
    ProtectKernelModules=yes
    RestrictNamespaces=yes
    ReadWritePaths=/opt/orama/.orama
    
  • Applied to both template files (pkg/environments/templates/) and hardcoded unit generators (pkg/environments/production/services.go)

Supply Chain

Binary Signing (Step 1.13)

  • Build archives include manifest.sig — a rootwallet EVM signature of the manifest hash
  • During install, the signature is verified against the embedded Orama public key
  • Unsigned or tampered archives are rejected

Phase 2: OramaOS

These measures apply only to OramaOS nodes (mainnet, devnet, testnet).

Immutable OS

  • Read-only rootfs — SquashFS with dm-verity integrity verification
  • No shell/bin/sh symlinked to /bin/false, no bash/ash/ssh
  • No SSH — OpenSSH not included in the image
  • Minimal packages — only what's needed for systemd, cryptsetup, and the agent

Full-Disk Encryption

  • LUKS2 with AES-XTS-Plain64 on the data partition
  • Shamir's Secret Sharing over GF(256) — LUKS key split across peer vault-guardians
  • Adaptive threshold — K = max(3, N/3) where N is the number of peers
  • Key zeroing — LUKS key wiped from memory immediately after use
  • Malicious share detection — fetch K+1 shares when possible, verify consistency

Service Sandboxing

Each service runs in isolated Linux namespaces:

  • CLONE_NEWNS — mount namespace (filesystem isolation)
  • CLONE_NEWUTS — hostname namespace
  • Dedicated UID/GID — each service has its own user
  • Seccomp filtering — per-service syscall allowlist

Note: CLONE_NEWPID is intentionally omitted — it makes services PID 1 in their namespace, which changes signal semantics (SIGTERM ignored by default for PID 1).

Signed Updates

  • A/B partition scheme with systemd-boot and boot counting (tries_left=3)
  • All updates signed with rootwallet EVM signature (secp256k1 + keccak256)
  • Signer address: 0xb5d8a496c8b2412990d7D467E17727fdF5954afC
  • P2P distribution over WireGuard between nodes
  • Automatic rollback on 3 consecutive boot failures

Zero Operator Access

  • Operators cannot read data on the machine (LUKS encrypted, no shell)
  • Management only through Gateway API → agent over WireGuard
  • All commands are logged and auditable
  • No root access, no console access, no file system access

Rollout Strategy

Phase 1 Batches

Batch 1 (zero-risk, no restart):
  - CIDR fix
  - IPv6 disable
  - Internal endpoint auth
  - WebSocket origin check

Batch 2 (medium-risk, restart needed):
  - Hash refresh tokens
  - Hash API keys
  - Binary signing
  - Vault V1 auth enforcement
  - TURN secret encryption

Batch 3 (high-risk, coordinated rollout):
  - RQLite auth (followers first, leader last)
  - Olric encryption (simultaneous restart)
  - IPFS Cluster TrustedPeers

Batch 4 (infrastructure changes):
  - InsecureSkipVerify fix
  - Dedicated user
  - systemd hardening

Phase 2

  1. Build and test OramaOS image in QEMU
  2. Deploy to sandbox cluster alongside Ubuntu nodes
  3. Verify interop and stability
  4. Gradual migration: testnet → devnet → mainnet (one node at a time, maintaining Raft quorum)

Verification

All changes verified on sandbox cluster before production deployment:

  • make test — all unit tests pass
  • orama monitor report --env sandbox — full cluster health
  • Manual endpoint testing (e.g., curl without auth → 401)
  • Security-specific checks (IPv6 listeners, RQLite auth, binary signatures)