14 KiB
Development Guide
Prerequisites
- Go 1.21+
- Node.js 18+ (for anyone-client in dev mode)
- macOS or Linux
Building
# Build all binaries
make build
# Outputs:
# bin/orama-node — the node binary
# bin/orama — the CLI
# bin/gateway — standalone gateway (optional)
# bin/identity — identity tool
# bin/rqlite-mcp — RQLite MCP server
Running Tests
make test
Deploying to VPS
Source is always deployed via SCP (no git on VPS). The CLI is the only binary cross-compiled locally; everything else is built from source on the VPS.
Deploy Workflow
# 1. Cross-compile the CLI for Linux
make build-linux
# 2. Generate a source archive (includes CLI binary + full source)
./scripts/generate-source-archive.sh
# Creates: /tmp/network-source.tar.gz
# 3. Install on a new VPS (handles SCP, extract, and remote install automatically)
./bin/orama install --vps-ip <ip> --nameserver --domain <domain> --base-domain <domain>
# Or upgrade an existing VPS
./bin/orama upgrade --restart
The orama install command automatically:
- Uploads the source archive via SCP
- Extracts source to
/home/orama/srcand installs the CLI to/usr/local/bin/orama - Runs
orama installon the VPS which builds all binaries from source (Go, CoreDNS, Caddy, Olric, etc.)
Upgrading a Multi-Node Cluster (CRITICAL)
NEVER restart all nodes simultaneously. RQLite uses Raft consensus and requires a majority (quorum) to function. Restarting all nodes at once can cause cluster splits where nodes elect different leaders or form isolated clusters.
Safe Upgrade Procedure (Rolling Restart)
Always upgrade nodes one at a time, waiting for each to rejoin before proceeding:
# 1. Build CLI + generate archive
make build-linux
./scripts/generate-source-archive.sh
# Creates: /tmp/network-source.tar.gz
# 2. Upload to ONE node first (the "hub" node)
sshpass -p '<password>' scp /tmp/network-source.tar.gz ubuntu@<hub-ip>:/tmp/
# 3. Fan out from hub to all other nodes (server-to-server is faster)
ssh ubuntu@<hub-ip>
for ip in <ip2> <ip3> <ip4> <ip5> <ip6>; do
scp /tmp/network-source.tar.gz ubuntu@$ip:/tmp/
done
exit
# 4. Extract on ALL nodes (can be done in parallel, no restart yet)
for ip in <ip1> <ip2> <ip3> <ip4> <ip5> <ip6>; do
ssh ubuntu@$ip 'sudo bash -s' < scripts/extract-deploy.sh
done
# 5. Find the RQLite leader (upgrade this one LAST)
ssh ubuntu@<any-node> 'curl -s http://localhost:5001/status | jq -r .store.raft.state'
# 6. Upgrade FOLLOWER nodes one at a time
ssh ubuntu@<follower-ip> 'sudo orama prod stop && sudo orama upgrade --restart'
# Wait for rejoin before proceeding to next node
ssh ubuntu@<leader-ip> 'curl -s http://localhost:5001/status | jq -r .store.raft.num_peers'
# Should show expected number of peers (N-1)
# Repeat for each follower...
# 7. Upgrade the LEADER node last
ssh ubuntu@<leader-ip> 'sudo orama prod stop && sudo orama upgrade --restart'
What NOT to Do
- DON'T stop all nodes, replace binaries, then start all nodes
- DON'T run
orama upgrade --restarton multiple nodes in parallel - DON'T clear RQLite data directories unless doing a full cluster rebuild
- DON'T use
systemctl stop orama-nodeon multiple nodes simultaneously
Recovery from Cluster Split
If nodes get stuck in "Candidate" state or show "leader not found" errors:
- Identify which node has the most recent data (usually the old leader)
- Keep that node running as the new leader
- On each other node, clear RQLite data and restart:
sudo orama prod stop sudo rm -rf /home/orama/.orama/data/rqlite sudo systemctl start orama-node - The node should automatically rejoin using its configured
rqlite_join_address
If automatic rejoin fails, the node may have started without the -join flag. Check:
ps aux | grep rqlited
# Should include: -join 10.0.0.1:7001 (or similar)
If -join is missing, the node bootstrapped standalone. You'll need to either:
- Restart orama-node (it should detect empty data and use join)
- Or do a full cluster rebuild from CLEAN_NODE.md
Deploying to Multiple Nodes
To deploy to all nodes, repeat steps 3-5 (dev) or 3-4 (production) for each VPS IP.
Important: When using --restart, do nodes one at a time (see "Upgrading a Multi-Node Cluster" above).
CLI Flags Reference
orama install
| Flag | Description |
|---|---|
--vps-ip <ip> |
VPS public IP address (required) |
--domain <domain> |
Domain for HTTPS certificates. Nameserver nodes use the base domain (e.g., example.com); non-nameserver nodes use a subdomain (e.g., node-4.example.com) |
--base-domain <domain> |
Base domain for deployment routing (e.g., example.com) |
--nameserver |
Configure this node as a nameserver (CoreDNS + Caddy) |
--join <url> |
Join existing cluster via HTTPS URL (e.g., https://node1.example.com) |
--token <token> |
Invite token for joining (from orama invite on existing node) |
--force |
Force reconfiguration even if already installed |
--skip-firewall |
Skip UFW firewall setup |
--skip-checks |
Skip minimum resource checks (RAM/CPU) |
--anyone-relay |
Install and configure an Anyone relay on this node |
--anyone-migrate |
Migrate existing Anyone relay installation (preserves keys/fingerprint) |
--anyone-nickname <name> |
Relay nickname (required for relay mode) |
--anyone-wallet <addr> |
Ethereum wallet for relay rewards (required for relay mode) |
--anyone-contact <info> |
Contact info for relay (required for relay mode) |
--anyone-family <fps> |
Comma-separated fingerprints of related relays (MyFamily) |
--anyone-orport <port> |
ORPort for relay (default: 9001) |
--anyone-exit |
Configure as an exit relay (default: non-exit) |
--anyone-bandwidth <pct> |
Limit relay to N% of VPS bandwidth (default: 30, 0=unlimited). Runs a speedtest during install to measure available bandwidth |
--anyone-accounting <GB> |
Monthly data cap for relay in GB (0=unlimited) |
orama invite
| Flag | Description |
|---|---|
--expiry <duration> |
Token expiry duration (default: 1h, e.g. --expiry 24h) |
Important notes about invite tokens:
- Tokens are single-use. Once a node consumes a token during the join handshake, it cannot be reused. Generate a separate token for each node you want to join.
- Expiry is checked in UTC. RQLite uses
datetime('now')which is always UTC. If your local timezone differs, account for the offset when choosing expiry durations. - Use longer expiry for multi-node deployments. When deploying multiple nodes, use
--expiry 24hto avoid tokens expiring mid-deployment.
orama upgrade
| Flag | Description |
|---|---|
--restart |
Restart all services after upgrade |
--anyone-relay |
Enable Anyone relay (same flags as install) |
--anyone-bandwidth <pct> |
Limit relay to N% of VPS bandwidth (default: 30, 0=unlimited) |
--anyone-accounting <GB> |
Monthly data cap for relay in GB (0=unlimited) |
orama prod (Service Management)
Use these commands to manage services on production nodes:
# Stop all services (orama-node, coredns, caddy)
sudo orama prod stop
# Start all services
sudo orama prod start
# Restart all services
sudo orama prod restart
# Check service status
sudo orama prod status
Note: Always use orama prod stop instead of manually running systemctl stop. The CLI ensures all related services (including CoreDNS and Caddy on nameserver nodes) are handled correctly.
Node Join Flow
# 1. Genesis node (first node, creates cluster)
# Nameserver nodes use the base domain as --domain
sudo orama install --vps-ip 1.2.3.4 --domain example.com \
--base-domain example.com --nameserver
# 2. On genesis node, generate an invite
orama invite
# Output: sudo orama install --join https://example.com --token <TOKEN> --vps-ip <IP>
# 3. On the new node, run the printed command
# Nameserver nodes use the base domain; non-nameserver nodes use subdomains (e.g., node-4.example.com)
sudo orama install --join https://example.com --token abc123... \
--vps-ip 5.6.7.8 --domain example.com --base-domain example.com --nameserver
The join flow establishes a WireGuard VPN tunnel before starting cluster services. All inter-node communication (RQLite, IPFS, Olric) uses WireGuard IPs (10.0.0.x). No cluster ports are ever exposed publicly.
DNS Prerequisite
The --join URL should use the HTTPS domain of the genesis node (e.g., https://node1.example.com).
For this to work, the domain registrar for example.com must have NS records pointing to the genesis
node's IP so that node1.example.com resolves publicly.
If DNS is not yet configured, you can use the genesis node's public IP with HTTP as a fallback:
sudo orama install --join http://1.2.3.4 --vps-ip 5.6.7.8 --token abc123... --nameserver
This works because Caddy's :80 block proxies all HTTP traffic to the gateway. However, once DNS
is properly configured, always use the HTTPS domain URL.
Important: Never use http://<ip>:6001 — port 6001 is the internal gateway and is blocked by
UFW from external access. The join request goes through Caddy on port 80 (HTTP) or 443 (HTTPS),
which proxies to the gateway internally.
Pre-Install Checklist
Before running orama install on a VPS, ensure:
-
Stop Docker if running. Docker commonly binds ports 4001 and 8080 which conflict with IPFS. The installer checks for port conflicts and shows which process is using each port, but it's easier to stop Docker first:
sudo systemctl stop docker docker.socket sudo systemctl disable docker docker.socket -
Stop any existing IPFS instance.
sudo systemctl stop ipfs -
Stop any service on port 53 (for nameserver nodes). The installer handles
systemd-resolvedautomatically, but other DNS services (likebind9ordnsmasq) must be stopped manually.
Recovering from Failed Joins
If a node partially joins the cluster (registers in RQLite's Raft but then fails or gets cleaned), the remaining cluster can lose quorum permanently. This happens because RQLite thinks there are N voters but only N-1 are reachable.
Symptoms: RQLite stuck in "Candidate" state, no leader elected, all writes fail.
Solution: Do a full clean reinstall of all affected nodes. Use CLEAN_NODE.md to reset each node, then reinstall starting from the genesis node.
Prevention: Always ensure a joining node can complete the full installation before it joins. The installer validates port availability upfront to catch conflicts early.
Debugging Production Issues
Always follow the local-first approach:
- Reproduce locally — set up the same conditions on your machine
- Find the root cause — understand why it's happening
- Fix in the codebase — make changes to the source code
- Test locally — run
make testand verify - Deploy — only then deploy the fix to production
Never fix issues directly on the server — those fixes are lost on next deployment.
Trusting the Self-Signed TLS Certificate
When Let's Encrypt is rate-limited, Caddy falls back to its internal CA (self-signed certificates). Browsers will show security warnings unless you install the root CA certificate.
Downloading the Root CA Certificate
From VPS 1 (or any node), copy the certificate:
# Copy the cert to an accessible location on the VPS
ssh ubuntu@<VPS_IP> "sudo cp /var/lib/caddy/.local/share/caddy/pki/authorities/local/root.crt /tmp/caddy-root-ca.crt && sudo chmod 644 /tmp/caddy-root-ca.crt"
# Download to your local machine
scp ubuntu@<VPS_IP>:/tmp/caddy-root-ca.crt ~/Downloads/caddy-root-ca.crt
macOS
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ~/Downloads/caddy-root-ca.crt
This adds the cert system-wide. All browsers (Safari, Chrome, Arc, etc.) will trust it immediately. Firefox uses its own certificate store — go to Settings > Privacy & Security > Certificates > View Certificates > Import and import the .crt file there.
To remove it later:
sudo security remove-trusted-cert -d ~/Downloads/caddy-root-ca.crt
iOS (iPhone/iPad)
- Transfer
caddy-root-ca.crtto your device (AirDrop, email attachment, or host it on a URL) - Open the file — iOS will show "Profile Downloaded"
- Go to Settings > General > VPN & Device Management (or "Profiles" on older iOS)
- Tap the "Caddy Local Authority" profile and tap Install
- Go to Settings > General > About > Certificate Trust Settings
- Enable full trust for "Caddy Local Authority - 2026 ECC Root"
Android
- Transfer
caddy-root-ca.crtto your device - Go to Settings > Security > Encryption & Credentials > Install a certificate > CA certificate
- Select the
caddy-root-ca.crtfile - Confirm the installation
Note: On Android 7+, user-installed CA certificates are only trusted by apps that explicitly opt in. Chrome will trust it, but some apps may not.
Windows
certutil -addstore -f "ROOT" caddy-root-ca.crt
Or double-click the .crt file > Install Certificate > Local Machine > Place in "Trusted Root Certification Authorities".
Linux
sudo cp caddy-root-ca.crt /usr/local/share/ca-certificates/caddy-root-ca.crt
sudo update-ca-certificates
Project Structure
See ARCHITECTURE.md for the full architecture overview.
Key directories:
cmd/
cli/ — CLI entry point (orama command)
node/ — Node entry point (orama-node)
gateway/ — Standalone gateway entry point
pkg/
cli/ — CLI command implementations
gateway/ — HTTP gateway, routes, middleware
deployments/ — Deployment types, service, storage
environments/ — Production (systemd) and development (direct) modes
rqlite/ — Distributed SQLite via RQLite