- Add `ntfyFanoutResolver` to distribute push notifications across all active cluster nodes, ensuring delivery when nodes lack shared state.
- Refactor secrets encryption key derivation to use cluster-wide secrets via HKDF, replacing ephemeral per-node keys to fix cross-node decryption issues.
- Add unit tests for fan-out resolution logic and caching behavior.
Root cause of the recurring "turn.credentials → namespace_not_configured" on a
distant node: at converge the gateway resolves its TURN secret from the
namespace rqlite, and on a slow/just-restarted node that read fails ONCE, so
the gateway is written with TURN disabled. Removing the node is not a fix — the
software must tolerate a slow read.
Two-part fix (complements e7ed718's "don't blank a warm config"):
- RETRY the secret read (5×2s) at converge so a node whose rqlite is still
syncing waits for it to land instead of writing an empty block once. A
genuine decrypt failure still exhausts the retries → unresolved → the
running config is preserved.
- CACHE the resolved secret into the node's own cluster-state.json
(applyResolvedWebRTCToState), so the NEXT cold start reads it from disk —
chooseRestoreWebRTC is state-first and short-circuits before the DB. The
state struct already had TURNSharedSecret "for cold start" but nothing
populated it; now it's filled on every successful resolve (only rewritten
on change). Each node self-heals its own cache; nothing new is sent
cross-node.
cluster-state.json now carries the TURN secret, so both writers (local
saveLocalState and the remote SaveClusterState) are tightened to 0600 + chmod.
Stale-secret self-heals: disable/enable webrtc re-pushes every node's config
and the next converge re-caches the new value.
Dual-reviewed: code-quality APPROVED; security SECURE after the remote-write
0600 fix. Tests: cache populate + short-circuit, no-change, turn-only node.
A freshly-joined namespace node could come up with TURN silently disabled
(turn.credentials -> namespace_not_configured) when GetWebRTCConfig errored:
the stored TURN shared secret was encrypted with a pre-rotation
cluster-secret-derived key and failed to decrypt, and the converge swallowed
that error into "WebRTC disabled", writing a TURN-disabled gateway config.
Distinguish "resolved but not enabled" (genuinely disabled, fine to write the
empty block) from "unresolved" (DB/decrypt error). chooseRestoreWebRTC's
dbFetch callback now returns a `resolved` bool; an unresolved lookup forces
enabled=false AND sets restoreWebRTC.unresolved. The converge then:
- logs the decrypt/read error loudly with the exact remediation
(`orama namespace disable webrtc` then `enable webrtc`) instead of
swallowing it;
- on the warm path, SKIPS ReconcileGateway so it doesn't rewrite a running
gateway's working WebRTC block to empty (preserves TURN);
- on the cold path, still spawns the gateway (the namespace needs one) but
warns loudly that TURN is degraded until the secret is regenerated.
Healthy nodes are unaffected: a node whose state file holds the secret
short-circuits before dbFetch, so a flaky/rotated DB cannot disable it.
Dual-reviewed (code-quality APPROVED, security SECURE — no secret material is
logged; operator disable still resolves to disabled, not unresolved).
Pure-function coverage in restore_webrtc_test.go: unresolved marking,
resolved-empty-is-disabled, and state-secret-wins-over-db-error.
- Add `turn_stealth_domain` to gateway config for stealth TURN support
- Introduce `turn_discovery` in `sni-router` to auto-discover per-namespace routes
- Add database migration to enable stealth TURN per namespace
- Document ephemeral state API in `SERVERLESS.md`
- add `secrets_encryption_key` to gateway config for serverless secrets
- implement durable TURN secret persistence to prevent config regen outages
- add regression test for gateway config loading and field mapping
- split webrtc route gating into `webrtcServeTURNCredentials` and `webrtcServeSFURoutes` to allow non-SFU gateways to mint TURN credentials
- update `chooseRestoreWebRTC` to correctly resolve configurations for nodes without local SFU ports
- add unit tests to verify independent route registration logic (bugboard #25)
- Add logic to reconcile gateway configuration drift for running instances
- Prevent unnecessary restart loops by verifying on-disk config state
- Add unit tests to validate synchronization logic and prevent regressions
- Add internal WebRTC management endpoints to public path exemption list
- Implement DB fallback for WebRTC configuration during cluster restore
- Add unit tests to verify WebRTC config precedence and state self-healing
- Add `namespace_push_config` table for per-namespace provider settings
- Introduce `cluster_secret_path` to enable deterministic JWT signing and
AES-256-GCM encryption for push credentials
- Update gateway config to support per-namespace overrides of push
notification providers (ntfy/Expo)
- Bump version to 0.122.3
- Replace CLI-based rootwallet calls with agent-based communication
- Update production provisioner to support sudo-based service management
- Add API key-to-wallet resolution for gateway operator handlers