Compare commits

...

560 Commits

Author SHA1 Message Date
anonpenguin
8b8f0a4251
Merge pull request #93 from DeBrosDAO/nightly
release: 0.122.47 — nightly → main
2026-06-11 17:37:20 +03:00
anonpenguin23
8472861ed3 merge main into nightly — keep nightly (staging) on all conflicts
Reconciles the divergence created by the May-13 nightly history rewrite
(main absorbed pre-rewrite SHAs via PRs #90-92). Content conflicts all
resolve in nightly's favor; nightly is the deployed, verified branch
(v0.122.47 live on devnet).
2026-06-11 17:32:37 +03:00
anonpenguin23
cd8c717363 chore(version): bump to 0.122.47
- refactor(turn): extract decodeTURNConfig for testability
- feat(turn): add stealth domain fields to config
- fix(apns): nest custom data under "body" for expo-notifications compatibility
2026-06-11 11:45:12 +03:00
anonpenguin23
f4c58db710 release: 0.122.46 2026-06-11 10:06:19 +03:00
anonpenguin23
8375d92109 feat(namespace): reuse caddy wildcard certificate for stealth turns
- Implement `resolveStealthCert` to use existing `*.<baseDomain>` wildcard certificates instead of dynamic Caddyfile provisioning.
- Avoids EROFS errors caused by `ProtectSystem=strict` on the orama-node service.
- Add strict validation to ensure stealth hosts are single-label subdomains covered by the wildcard.
2026-06-11 10:04:45 +03:00
anonpenguin23
37daf28b5a release: 0.122.45 2026-06-11 08:00:31 +03:00
anonpenguin23
b425f80efb fix(config): add sni_router to root Config — prevents feat-124 boot crash
b9d5f54 (stealth TURN discovery) emits a top-level `sni_router:` block
into node.yaml unconditionally, but only added a lenient ad-hoc parse
in the carry-forward logic — not the field on config.Config that
orama-node strict-decodes (KnownFields(true)) at boot. Identical
failure mode to the v0.122.42 secrets_encryption_key incident: the
unknown key fails the whole node.yaml parse and orama-node crash-loops.

Caught pre-deploy this time by the strict-decode gate check; devnet
never saw it. Regression test added alongside the v0.122.42 one in
decode_test.go.
2026-06-11 08:00:31 +03:00
anonpenguin23
b9d5f542e1 feat(gateway): implement stealth TURN discovery and configuration
- Add `turn_stealth_domain` to gateway config for stealth TURN support
- Introduce `turn_discovery` in `sni-router` to auto-discover per-namespace routes
- Add database migration to enable stealth TURN per namespace
- Document ephemeral state API in `SERVERLESS.md`
2026-06-11 07:04:50 +03:00
anonpenguin23
f192cd0b84 release: 0.122.44 2026-06-10 12:13:25 +03:00
anonpenguin23
ff3e273da8 feat(gateway): implement persistent secrets and webrtc configuration
- add `secrets_encryption_key` to gateway config for serverless secrets
- implement durable TURN secret persistence to prevent config regen outages
- add regression test for gateway config loading and field mapping
2026-06-10 12:10:52 +03:00
anonpenguin23
4c631243b3 release: 0.122.43 2026-06-09 15:57:33 +03:00
anonpenguin23
e685c864fc fix(config): add secrets_encryption_key to HTTPGatewayConfig — fixes orama-node boot crash
v0.122.42 (f412425, secrets encryption) shipped the template emission,
the per-cluster secret generator, and the gateway.Config consumer — but
NOT the parse field on config.HTTPGatewayConfig. Phase 4 writes
`secrets_encryption_key` into node.yaml under the http_gateway section,
and pkg/config/yaml.go decodes with KnownFields(true) (strict). The
unknown field made every node.yaml parse fail, so orama-node exited 1
on every start and systemd crash-looped it (restart counter hit 380+ on
the first upgraded devnet node before the rolling controller halted).

Root cause: a generated-config field with no matching struct field under
strict unmarshal. Fix is the missing field. The runtime key itself is
still consumed from ~/.orama/secrets/secrets-encryption-key (pkg/node/
gateway.go), which already worked — so this one-field addition fully
restores boot AND the feature.

The standalone gateway (cmd/gateway/config.go) uses lenient parsing and
was unaffected.

Regression test in pkg/config/decode_test.go decodes a node.yaml
carrying secrets_encryption_key under strict mode.
2026-06-09 15:57:32 +03:00
anonpenguin23
b6b518e005 release: 0.122.42 2026-06-09 13:01:38 +03:00
anonpenguin23
f41242538e feat(serverless): add raw http response mode and secrets encryption
- Add `raw_http_response` configuration to functions to allow verbatim HTTP responses
- Implement cluster-wide secrets encryption key generation and distribution for serverless functions
- Update documentation with UnifiedPush support for ntfy on Android/GrapheneOS
2026-06-09 13:01:02 +03:00
anonpenguin23
aa04ab5f50 release: 0.122.41 2026-06-09 09:24:59 +03:00
anonpenguin23
f8de4af704 feat(sni-router): implement hot-reloading for route configuration
- Add `FileRouteReloader` to watch and atomically update routes from disk
- Refactor `main` to support seamless configuration updates without restarts
- Ensure existing routes are preserved if a reload encounters an error
2026-06-09 09:23:54 +03:00
anonpenguin23
32f7b3824e release: 0.122.40 2026-06-04 10:08:59 +03:00
anonpenguin23
eade6e1742 feat(pubsub): remove mesh formation wait and add publish rate limiting
- Remove the 2-second polling wait for gossipsub mesh formation in `Publish`
  to eliminate unnecessary latency, relying on `FloodPublish` for delivery.
- Introduce a per-invocation publish budget (1000 messages) to prevent
  potential flooding of the shared gossipsub router by WASM functions.
- Add regression tests to ensure `Publish` remains non-blocking and that
  the publish budget is strictly enforced.
2026-06-04 10:08:10 +03:00
anonpenguin23
f3875d5157 release: 0.122.39 2026-06-02 15:06:48 +03:00
anonpenguin23
9373c2ad92 feat(rqlite,serverless): add local read consistency and async invocation
- Introduce `BatchQueryConsistency` with `ReadConsistencyNone` to allow
  local SQLite reads, bypassing leader round-trips for performance.
- Add `function_invoke_async` host function to support non-blocking
  fire-and-forget function execution.
2026-06-01 19:59:30 +03:00
anonpenguin23
b2a3bff88c release: 0.122.38 2026-06-01 10:13:15 +03:00
anonpenguin23
ca4ccbfcd4 feat(gateway): decouple turn credentials and sfu route registration
- split webrtc route gating into `webrtcServeTURNCredentials` and `webrtcServeSFURoutes` to allow non-SFU gateways to mint TURN credentials
- update `chooseRestoreWebRTC` to correctly resolve configurations for nodes without local SFU ports
- add unit tests to verify independent route registration logic (bugboard #25)
2026-06-01 10:12:07 +03:00
anonpenguin23
a3cf8384e9 release: 0.122.37 2026-05-30 19:27:25 +03:00
anonpenguin23
bf0d5f9f9f feat(namespace): implement warm reconciliation for gateway webrtc config
- Add logic to reconcile gateway configuration drift for running instances
- Prevent unnecessary restart loops by verifying on-disk config state
- Add unit tests to validate synchronization logic and prevent regressions
2026-05-30 19:26:26 +03:00
anonpenguin23
3987ad0cf3 release: 0.122.36 2026-05-30 14:41:51 +03:00
anonpenguin23
4fc975216f feat(gateway): fix WebRTC config persistence and endpoint access
- Add internal WebRTC management endpoints to public path exemption list
- Implement DB fallback for WebRTC configuration during cluster restore
- Add unit tests to verify WebRTC config precedence and state self-healing
2026-05-30 14:39:39 +03:00
anonpenguin23
9bace7bbf4 release: 0.122.35 2026-05-29 12:09:12 +03:00
anonpenguin23
325a2471c7 Changes 2026-05-29 11:46:20 +03:00
anonpenguin23
0d352d0b42 release: 0.122.34 2026-05-28 09:55:42 +03:00
anonpenguin23
cfff08d91e feat(serverless): add turn_credentials host function and slow invocation diagnostics
- Implement `turn_credentials` host function to provide TURN configuration to WASM modules.
- Add structured logging for slow serverless invocations exceeding 5s, providing per-phase timing (rate-limit, module-load, execution) to identify performance bottlenecks.
- Enhance WebSocket handler logging to capture request context when 30s timeouts occur.
2026-05-28 09:54:24 +03:00
anonpenguin23
8fbc4485c1 fix(serverless): enable system clocks for wasm modules
- opt into `WithSysWalltime` and `WithSysNanotime` to prevent wazero from using a frozen sentinel clock
- add regression tests to verify real-time clock behavior in wasm execution
- ensure serverless functions receive accurate timestamps for audit and cursor logic
2026-05-26 10:53:07 +03:00
anonpenguin23
1399b22676 release: 0.122.33 2026-05-25 10:25:51 +03:00
anonpenguin23
1faf04e2a3 feat(cli): add function enable/disable and fix upgrade re-exec
- Add `enable` and `disable` commands to manage function status
- Implement process re-exec in the upgrade orchestrator to ensure
  Phase 4 config generation uses the newly-installed binary version
  (fixes bugboard #15)
2026-05-25 10:25:04 +03:00
anonpenguin23
bc2c25ff16 release: 0.122.32 2026-05-25 09:35:16 +03:00
anonpenguin23
b2d35bbde1 feat(gateway): enable local wildcard triggers for pubsub
- wire PubSubDispatcher to host functions to support local wildcard
  triggers for WASM-published topics
- implement batch deduplication by topic to prevent redundant trigger
  invocations and bound fan-out
- propagate trigger depth through function invocations to maintain
  recursion limits during local dispatch
2026-05-25 09:34:01 +03:00
anonpenguin23
0463f37c0d release: 0.122.31 2026-05-24 20:57:52 +03:00
anonpenguin23
98dad46a81 fix(gateway): decouple webrtc route registration from legacy flag
- Update route registration logic to rely solely on SFUPort > 0, resolving a silent 404 issue where gateways with valid SFU configurations were incorrectly disabled.
- Retain WebRTCEnabled in config for backward compatibility with existing operator YAML and request schemas.
- Add unit tests to pin registration behavior and prevent future regressions.
2026-05-24 20:56:08 +03:00
anonpenguin23
877563b86f release: 0.122.30 2026-05-24 19:39:39 +03:00
anonpenguin23
62e4d1963b feat(gateway): add apns_voip provider support
- register "apns_voip" provider to handle PushKit/CallKit signals
- implement target provider filtering in dispatcher to prevent cross-talk
  between alert and VoIP push paths
- add comprehensive tests to ensure backward compatibility for fan-out
  and correct filtering behavior
2026-05-24 19:38:38 +03:00
anonpenguin23
57eb9f4f66 release: 0.122.29 2026-05-23 12:49:53 +03:00
anonpenguin23
ccbcea0f3f fix(serverless): prevent invocation context race condition
- Attach InvocationContext to the execution context in Engine.Execute to
  ensure host functions resolve identity from the request context.
- Fixes a race condition where concurrent stateless invocations would
  overwrite the global singleton, causing cross-tenant leaks or nil
  namespace errors.
- Added a regression test to verify per-invocation isolation under load.
2026-05-23 12:48:45 +03:00
anonpenguin23
370c48d575 release: 0.122.28 2026-05-21 15:53:58 +03:00
anonpenguin23
e2bc9577ff feat(serverless): isolate invocation logs and enforce cron poll interval
- Fix log cross-contamination by introducing per-invocation LogBuffers
  (bugboard #108)
- Enforce a 100ms minimum for CronPollInterval to prevent scheduler
  starvation (bugboard #109)
- Add comprehensive validation tests for cron interval constraints
2026-05-21 15:52:46 +03:00
anonpenguin23
3b8139802c feat: APNs silent-drop guard + persistent-WS mid-session JWT refresh
#348 - APNs silent-drop guard
Apple's APNs silently returns HTTP 200 for pushes with no visible
content (no title, no body, no badge, no sound, no
content-available=1) and then drops them — which looked to the WASM
caller like a successful delivery. Now rejected up-front with the new
push.ErrEmptyContent sentinel, and the APNs provider returns the
structured push.PushError shape (HTTPStatus, Reason, Unregistered,
Wrapped) so the dispatcher can branch on Unregistered to remove dead
tokens automatically. Legacy ErrDeviceUnregistered sentinel is
preserved for errors.Is compatibility (wrapped inside PushError).

Always logs APNs HTTP response (status, reason, apns_id, token prefix)
so future silent-drop classes show up in operator logs.

content-available is also now correctly mapped from snake_case
Data["content_available"] (any truthy variant) into Apple's
canonical "content-available": 1 inside the aps dictionary.

#321 - mid-session JWT refresh on persistent WS
Long-lived persistent WS connections used to have to close+reconnect
when the JWT rolled — losing per-instance state, message queues, and
subscriptions. The handler now accepts an "auth.refresh" control
frame: client sends the new token, the gateway re-verifies it via
the new JWTVerifier interface, updates the per-instance invCtx
in-place (persistent.Instance.UpdateInvCtx), and acks. No close, no
state loss.

JWTVerifier is optional — handlers set it via SetJWTVerifier at
gateway init. When unwired the handler nack's with a "not supported
on this gateway" response and clients fall back to the old
close+reconnect path, so older deploys don't break.

Other:
- push/dispatcher.go: SendToUserDetailed returns per-device PushError
  shape so callers can act on Unregistered / HTTPStatus / Reason.
- serverless/hostfunctions/push.go: WASM host functions for the new
  detailed-error shape.
- serverless/persistent/instance.go: UpdateInvCtx mid-session.

Tests:
- ws_persistent_control_test.go: auth.refresh ack/nack paths.
- apns_test.go: empty-content rejection, PushError shape on 410 +
  generic non-200, content-available mapping.
- dispatcher_detailed_test.go: SendToUserDetailed result shape.
- instance_update_invctx_test.go: invCtx update is per-instance, not
  cross-tenant.

VERSION bumped to 0.122.27.
2026-05-19 18:19:21 +03:00
anonpenguin23
94f10c66c5 release: 0.122.26 2026-05-18 10:42:50 +03:00
anonpenguin23
ebc9d51167 feat(gateway): implement pubsub dispatcher and batch query support
- Integrate PubSubDispatcher to enable libp2p subscription for trigger patterns
- Add BatchQuery to rqlite client to reduce round-trips for multi-query operations
- Implement lifecycle management for dispatcher and add safety limits for batch queries
2026-05-17 16:27:05 +03:00
anonpenguin23
17b06d38e4 fix(gateway,serverless): libp2p mesh peer-port + system-trigger auth bypass
Two serious bugs found via cross-node behavior observation:

1. libp2p peer-discovery published wrong port
   PeerDiscovery's multiaddr was using the gateway's HTTP API port (e.g.
   10004), not the actual libp2p TCP port. Remote gateways dialed that
   port, hit the HTTP server, received 400, and failed the libp2p
   multistream handshake ("message did not have trailing newline").
   Result: cluster-wide cross-node libp2p mesh had 0 connected peers
   and cross-node pubsub silently dropped 100% of messages.

   The libp2p port is OS-assigned at startup (client.go uses
   /ip4/0.0.0.0/tcp/0). It's not anywhere in cfg — it's only on
   host.Addrs(). Fix: drop the listenPort field from PeerDiscovery
   entirely and derive the port live from host.Addrs() via
   extractLibp2pTCPPort. WG IP still comes from getWireGuardIP
   (libp2p filters its own enumeration so WG IPs don't appear in
   host.Addrs(), but the listener is bound 0.0.0.0 so the port is
   reachable on the WG interface).

2. System triggers silently blocked by CanInvoke (#264)
   Cron, pubsub, database, timer, and job triggers all fire from
   gateway-internal state with no caller identity. Invoke() ran every
   request through CanInvoke(callerWallet) which returned false for
   the empty wallet — every fire returned ErrUnauthorized. Reported as
   a cron firing every minute with "unauthorized" for 19+ hours.

   Auth boundary for system triggers belongs at REGISTRATION time
   (POST /v1/functions/{name}/triggers, deploy-time auto-register
   from function.yaml). Skip the per-invocation check for system
   trigger types; user-driven triggers (HTTP, WebSocket) still gate
   on caller identity as before.

Tests:
- gateway/peer_discovery_test.go covers extractLibp2pTCPPort.
- serverless/invoke_system_trigger_test.go covers the bypass and the
  user-trigger gate.

VERSION bumped to 0.122.25.
2026-05-16 15:43:18 +03:00
anonpenguin23
251630a5c7 fix(serverless): per-call invCtx propagation prevents cross-tenant identity leak in persistent WS
HostFunctions is a process-wide singleton (one per gateway engine).
Its `invCtx` field is shared across all WASM instances. For STATELESS
execution the executor sets/clears it per-call but the lock is
released before WASM runs — two concurrent invocations can race on
the field and one's host call can read the other's identity. Window
is microseconds.

For PERSISTENT WS the bug was much worse: invCtx used to be bound
ONCE at instantiation and reused for the connection's lifetime. Two
simultaneous persistent WS connections from different namespaces /
wallets overwrote each other's invCtx, and EVERY subsequent
function_invoke / GetCallerJWTSubject / GetCallerWallet / GetSecret
call from inside the WASM read whatever was bound LAST. Result:
silent identity leak across tenants for as long as the connections
overlapped.

Fix: per-call invCtx propagation through Go's context.Context.
wazero passes the ctx given to api.Function.Call through to host
function callbacks, so every WASM-host hop carries its own invCtx.

- pkg/serverless/invocation_context.go (new): WithInvocationContext +
  InvocationContextFromCtx helpers using an unexported invCtxKey.
- pkg/serverless/hostfunctions/invocation_context.go (new):
  currentInvocationContext(ctx) — ctx-attached invCtx wins over the
  singleton field.
- All host accessors (FunctionInvoke, GetEnv, GetSecret, GetRequestID,
  GetCallerWallet, GetWSClientID, GetCallerClaim, GetCallerJWTSubject)
  now route through currentInvocationContext(ctx).
- pkg/serverless/persistent/instance.go: every export call's ctx is
  wrapped with the per-instance invCtx before being passed to wazero.
- pkg/gateway/handlers/serverless/ws_persistent_handler.go: invCtx is
  built per-frame and attached to ctx, not stored on a shared field.
- pkg/serverless/engine.go: removed the SetInvocationContext call at
  InstantiatePersistent (no longer needed; ctx carries it).

Stateless still uses the singleton field — its race is latent since
the host-functions split and migrating it is a separate scoped
change.

Tests:
- hostfunctions/invocation_context_test.go covers ctx-wins-over-singleton.
- gateway/handlers/serverless/ws_persistent_handler_test.go covers the
  per-frame ctx wiring.
- cli/functions/build_test.go is new coverage for the build path
  touched in this change.

VERSION bumped to 0.122.24.
2026-05-15 13:36:35 +03:00
anonpenguin23
80b466af68 fix(serverless): override WASI proc_exit so command-mode persistent WS stays alive
The previous fix (v0.122.22) made `InstantiatePersistent` call `_start`
to bootstrap TinyGo's runtime, then catch the resulting ExitError(0).
That got past init, but the module STILL died — wazero's stock
`proc_exit` implementation calls `mod.CloseWithExitCode(exitCode)`
before panicking, which invalidates the module regardless of what
the caller does with the panic. Every subsequent call to ws_open /
ws_frame / ws_close / orama_alloc returned ExitError(0) ("module
already closed").

Wazero exposes no flag for this — the close is hard-coded. The only
intercept point is to override `proc_exit` at the WASI host-module
boundary. Documented pattern at imports/wasi_snapshot_preview1/wasi.go
lines 111-127.

Fix: build the WASI host module manually so we can override
`proc_exit`:

  - exit code 0 → panic ExitError(0) BUT do NOT close the module.
    This is TinyGo's "_start completed cleanly" signal; the module's
    other exports must stay callable for the persistent lifecycle.
  - exit code != 0 → preserve standard WASI behavior (close + panic).
    A non-zero exit is a genuine app-signaled failure; we want
    `proc_exit(N != 0)` to behave exactly as upstream does.

The InstantiatePersistent caller already distinguishes the two cases
via errors.As + ExitCode() check — added in v0.122.22, no change here.

Safe for stateless functions on the same runtime: the stateless
execution path closes its own module after each invocation, so the
"module stays alive on exit 0" override has no effect on that path.

VERSION bumped to 0.122.23.
2026-05-15 11:56:29 +03:00
anonpenguin23
6a0043a244 fix(serverless): bootstrap TinyGo runtime in persistent WS instances (#240/#249)
InstantiatePersistent passed WithStartFunctions() with no args,
explicitly disabling both wasi entry points. The intent was to skip
main(); the side effect was leaving the TinyGo runtime
uninitialized. The first call to any export traps via
wasmExportCheckRun and managed-memory ops panic. Every persistent WS
function was effectively dead since plan #06 landed.

Earlier patch in this thread restored the call but only handled
wasi-reactor builds (_initialize). AnChat's rpc-router is a wasi
command build (`_start` export only, no `_initialize`) — wasm-objdump
confirms — so the reactor-only fix still left it broken.

This fix tries `_initialize` first, falls back to `_start`, and
bounds whichever runs with a 5s timeout so a buggy main() can't hang
instantiation forever. Logs the chosen hook at Debug, warns when
neither is exported.

Still pass WithStartFunctions() (no args) so wazero doesn't
auto-call `_start` during InstantiateModule — we want full control
over which hook runs and the timeout that bounds it.

VERSION bumped to 0.122.22.
2026-05-15 10:40:27 +03:00
anonpenguin23
62a8fbf2df fix(serverless): registry read paths now load WS persistent metadata (#240/#249)
Register() writes the four ws_* columns (ws_persistent,
ws_idle_timeout_sec, ws_max_frame_bytes, ws_max_inflight_per_conn) to
the functions table, but every read path — Get, List, GetByID,
GetByNameInternal — silently dropped them from the SELECT. functionRow
had no fields for them either. Result: fn.WSPersistent was always the
zero value (false) at runtime, no matter what the DB row said. Every
WS function ran in per-frame stateless mode regardless of its
`ws_persistent: true` config.

AnChat's rpc-router was the canary: it relies on per-connection
instance state (request_id ↔ reply correlation, subscription
bookkeeping) that the stateless model destroys every frame. The
gateway telemetry envelope still reached the client
({request_id, status, duration_ms}) so the failure looked like
"function works, frames don't" — every RPC timed out at 15 s.

Fix: include the four columns in every SELECT, add the matching
functionRow fields, and copy them into Function in rowToFunction.
No schema change (columns have been in migration 011 from the start).

Regression tests in registry_ws_columns_test.go cover the Get / List
paths against an in-memory SQLite that mirrors the production DDL.

VERSION bumped to 0.122.21.
2026-05-15 09:01:42 +03:00
anonpenguin23
a0a1decd06 fix(ws): prefer X-Forwarded-Host in Origin check — root cause #240/#249
handleNamespaceGatewayRequest rewrites r.Host to the backend target
IP:port (e.g. "10.0.0.6:10004") before forwarding. The original
public host (e.g. "ns-anchat-test.orama-devnet.network") is preserved
in X-Forwarded-Host. checkWSOrigin in both pubsub/ws_client.go and
serverless/ws_handler.go was comparing the client's Origin against
the proxied r.Host only — so every browser / RN-iOS WS upgrade was
rejected 403 because their Origin's public hostname can never match
10.0.0.6.

curl probes don't send Origin, so curl returned true unconditionally
and the bug was invisible to operator smoke tests. AnChat's iPhone
WS clients hit `code=1006 reason="Received bad response code from
server: 403"` for ~24h.

Fix: prefer X-Forwarded-Host (the original public host) when present,
fall back to r.Host for direct (non-proxied) connections. Applied
identically to both WS handlers. Regression test in
serverless/ws_origin_test.go covers the proxy-hop case, no-Origin
case, and direct-connection case.

This is the real fix; v0.122.19 only closed a separate silent-forward
auth hole that produced opaque 401s on a different code path.

VERSION bumped to 0.122.20.
2026-05-15 07:03:28 +03:00
anonpenguin23
872c553d1c fix(gateway): namespace-proxy rejects unauthed requests at main, logs WS audit
Root-cause hardening for bug #240 and #249's "intermittent 401 over WS"
reports. handleNamespaceGatewayRequest previously had a third code
path beyond "auth ok" and "auth error": when validateAuthForNamespaceProxy
returned empty namespace AND empty error (i.e. "no credentials found"),
the request fell through to a silent forward to the namespace gateway
WITHOUT internal-auth headers. The namespace gateway then rejected
with 401 "missing API key" in ~60µs.

From the client's perspective: opaque 401.
From our side: only the namespace gateway logged it, and that tier
can't validate API keys (they live in the main cluster RQLite), so
the operator had no signal that the main gateway had even seen the
request. AnChat's intermittent 401-on-WS reports went unsolved for
this exact reason.

Fix:
- Explicit reject at main when no credentials extracted AND path
  isn't public. Returns 401 with WWW-Authenticate: Bearer realm and a
  clear message naming the three accepted credential sources.
- Rich structured logging on every WS upgrade auth outcome: presence
  of api_key/token/jwt query params, Authorization + X-API-Key
  headers, Connection/Upgrade headers, Origin, User-Agent, client IP,
  raw query length. Steady-state stays low-noise: success path logs at
  debug, reject paths log at warn.
- Namespace-mismatch reject (existing branch) now also logs.

VERSION bumped to 0.122.19.
2026-05-14 17:53:38 +03:00
anonpenguin23
5c1404849b fix(#72): correct ntfy upstream checksum URL
Upstream publishes the checksums asset as a plain "checksums.txt" at
the release root, not "ntfy_<VER>_checksums.txt". The version-prefixed
URL we were constructing 404'd, so InstallNtfy bailed in the
download-binary step and ntfy never landed even after we wired
InstallNtfy into the pre-built install path.

Verified against the v2.11.0 release assets list. If a future version
changes the naming convention, the install will 404 loud and this URL
gets bumped in the same PR as ntfyVersion.

VERSION bumped to 0.122.18.
2026-05-14 14:29:24 +03:00
anonpenguin23
7e47f42f91 fix(#72): install ntfy in pre-built path too — devnet path was missing it
Phase 2b auto-detects pre-built archive mode and routes to
installFromPreBuilt(). That path copies bundled binaries (caddy, orama,
gateway, …) into place but never called InstallNtfy() — because ntfy
is downloaded from upstream github, not bundled. Result: on devnet
(which always uses pre-built mode), ntfy never installed even though
the always-on code path in installFromSource() was correctly wired up.

Fix: add InstallNtfy() call to installFromPreBuilt right after the
binary deploy + setCapabilities steps, before disableResolvedStub
runs. Ordering matters because Phase 4's ConfigureNtfy chowns
/etc/ntfy/server.yml to the ntfy user, which needs to exist.

VERSION bumped to 0.122.17.
2026-05-14 12:16:28 +03:00
DeBros
85e6dd8a9c
Merge pull request #92 from DeBrosDAO/nightly
release: 0.122.10
2026-05-12 10:18:52 +03:00
anonpenguin23
55b86cdcfc release: 0.122.10
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 10:14:53 +03:00
anonpenguin
b13948d15a
Merge pull request #91 from DeBrosDAO/nightly
ci(publish-sdk): allow same-version on npm bump
2026-05-12 10:13:50 +03:00
anonpenguin23
cd690bd28c ci(publish-sdk): allow same-version on npm bump
When /VERSION is bumped on the source branch before tagging, the SDK
package.json already matches the release tag. Without --allow-same-version,
npm version errors as 'Version not changed' and the workflow fails on
stable main releases (nightly worked by accident because the bump was
done in the previous commit cycle and package.json was a step behind).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 10:07:34 +03:00
anonpenguin
c172f1355f
Merge pull request #90 from DeBrosDAO/nightly
Nightly
2026-05-12 10:01:25 +03:00
anonpenguin23
fbe05d94b3 ci: goreleaser v2 hooks need string form, bump to 0.122.9
GoReleaser v2.15.4 rejects the {cmd: ..., dir: ...} map syntax for
before.hooks even though v2 docs show it. Reverting to the simple
string form `go -C core mod tidy` that worked in v1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 09:54:58 +03:00
anonpenguin23
fb609d66d9 ci: single VERSION file, version guards, goreleaser v2, CI on push
Workflow hardening based on the four-cycle release-debugging session:

Centralized versioning
- Add /VERSION at repo root as single source of truth.
- core/Makefile reads VERSION via `$(shell cat ../VERSION)`.
- Add `make bump VER=X.Y.Z` target that updates /VERSION and syncs
  sdk/package.json in one shot.

Version mismatch guards
- All three release workflows (release.yaml, release-apt.yml,
  publish-sdk.yml) now verify the release tag matches /VERSION at the
  very first step. Stale-VERSION releases fail fast with a clear hint
  to run `make bump`.

GoReleaser v2 migration
- Upgrade goreleaser-action v5 -> v6 (pinned `~> v2`).
- Add `version: 2` to .goreleaser.yaml.
- Migrate to v2 syntax: `archives.format` -> `formats: [...]`,
  `brews.folder` -> `directory`, `snapshot.name_template` ->
  `version_template`, `builds`-style references replaced with `ids:`.
- `before.hooks` can use map syntax again (v2 supports it).

Homebrew tap on stable only
- `brews.skip_upload` is now `'{{ if .Prerelease }}true{{ else }}false{{ end }}'`.
- Stops nightly releases from polluting the tap and from hitting 401
  on stale HOMEBREW_TAP_TOKEN. Stable main releases still publish.

CI on every push
- New ci.yml runs `go vet` + `go test -race` on the core module and
  typecheck/build/unit-tests on the SDK for every push to main/nightly
  and every PR. version-sanity job warns when /VERSION and
  sdk/package.json drift.

Version bump for next pipeline test
- /VERSION: 0.122.8
- sdk/package.json: 0.122.8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 09:49:33 +03:00
anonpenguin23
10056fe5db ci: point goreleaser at renamed DeBrosDAO/orama repo, bump to 0.122.7
The repo moved from DeBrosOfficial/network to DeBrosDAO/orama.
GoReleaser was uploading artifacts to the old URL and getting 307
redirects, then retrying until secondary rate limits kicked in.

- release.github.owner/name: DeBrosOfficial/network -> DeBrosDAO/orama
- brews.repository.owner: DeBrosOfficial -> DeBrosDAO
- all homepage URLs updated
- bump VERSION to 0.122.7 for fourth pipeline test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 09:42:51 +03:00
anonpenguin23
daa96a5c2f ci: fix goreleaser nfpm README path and bump to 0.122.6
- goreleaser nfpm contents pointed to ./core/README.md which doesn't
  exist; switched to repo-root ./README.md
- bump VERSION to 0.122.6 for third pipeline test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 09:37:32 +03:00
anonpenguin23
6c0ada087e ci: fix remaining release pipeline bugs and bump to 0.122.5
- goreleaser: 'cd core && go mod tidy' didn't work — hooks run via exec,
  not shell. Switched to 'go -C core mod tidy' which is shell-free.
- release-apt: remove redundant mv that renamed file to itself
  (PKG_NAME and target name were identical, dpkg-deb already named it
  correctly)
- bump VERSION to 0.122.5 for second pipeline test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 09:32:59 +03:00
anonpenguin23
fd45d26d51 ci: fix release workflows and bump version to 0.122.4
- goreleaser: switch hooks to v1 string syntax (was map syntax, caused
  yaml unmarshal error on action v1.26.2)
- release-apt: build ./cmd/cli and ./cmd/node as packages (was building
  single .go files, missed sibling files → undefined: runCLI)
- publish-sdk: remove main-only guard; nightly releases now publish to
  npm with --tag nightly (stable releases still go to @latest)
- bump VERSION to 0.122.4 for first end-to-end release pipeline test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 09:30:13 +03:00
anonpenguin23
887669ae36 chore(sdk): bump version to 0.122.3 2026-05-12 09:19:50 +03:00
anonpenguin23
59b4df976b ci(workflow): restrict sdk publish to manual triggers or main branch releases 2026-05-12 09:19:15 +03:00
anonpenguin23
7c76790ad3 ci(publish-sdk): trigger workflow on release
- add release event trigger to automate publishing
- update versioning logic to support release tags
- conditionally skip git tagging when triggered by release
2026-05-12 09:17:49 +03:00
anonpenguin23
5ccacb91d6 fix(gateway): update rqlite consistency level and improve column mapping
- Change RQLite consistency level from `none` to `weak` to ensure reads
  route to the leader and prevent stale data reads (fixes #235)
- Add `normalizeColumnKey` to allow snake_case SQL columns to map to
  CamelCase Go struct fields automatically (fixes #65)
- Add comprehensive unit tests for DSN generation and column mapping
2026-05-12 09:13:03 +03:00
anonpenguin23
6d9822dc35 feat(gateway): implement self-service tenant push notifications
- Add `namespace_push_config` table for per-namespace provider settings
- Introduce `cluster_secret_path` to enable deterministic JWT signing and
  AES-256-GCM encryption for push credentials
- Update gateway config to support per-namespace overrides of push
  notification providers (ntfy/Expo)
- Bump version to 0.122.3
2026-05-08 11:23:53 +03:00
anonpenguin23
333b7233c1 docs: update deployment and serverless documentation
- bump version to 0.122.2
- document schema migration invariants and push notification configuration
- add serverless host function aliases and v2 database API documentation
- introduce schema roundtrip test to prevent migration drift
2026-05-07 07:33:52 +03:00
anonpenguin23
c2556a14d7 feat(serverless): register host module under "orama" alias
- Add "orama" to the list of host module registration names to support
  common developer intuition and prevent instantiation errors.
- Add comprehensive regression tests to ensure all aliases ("env",
  "host", "orama") remain registered.
- Update SDK documentation to clarify import conventions and alias
  support.
2026-05-06 15:43:11 +03:00
anonpenguin23
96029e5ded Update version 2026-05-06 09:12:24 +03:00
anonpenguin23
f865b174ed feat(migrations): implement schema version contract enforcement
- Add `contract.go` to manage and validate embedded SQL migrations
- Introduce `AssertSchema` to verify database version at startup
- Include `SchemaMismatchError` with actionable recovery instructions
- Add comprehensive unit tests for version parsing and validation
2026-05-06 08:23:13 +03:00
anonpenguin
46d511eb5e
Merge pull request #89 from DeBrosDAO/blockchain
Blockchain
2026-05-05 12:30:45 +03:00
anonpenguin23
c2f6b19493 Updated Version 2026-05-05 12:30:15 +03:00
8fdf03b143 chore: untrack bin/ and add to .gitignore
Compiled binaries should not be in version control.
2026-05-05 12:20:04 +03:00
7d322dc84f etc 2026-05-05 11:35:35 +03:00
abd0b2651d Serverless Engine Patch 2026-05-05 07:40:16 +03:00
anonpenguin23
d10f8c35bb feat(gateway): implement persistent webhooks and namespace sequencing
- Add migrations for per-namespace publish sequences and persistent WebSocket function settings
- Integrate PersistentWSManager and WSBridge into the gateway dependency graph
- Upgrade serverless engine to use a multi-tier rate limiter
- Update JWT claims to support custom application-defined fields
2026-05-04 11:38:19 +03:00
anonpenguin23
0379dc39f1 feat(core): implement sni-router for stealth turn
- add `orama-sni-router` binary to build process
- introduce `cmd/sni-router` for TLS-level SNI routing
- add documentation for stealth turn deployment architecture
2026-05-03 18:20:21 +03:00
anonpenguin23
54852076f9 feat(ssh): allow running remote commands via ssh
- update ssh command to accept optional remote command argument
- modify sshInto to execute commands non-interactively when provided
- comment out unreachable node in nodes.conf
2026-05-03 14:55:43 +03:00
anonpenguin23
69c7ed5e5a Fix 2026-04-02 15:22:06 +03:00
anonpenguin23
7284cb4578 feat(cli): add fanout push strategy and improve website responsiveness
- Add --fanout flag to push command for server-to-server deployment
- Implement agent forwarding for efficient multi-node distribution
- Update landing page scene heights and section padding for mobile devices
2026-03-28 15:27:54 +02:00
anonpenguin23
b9b6e19bb8 Changes 2026-03-28 14:58:25 +02:00
anonpenguin23
f3f4a84762 feat(cli): add push command and improve node setup
- Add `orama push` command to upload and extract binary archives to nodes
- Update `node setup` to pass operator metadata and auto-configure environments
- Improve SSH configuration and node registration logic
2026-03-28 14:30:55 +02:00
anonpenguin23
ab1be4105c feat(cli): add node setup command
- implement automated VPS bootstrapping for Orama nodes
- add SSH key management via rootwallet
- support genesis node creation and cluster joining via invite tokens
2026-03-28 10:24:48 +02:00
anonpenguin23
c27faa02fa feat(auth): integrate rootwallet agent and update service hardening
- Replace CLI-based rootwallet calls with agent-based communication
- Update production provisioner to support sudo-based service management
- Add API key-to-wallet resolution for gateway operator handlers
2026-03-28 08:59:11 +02:00
anonpenguin23
8d7d1c6621 feat(cli): add node management and rollout commands
- implement `nodes`, `rollout`, `ssh`, and `status` commands
- add `migrate-conf` utility to register existing nodes with the gateway
- update database schema to support operator wallet tracking for nodes
2026-03-27 16:25:32 +02:00
anonpenguin23
2017fcb432 feat(monitor): add vault health checks and reporting
- integrate vault into node alerts (service, responsive, status, restarts)
- add vault report collection (systemd, logs, HTTP status)
- update production CLI (clean, restart, stop, services)
- add comprehensive unit tests for vault alerts
2026-03-27 14:52:41 +02:00
anonpenguin23
318eea33ae refactor(cli): extract AddEnvironment/RemoveEnvironment functions
- support upsert in AddEnvironment, no-op RemoveEnvironment if absent
- fallback active env to devnet on remove, add tests
- integrate with sandbox create/destroy, ignore core/plans/
2026-03-27 14:16:51 +02:00
anonpenguin23
fd59131ff4 docs(landing): update operator sections for testnet launch 2026-03-27 12:13:38 +02:00
anonpenguin23
59d881afe6 no index no follow 2026-03-26 19:07:25 +02:00
anonpenguin23
81083402a1 feat(website): add deployment script and SEO enhancements
- add deploy.sh for rsync-based remote deployment
- add SEO/Open Graph/Twitter meta tags and og-image
- ignore remote.conf and update pnpm build deps
2026-03-26 19:05:19 +02:00
DeBros
82c477266d
Merge pull request #88 from DeBrosDAO/nightly
ci: upgrade go to 1.24, add caching, run unit tests
2026-03-26 18:49:00 +02:00
anonpenguin23
169be97026 ci: upgrade go to 1.24, add caching, run unit tests
- upgrade go to 1.24 in release-apt.yml
- add go.sum dependency caching to release workflows
- update publish-sdk.yml to run vitest unit tests
2026-03-26 18:48:07 +02:00
anonpenguin
4b7c342c77
Merge pull request #87 from DeBrosDAO/nightly
Nightly
2026-03-26 18:44:22 +02:00
anonpenguin23
7d5ccc0678 Added orama sdk 2026-03-26 18:40:20 +02:00
anonpenguin
1ca779880b
Merge pull request #86 from DeBrosDAO/0.120.0
0.120.0
2026-03-26 18:27:14 +02:00
anonpenguin23
3b779cd5a0 Updated version 2026-03-26 18:26:09 +02:00
anonpenguin23
b94fd1efcd Fixed pre push on core 2026-03-26 18:24:47 +02:00
anonpenguin23
abcc23c4f3 refactor(monorepo): restructure repo with core, website, vault, os packages
- add monorepo Makefile delegating to sub-projects
- update CI workflows, GoReleaser, gitignore for new structure
- revise README, CONTRIBUTING.md for monorepo overview
- bump Go to 1.24
2026-03-26 18:21:55 +02:00
anonpenguin23
ebaf37e9d0 Merge commit '7c165b9579ac02394438d93ca22e09443bea7131' as 'vault' 2026-03-26 18:15:11 +02:00
anonpenguin23
c536e45d0f Merge commit '655bd921784bd5aaa339cffc6b72a37879fb6534' as 'website' 2026-03-26 18:14:59 +02:00
anonpenguin23
211c0275d3 refactor: move Go project into core/ for monorepo structure 2026-03-26 18:14:52 +02:00
anonpenguin23
5456d57aeb feat(sandbox): add preflight checks and auto-build archive to create
- validate agent, API token, archive before provisioning
- auto-build archive via `make build-archive` if missing
- add tests and Makefile install target
2026-03-26 17:33:19 +02:00
anonpenguin
8ea4499052
Merge pull request #85 from DeBrosDAO/0.115.0
0.115.0
2026-03-20 07:25:50 +02:00
anonpenguin23
6657c90e36 Merge branch '0.115.0' of github-debros:DeBrosOfficial/network into 0.115.0 2026-03-20 07:23:16 +02:00
anonpenguin23
0764ac287e refactor(remotessh): use rwagent directly instead of rw CLI subprocesses
- replace `rw vault ssh` calls with `rwagent.Client` in PrepareNodeKeys,
  LoadAgentKeys, EnsureVaultEntry, ResolveVaultPublicKey
- add vaultClient interface, newClient func, and wrapAgentError for
  testability and improved error messages
- prefer pre-built systemd dir in installNamespaceTemplates
2026-03-20 07:23:10 +02:00
anonpenguin
c4fd1878a7
Merge pull request #83 from DeBrosDAO/0.115.0
0.115.0
2026-03-20 07:22:43 +02:00
anonpenguin
3d70f92ed5
Merge branch 'nightly' into 0.115.0 2026-03-20 07:22:31 +02:00
anonpenguin23
fa826f0d00 refactor(sandbox): integrate rootwallet SSH keys
- replace standalone sandbox keys with "sandbox/root" vault entry
- update inspector config to use vault targets (no passwords/keys)
- make sandbox default active environment
- add vault helpers and tests for remotessh
2026-03-10 05:25:41 +02:00
anonpenguin23
733b059681 feat(sandbox): add --anyone-client flag to rollout
- propagate `--anyone-client` to `orama node upgrade` on all nodes
- prioritize explicit `--anyone-client` over prefs/auto-detect in production
- ensure mutual exclusivity between relay/client modes in prefs
2026-03-09 10:59:15 +02:00
anonpenguin23
78d876e71b feat(monitor): add sandbox environment support
- load nodes from active sandbox state for env=sandbox
- extract fanoutArchive for efficient server-to-server distribution
2026-03-09 10:19:40 +02:00
anonpenguin23
6468019136 feat(sandbox): optimize archive upload via server-to-server fanout
- add WithNoHostKeyCheck option for ephemeral server IPs
- upload binary to genesis then distribute to other nodes (faster)
- improve provisioning error handling for cleanup on partial failure
2026-03-07 14:27:09 +02:00
anonpenguin23
e2b6f7d721 docs: add security hardening and OramaOS deployment docs
- Document WireGuard IPv6 disable, service auth, token security, process isolation
- Introduce OramaOS architecture, enrollment flow, and management via Gateway API
- Add troubleshooting for RQLite/Olric auth, OramaOS LUKS/enrollment issues
2026-02-28 15:41:04 +02:00
anonpenguin23
fd87eec476 feat(security): add manifest signing, TLS TOFU, refresh token migration
- Invalidate plaintext refresh tokens (migration 019)
- Add `--sign` flag to `orama build` for rootwallet manifest signing
- Add `--ca-fingerprint` TOFU verification for production joins/invites
- Save cluster secrets from join (RQLite auth, Olric key, IPFS peers)
- Add RQLite auth config fields
2026-02-28 15:40:43 +02:00
anonpenguin23
a0468461ab feat(sandbox): add reset command and interactive setup
- new `orama sandbox reset` deletes Hetzner resources (IPs, firewall, SSH key) and local config
- interactive location/server type selection during `setup`
- add Hetzner API methods for listing locations/types, deleting resources
- update defaults to nbg1/cx23
2026-02-28 10:14:02 +02:00
anonpenguin23
2f5718146a Fixed builder bug 2026-02-27 15:56:22 +02:00
anonpenguin23
f26676db2c feat: add sandbox command and vault guardian build
- integrate Zig-built vault-guardian into cross-compile process
- add `orama sandbox` for ephemeral Hetzner Cloud clusters
- update docs for `orama node` subcommands and new guides
2026-02-27 15:22:51 +02:00
anonpenguin23
fade8f89ed Added hatzhner support for clustering cli orama to spin up clusters 2026-02-25 15:13:18 +02:00
anonpenguin23
ed4e490463 Fixed bug on rqlite array overflow buffer 2026-02-25 08:37:55 +02:00
anonpenguin23
6898f47e2e Replace sshpass password auth with RootWallet SSH keys
Replaces plaintext password-based SSH authentication (sshpass) across
the entire Go CLI with wallet-derived ed25519 keys via RootWallet.

- Add `rw vault ssh agent-load` command to RootWallet CLI for SSH
  agent forwarding in push fanout
- Create wallet.go bridge: PrepareNodeKeys resolves keys from `rw
  vault ssh get --priv`, writes temp PEMs (0600), zero-overwrites
  on cleanup
- Remove Password field from Node struct, update config parser to
  new 3-field format (env|user@host|role)
- Remove all sshpass branches from inspector/ssh.go and
  remotessh/ssh.go, require SSHKey on all SSH paths
- Add WithAgentForward() option to RunSSHStreaming for hub fanout
- Add PrepareNodeKeys + defer cleanup to all 7 entry points:
  inspect, monitor, push, upgrade, clean, recover, install
- Update push fanout to use SSH agent forwarding instead of sshpass
  on hub
- Delete install/ssh.go duplicate, replace with remotessh calls
- Create nodes.conf from remote-nodes.conf (topology only, no
  secrets)
- Update all config defaults and help text from remote-nodes.conf
  to nodes.conf
- Use StrictHostKeyChecking=accept-new consistently everywhere
2026-02-24 17:24:16 +02:00
anonpenguin23
f0d2621199 Removed extract deploy script 2026-02-24 14:28:11 +02:00
anonpenguin23
c6998b6ac2 Remove legacy deployment and upgrade scripts
- Deleted redeploy.sh, which handled redeployment to nodes in devnet/testnet environments.
- Removed upgrade-nodes.sh, responsible for rolling upgrades of nodes.
- Eliminated upload-source-fanout.sh, which uploaded source archives to nodes in parallel.
- Removed upload-source.sh, used for uploading and extracting source archives to VPS nodes.
2026-02-24 14:24:25 +02:00
anonpenguin23
45a8285ae8 updated version 2026-02-24 10:16:41 +02:00
anonpenguin23
80e26f33fb feat: add sourceUserID to publishedTrack and update TrackAddedData and TrackRemovedData structures 2026-02-24 10:16:27 +02:00
anonpenguin23
25495448ed fix: update test to verify is_active value as 1 instead of TRUE 2026-02-24 09:15:00 +02:00
anonpenguin23
1882876922 chore: update version to 0.112.6 and add Secure field to yamlTURNServer struct 2026-02-24 08:17:43 +02:00
anonpenguin23
7227e5ceb9 updated version 2026-02-24 08:00:21 +02:00
anonpenguin23
7f1c592235 refactor: remove UUID generation from DNS record creation for cleaner inserts 2026-02-24 08:00:15 +02:00
anonpenguin23
72fb5f1a5a feat: add secrets and triggers management to function commands
- Introduced `secrets` command for managing function secrets, including set, list, and delete operations.
- Added `triggers` command for managing PubSub triggers associated with functions, allowing addition, listing, and deletion of triggers.
- Implemented API handlers for secrets management, including setting, listing, and deleting secrets.
- Updated serverless handlers to support new secrets and triggers functionalities.
- Enhanced tests for the new features, ensuring proper functionality and error handling.
2026-02-23 19:18:39 +02:00
anonpenguin23
2fecebc0c2 feat: add upload source scripts for devnet and testnet environments 2026-02-23 17:26:24 +02:00
anonpenguin23
85eb98ed34 feat: add TURN domain configuration and certificate provisioning via Caddy 2026-02-23 16:57:29 +02:00
anonpenguin23
714a986a78 Bump version to 0.112.2 and update TURN server configuration
- Updated version in Makefile to 0.112.2.
- Enhanced SFU server error handling to ignore http.ErrServerClosed.
- Added TURNS (TURN over TLS) configuration options in TURN server and related components.
- Updated firewall rules to include TURNS ports and modified related tests.
- Implemented self-signed certificate generation for TURNS.
- Adjusted TURN server to support both UDP and TCP listeners.
- Updated WebRTC and SFU components to accommodate new TURNS configurations.
2026-02-23 16:32:32 +02:00
anonpenguin23
bcfdabb32d feat: update WebRTC handlers to support dynamic SFU host configuration and add monitoring script 2026-02-23 06:30:57 +02:00
anonpenguin23
3597c61cfc feat: add WebRTC configuration support for gateway instances 2026-02-22 13:11:16 +02:00
anonpenguin23
552fde428e Updated gitignore 2026-02-22 11:42:25 +02:00
anonpenguin23
ca86becf85 Updated version 2026-02-22 11:40:37 +02:00
anonpenguin23
bfff2a241b Updated gitignore 2026-02-22 11:40:32 +02:00
anonpenguin23
3e9ef5ac6c feat: enhance WebRTC port allocation with existing allocation checks and increase upload timeout 2026-02-22 11:39:59 +02:00
anonpenguin
f1dc3014fc
Merge pull request #82 from DeBrosOfficial/nightly-webrtc
feat: implement SFU and TURN server functionality
2026-02-21 18:30:51 +02:00
anonpenguin23
19463b8621 feat: disable HTTP/3 in Caddy to free UDP 443 for TURN server and add patch script 2026-02-21 18:30:28 +02:00
anonpenguin23
a79ae41dd5 feat: implement PubSub trigger management with API endpoints for adding, listing, and removing triggers 2026-02-21 16:26:36 +02:00
anonpenguin23
e4d51676cc feat: add WebRTC feature management commands and public API endpoints for enabling, disabling, and checking status 2026-02-21 13:14:46 +02:00
anonpenguin23
e6f828d6f1 feat: add WebRTC support with SFU and TURN server integration, including configuration, monitoring, and API endpoints 2026-02-21 11:31:20 +02:00
anonpenguin23
8ee606bfb1 feat: implement SFU and TURN server functionality
- Add signaling package with message types and structures for SFU communication.
- Implement client and server message serialization/deserialization tests.
- Enhance systemd manager to handle SFU and TURN services, including start/stop logic.
- Create TURN server configuration and main server logic with HMAC-SHA1 authentication.
- Add tests for TURN server credential generation and validation.
- Define systemd service files for SFU and TURN services.
2026-02-21 11:17:13 +02:00
anonpenguin23
58ea896cb0 Updated makefile 2026-02-21 08:46:41 +02:00
anonpenguin23
d256a83fb7 feat: enhance namespace management with cluster state deletion and improved deprovisioning process 2026-02-21 06:41:19 +02:00
anonpenguin23
c731486454 feat: update version to 0.110.0 and enhance dead node recovery handling with deployment status updates 2026-02-20 10:08:17 +02:00
anonpenguin23
8cabe48f7d feat: enhance WASM upload functionality with replication and error handling 2026-02-20 09:44:13 +02:00
anonpenguin23
c499b2d76e Enhance health checker and deployment handling
- Added support for "degraded" deployment status in types.
- Updated health checker initialization to include process manager and node ID.
- Refactored health checker tests to accommodate new process manager functionality.
- Implemented logic to handle unhealthy deployments, including restart and failure marking.
- Enhanced deployment reconciliation to manage under-replicated scenarios.
- Updated gateway handlers and middleware to consider "degraded" status in deployment queries.
2026-02-20 09:44:07 +02:00
anonpenguin23
4ebf558719 feat: implement suspect node handling with callbacks for DNS record management 2026-02-20 09:27:35 +02:00
anonpenguin23
2b0bfaaa12 feat: enhance deployment scripts with improved error handling and add rolling upgrade functionality 2026-02-20 08:18:53 +02:00
anonpenguin23
a71b979036 feat: add RQLite export/import functionality and related handlers + Deployments DNS Fixes 2026-02-20 07:29:18 +02:00
anonpenguin23
106c2df4d2 feat: implement wallet-based SSH authentication using Ed25519 keys
- Added documentation for wallet-based SSH authentication in WALLET_SSH_AUTH.md.
- Introduced SSH key derivation and management in rootwallet core and CLI.
- Created commands for generating, loading, and unloading SSH keys in the CLI.
- Updated Orama network to support SSH key authentication.
- Added migration steps for nodes to transition from password-based to key-based authentication.

feat: add serverless function management commands

- Implemented function command structure in CLI for managing serverless functions.
- Added commands for initializing, building, deploying, invoking, deleting, and listing functions.
- Created helper functions for handling function configuration and API requests.
- Integrated TinyGo for building functions to WASM.
- Added logging and version management for deployed functions.
2026-02-19 10:51:03 +02:00
anonpenguin23
40600c3557 Enhance PID collection by adding namespace service support and extending timeout for systemd queries 2026-02-19 10:24:49 +02:00
anonpenguin23
aa2da83969 Bump version to 0.109.0 and add tests for TCP port waiting and Olric config YAML parsing 2026-02-19 08:57:32 +02:00
anonpenguin23
bb98418ac9 Unmask and re-enable services before restarting to ensure proper startup 2026-02-19 06:43:28 +02:00
anonpenguin23
b58e1d80ee Implement WireGuard peer authentication and enhance internal request validation 2026-02-19 06:43:06 +02:00
anonpenguin23
4f1709e136 Bug fixing 2026-02-18 11:20:16 +02:00
anonpenguin23
83804422c4 Namespace bug fix and fixing bugs on serverless and deployments 2026-02-17 15:57:32 +02:00
anonpenguin23
8aef779fcd Updated docs and bug fixing 2026-02-17 08:49:16 +02:00
anonpenguin23
0b5b6e68e3 A ton of updates on the monitoring mostly bug fixes 2026-02-16 16:35:29 +02:00
anonpenguin23
f889c2e358 Added some new alerts on monitoring 2026-02-16 11:47:18 +02:00
anonpenguin23
1e38fc2861 Created new monitoring cli interface for nodes, namespaces and cluster 2026-02-16 11:16:02 +02:00
anonpenguin23
88ba08fcba updated makefile 2026-02-16 10:31:37 +02:00
anonpenguin23
865a4f3434 Refactored cli to make things more clear and easy to understand for developers 2026-02-16 10:31:17 +02:00
anonpenguin23
7163aad850 Refactor installation scripts and improve security measures
- Updated `clean-testnet.sh` to stop and disable legacy services.
- Added `upload-source.sh` for streamlined source archive uploads.
- Enhanced password input handling in `ssh.go` for better security.
- Adjusted directory permissions in `validator.go` for improved security.
- Simplified node configuration logic in `config.go`.
- Removed unnecessary commands from `gateway.go` to streamline installation.
2026-02-16 10:01:35 +02:00
anonpenguin23
25a167f9b4 removed script 2026-02-14 14:33:47 +02:00
anonpenguin23
bc9cbb3627 Moved everything to root user and to /opt/orama from home/orama/.orama 2026-02-14 14:33:38 +02:00
anonpenguin23
ef8002bf13 Removed rqlite mcp 2026-02-14 14:14:16 +02:00
anonpenguin23
29d255676f Renamed debros to orama 2026-02-14 14:14:04 +02:00
anonpenguin23
ba4e2688e4 Updated installation process simplified it 2026-02-14 14:06:14 +02:00
anonpenguin23
749d5ed5e7 Bro i did so many things to fix the problematic discovery and redeployment and i dont even remember what i did 2026-02-14 10:56:26 +02:00
anonpenguin23
afbb7d4ede Updated makefile 2026-02-13 16:18:43 +02:00
anonpenguin23
2986e64162 Writing more tests and fixed bug on rqlite address 2026-02-13 16:18:22 +02:00
anonpenguin23
1ab63857d3 Security audit, code cleanup 2026-02-13 14:33:11 +02:00
anonpenguin23
61ccad952a Improved health check , plus bug fixing 2026-02-13 13:40:33 +02:00
anonpenguin23
85a556d0a0 Did a lot of cleanup and bug fixing 2026-02-13 12:47:02 +02:00
anonpenguin23
ed82c8ca6b Cleaned up development environment 2026-02-13 10:33:11 +02:00
anonpenguin23
266507ef09 Auto node recovery and failover if node is dead, added wallet authentication with phantom and root wallet + cluster repair logic 2026-02-13 08:16:01 +02:00
anonpenguin23
5fed8a6c88 Fixed firewall problem with anyone rellay and added authentication with root wallet 2026-02-13 07:38:54 +02:00
anonpenguin23
1d186706f6 Improved collector added anyone on health check 2026-02-12 09:36:07 +02:00
anonpenguin23
83bd495f0f updated makefile 2026-02-12 07:41:01 +02:00
anonpenguin23
5c73330be6 created patch and fixed authentication issue for ipfs bug and cross node communication 2026-02-12 07:40:43 +02:00
anonpenguin23
ee0f035948 Updated makefile 2026-02-11 15:10:54 +02:00
anonpenguin23
b5dfcab1d6 Collector updates and bug fixes + patches scripts created 2026-02-11 15:08:20 +02:00
anonpenguin23
4356f5544a Improved how we present results on inspector and added anyone tests 2026-02-11 11:07:56 +02:00
anonpenguin23
ebdd08f71c updated makefile 2026-02-11 10:36:57 +02:00
anonpenguin23
35ad8bdb16 Added system prompt on inspector 2026-02-11 10:36:40 +02:00
anonpenguin
051c002ec8
Merge pull request #81 from DeBrosOfficial/cleanup/dead-code
Cleanup/dead code
2026-02-11 09:57:34 +02:00
anonpenguin23
fbeecb617a Updated version 2026-02-11 09:57:17 +02:00
anonpenguin23
f3f0716715 added docs about inspector 2026-02-11 09:56:16 +02:00
anonpenguin23
7dc6fecac2 Created inspector , it is hot as fuck 2026-02-11 09:53:46 +02:00
anonpenguin23
eddf0553b7 Did a small refactor, code clean, remove dead code, legacy etc 2026-02-11 07:43:33 +02:00
anonpenguin23
888df0385e Removed dead code 2026-02-11 07:24:27 +02:00
anonpenguin23
b305f562d7 Updated git ignore 2026-02-11 07:12:25 +02:00
anonpenguin23
1fb6f9a13e Improved scripts and updated makefile 2026-02-11 07:09:13 +02:00
anonpenguin23
490c4f66da Moved examples inside docs and added common problems doc 2026-02-11 07:06:52 +02:00
anonpenguin23
91ac56c50a Fixed bug on spawn handler and cluster manager 2026-02-11 06:44:07 +02:00
anonpenguin23
f7db698273 Bug fix on production orchestrator on start and stop 2026-02-10 19:32:08 +02:00
anonpenguin23
a78e09d2b9 Added failover for namespaces 2026-02-10 17:32:29 +02:00
anonpenguin23
359fb5ae04 Updated health check 2026-02-10 16:40:01 +02:00
anonpenguin23
21e82abb65 Fixed WG port issues and production firewall fixes 2026-02-10 09:39:26 +02:00
anonpenguin23
a02c63a7ee updated make file 2026-02-09 15:55:57 +02:00
anonpenguin23
cbdde6ab66 updated makefile 2026-02-09 15:39:18 +02:00
anonpenguin23
af5250ccad Updated goreleaser 2026-02-09 15:39:00 +02:00
anonpenguin23
7a7553f0eb Updated github action 2026-02-09 15:29:22 +02:00
anonpenguin23
d5cfb12435 Updated makefile 2026-02-09 15:23:20 +02:00
anonpenguin23
a297a14b44 Updated docs and bug fixes and updated redeploy script 2026-02-09 15:23:02 +02:00
anonpenguin23
e2b38c409a Fixed bug on limiting to 10 nodes on cluster because of WG 2026-02-09 09:12:08 +02:00
anonpenguin
5d543b2662
Merge pull request #79 from DeBrosOfficial/0.100.0
0.100.0
2026-02-06 14:26:10 +02:00
anonpenguin23
b382350f76 Rate limit fixes 2026-02-06 11:09:34 +02:00
anonpenguin23
7690b22c0a Improved performance on request journey with cache and some tricks 2026-02-06 08:30:11 +02:00
anonpenguin23
3999253685 Create redeploy script with creds 2026-02-06 07:23:28 +02:00
anonpenguin23
854523c3a9 Fixed bugs on pubsub and ipfs 2026-02-06 07:21:26 +02:00
anonpenguin23
02b5c095d0 More bug fixing 2026-02-05 16:12:52 +02:00
anonpenguin23
a7f100038d Fixed system service sudoer error on debros user 2026-02-05 13:32:06 +02:00
anonpenguin23
c855b790f8 Updated the way we spawn services on namespace added systemd 2026-02-04 17:17:01 +02:00
anonpenguin23
f972358e78 Bored of fixing bugs 2026-02-04 16:14:49 +02:00
anonpenguin23
0c4af88388 Updated docs 2026-02-03 17:34:07 +02:00
anonpenguin23
d85ed032f8 Bug fixing 2026-02-03 17:27:36 +02:00
anonpenguin23
156de7eb19 Bug fixing 2026-02-03 13:59:03 +02:00
anonpenguin23
65ffd28151 DNS Bug Fixing Filtering out private WG ip's 2026-02-03 07:01:54 +02:00
anonpenguin23
11d5c1b19a Bug fixing 2026-02-02 16:18:13 +02:00
anonpenguin23
859c30fcd9 Bug fixing 2026-02-02 14:55:29 +02:00
anonpenguin23
79a489d650 Fix ensure only nameservers nodes added on schema for caddy load balancing 2026-02-02 11:17:54 +02:00
anonpenguin23
e95ecfb12a Fixed filter our prviate ips on dns register 2026-02-02 09:31:47 +02:00
anonpenguin23
b43e6d77b7 Fixed zombie rqlite on upgrade 2026-02-02 09:19:02 +02:00
anonpenguin23
e3dd359e55 Bug fixing 2026-02-02 08:39:42 +02:00
anonpenguin23
765ce46ea7 fixed ipfs problem forming cluster 2026-02-01 18:19:43 +02:00
anonpenguin23
3343ade433 New check node health script 2026-02-01 17:36:03 +02:00
anonpenguin23
c7036cb931 Fixed ipfs blocking WG 2026-02-01 17:09:58 +02:00
anonpenguin23
9c52287af9 fixed upload scripts 2026-02-01 17:01:32 +02:00
anonpenguin23
af5f5f9893 WG Fix for IPFS 2026-02-01 16:17:36 +02:00
anonpenguin23
683ce50106 Made building faster 2026-02-01 16:15:37 +02:00
anonpenguin23
c401fdcd74 fixed more bugs and updated docs 2026-02-01 15:58:28 +02:00
anonpenguin23
73dfe22438 fixes 2026-02-01 14:26:36 +02:00
anonpenguin23
4b3b7b3458 updated docs 2026-02-01 12:06:20 +02:00
anonpenguin23
9282fe64ee Deployement updates 2026-02-01 12:01:31 +02:00
anonpenguin23
b5109f1ee8 Added delete namespace handler 2026-01-31 13:13:09 +02:00
anonpenguin23
16eaf9a129 Fixed olric bug 2026-01-31 13:11:26 +02:00
anonpenguin23
8c392194bb Fixed olric cluster problem 2026-01-31 12:14:49 +02:00
anonpenguin23
51371e199d Added self signed cert fallback, fixed dns bugs 2026-01-31 10:07:15 +02:00
anonpenguin23
04f345f9ee Added flags on cli auth login 2026-01-31 07:27:55 +02:00
anonpenguin23
810094771d Updated docs and fixed WG bugs and ip's bugs 2026-01-31 07:09:09 +02:00
anonpenguin23
4acea72467 Added wireguard and updated installation process and added more tests 2026-01-30 15:30:18 +02:00
anonpenguin23
dcaf695fbc Fixed more tests, fixed gateway ip to use domain 2026-01-30 06:30:04 +02:00
anonpenguin23
9a8fba3f47 Fixed some broken tests 2026-01-30 05:35:50 +02:00
anonpenguin23
46aa2f2869 fixed some e2e tests 2026-01-29 15:05:50 +02:00
anonpenguin23
7b12dde469 Fixed dns failover middleware 2026-01-29 13:07:05 +02:00
anonpenguin23
82963c960e Updated docs and added replication and load balancing for deployments 2026-01-29 11:44:50 +02:00
anonpenguin23
d6106bcbb8 Added nyx auto install with anyone relay 2026-01-29 10:23:40 +02:00
anonpenguin23
15ecf366d5 Added Stats for deployments on CLI 2026-01-29 10:13:29 +02:00
anonpenguin23
e706ed3397 Fixed failing unit test on handlers 2026-01-29 09:54:26 +02:00
anonpenguin23
42c0c61d19 Round Robin DNS fix for deployments (update, rollback etc) 2026-01-29 09:53:11 +02:00
anonpenguin23
cd4189f64b Updated testdata for e2e tests 2026-01-29 09:29:41 +02:00
anonpenguin23
d8c93f6ee9 Fixed services on caddy 2026-01-29 08:56:28 +02:00
anonpenguin23
571f8babb4 Fixed IPFS systemd service and deploy issue on nextjs 2026-01-29 08:38:33 +02:00
anonpenguin23
4b24b0aa6c Fixed swarm problem flag and install TUI problem 2026-01-29 08:03:06 +02:00
anonpenguin23
6397efde25 updated cli env to use https 2026-01-29 07:52:59 +02:00
anonpenguin23
29581bec51 refactored all e2e tests 2026-01-29 07:50:40 +02:00
anonpenguin23
81414722cd core dns https issue 2026-01-29 07:45:40 +02:00
anonpenguin23
c3d6500785 fixed cert issue 2026-01-29 07:45:15 +02:00
anonpenguin23
5ec292a4f2 fixed bugs on dns for deployment 2026-01-29 07:22:32 +02:00
anonpenguin23
d4f5f3b999 added more tests 2026-01-28 14:30:28 +02:00
anonpenguin23
c3f87aede7 bug fixing on namespaces 2026-01-28 13:33:15 +02:00
anonpenguin23
7ded21939b fixed test issues 2026-01-28 11:52:58 +02:00
anonpenguin23
edd9c1f3dc namespaces on gateway, load balancer and rqlite and olric namespaces 2026-01-28 11:24:21 +02:00
anonpenguin23
468ca06398 added support for anyone relay with rewards 2026-01-28 08:36:57 +02:00
anonpenguin23
c827651245 fixed mobile not running e2e tests and process update 2026-01-26 15:55:40 +02:00
anonpenguin23
2c374b2156 fixed some bugs on tests and nextjs and nodejs 2026-01-26 15:19:00 +02:00
anonpenguin23
039c246d47 fixed nextjs problem 2026-01-26 15:12:51 +02:00
anonpenguin23
380b10add3 fixing bugs on tests and on codebase 2026-01-26 14:41:26 +02:00
anonpenguin23
1a717537e5 enchanced e2e tests, fixed rqlite issue 2026-01-26 10:04:30 +02:00
anonpenguin23
e94da3a639 Fixed problem on ipfs 2026-01-26 08:52:52 +02:00
anonpenguin23
6c3d16c332 fixed bug on nextjs deployment 2026-01-26 08:10:19 +02:00
anonpenguin23
ec664466c0 Extra tests and a lot of bug fixing 2026-01-26 07:53:35 +02:00
anonpenguin23
6101455f4a bug fixing 2026-01-24 17:37:52 +02:00
anonpenguin23
3d3b0d2ee6 update install and upgrade 2026-01-24 16:42:58 +02:00
anonpenguin23
2281899784 removed executable 2026-01-24 16:03:19 +02:00
anonpenguin23
fb229af2a0 pushed more changes 2026-01-24 16:00:28 +02:00
anonpenguin23
00c9792780 updated docs 2026-01-24 13:18:14 +02:00
anonpenguin23
fc0b958b1e fixed deployments 2026-01-24 12:55:17 +02:00
anonpenguin23
84c9b9ab9b fixes 2026-01-24 11:12:00 +02:00
anonpenguin23
da8c9822f4 Changed from dbn to orama 2026-01-24 09:40:42 +02:00
anonpenguin23
b1011c29b5 added support for different domain except orama.network 2026-01-24 09:26:31 +02:00
anonpenguin23
ec66213e2e updated docs 2026-01-22 18:55:00 +02:00
anonpenguin23
5547c8ccb5 some more fixes 2026-01-22 18:23:37 +02:00
anonpenguin23
1c2bde2d81 fixed ore tests 2026-01-22 18:00:24 +02:00
anonpenguin23
b33da4282b more test fixes 2026-01-22 17:49:10 +02:00
anonpenguin23
903bef14a3 fixed some more tests 2026-01-22 17:13:08 +02:00
anonpenguin23
0a7e3ba3c7 did some fixes 2026-01-22 16:05:03 +02:00
anonpenguin23
c2071586f8 fixed more tests 2026-01-22 15:42:54 +02:00
anonpenguin23
1338b32a0e fixing tests 2026-01-22 15:21:46 +02:00
anonpenguin23
76bbf23f25 fixed test problems 2026-01-22 14:52:50 +02:00
anonpenguin23
0dcde29f7c added some tests 2026-01-22 14:39:50 +02:00
anonpenguin23
9fc9bbb8e5 a lot of changes 2026-01-22 13:04:52 +02:00
anonpenguin
ade6241357
Merge pull request #78 from DeBrosOfficial/nightly
Nightly version 0.9
2026-01-20 10:19:01 +02:00
anonpenguin
d3d1bb98ba
Merge pull request #77 from DeBrosOfficial/big-cleanup
Big cleanup
2026-01-20 10:13:50 +02:00
anonpenguin23
ccee66d525 Merge branch 'big-cleanup' of github-debros:DeBrosOfficial/network into big-cleanup 2026-01-20 10:13:21 +02:00
anonpenguin23
acc38d584a Fixed issue on wallet handler 2026-01-20 10:12:33 +02:00
anonpenguin
c20f6e9a25
Merge branch 'main' into big-cleanup 2026-01-20 10:06:55 +02:00
anonpenguin23
b0bc0a232e Refactored the whole codebase to be much cleaner 2026-01-20 10:03:55 +02:00
anonpenguin
86f73a1d8e
Merge pull request #76 from DeBrosOfficial/0.80.0
0.80.0
2026-01-05 20:00:41 +02:00
anonpenguin23
8c82124e05 Updated cursor rule 2026-01-05 20:00:20 +02:00
anonpenguin23
6f4f55f669 feat: disable debug logging in Rqlite MCP server to reduce disk writes
- Commented out debug logging statements in the Rqlite MCP server to prevent excessive disk writes during operation.
- Added a new PubSubAdapter method in the client for direct access to the pubsub.ClientAdapter, bypassing authentication checks for serverless functions.
- Integrated the pubsub adapter into the gateway for serverless function support.
- Implemented a new pubsub_publish host function in the serverless engine for publishing messages to topics.
2026-01-05 10:25:03 +02:00
anonpenguin23
fff665374f feat: disable debug logging in Rqlite MCP server to reduce disk writes
- Commented out debug logging statements in the Rqlite MCP server to prevent excessive disk writes during operation.
- Added a new PubSubAdapter method in the client for direct access to the pubsub.ClientAdapter, bypassing authentication checks for serverless functions.
- Integrated the pubsub adapter into the gateway for serverless function support.
- Implemented a new pubsub_publish host function in the serverless engine for publishing messages to topics.
2026-01-05 10:22:55 +02:00
anonpenguin23
2b3e6874c8 feat: disable debug logging in Rqlite MCP server to reduce disk writes
- Commented out debug logging statements in the Rqlite MCP server to prevent excessive disk writes during operation.
- Added a new PubSubAdapter method in the client for direct access to the pubsub.ClientAdapter, bypassing authentication checks for serverless functions.
- Integrated the pubsub adapter into the gateway for serverless function support.
- Implemented a new pubsub_publish host function in the serverless engine for publishing messages to topics.
2026-01-03 21:02:35 +02:00
anonpenguin23
cbbf72092d feat: add Rqlite MCP server and presence functionality
- Introduced a new Rqlite MCP server implementation in `cmd/rqlite-mcp`, enabling JSON-RPC communication for database operations.
- Updated the Makefile to include the build command for the Rqlite MCP server.
- Enhanced the WebSocket PubSub client with presence capabilities, allowing members to join and leave topics with notifications.
- Implemented presence management in the gateway, including endpoints for querying current members in a topic.
- Added end-to-end tests for presence functionality, ensuring correct behavior during member join and leave events.
2026-01-03 14:25:13 +02:00
anonpenguin23
9ddbe945fd feat: update mockFunctionRegistry methods for serverless function handling
- Modified the Register method to return a function instance and an error, enhancing its functionality.
- Added a new GetLogs method to the mockFunctionRegistry for retrieving log entries, improving test coverage for serverless function logging.
2026-01-02 08:41:54 +02:00
anonpenguin23
4f893e08d1 feat: enhance serverless function management and logging
- Updated the serverless functions table schema to remove the version constraint for uniqueness, allowing for more flexible function definitions.
- Enhanced the serverless engine to support HTTP fetch functionality, enabling external API calls from serverless functions.
- Implemented logging capabilities for function invocations, capturing detailed logs for better debugging and monitoring.
- Improved the authentication middleware to handle public endpoints more effectively, ensuring seamless access to serverless functions.
- Added new configuration options for serverless functions, including memory limits, timeout settings, and retry parameters, to optimize performance and reliability.
2026-01-02 08:40:28 +02:00
anonpenguin23
df5b11b175 feat: add API examples for Orama Network Gateway
- Introduced a new `example.http` file containing comprehensive API examples for the Orama Network Gateway, demonstrating various functionalities including health checks, distributed cache operations, decentralized storage interactions, real-time pub/sub messaging, and serverless function management.
- Updated the README to include a section on serverless functions using WebAssembly (WASM), detailing the build, deployment, invocation, and management processes for serverless functions.
- Removed outdated debug configuration file to streamline project structure.
2026-01-01 18:53:51 +02:00
anonpenguin23
a9844a1451 feat: add unit tests for gateway authentication and RQLite utilities
- Introduced comprehensive unit tests for the authentication service in the gateway, covering JWT generation, Base58 decoding, and signature verification for Ethereum and Solana.
- Added tests for RQLite cluster discovery functions, including host replacement logic and public IP validation.
- Implemented tests for RQLite utility functions, focusing on exponential backoff and data directory path resolution.
- Enhanced serverless engine tests to validate timeout handling and memory limits for WASM functions.
2025-12-31 12:26:31 +02:00
anonpenguin23
4ee76588ed feat: refactor API gateway and CLI utilities for improved functionality
- Updated the API gateway documentation to reflect changes in architecture and functionality, emphasizing its role as a multi-functional entry point for decentralized services.
- Refactored CLI commands to utilize utility functions for better code organization and maintainability.
- Introduced new utility functions for handling peer normalization, service management, and port validation, enhancing the overall CLI experience.
- Added a new production installation script to streamline the setup process for users, including detailed dry-run summaries for better visibility.
- Enhanced validation mechanisms for configuration files and swarm keys, ensuring robust error handling and user feedback during setup.
2025-12-31 10:48:15 +02:00
anonpenguin23
b3b1905fb2 feat: refactor API gateway and CLI utilities for improved functionality
- Updated the API gateway documentation to reflect changes in architecture and functionality, emphasizing its role as a multi-functional entry point for decentralized services.
- Refactored CLI commands to utilize utility functions for better code organization and maintainability.
- Introduced new utility functions for handling peer normalization, service management, and port validation, enhancing the overall CLI experience.
- Added a new production installation script to streamline the setup process for users, including detailed dry-run summaries for better visibility.
- Enhanced validation mechanisms for configuration files and swarm keys, ensuring robust error handling and user feedback during setup.
2025-12-31 10:16:26 +02:00
anonpenguin23
54aab4841d feat: add network MCP rules and documentation
- Introduced a new `network.mdc` file containing comprehensive guidelines for utilizing the network Model Context Protocol (MCP).
- Documented available MCP tools for code understanding, skill learning, and recommended workflows to enhance developer efficiency.
- Provided detailed instructions on the collaborative skill learning process and user override commands for better interaction with the MCP.
2025-12-29 14:09:48 +02:00
anonpenguin23
ee80be15d8 feat: add network MCP rules and documentation
- Introduced a new `network.mdc` file containing comprehensive guidelines for utilizing the network Model Context Protocol (MCP).
- Documented available MCP tools for code understanding, skill learning, and recommended workflows to enhance developer efficiency.
- Provided detailed instructions on the collaborative skill learning process and user override commands for better interaction with the MCP.
2025-12-29 14:08:58 +02:00
anonpenguin
6740e67d40
Merge pull request #75 from DeBrosOfficial/nightly
chore: update README and configuration for improved clarity and funct…
2025-12-15 14:59:02 +02:00
anonpenguin23
670c3f99df chore: update README and configuration for improved clarity and functionality
- Removed outdated feature list from README for a more concise overview.
- Updated health check instructions and command references in the README.
- Changed `make down` to `make stop` for consistency in stopping the development environment.
- Enhanced the configuration in `config.go` to include additional RQLite and Raft addresses for better node communication.
- Adjusted the build process in the release workflow to ensure all necessary gateway files are included.
2025-12-09 07:23:24 +02:00
DeBros
9f43cea907
Merge pull request #74 from DeBrosOfficial/JohnySigma-patch-1
Update README.md
2025-12-03 12:27:57 +02:00
65286df31e
Update README.md 2025-12-03 12:26:04 +02:00
anonpenguin
b91b7c27ea
Merge pull request #73 from DeBrosOfficial/nightly
Nightly
2025-11-28 22:30:03 +02:00
anonpenguin
432952ed69
Merge pull request #72 from DeBrosOfficial/super
Super
2025-11-28 22:27:52 +02:00
anonpenguin23
9193f088a3 feat: update node and gateway commands to use Orama naming convention
- Renamed the node executable from `node` to `orama-node` in the Makefile and various scripts to reflect the new naming convention.
- Updated the gateway command to `orama-gateway` for consistency.
- Modified service configurations and systemd templates to ensure proper execution of the renamed binaries.
- Enhanced the interactive installer to prompt for the gateway URL, allowing users to select between local and remote nodes.
- Added functionality to extract domain information for TLS configuration, improving security for remote connections.
2025-11-28 22:27:27 +02:00
anonpenguin23
3505a6a0eb feat: update RQLite configuration for direct TLS support
- Modified the RQLite node configuration to use direct TLS on port 7002 when HTTPS is enabled, bypassing SNI gateway conflicts.
- Updated the join address logic to reflect the new direct RQLite TLS connection method.
- Enhanced documentation comments to clarify the changes in TLS handling and port usage for Raft communication.
2025-11-28 15:14:26 +02:00
anonpenguin23
3ca4e1f43b feat: enhance RQLite service startup with TLS certificate readiness
- Added a certificate ready signal to coordinate RQLite node-to-node TLS startup with certificate provisioning.
- Updated the RQLite service generation to include a log file path for better logging management.
- Implemented a timeout mechanism for waiting on TLS certificates, improving error handling during RQLite startup.
2025-11-28 14:26:51 +02:00
anonpenguin23
2fb1d68fcb feat: enhance IPFS integration and swarm key management
- Introduced IPFS peer information handling for improved network discovery and configuration.
- Added validation for the 64-hex swarm key, ensuring proper input during installation.
- Updated the installer to collect and store IPFS peer details, enhancing the setup experience for private networks.
- Enhanced the production setup to configure IPFS peering for better node discovery in private environments.
- Improved documentation to reflect new IPFS-related configuration options and swarm key requirements.
2025-11-28 14:25:31 +02:00
anonpenguin23
7126c4068b feat: enhance HTTPS support and certificate management
- Added a new CertificateManager for managing self-signed certificates, ensuring secure communication within the network.
- Updated the configuration to support self-signed certificates and Let's Encrypt integration for HTTPS.
- Enhanced the installer to generate and manage certificates automatically, improving the setup experience.
- Introduced a centralized TLS configuration for HTTP clients, ensuring consistent security practices across the application.
- Updated documentation to reflect new port requirements and HTTPS setup instructions.
2025-11-27 16:52:49 +02:00
anonpenguin23
681cef999a feat: enhance HTTPS support and certificate management
- Added a new CertificateManager for managing self-signed certificates, ensuring secure communication within the network.
- Updated the configuration to support self-signed certificates and Let's Encrypt integration for HTTPS.
- Enhanced the installer to generate and manage certificates automatically, improving the setup experience.
- Introduced a centralized TLS configuration for HTTP clients, ensuring consistent security practices across the application.
- Updated documentation to reflect new port requirements and HTTPS setup instructions.
2025-11-27 16:49:26 +02:00
anonpenguin23
5c7767b7c8 feat: enhance HTTPS support and certificate management
- Added a new CertificateManager for managing self-signed certificates, ensuring secure communication within the network.
- Updated the configuration to support self-signed certificates and Let's Encrypt integration for HTTPS.
- Enhanced the installer to generate and manage certificates automatically, improving the setup experience.
- Introduced a centralized TLS configuration for HTTP clients, ensuring consistent security practices across the application.
- Updated documentation to reflect new port requirements and HTTPS setup instructions.
2025-11-27 16:48:02 +02:00
anonpenguin23
d8994b1e4f refactor: rename DeBros to Orama and update configuration paths
- Replaced all instances of DeBros with Orama throughout the codebase, including CLI commands and configuration paths.
- Updated documentation to reflect the new naming convention and paths for configuration files.
- Removed the outdated PRODUCTION_INSTALL.md file and added new scripts for local domain setup and testing.
- Introduced a new interactive TUI installer for Orama Network, enhancing the installation experience.
- Improved logging and error handling across various components to provide clearer feedback during operations.
2025-11-26 16:14:19 +02:00
anonpenguin23
b983066016 refactor: rename DeBros to Orama and update configuration paths
- Replaced all instances of DeBros with Orama throughout the codebase, including CLI commands and configuration paths.
- Updated documentation to reflect the new naming convention and paths for configuration files.
- Removed the outdated PRODUCTION_INSTALL.md file and added new scripts for local domain setup and testing.
- Introduced a new interactive TUI installer for Orama Network, enhancing the installation experience.
- Improved logging and error handling across various components to provide clearer feedback during operations.
2025-11-26 15:36:11 +02:00
anonpenguin23
660008b0aa refactor: rename DeBros to Orama and update configuration paths
- Replaced all instances of DeBros with Orama throughout the codebase, including CLI commands and configuration paths.
- Updated documentation to reflect the new naming convention and paths for configuration files.
- Removed the outdated PRODUCTION_INSTALL.md file and added new scripts for local domain setup and testing.
- Introduced a new interactive TUI installer for Orama Network, enhancing the installation experience.
- Improved logging and error handling across various components to provide clearer feedback during operations.
2025-11-26 13:31:02 +02:00
anonpenguin23
775289a1a2 feat: enhance cluster secret management and anyone-client installation verification
- Added a new method to verify the cluster secret in the service.json file, ensuring the correct secret is used during configuration updates.
- Updated the anyone-client installation process to utilize `npx` for improved reliability and added verification steps to confirm successful installation.
- Enhanced logging to provide clearer feedback on cluster secret verification and anyone-client installation status.
2025-11-22 13:31:44 +02:00
anonpenguin23
87059fb9c4 fix: update anyone-client installation command to use scoped package name
- Changed the npm installation command for anyone-client to use the scoped package name `@anyone-protocol/anyone-client`, ensuring correct package retrieval during installation.
2025-11-22 13:10:21 +02:00
anonpenguin23
90a26295a4 feat: add port checking and anyone-client installation to production setup
- Introduced a new `PortChecker` type to verify port availability, enhancing service management during startup.
- Updated the `BinaryInstaller` to install the `anyone-client` npm package globally, ensuring its availability for SOCKS5 proxy functionality.
- Enhanced the `ProductionSetup` to include checks for port usage before starting the `anyone-client` service, improving conflict resolution.
- Added logging for the installation and service creation of `anyone-client`, providing clearer feedback during the setup process.
2025-11-22 13:01:46 +02:00
anonpenguin23
4c1f842939 feat: enhance service shutdown and logging in development environment
- Improved the `stop` target in the Makefile to ensure graceful shutdown of development services, allowing for a more reliable process termination.
- Updated the `StopAll` method in the ProcessManager to provide clearer logging during service shutdown, including progress updates and error handling.
- Added a new `PushNotificationService` to handle sending push notifications via Expo, including bulk notification capabilities and improved error handling.
- Refactored RQLite management to streamline node identification and logging, ensuring consistent behavior across node types during startup and recovery.
2025-11-21 13:52:55 +02:00
anonpenguin23
33ebf222ff feat: enhance development process management and service shutdown
- Introduced a new `stop` target in the Makefile for graceful shutdown of development services, improving user experience during service management.
- Updated the `stopProcess` method in the ProcessManager to check if a process is running before attempting to stop it, enhancing reliability.
- Improved the shutdown logic to wait for a graceful shutdown before forcefully killing processes, providing clearer logging on the shutdown status.
- Enhanced the `dev-kill-all.sh` script to specifically target debros-related processes and improve the cleanup of PID files, ensuring a more thorough shutdown process.
2025-11-16 18:39:45 +02:00
anonpenguin23
2f1ccfa473 feat: normalize wallet address handling in nonce queries
- Updated nonce handling in challenge, verify, and issue API key handlers to normalize wallet addresses to lowercase for case-insensitive comparison.
- Enhanced SQL queries to use LOWER() function for wallet address checks, improving consistency and reliability in nonce validation.
2025-11-16 18:10:08 +02:00
anonpenguin23
6f7b7606b0 refactor: remove RQLite service management and improve Olric client handling
- Eliminated the RQLite service management functions from the ProcessManager, streamlining the service startup and shutdown processes.
- Updated the Gateway to utilize a mutex for thread-safe access to the Olric client, enhancing concurrency handling.
- Refactored cache handler methods to consistently retrieve the Olric client, improving code clarity and maintainability.
- Added a reconnect loop for the Olric client to ensure resilience during connection failures, enhancing overall system reliability.
2025-11-14 17:49:27 +02:00
anonpenguin
adb180932b
Merge pull request #68 from DeBrosOfficial/nightly
Bugs, IPFS, Olric
2025-11-14 08:59:01 +02:00
anonpenguin23
5d6de3b0b8 feat: improve gateway.yaml path handling and Olric client initialization
- Enhanced the DefaultPath function to remember the preferred data path for gateway.yaml, allowing for better error messaging and fallback options.
- Introduced a new function to initialize the Olric client with retry logic, improving resilience during client setup and providing clearer logging for connection attempts.
- Updated logging to provide detailed feedback on Olric client initialization attempts, enhancing troubleshooting capabilities.
2025-11-14 08:56:43 +02:00
anonpenguin23
747be5863b feat: enforce cluster secret requirement for non-bootstrap nodes
- Added documentation for joining additional nodes, specifying the need for the same IPFS Cluster secret as the bootstrap host.
- Updated the production command to require the `--cluster-secret` flag for non-bootstrap nodes, ensuring consistent cluster PSKs during deployment.
- Enhanced error handling to validate the cluster secret format and provide user feedback if the secret is missing or invalid.
- Modified the configuration setup to accommodate the cluster secret, improving security and deployment integrity.
2025-11-14 07:12:03 +02:00
anonpenguin23
358de8a8ad
feat: enhance production service initialization and logging
- Updated the `Phase2cInitializeServices` function to accept bootstrap peers and VPS IP, improving service configuration for non-bootstrap nodes.
- Refactored the `handleProdInstall` and `handleProdUpgrade` functions to ensure proper initialization of services with the new parameters.
- Improved logging to provide clearer feedback during service initialization and configuration, enhancing user experience and troubleshooting capabilities.
2025-11-13 10:26:50 +02:00
anonpenguin23
47ffe817b4
feat: add service enable/disable functionality to production commands
- Introduced new functions to check if a service is enabled and to enable or disable services as needed during production command execution.
- Enhanced the `handleProdStart` and `handleProdStop` functions to manage service states more effectively, ensuring services are re-enabled after being stopped and disabled when stopped.
- Improved logging to provide clear feedback on service status changes, enhancing user experience during service management.
2025-11-13 07:21:22 +02:00
anonpenguin23
7f77836d73
feat: add service enable/disable functionality to production commands
- Introduced new functions to check if a service is enabled and to enable or disable services as needed during production command execution.
- Enhanced the `handleProdStart` and `handleProdStop` functions to manage service states more effectively, ensuring services are re-enabled after being stopped and disabled when stopped.
- Improved logging to provide clear feedback on service status changes, enhancing user experience during service management.
2025-11-12 17:08:24 +02:00
anonpenguin23
1d060490a8
feat: add service enable/disable functionality to production commands
- Introduced new functions to check if a service is enabled and to enable or disable services as needed during production command execution.
- Enhanced the `handleProdStart` and `handleProdStop` functions to manage service states more effectively, ensuring services are re-enabled after being stopped and disabled when stopped.
- Improved logging to provide clear feedback on service status changes, enhancing user experience during service management.
2025-11-12 11:18:50 +02:00
anonpenguin23
0421155594
refactor: improve Olric server configuration logic and enhance bootstrap peer handling
- Updated the logic for determining Olric server addresses in the gateway configuration, differentiating between bootstrap and non-bootstrap nodes for better connectivity.
- Introduced a new function to parse bootstrap host and port from the API URL, improving clarity and flexibility in handling different network configurations.
- Enhanced the handling of IP protocols (IPv4 and IPv6) when constructing bootstrap peer addresses, ensuring compatibility across various network environments.
2025-11-12 10:07:40 +02:00
anonpenguin23
32470052ba
feat: enhance production command handling and configuration management
- Updated the production command to support dynamic IP address extraction from bootstrap peer multiaddrs, improving node connectivity.
- Refactored the configuration generation to include advertised HTTP and Raft addresses based on the node type and bootstrap peers.
- Enhanced error handling and logging for service management commands, ensuring better feedback during installation and upgrades.
- Improved the README documentation with new command examples and clarified usage instructions for production deployment.
2025-11-12 09:14:26 +02:00
anonpenguin23
0ca211c983
refactor: update config paths for gateway.yaml
- Changed the default configuration path for gateway.yaml to prioritize the ~/.debros/data/ directory, ensuring better organization and clarity.
- Updated related functions to reflect the new path structure, maintaining backward compatibility with existing configurations.
- Adjusted service execution commands to align with the new configuration path, enhancing deployment consistency.
2025-11-11 17:08:56 +02:00
anonpenguin23
2b17bcdaa2
refactor: remove RQLite as a separate service management entity
- Updated production command handling to reflect that RQLite is now managed internally by the node process, eliminating the need for separate service definitions and binary paths.
- Adjusted logging and service management functions to streamline operations and improve clarity regarding RQLite's integration.
- Enhanced log file creation to be node-type specific, ensuring only relevant logs are generated based on the node type being installed.
2025-11-11 16:23:26 +02:00
anonpenguin23
c405be3e69
feat: add --ignore-resource-checks option to production commands
- Introduced a new command-line option `--ignore-resource-checks` to skip prerequisite validation for disk, RAM, and CPU during installation and upgrades.
- Updated the `ProductionSetup` struct to include a flag for skipping resource checks, enhancing flexibility for users with specific deployment needs.
- Enhanced logging to inform users when resource checks are skipped, improving transparency during the setup process.
2025-11-11 15:39:51 +02:00
anonpenguin23
c2298e476e
chore: update version to 0.69.2 and refine CLI help output
- Bumped the version number in the Makefile to 0.69.2.
- Removed deprecated environment commands from the CLI main.go file to streamline user experience.
- Enhanced help output for development and production commands, adding new options and clarifying usage instructions.
- Introduced a `--no-pull` option in production commands to skip repository updates during installation and upgrades, improving flexibility for users.
2025-11-11 15:29:23 +02:00
anonpenguin23
ee566d93b7
feat: implement service management during production upgrades
- Added functionality to stop services before upgrading binaries, ensuring a smoother upgrade process.
- Implemented a helper function to extract existing configuration settings from node and gateway config files, preserving critical parameters like bootstrap peers and domain.
- Enhanced logging to provide feedback on preserved configurations during the upgrade process.
2025-11-11 09:46:52 +02:00
anonpenguin23
7c3378a8ec
feat: enhance binary installation and ownership management
- Improved the binary installation process by adding checks for the existence and contents of the source binary directory, ensuring a successful build.
- Implemented individual file copying to avoid issues with wildcard expansion, enhancing reliability during the installation.
- Added ownership correction for directories and files created during service initialization, ensuring proper permissions for the debros user.
- Enhanced error handling for ownership changes, providing detailed output in case of failures.
2025-11-11 09:33:01 +02:00
anonpenguin23
bd4542ef56 fixed test error 2025-11-11 09:18:43 +02:00
anonpenguin23
f88a28b3df
feat: enhance binary installation and ownership management
- Improved the binary installation process by adding checks for the existence and contents of the source binary directory, ensuring a successful build.
- Implemented individual file copying to avoid issues with wildcard expansion, enhancing reliability during the installation.
- Added ownership correction for directories and files created during service initialization, ensuring proper permissions for the debros user.
- Enhanced error handling for ownership changes, providing detailed output in case of failures.
2025-11-11 09:17:45 +02:00
anonpenguin23
b0ac58af3e
feat: add production deployment documentation and service management commands
- Introduced a new section in the README for Production Deployment, detailing prerequisites, installation steps, and service management commands.
- Added commands for starting, stopping, and restarting production services in the CLI, enhancing user control over service management.
- Updated IPFS initialization to configure API, Gateway, and Swarm addresses to avoid port conflicts, improving deployment reliability.
- Enhanced error handling and logging for service management operations, ensuring better feedback during execution.
2025-11-11 09:00:45 +02:00
anonpenguin23
52b3a99bb9
feat: add branch management for production commands
- Introduced `--branch` flag for specifying Git branches (main or nightly) during installation and upgrade processes.
- Implemented validation for branch input and defaulted to 'main' if not specified.
- Enhanced help output to include examples for using the new branch feature.
- Added functionality to save and read branch preferences for future upgrades, improving user experience and flexibility.
2025-11-11 08:45:20 +02:00
anonpenguin23
19bfaff943
feat: enhance binary installation and IPFS configuration management
- Improved the binary installation process by checking if the git repository is already initialized, allowing for updates instead of re-cloning.
- Added error handling for fetching and resetting the repository to ensure the latest changes are applied.
- Enhanced IPFS configuration management by clearing AutoConf placeholders to prevent startup errors, with detailed logging for each cleanup step.
2025-11-11 08:35:40 +02:00
anonpenguin23
b58b632be9
feat: enhance gateway configuration handling with optional config flag
- Added support for a `--config` flag in the gateway configuration parser to allow absolute or relative paths for the config file.
- Improved error handling for determining the config path, ensuring robust loading of `gateway.yaml` from specified locations.
- Updated related functions to maintain backward compatibility while enhancing flexibility in configuration management.
2025-11-11 08:15:48 +02:00
anonpenguin23
a33d03c6b2
feat: enhance configuration handling for IPFS and cluster initialization
- Updated `select_data_dir_check` to handle both absolute and relative config paths, improving flexibility in configuration file management.
- Enhanced `DefaultPath` to check for config files in both legacy and production directories, ensuring backward compatibility.
- Modified `InitializeIPFSClusterConfig` to include cluster secret management and improve error handling during initialization.
- Added functionality to update the cluster secret in the service.json configuration file, enhancing security and configurability.
2025-11-11 08:04:48 +02:00
anonpenguin23
6ba0a824e0
feat: enhance binary installation process for RQLite and IPFS
- Added error handling for copying the RQLite binary to ensure successful installation.
- Updated IPFS installation to follow official steps, including downloading, extracting, and verifying the installation of Kubo.
- Improved logging for each installation step and ensured the PATH is updated for the current process.
- Added checks to verify the existence of downloaded files and installation scripts, enhancing robustness.
2025-11-11 07:43:03 +02:00
anonpenguin23
d5e28bb694 fixed test error 2025-11-11 07:14:41 +02:00
anonpenguin23
72ba75d16b
feat: implement binary resolution for systemd service generation
- Added a new method `ResolveBinaryPath` to locate required executables in the system PATH and specified extra paths.
- Updated `InitializeIPFSRepo` and `Phase5CreateSystemdServices` methods to utilize the new binary resolution logic for IPFS, IPFS Cluster, RQLite, and Olric services.
- Modified service generation functions to accept binary paths as parameters, ensuring correct executable paths are used in systemd unit files.
2025-11-11 07:13:36 +02:00
anonpenguin23
b896e37e09
fix: improve disk space check logic in ResourceChecker
- Updated CheckDiskSpace method to validate disk space against the parent directory if the specified path does not exist.
- Enhanced error handling to ensure accurate reporting of disk space validation failures.
2025-11-11 07:00:39 +02:00
anonpenguin23
b1732b2cbe
refactor: simplify flag parsing in TestProdCommandFlagParsing
- Removed unused `--force` and `--domain` flags from the test case to streamline the flag parsing logic.
- Updated the test to focus on essential flags, enhancing clarity and maintainability.
2025-11-11 06:52:31 +02:00
anonpenguin23
badaa920d9
feat: enhance production command handling and configuration generation
- Added comprehensive tests for production command flag parsing to ensure correct handling of bootstrap, VPS IP, and peer configurations.
- Updated production command help output to clarify the usage of new flags, including `--vps-ip` and `--bootstrap-join`.
- Modified the configuration generation logic to incorporate the new `bootstrapJoin` parameter for secondary bootstrap nodes.
- Enhanced systemd service generation to include the correct advertise IP and join address for non-bootstrap nodes.
- Implemented tests for RQLite service generation to verify the inclusion of join addresses and advertise IPs in the generated units.
2025-11-11 06:51:28 +02:00
anonpenguin23
ed80b5b023
feat: implement resource validation checks for production deployment
- Added a new ResourceChecker type to validate system resources including disk space, RAM, and CPU cores.
- Implemented CheckDiskSpace, CheckRAM, and CheckCPU methods to ensure minimum requirements are met for production deployment.
- Integrated resource checks into the ProductionSetup's Phase1CheckPrerequisites method to enhance deployment reliability.
- Updated systemd service generation to log output to specific log files instead of the journal for better log management.
2025-11-11 06:14:05 +02:00
anonpenguin23
e9bf94ba96
feat: implement resource validation checks for production deployment
- Added a new ResourceChecker type to validate system resources including disk space, RAM, and CPU cores.
- Implemented CheckDiskSpace, CheckRAM, and CheckCPU methods to ensure minimum requirements are met for production deployment.
- Integrated resource checks into the ProductionSetup's Phase1CheckPrerequisites method to enhance deployment reliability.
- Updated systemd service generation to log output to specific log files instead of the journal for better log management.
2025-11-11 05:44:40 +02:00
anonpenguin23
52a726ffd4
fix: update expected port count in TestPortChecker
- Adjusted the expected port count in the TestPortChecker test from 22 to 44 to align with the current RequiredPorts configuration.
2025-11-11 05:28:20 +02:00
anonpenguin23
efa26e6ec8
feat: enhance development environment topology and configuration
- Introduced a new topology structure to manage multiple bootstrap and node configurations, allowing for a more flexible development environment.
- Updated Makefile and help commands to reflect the addition of a second bootstrap and fourth node.
- Enhanced health checks to require a minimum of three healthy nodes for RQLite and LibP2P connectivity.
- Refactored service management to dynamically handle multiple nodes and their respective configurations.
- Improved logging and configuration file generation for better clarity and maintainability.
2025-11-11 05:26:43 +02:00
anonpenguin23
239fb2084b
feat: enhance Raft log index retrieval and data directory management
- Improved the `getRaftLogIndex` method to accurately report the Raft log index by incorporating fallback logic to read persisted snapshot metadata when the RQLite status is unavailable or returns zero.
- Added a new method `getPersistedRaftLogIndex` to read the highest Raft log index from snapshot metadata files, ensuring accurate reporting even before RQLite starts.
- Centralized the data directory path resolution logic in `rqliteDataDirPath`, simplifying the codebase and enhancing maintainability.
2025-11-10 16:24:05 +02:00
anonpenguin23
5463df73d5 fixed test error 2025-11-10 15:38:33 +02:00
anonpenguin23
0ea58354ca
feat: enhance E2E testing and dependency management
- Added new E2E tests for authentication, cache operations, and IPFS interactions to improve coverage and reliability.
- Introduced concurrency tests for cache operations to validate performance under load.
- Updated `go.mod` to include `github.com/mattn/go-sqlite3` as a dependency for database interactions.
- Refined Makefile to simplify E2E test execution and configuration discovery.
- Removed outdated client E2E tests and consolidated related functionality for better maintainability.
2025-11-10 15:36:58 +02:00
anonpenguin23
263fbbb8b4
feat: implement Raft state management for cluster recovery
- Added methods to check for existing Raft state and clear it if necessary, allowing for a clean join to the cluster.
- Enhanced automatic recovery logic to detect stale Raft state and clear it when peers have higher log indexes.
- Improved logging for Raft state operations to provide better visibility during cluster management.
2025-11-10 08:51:33 +02:00
anonpenguin23
a72aebc1fe
feat: add script for graceful shutdown and process cleanup
- Introduced a new script `dev-kill-all.sh` to handle graceful shutdown of development processes and cleanup of stale PID files.
- Updated Makefile to include a `kill` command that utilizes the new script for improved process management.
- Enhanced the shutdown process to verify that required ports are free after termination of processes.
2025-11-10 08:29:34 +02:00
anonpenguin23
80ea58848b
feat: add script for graceful shutdown and process cleanup
- Introduced a new script `dev-kill-all.sh` to handle graceful shutdown of development processes and cleanup of stale PID files.
- Updated Makefile to include a `kill` command that utilizes the new script for improved process management.
- Enhanced the shutdown process to verify that required ports are free after termination of processes.
2025-11-10 08:08:43 +02:00
anonpenguin23
687316b8d6 fixed 2025-11-10 06:43:54 +02:00
anonpenguin23
170665bf02
feat: add script for graceful shutdown and process cleanup
- Introduced a new script `dev-kill-all.sh` to handle graceful shutdown of development processes and cleanup of stale PID files.
- Updated Makefile to include a `kill` command that utilizes the new script for improved process management.
- Enhanced the shutdown process to verify that required ports are free after termination of processes.
2025-11-10 06:41:44 +02:00
anonpenguin23
17fc78975d
refactor: reorder production installation phases and enhance service initialization
- Adjusted the installation sequence to generate secrets before initializing services, ensuring necessary keys are in place.
- Updated service initialization to account for both bootstrap and node variants, improving service status reporting.
- Enhanced error handling during IPFS repo and cluster path initialization, providing clearer feedback on failures.
- Captured the node peer ID for logging after secret generation, improving visibility during production setup.
2025-11-10 06:03:40 +02:00
anonpenguin23
6a86592cad
refactor: streamline development and production command structure
- Consolidated development commands into a new `dev` command group for better organization.
- Introduced a `prod` command group to manage production environment operations.
- Updated Makefile to simplify the development environment setup and improve logging.
- Enhanced README to clarify the development process and health check requirements.
- Removed deprecated configuration and service management commands to streamline the CLI interface.
2025-11-10 05:34:50 +02:00
abcf9a42eb fixed ipfs error 2025-11-09 19:52:23 +02:00
anonpenguin
a9af0d2f2d
Merge pull request #69 from DeBrosOfficial/super-magic-updates
Super magic updates
2025-11-09 19:20:51 +02:00
anonpenguin23
0b24c66d56
works 2025-11-09 19:20:08 +02:00
anonpenguin23
f991d55676 fixes 2025-11-09 18:30:03 +02:00
anonpenguin23
0388c3a766
refactor: streamline development and production command structure
- Consolidated development commands into a new `dev` command group for better organization.
- Introduced a `prod` command group to manage production environment operations.
- Updated Makefile to simplify the development environment setup and improve logging.
- Enhanced README to clarify the development process and health check requirements.
- Removed deprecated configuration and service management commands to streamline the CLI interface.
2025-11-09 18:28:24 +02:00
anonpenguin23
c726dfc401 feat: update IPFS configuration and enhance cluster secret management
- Changed default IPFS API URL to port 5001 for better compatibility.
- Enhanced the initialization process for IPFS and Cluster by adding support for bootstrap peers.
- Introduced user prompts for cluster secret and swarm key generation, improving user experience during setup.
- Updated service configuration to dynamically determine paths based on existing configuration files.
2025-11-08 13:29:21 +02:00
anonpenguin23
a5c30d0141 feat: update IPFS configuration and enhance cluster secret management
- Changed default IPFS API URL to port 5001 for better compatibility.
- Enhanced the initialization process for IPFS and Cluster by adding support for bootstrap peers.
- Introduced user prompts for cluster secret and swarm key generation, improving user experience during setup.
- Updated service configuration to dynamically determine paths based on existing configuration files.
2025-11-08 12:59:54 +02:00
anonpenguin23
93b25c42e4 feat: enhance IPFS configuration and logging in CLI
- Added IPFS cluster API and HTTP API configuration options to node and bootstrap configurations.
- Improved the generation of IPFS-related URLs and parameters for better integration.
- Enhanced error logging in cache handlers to provide more context on failures during cache operations.
2025-11-08 11:59:38 +02:00
anonpenguin23
50f7abf376 feat: enhance IPFS configuration and logging in CLI
- Added IPFS cluster API and HTTP API configuration options to node and bootstrap configurations.
- Improved the generation of IPFS-related URLs and parameters for better integration.
- Enhanced error logging in cache handlers to provide more context on failures during cache operations.
2025-11-07 16:51:08 +02:00
anonpenguin23
5b21774e04 feat: add cache multi-get handler and improve API key extraction
- Implemented a new cacheMultiGetHandler to retrieve multiple keys from the Olric cache in a single request.
- Enhanced the extractAPIKey function to prioritize the X-API-Key header and improve handling of non-JWT Bearer tokens.
- Updated routes to include the new multi-get endpoint for cache operations.
2025-11-07 10:33:25 +02:00
anonpenguin
05ca685eee
Merge pull request #67 from DeBrosOfficial/ipfs
Ipfs & olric
2025-11-06 06:26:05 +02:00
anonpenguin23
a7d21d4217 remove docs files 2025-11-06 06:25:41 +02:00
anonpenguin23
fbdfa23c77 feat: enhance IPFS and Cluster integration in setup
- Added automatic setup for IPFS and IPFS Cluster during the network setup process.
- Implemented initialization of IPFS repositories and Cluster configurations for each node.
- Enhanced Makefile to support starting IPFS and Cluster daemons with improved logging.
- Introduced a new documentation guide for IPFS Cluster setup, detailing configuration and verification steps.
- Updated changelog to reflect the new features and improvements.
2025-11-05 17:32:18 +02:00
anonpenguin23
d00290d278 feat: enhance IPFS and Cluster integration in setup
- Added automatic setup for IPFS and IPFS Cluster during the network setup process.
- Implemented initialization of IPFS repositories and Cluster configurations for each node.
- Enhanced Makefile to support starting IPFS and Cluster daemons with improved logging.
- Introduced a new documentation guide for IPFS Cluster setup, detailing configuration and verification steps.
- Updated changelog to reflect the new features and improvements.
2025-11-05 17:30:25 +02:00
anonpenguin23
69d7ccf4c7 feat: enhance IPFS and Cluster integration in setup
- Added automatic setup for IPFS and IPFS Cluster during the network setup process.
- Implemented initialization of IPFS repositories and Cluster configurations for each node.
- Enhanced Makefile to support starting IPFS and Cluster daemons with improved logging.
- Introduced a new documentation guide for IPFS Cluster setup, detailing configuration and verification steps.
- Updated changelog to reflect the new features and improvements.
2025-11-05 10:52:40 +02:00
anonpenguin23
d6009bb33f feat: enhance IPFS and Cluster integration in setup 08:16:27
- Added automatic setup for IPFS and IPFS Cluster during the network setup process.
  - Implemented initialization of IPFS repositories and Cluster configurations for each node.
  - Enhanced Makefile to support starting IPFS and Cluster daemons with improved logging.
  - Introduced a new documentation guide for IPFS Cluster setup, detailing configuration and verification steps.
  - Updated changelog to reflect the new features and improvements.
2025-11-05 09:01:55 +02:00
anonpenguin23
cf26c1af2c feat: integrate Olric distributed cache support
- Added Olric cache server integration, including configuration options for Olric servers and timeout settings.
- Implemented HTTP handlers for cache operations: health check, get, put, delete, and scan.
- Enhanced Makefile with commands to run the Olric server and manage its configuration.
- Updated README and setup scripts to include Olric installation and configuration instructions.
- Introduced tests for cache handlers to ensure proper functionality and error handling.
2025-11-05 07:31:50 +02:00
anonpenguin23
3196e91e85
feat: integrate Olric distributed cache support
- Added Olric cache server integration, including configuration options for Olric servers and timeout settings.
- Implemented HTTP handlers for cache operations: health check, get, put, delete, and scan.
- Enhanced Makefile with commands to run the Olric server and manage its configuration.
- Updated README and setup scripts to include Olric installation and configuration instructions.
- Introduced tests for cache handlers to ensure proper functionality and error handling.
2025-11-03 15:30:08 +02:00
anonpenguin
42131c0e75
Merge pull request #65 from DeBrosOfficial/nightly
Nightly
2025-11-03 08:39:19 +02:00
anonpenguin23
5e7d59c7a1 fix: adjust connection timeout and remove debug logging for ephemeral ports
- Increased the connection timeout from 15 seconds to 20 seconds for improved reliability during peer connections.
- Removed debug logging that filtered out ephemeral port addresses to streamline the logging process.
2025-11-03 08:11:20 +02:00
anonpenguin23
11ce4f2a53 feat: add pre-commit hook for automatic changelog updates
- Introduced a pre-commit hook that updates the changelog if there are code changes, excluding commits that only modify the changelog or Makefile.
- Added user confirmation for proceeding with the commit after displaying the changelog preview.
- Enhanced the update_changelog.sh script to differentiate between pre-commit and pre-push contexts for better change analysis.
2025-11-03 07:34:42 +02:00
anonpenguin23
d3543ac3ab chore: update changelog and version to 0.53.16 2025-11-03 07:32:51 +02:00
anonpenguin23
2b51859ea7 fix: enhance changelog update logic to prevent infinite loops
- Added checks to skip changelog updates if the only unpushed commit is a changelog update commit.
- Improved handling of multiple unpushed commits by excluding the latest changelog commit from the diff analysis.
- Cleaned up temporary changelog preview files when no other changes are detected.
2025-11-03 07:32:42 +02:00
anonpenguin23
3ba7e88e4e chore: update changelog and version to 0.53.15 2025-11-03 07:31:53 +02:00
anonpenguin23
952132de8e chore: update changelog and version to 0.53.15 2025-11-03 07:31:08 +02:00
anonpenguin23
31e01df940 chore: update changelog and version to 0.53.14 2025-11-03 07:30:38 +02:00
anonpenguin23
9093c8937e feat: enhance bootstrap peer handling and configuration validation
- Updated DefaultBootstrapPeers function to prioritize environment variable settings for bootstrap peers, allowing for dynamic configuration.
- Added tests to ensure non-empty default bootstrap peers and validate the correct handling of bootstrap peer configurations.
- Introduced a helper function to generate valid configurations for different node types, improving test clarity and maintainability.
- Enhanced the isPrivateOrLocalHost function to properly handle IPv6 addresses, ensuring accurate host validation.
2025-11-03 07:30:27 +02:00
anonpenguin23
2088b6a0cf chore: update changelog and version to 0.53.14 2025-11-03 07:24:52 +02:00
anonpenguin23
3d02663e27 fix: ensure correct repository context in pre-push hook
- Updated the pre-push hook to change the directory to the repository root before staging CHANGELOG.md and Makefile, ensuring the commands execute in the correct context.
2025-11-03 07:24:36 +02:00
anonpenguin23
a17255e6b4 chore: automate changelog and version commit in pre-push hook
- Enhanced the pre-push hook to automatically stage and commit changes to CHANGELOG.md and Makefile, streamlining the version update process.
- Added error handling to ensure the push proceeds even if the commit fails due to no changes.
2025-11-03 07:24:06 +02:00
anonpenguin23
09c903dd14 refactor: update date handling in update_changelog.sh
- Replaced the static date field in the changelog entry with a programmatically generated date, improving accuracy and reducing manual errors.
- Removed the date validation step, streamlining the script's functionality and ensuring consistency in date formatting.
2025-11-03 07:22:46 +02:00
anonpenguin23
a895726cbd fix: improve user confirmation handling in pre-push hook
- Updated the pre-push hook to read user confirmation directly from /dev/tty, ensuring it works correctly in all terminal contexts.
- This change enhances the reliability of the push confirmation process, preventing potential issues with user input in non-interactive environments.
2025-11-03 07:21:44 +02:00
anonpenguin23
f1fcbf69cf refactor: enhance update_changelog.sh with diff counting and logging
- Added functionality to count and log the number of unstaged, staged, and unpushed changes for better visibility of the git state.
- Improved logging to provide clearer feedback on the current status of changes before generating the changelog.
2025-11-03 07:20:50 +02:00
anonpenguin23
c282cf57d6 refactor: enhance OpenRouter API key handling in update_changelog.sh
- Removed hardcoded API key assignment and implemented loading from a .env file for better security and flexibility.
- Added error handling to guide users on setting the API key through environment variables or a .env file.
- Improved API response handling and error reporting for clearer feedback on potential issues with the API key.
2025-11-03 07:19:35 +02:00
anonpenguin23
4ec47fa7ef refactor: simplify JSON payload construction in update_changelog.sh
- Improved the construction of the JSON request body for the OpenRouter API by using a here-document for better readability and maintainability.
- Ensured proper escaping of the prompt content to prevent issues with special characters in the API request.
2025-11-03 07:17:18 +02:00
anonpenguin23
6abe43ddc6 fix: update OpenRouter API key assignment in update_changelog.sh
- Changed the assignment of OPENROUTER_API_KEY to a hardcoded value for immediate use, removing the environment variable fallback.
- This change simplifies the script's usage for users who may not have the environment variable set.
2025-11-03 07:15:46 +02:00
anonpenguin23
7fe56f11d5 feat: enhance update_changelog.sh with API key handling and error reporting
- Updated the script to allow setting the OpenRouter API key via an environment variable, improving flexibility.
- Added HTTP headers for better API request context.
- Enhanced error reporting to provide clearer feedback on API errors and potential key issues.
2025-11-03 07:15:08 +02:00
anonpenguin23
909be0f18f feat: add git hook installation script and update Makefile
- Introduced a new script `install-hooks.sh` to automate the installation of git hooks from the `.githooks` directory to the `.git/hooks` directory.
- Updated the Makefile to include a new `install-hooks` target for easy execution of the hook installation process.
- Modified the pre-push hook to correctly reference the repository root directory.
2025-11-03 07:13:26 +02:00
anonpenguin23
6e59b17c6a feat: enhance pre-push hook with changelog preview and user confirmation
- Updated the pre-push hook to display a preview of the changelog and the new version before proceeding with the push.
- Added color-coded output for better visibility and user experience.
- Implemented user confirmation to abort the push if the changelog preview is not approved.
- Cleaned up temporary files created during the changelog update process.
2025-11-03 07:11:47 +02:00
anonpenguin23
69fd6e32f1 feat: add script to automate changelog updates
- Introduced a new script `update_changelog.sh` that automates the process of generating and updating the changelog based on git diffs.
- The script checks for necessary dependencies, validates the current git state, and interacts with the OpenRouter API to generate changelog entries.
- It updates the version in the Makefile and inserts the new changelog entry into `CHANGELOG.md` while ensuring proper formatting and error handling.
2025-11-03 07:10:36 +02:00
anonpenguin23
30d18aca02 feat: add RQLite command support and help documentation
- Introduced a new RQLite command in the CLI to handle RQLite-related operations.
- Implemented the 'fix' subcommand to automatically repair common RQLite cluster issues, including correcting misconfigured join addresses and cleaning stale raft state.
- Updated help documentation to include RQLite commands and their usage.
2025-11-03 07:10:25 +02:00
anonpenguin23
ed7f4ae3d9 fix: update RQLite join address to use port 5001
- Changed the RQLite join address from "localhost:7001" to "localhost:5001" in multiple locations, including node configuration and test cases.
- Ensured consistency across configuration generation and validation tests to reflect the updated join address.
2025-10-31 20:48:45 +02:00
anonpenguin23
f71ef8e60b refactor: streamline repository removal and directory setup in cloneAndBuild function
- Updated the repository removal process to always use root, with a fallback to the 'debros' user if necessary.
- Added error handling for ensuring the 'debros' home directory exists and setting its ownership.
- Improved clarity in error messages for directory setup failures.
2025-10-31 20:09:34 +02:00
anonpenguin23
6e80ff28b4 refactor: improve repository removal and directory setup in cloneAndBuild function
- Changed the repository removal process to use the 'debros' user to avoid permission issues, with a fallback to root if necessary.
- Added a delay to ensure filesystem sync after removing the repository.
- Ensured the parent directory exists and has the correct permissions, including setting ownership to the 'debros' user.
2025-10-31 20:04:17 +02:00
anonpenguin23
58224826d2 chore: update version to 0.53.10 in Makefile 2025-10-31 19:58:31 +02:00
anonpenguin23
6f30514974 refactor: streamline repository management in cloneAndBuild function
- Removed existing repository if it exists to ensure a fresh start.
- Simplified the cloning process by directly cloning the repository without checking for existing branches.
- Enhanced error handling for cloning, copying binaries, and setting permissions, providing clearer feedback on failures.
2025-10-31 19:58:05 +02:00
anonpenguin23
13e05609e0 chore: update version to 0.53.9 in Makefile and enhance HTTPS configuration prompts in setup.go
- Updated the version number in the Makefile to 0.53.9.
- Improved the interactive prompts for domain name input and DNS verification in the setup.go file, streamlining the process for enabling HTTPS configuration.
2025-10-31 19:44:31 +02:00
anonpenguin23
8a7ae4ad6f feat: add HTTPS configuration options and server setup
- Introduced new configuration fields for enabling HTTPS, specifying a domain name, and setting a TLS cache directory in the gateway configuration.
- Enhanced the main server logic to support HTTPS with ACME integration, including automatic HTTP to HTTPS redirection and error handling for server startup.
- Added validation for HTTPS settings to ensure proper domain and cache directory configuration.
- Implemented interactive prompts in the CLI for domain and HTTPS setup, including DNS verification and port availability checks.
2025-10-31 19:32:13 +02:00
anonpenguin23
f2d6254b7b chore: remove example files for Go and TypeScript SDK
- Deleted basic usage example in Go, TypeScript package.json, README.md, tsconfig.json, and client.ts to streamline the examples directory and remove outdated content.
2025-10-31 19:24:10 +02:00
anonpenguin23
5b05f52162 chore: update version to 0.53.6 in Makefile and add IP extraction utility for multiaddr in setup.go 2025-10-31 15:12:44 +02:00
anonpenguin23
042e516b8c chore: update version to 0.53.5 in Makefile and enhance setup command with peer ID retrieval and VPS IPv4 detection 2025-10-31 14:50:20 +02:00
anonpenguin
cc74a8f135
Merge pull request #64 from DeBrosOfficial/nightly
feat: enhance service management and configuration options
2025-10-31 14:36:04 +02:00
anonpenguin23
168808b007 chore: update version to 0.53.3 in Makefile 2025-10-31 14:35:13 +02:00
anonpenguin23
c326711d7c feat: enhance service management and configuration options
- Updated service startup logic to include a helper function for starting or restarting services, improving reliability and clarity in service management.
- Added new configuration options for cluster synchronization, peer inactivity limits, and minimum cluster size in the config_commands.go file, enhancing cluster management capabilities.
2025-10-31 14:34:03 +02:00
anonpenguin
685295551c
Merge pull request #63 from DeBrosOfficial/nightly
feat: add Go build cache directory to setupDirectories function
2025-10-31 14:26:23 +02:00
anonpenguin23
ebe2706ad8 feat: add Go build cache directory to setupDirectories function
- Included a new directory for the Go build cache in the setupDirectories function to enhance the Go environment setup.
2025-10-31 14:24:57 +02:00
anonpenguin
ca00561da1
Merge pull request #62 from DeBrosOfficial/nightly
chore: update version and enhance database connection configuration
2025-10-31 13:17:08 +02:00
anonpenguin23
7b7087e5eb chore: update version to 0.53.1 in Makefile 2025-10-31 12:12:40 +02:00
anonpenguin23
c5d3dd1f6d feat: enhance Go module cache setup in CLI
- Added Go module cache directory to the setupDirectories function for improved Go environment configuration.
- Updated cloneAndBuild function to set HOME environment variable, ensuring proper module cache creation during build process.
2025-10-31 12:12:27 +02:00
anonpenguin23
2aead48045 feat: enhance RQLite manager with improved logging and data directory management
- Added structured logging for RQLite components, including cluster discovery and leadership processes.
- Introduced methods for preparing the data directory and launching the RQLite process, improving code organization.
- Implemented exponential backoff for leadership checks to reduce log noise and improve reliability.
- Enhanced peer health tracking and membership update logic to streamline cluster synchronization and recovery.
2025-10-31 11:53:50 +02:00
anonpenguin23
8f82dc7ca3 feat: enhance RQLite cluster discovery and recovery mechanisms
- Introduced TriggerPeerExchange method to facilitate manual peer exchange for RQLite metadata.
- Implemented performPreStartClusterDiscovery to ensure coordinated recovery by building peers.json before RQLite startup.
- Added validation for node ID consistency with raft address during RQLite startup.
- Enhanced logging for cluster recovery processes and peer discovery progress.
- Updated cluster synchronization logic to improve reliability during node recovery scenarios.
2025-10-31 11:41:20 +02:00
anonpenguin23
ea5ef6bc1a feat: implement dynamic cluster discovery and validation for RQLite nodes
- Added ClusterDiscoveryService to manage peer discovery and synchronization for RQLite nodes.
- Introduced new configuration options for cluster synchronization intervals, peer inactivity limits, and minimum cluster size.
- Enhanced validation logic to ensure proper configuration of cluster parameters.
- Implemented metrics collection for cluster health and peer status, improving monitoring capabilities.
- Updated RQLiteManager to integrate with the new discovery service, allowing for dynamic leadership and cluster joining logic.
2025-10-31 10:46:52 +02:00
anonpenguin23
f561bc5311 refactor: improve Anon installation script for terms acceptance
- Updated the installation process to rely on debconf preseed and file-based acceptance methods, reducing the need for interactive prompts.
- Limited the number of "yes" responses piped during installation to enhance reliability and prevent infinite loops.
- Streamlined the command execution for fetching and running the installation script from GitHub.
2025-10-31 08:42:07 +02:00
anonpenguin23
624f92bf11 updated version 2025-10-31 08:37:12 +02:00
anonpenguin23
8538e2eb3f feat: streamline Anon installation with automated terms acceptance
- Implemented pre-acceptance of Anon terms in both the setup and installation scripts to eliminate interactive prompts.
- Enhanced the installation process by creating necessary directories and files to ensure terms are accepted before installation.
- Updated the installation command to use a non-interactive frontend, improving reliability during package installation.
2025-10-31 08:36:59 +02:00
anonpenguin23
472b7c10bb feat: automate terms acceptance in Anon configuration and installation
- Added automatic acceptance of terms in the Anon configuration file to streamline the setup process.
- Updated the installation script to pre-accept terms, eliminating interactive prompts during installation.
- Enhanced output messages to reflect the auto-accepted terms for better user awareness.
2025-10-31 08:24:32 +02:00
anonpenguin23
ede253afae feat: enhance Anon installation process with Ubuntu version check and improved error handling
- Added a check for unsupported Ubuntu version 25.04, providing informative messages if installation is skipped.
- Updated the installation method to use the official script from GitHub, ensuring a cleaner installation process.
- Implemented cleanup of old APT repository files and GPG keys before installation.
- Enhanced error handling for enabling and starting the Anon service, including output of command errors for better debugging.
2025-10-31 08:05:59 +02:00
anonpenguin23
45dde89175 feat: add branch selection prompt in setup process
- Introduced a new promptBranch function to allow users to select between 'main' and 'nightly' branches during the setup.
- Updated cloneAndBuild function to use the selected branch for cloning and pulling updates, enhancing flexibility in repository management.
- Implemented logic to switch branches if the current branch differs from the selected one, ensuring the correct branch is used.
2025-10-31 07:38:30 +02:00
anonpenguin23
d672f01b30 chore: bump version to 0.52.17 in the Makefile 2025-10-31 07:36:06 +02:00
anonpenguin23
56dc6892de chore: update version and enhance database connection configuration
- Bumped version from 0.52.15 to 0.52.16 in the Makefile.
- Added connection pool configuration with timeouts and limits in the RQLite adapter and gateway, improving database connection management.
2025-10-31 07:34:27 +02:00
anonpenguin
a4b4b8f0df
Merge pull request #61 from DeBrosOfficial/nightly
Nightly
2025-10-30 13:11:53 +02:00
anonpenguin23
061b17de4f feat: improve configuration generation in setup process
- Added checks to prevent overwriting existing node.yaml and gateway.yaml files during configuration generation.
- Enhanced error handling for bootstrap.yaml creation, providing informative messages based on the existence of files.
- Implemented conditional renaming of bootstrap.yaml to node.yaml, ensuring it only occurs if node.yaml does not already exist.
- Updated output messages for clarity on configuration readiness and existing file usage.
2025-10-30 13:10:57 +02:00
anonpenguin23
351ce086bf feat: enhance anonProxyHandler to handle binary data safely
- Added Base64 encoding for the response body in the anonProxyHandler to prevent corruption of binary data when returned in JSON format.
- Updated the response structure to include the encoded body, ensuring compatibility with various data types.
2025-10-30 09:00:24 +02:00
anonpenguin23
fe16d503b5 feat: integrate Anyone Relay (Anon) into the development environment
- Added support for installing and configuring the Anyone Relay (Anon) for anonymous networking in the setup process.
- Updated the Makefile to include the Anon client in the development stack, allowing it to run alongside other services.
- Implemented a new HTTP proxy handler for the Anon service, enabling proxied requests through the Anyone network.
- Enhanced the installation script to manage Anon installation, configuration, and firewall settings.
- Introduced tests for the Anon proxy handler to ensure proper request validation and error handling.
- Updated documentation to reflect the new Anon service and its usage in the development environment.
2025-10-30 06:21:32 +02:00
anonpenguin
fe05240362
Merge pull request #60 from DeBrosOfficial/nightly
Nightly
2025-10-29 08:24:57 +02:00
ad088bd476 feat: enhance API key management and ownership recording in verifyHandler
- Implemented logic to ensure an API key is created or retrieved for each wallet during the verification process.
- Added best-effort recording of ownership for both API keys and wallets in the namespace ownership database.
- Improved error handling and logging for better traceability of ownership checks and API key operations.
- Cleaned up unnecessary comments and whitespace in the auth_handlers.go file for better code readability.
2025-10-29 06:53:51 +02:00
c208ff3288 feat: enhance signature verification for multiple blockchain types
- Added support for verifying signatures from both Ethereum (ETH) and Solana (SOL) blockchains in the verifyHandler.
- Introduced a new ChainType field in the request to specify the blockchain type, defaulting to ETH for backward compatibility.
- Implemented Ed25519 signature verification for Solana, including base64 decoding of signatures and base58 decoding of public keys.
- Enhanced error handling for unsupported chain types and invalid signature formats, improving robustness of the verification process.
2025-10-28 19:19:31 +02:00
0f0237b5ea
feat: implement asynchronous libp2p publishing in pubsub handler
- Updated the pubsubPublishHandler to publish messages to libp2p asynchronously, preventing HTTP response blocking.
- Introduced context with timeout for libp2p publishing to enhance reliability and logging for success and failure cases.
- Maintained immediate response for local delivery to WebSocket subscribers, improving overall message handling efficiency.
2025-10-28 10:25:38 +02:00
8dc71e7920 feat: implement local pub/sub delivery for WebSocket subscribers
- Added local subscriber management to the Gateway for direct message delivery to WebSocket clients.
- Introduced synchronization mechanisms to handle concurrent access to local subscribers.
- Enhanced pubsub handlers to register and unregister local subscribers, improving message delivery efficiency.
- Updated message publishing logic to prioritize local delivery before forwarding to libp2p, ensuring faster response times for local clients.
2025-10-27 17:30:40 +02:00
5930c9d832
feat: enhance pubsub functionality with namespace isolation and handler management
- Implemented namespace isolation in pubsub handlers to ensure message segregation.
- Refactored subscription management to support multiple handlers per topic, allowing independent lifecycles for each handler.
- Introduced a unique HandlerID for each subscription, enabling better management of handler registrations and unsubscriptions.
- Updated message handling to broadcast to all registered handlers, improving message delivery efficiency.
2025-10-27 16:37:25 +02:00
3f3ef9d1ac Revert "feat: enhance setup process with firewall configuration"
This reverts commit 030ea88cf6dafd274f077cc87d3bdd5b968dc33e.
2025-10-27 10:29:00 +02:00
anonpenguin23
36b69047cd
updated makefile 2025-10-27 10:20:35 +02:00
anonpenguin23
572912ce98
Merge remote-tracking branch 'origin/main' into nightly 2025-10-27 09:57:10 +02:00
anonpenguin23
dbf8660941
feat: enhance configuration generation in initFullStack
- Added steps to generate node2.yaml, node3.yaml, and gateway.yaml configurations during the full stack initialization process.
- Implemented checks to prevent overwriting existing configuration files unless the --force flag is used.
- Improved output messages to provide clear feedback on generated configurations and bootstrap information.
2025-10-27 09:00:49 +02:00
anonpenguin23
030ea88cf6
feat: enhance setup process with firewall configuration
- Added a new step to configure the UFW firewall during the setup process, ensuring necessary ports are opened for SSH, P2P, and Gateway API.
- Updated the setup command output to reflect the new firewall configuration step.
- Refactored the setup logic to maintain proper sequence and clarity in the setup steps.
- Modified the installation script to update the documentation link for better user guidance.
2025-10-26 12:55:32 +02:00
anonpenguin23
71d6c292d9
updated makefile 2025-10-26 11:39:10 +02:00
anonpenguin23
e96271e28c
feat: enhance debros user setup and sudoers configuration
- Added functionality to create the 'debros' user if it does not exist during setup.
- Implemented a mechanism to configure passwordless sudo access for the 'debros' user, including validation of the sudoers rule.
- Updated setup command output to guide users on accessing the 'debros' user.
2025-10-26 11:38:31 +02:00
anonpenguin23
879ba1f2e6
fix: update supported OS versions and improve binary copy command
- Added Ubuntu 25.04 to the list of supported operating systems in the setup command.
- Modified the binary copy command to use a shell command for improved compatibility and functionality.
2025-10-26 11:27:31 +02:00
anonpenguin23
ddebb03aa6
chore: update version and enhance bootstrap configuration in setup
- Incremented version in Makefile to 0.52.7.
- Refactored the `generateConfigsInteractive` function to streamline the setup for single-node VPS configurations.
- Removed user prompts for bootstrap peers and join address, simplifying the configuration process.
- Added explicit commands for generating bootstrap and gateway configurations with improved error handling and output messages.
2025-10-26 11:20:10 +02:00
anonpenguin23
a9ac3fcb83
feat: enhance setup command with validation and refactor
- Added validation functions for multiaddr and host:port formats to improve user input handling during setup.
- Refactored the setup command steps for clarity and consistency, ensuring proper sequence in the setup process.
- Removed the `initializeEnvironments` function and its related logic to streamline the setup flow.
- Updated the cloning process to use a shallow clone for efficiency.
- Enhanced service configuration to use environment variables for better management.
2025-10-26 10:45:28 +02:00
anonpenguin23
3b4d43f808
feat: enhance setup command with validation and refactor
- Added validation functions for multiaddr and host:port formats to improve user input handling during setup.
- Refactored the setup command steps for clarity and consistency, ensuring proper sequence in the setup process.
- Removed the `initializeEnvironments` function and its related logic to streamline the setup flow.
- Updated the cloning process to use a shallow clone for efficiency.
- Enhanced service configuration to use environment variables for better management.
2025-10-26 10:39:19 +02:00
anonpenguin23
0ebce77429
updated makefile 2025-10-26 07:02:54 +02:00
anonpenguin23
c315cd4a86
feat: enhance installation script for persistent Go setup
- Updated `setup.go` to add Go binary path to the current process and the `debros` user's `.bashrc` for persistent availability.
- Implemented error handling for updating `.bashrc` and ensured proper ownership of the file.
- Refined `install-debros-network.sh` to improve OS detection, dependency checks, and installation flow, including downloading the latest release of `network-cli` from GitHub.
- Streamlined the installation process with clearer logging and error messages, enhancing user experience.
2025-10-26 07:02:25 +02:00
anonpenguin23
1bc6292db3
chore: streamline GoReleaser configuration to build only network-cli binary
- Updated `.goreleaser.yaml` to focus solely on building the `network-cli` binary, removing configurations for other binaries (node, gateway, identity).
- Adjusted archive settings to reflect changes in the build process, resulting in cleaner and smaller release packages.
- Updated version in `Makefile` to 0.52.3-beta and documented changes in `CHANGELOG.md`.
2025-10-26 06:47:17 +02:00
anonpenguin23
d59b65e0b4
chore: update GitHub Actions to use latest artifact actions and add release script
- Upgraded `actions/upload-artifact` and `actions/download-artifact` to version 4 in the release workflow for improved functionality.
- Introduced a new `release.sh` script to manage the complete release workflow, including versioning, branch checks, and tagging for both stable and nightly releases.
2025-10-26 06:42:49 +02:00
anonpenguin23
795ab01dca
updated changelog 2025-10-26 06:32:32 +02:00
anonpenguin23
43c0caaf7f
feat: overhaul GoReleaser configuration and CLI structure
- Updated `.goreleaser.yaml` to reflect the new project name and added multi-platform binary builds for `network-cli`, `node`, `gateway`, and `identity`.
- Enhanced the CLI by modularizing commands into separate packages for better maintainability and clarity.
- Introduced a comprehensive environment management system, allowing users to switch between local, devnet, and testnet environments seamlessly.
- Added interactive setup commands for VPS installation, improving user experience and installation flow.
- Updated the installation script to be APT-ready, providing clear instructions for users and ensuring a smooth setup process.
- Enhanced documentation and changelog to reflect the new features and improvements.
2025-10-26 06:30:44 +02:00
anonpenguin23
f0576846bc
ypdate 2025-10-25 16:46:08 +03:00
anonpenguin23
33a13db44d
fix: enforce interactive mode requirement in installation script
- Added a check to ensure the script is run in an interactive terminal, providing clear error messages and instructions if not.
- Removed non-interactive mode handling for user creation, streamlining the user experience and ensuring necessary prompts are displayed.
- Updated instructions for re-running the script after user creation to enhance clarity.
2025-10-25 16:12:36 +03:00
anonpenguin23
46d69baf63
fix: enhance user switching logic in installation script for non-interactive mode
- Added a check to skip re-execution of the script as the 'debros' user when running in non-interactive mode.
- Ensured that the installation can proceed as root without switching users in non-interactive scenarios, improving usability.
2025-10-25 16:04:58 +03:00
anonpenguin23
38e77c79c6
feat: enhance debros user creation in installation script for non-interactive mode
- Added support for non-interactive mode to automatically create the 'debros' user without user prompts.
- Implemented passwordless login for the 'debros' user in non-interactive mode.
- Retained interactive prompts for user creation when not in non-interactive mode, improving user experience and flexibility.
2025-10-25 16:01:44 +03:00
anonpenguin23
e5a71ba295
fix: improve user prompt in installation script for debros user creation
- Updated the user creation prompt to use `read -rp` for better user experience by allowing inline input without a newline.
- Ensured clarity in the prompt message for creating the 'debros' user.
2025-10-25 15:58:08 +03:00
anonpenguin23
6bce9f23d0
chore: remove domain configuration and ACME TLS support from gateway
- Removed the `Domain` field from the gateway configuration to simplify setup.
- Eliminated related logic for domain validation and ACME certificate management.
- Updated installation script to remove references to ACME certificate storage and HTTP/HTTPS port configurations.
- Bumped version in Makefile to 0.52.0-beta to reflect changes.
2025-10-25 15:52:47 +03:00
anonpenguin23
ecc5f3241a
feat: improve user setup and installation flow in debros network script
- Added functionality to check and create the 'debros' user if it does not exist, with options for password setup or passwordless login.
- Enhanced user switching logic to ensure the script runs as the 'debros' user or prompts for necessary actions.
- Updated the installation script to provide clearer instructions for user switching and installation continuation.
- Refined existing checks and improved error handling for a smoother installation experience.
2025-10-25 15:41:29 +03:00
anonpenguin23
a626982636
feat: enhance installation script to support user switching and directory setup
- Updated the installation directory to a specific path (`/home/debros`) for better organization.
- Added a function to check and switch to the 'debros' user if not already running as that user.
- Improved user creation logic to ensure the 'debros' user is set up correctly with the appropriate home directory.
- Enhanced directory setup to create necessary subdirectories and set permissions accordingly.
2025-10-25 14:49:55 +03:00
anonpenguin23
6ec4f7f903
refactor: update installation script to use user home directory
- Changed the installation directory from `/opt/debros` to the user's home directory for improved flexibility.
- Updated systemd service configurations to reflect the new installation path for both node and gateway services.
- Ensured that read/write paths are aligned with the new installation directory.
2025-10-25 14:40:45 +03:00
anonpenguin
2cbc3034e8
Merge pull request #59 from DeBrosOfficial/domain-proxy
feat: enhance update mode in installation script to generate missing …
2025-10-25 14:36:52 +03:00
anonpenguin23
82d7dc2f2a
feat: enhance update mode in installation script to generate missing configuration files
- Added logic to check for the presence of essential configuration files during update mode.
- If missing, the script now generates the necessary `bootstrap.yaml` and `gateway.yaml` files to ensure proper setup.
2025-10-25 14:35:13 +03:00
anonpenguin
000f32f5e5
Merge pull request #58 from DeBrosOfficial/domain-proxy
feat: add domain configuration and ACME TLS support for gateway
2025-10-25 14:28:33 +03:00
anonpenguin23
8ba3c838e9
Updated script branch to nightly 2025-10-25 14:28:11 +03:00
anonpenguin23
d2b671b335
feat: add domain configuration and ACME TLS support for gateway
- Introduced a new `Domain` field in the gateway configuration to support HTTPS and ACME certificate provisioning.
- Implemented domain validation to ensure proper format.
- Enhanced the main gateway logic to handle ACME challenges and manage TLS certificates using CertMagic.
- Updated installation script to create necessary directories for ACME certificate storage and configure firewall rules for HTTP/HTTPS ports.
2025-10-25 13:45:53 +03:00
anonpenguin
f8c81ff3a1
Merge pull request #57 from DeBrosOfficial/nightly
Fix: Use config file instead of data directory in systemd service
2025-10-25 13:07:30 +03:00
anonpenguin23
802f058345
chore: update CHANGELOG and version for v0.51.9-beta release
- Added new entries to CHANGELOG.md for the one-command `make dev` target and full stack initialization features.
- Updated version in Makefile from 0.51.7-beta to 0.51.9-beta to reflect the latest changes.
2025-10-25 13:07:04 +03:00
anonpenguin
5939cf413e
Merge branch 'main' into nightly 2025-10-25 13:05:00 +03:00
4bd91ae83e Fix: Use config file instead of data directory in systemd service 2025-10-25 12:19:31 +03:00
anonpenguin
6ce999650f
Merge pull request #56 from DeBrosOfficial/super-fixes
Super fixes
2025-10-25 11:33:55 +03:00
anonpenguin23
a9d624c0f6
refactor: enhance DeBros network installation script
- Updated script to support a complete DeBros network node installation with gateway.
- Improved user management by introducing a variable for the debros user.
- Added support for idempotent updates and secure systemd service management for both node and gateway services.
- Replaced the configuration wizard with automated configuration file generation using network-cli.
- Enhanced firewall configuration instructions and service management commands for better usability.
2025-10-25 08:40:06 +03:00
anonpenguin23
7c9851729e
chore: update .gitignore and CHANGELOG for new development features
- Added .dev/ directory to .gitignore to exclude development process files.
- Updated CHANGELOG.md with new entries for the one-command `make dev` target, full stack initialization, and improved configuration management.
- Simplified README instructions for generating configuration files and starting the complete network stack.
2025-10-25 08:19:31 +03:00
f4a6d98a03
Merge pull request #55 from DeBrosOfficial/nightly
chore: update changelog for v0.51.6 release
2025-10-24 14:20:34 +03:00
cce5326368 chore: update changelog for v0.51.6 release
- Added entry for LibP2P NAT support under version 0.51.6
- Created new unreleased section at top of changelog for future updates
- Maintained Keep a Changelog format with standard sections (Added, Changed, Deprecated, etc.)
2025-10-24 14:19:21 +03:00
c0eeabe496
Merge pull request #54 from DeBrosOfficial/nightly
Nat Fixes
2025-10-24 14:17:17 +03:00
a1050a02db
Merge pull request #53 from DeBrosOfficial/nat-feat
Nat feat
2025-10-24 14:15:51 +03:00
ffde028694 feat: add LibP2P NAT support and bump version to 0.51.6-beta
- Added LibP2P support for NAT traversal to improve peer connectivity
- Updated version from 0.51.5-beta to 0.51.6-beta
- Updated CHANGELOG.md to document the new NAT support feature
2025-10-24 13:47:25 +03:00
c770968507 feat: added support for NAT on LibP2P 2025-10-24 13:41:38 +03:00
anonpenguin
ba3c1fadc9
Merge pull request #52 from DeBrosOfficial/nightly
Nightly
2025-10-24 11:19:37 +03:00
anonpenguin
b0c8c8c5f4
Merge pull request #51 from DeBrosOfficial/config-validation
Config validation
2025-10-24 11:14:22 +03:00
anonpenguin23
b8af8e0c98
Refactor configuration handling in CLI and node package. Updated default config file naming based on node type and enhanced gateway config generation to include bootstrap peers. Improved data directory path handling by expanding environment variables and user home directory. 2025-10-24 11:13:21 +03:00
anonpenguin23
83c498892c
Update CHANGELOG.md to include new entries for version 0.51.5, highlighting the addition of YAML file validation and a new authentication command in the CLI. 2025-10-24 11:05:34 +03:00
anonpenguin23
0717296822
Update configuration management and documentation. Added commands for generating and validating configuration files in the CLI, ensuring strict YAML decoding and improved error handling. Updated README to reflect new configuration paths and filesystem permissions. Enhanced Makefile to streamline node and gateway startup processes. 2025-10-24 11:03:50 +03:00
anonpenguin23
8a9deb50ec
Enhance README with detailed authentication commands and flow. Introduced explicit commands for managing credentials, improved descriptions of authentication features, and added environment variable support for gateway URL configuration. 2025-10-24 10:02:03 +03:00
anonpenguin23
60affaec5c
Enhance configuration validation for nodes and gateways, implementing strict YAML decoding to reject unknown fields. Added comprehensive validation checks for node types, listen addresses, database settings, and discovery parameters. Updated README with configuration validation details and examples. 2025-10-24 10:01:53 +03:00
anonpenguin23
279c03df82
Enhance configuration validation for nodes and gateways, implementing strict YAML decoding to reject unknown fields. Added comprehensive validation checks for node types, listen addresses, database settings, and discovery parameters. Updated README with configuration validation details and examples. 2025-10-24 09:52:18 +03:00
anonpenguin
b98409e268
Merge pull request #50 from DeBrosOfficial/nightly
Updated changelog
2025-10-22 09:52:06 +03:00
anonpenguin23
553797ab18
Updated changelog 2025-10-22 09:48:29 +03:00
anonpenguin
76ca6ec84d
Merge pull request #49 from DeBrosOfficial/nightly
Bug fixing
2025-10-22 09:47:24 +03:00
anonpenguin23
fc8a1545c8
Update default bootstrap peers in configuration to include additional nodes for improved discovery. 2025-10-22 08:44:06 +03:00
anonpenguin23
a21655b35a
Update version in Makefile to 0.51.2-beta 2025-10-22 08:43:09 +03:00
anonpenguin
64ca6c2f99
Merge pull request #48 from DeBrosOfficial/bug-fixes
Bug fixes
2025-10-22 08:42:17 +03:00
anonpenguin
83306ac5e4
Merge branch 'nightly' into bug-fixes 2025-10-22 08:41:59 +03:00
anonpenguin23
3694a2de93
Update CHANGELOG.md for version 0.51.2, enhance API key extraction to support query parameters, and implement internal auth context in status and storage handlers. 2025-10-22 08:41:15 +03:00
anonpenguin23
229e769755
Enhance gateway configuration by adding RQLiteDSN support and updating default connection settings. Updated config parsing to include RQLiteDSN from YAML and environment variables. Changed default RQLite connection URL from port 4001 to 5001. 2025-10-22 08:34:42 +03:00
anonpenguin
dd90db0215
Merge pull request #47 from DeBrosOfficial/nightly
Nightly
2025-10-22 07:21:38 +03:00
anonpenguin23
6d6c73dc33
Changed versions and changelog 2025-10-22 07:18:21 +03:00
anonpenguin
a9d4c1e0d6
Merge pull request #46 from DeBrosOfficial/discovery-bug-fix
Update node configuration and enhance peer discovery protocol
2025-10-22 07:15:34 +03:00
anonpenguin23
f372f2f5dc
Changed bootstrap peers 2025-10-22 07:15:01 +03:00
anonpenguin23
3f63194c22
Update node configuration and enhance peer discovery protocol
- Changed the configuration file for run-node3 to use node3.yaml.
- Modified select_data_dir function to require a hasConfigFile parameter and added error handling for missing configuration.
- Updated main function to pass the config path to select_data_dir.
- Introduced a peer exchange protocol in the discovery package, allowing nodes to request and exchange peer information.
- Refactored peer discovery logic in the node package to utilize the new discovery manager for active peer exchange.
- Cleaned up unused code related to previous peer discovery methods.
2025-10-22 07:13:47 +03:00
anonpenguin
889735f8d0
Merge pull request #42 from DeBrosOfficial/nightly
Peer ID Fixes and Identity v0.51.0
2025-09-26 09:38:59 +03:00
anonpenguin23
2eb4db3ddb
Update CHANGELOG.md 2025-09-26 07:56:05 +03:00
anonpenguin
587cb3dc11
Merge pull request #41 from DeBrosOfficial/peer-id-identity
Peer id identity fixes
2025-09-26 07:55:16 +03:00
anonpenguin23
b6db781ce2
- Added identity/main.go to generate identity and peer id
- Added encryption module identity.go for reusable identity create, save etc funtions
- Updated make file to support identity/main.go
- Updated node/node.go on loadOrCreateIdentity to use encryption.identity
- Updated cli/main.go to remove fallbacks for identity
- Updated install-debros-network.sh script to use new ./cmd/identity and fixed port order on print
- Updated makefile and changelog
2025-09-26 07:53:20 +03:00
anonpenguin23
5d951daaf8
Updated changelog 2025-09-23 07:50:09 +03:00
anonpenguin
b5fc5cff4b
Merge pull request #38 from DeBrosOfficial/nightly
Fixed wrong URL /v1/db to /v1/rqlite
2025-09-23 07:46:26 +03:00
anonpenguin23
ad1b389a53
Updated make file 2025-09-23 07:43:23 +03:00
anonpenguin23
3b08a91de3
Fixed wrong URL /v1/db to /v1/rqlite 2025-09-23 07:42:34 +03:00
anonpenguin
c1486028da
Merge pull request #37 from DeBrosOfficial/nightly
Storage Refactoring
2025-09-23 07:25:24 +03:00
anonpenguin23
f676659139
Update CHANGELOG.md 2025-09-23 07:22:49 +03:00
anonpenguin
05f2e61822
Merge pull request #36 from DeBrosOfficial/storage-refactor
Storage refactor
2025-09-23 07:20:50 +03:00
anonpenguin23
c0d8fcb895
- Created namespace_helpers on gateway
- Removed old pkg/gatway storage and migrated to new rqlite
- Updated readme
- Created new rqlite implementation
- Updated changelog
- Fixed migration error on migrations.go applySQL
2025-09-23 07:19:35 +03:00
anonpenguin23
16845b758d
- Remoevd old storage folder
- Created new rqlite folder
- Created rqlite adapter, client, gateway, migrations and rqlite init
- Updated node.go to support new rqlite architecture
- Updated readme
- Updated version
2025-09-23 06:00:57 +03:00
f46e9661eb
Merge pull request #35 from DeBrosOfficial/nightly
Nightly
2025-09-22 17:50:08 +03:00
anonpenguin23
c613dbd0ee
Updated changelog 2025-09-22 17:47:25 +03:00
anonpenguin
31920f504e
Merge pull request #34 from DeBrosOfficial/anyone-updates
Anyone updates
2025-09-22 17:43:43 +03:00
anonpenguin23
42598b1123
Removed anyone proxy from libp2p, anyone will be added
Updated readme to include all options for .yaml files
Added gateway.yaml file for gateway default configurations
2025-09-22 17:38:18 +03:00
anonpenguin23
25935fd3b1
Removed unused command setup-production-security.sh 2025-09-20 13:28:44 +03:00
anonpenguin23
ef0a6d5891 Update CHANGELOG.md 2025-09-20 13:06:48 +03:00
anonpenguin23
fc735c57b3 Update Makefile 2025-09-20 13:05:57 +03:00
anonpenguin
fd677e0290
Merge pull request #32 from DeBrosOfficial/bootstrap-peers
Added default bootstrap peers on config.go
2025-09-20 10:08:48 +03:00
anonpenguin23
3236e48725 Added default bootstrap peers on config.go 2025-09-20 10:07:32 +03:00
anonpenguin
5da956ab9d
Merge pull request #30 from DeBrosOfficial/port-change
Updated Gateway port from 8080/8005 to 6001 - Added Gateway port on
2025-09-20 08:18:47 +03:00
anonpenguin23
66f10df2e1 - Updated Gateway port from 8080/8005 to 6001 - Added Gateway port on
install-debros-network.sh
2025-09-20 08:15:31 +03:00
DeBros
fcbf043aca
Update README.md 2025-09-20 06:38:43 +03:00
DeBros
d866afbecf
Update README.md 2025-09-19 06:24:32 +03:00
anonpenguin23
fdd6dd2db3 Initialize changelog sections for unreleased and v0.43.4 versions 2025-09-18 15:32:06 +03:00
anonpenguin23
a9093355d8 Bump version to 0.42.4-beta and update repo URL 2025-09-18 15:31:03 +03:00
anonpenguin
44498ff381
Merge pull request #19 from DeBrosOfficial/convert-gitdebros
replaced git.debros.io with github.com
2025-09-18 15:29:43 +03:00
200 changed files with 19439 additions and 856 deletions

View File

@ -1 +1 @@
0.122.16
0.122.47

View File

@ -74,6 +74,10 @@ func parseGatewayConfig(logger *logging.ColoredLogger) *gateway.Config {
SFUPort int `yaml:"sfu_port"`
TURNDomain string `yaml:"turn_domain"`
TURNSecret string `yaml:"turn_secret"`
// TURNStealthDomain is the neutral stealth TURNS:443 host (feat-124).
// Maps to cfg.StealthCDNDomain so turn.credentials advertises the
// stealth rung of the URI ladder.
TURNStealthDomain string `yaml:"turn_stealth_domain"`
}
type yamlCfg struct {
@ -92,6 +96,12 @@ func parseGatewayConfig(logger *logging.ColoredLogger) *gateway.Config {
IPFSTimeout string `yaml:"ipfs_timeout"`
IPFSReplicationFactor int `yaml:"ipfs_replication_factor"`
WebRTC yamlWebRTCCfg `yaml:"webrtc"`
// SecretsEncryptionKey: see GatewayYAMLConfig docstring. Optional;
// when set, the standalone gateway populates
// cfg.SecretsEncryptionKey so serverless function secrets can be
// encrypted/decrypted (bugboard #837 follow-up). Empty leaves
// secrets management disabled (fail-loud).
SecretsEncryptionKey string `yaml:"secrets_encryption_key"`
// ClusterSecretPath: see GatewayYAMLConfig docstring. Optional;
// when set, the standalone gateway reads the file at this path
// and populates cfg.ClusterSecret so JWT signing keys can be
@ -229,6 +239,16 @@ func parseGatewayConfig(logger *logging.ColoredLogger) *gateway.Config {
}
}
// Serverless secrets encryption key — bugboard #837 follow-up. The
// host-managed gateway (pkg/node/gateway.go) reads this from
// secrets/secrets-encryption-key; the standalone binary used by namespace
// gateways via systemd receives it through this YAML field. Without it,
// `function secrets list` returned 501 ("Secrets management not
// available") on namespace gateways even though the host had the key.
if v := strings.TrimSpace(y.SecretsEncryptionKey); v != "" {
cfg.SecretsEncryptionKey = v
}
// WebRTC configuration
cfg.WebRTCEnabled = y.WebRTC.Enabled
if y.WebRTC.SFUPort > 0 {
@ -240,6 +260,9 @@ func parseGatewayConfig(logger *logging.ColoredLogger) *gateway.Config {
if v := strings.TrimSpace(y.WebRTC.TURNSecret); v != "" {
cfg.TURNSecret = v
}
if v := strings.TrimSpace(y.WebRTC.TURNStealthDomain); v != "" {
cfg.StealthCDNDomain = v
}
// Validate configuration
if errs := cfg.ValidateConfig(); len(errs) > 0 {

View File

@ -0,0 +1,70 @@
package main
import (
"strings"
"testing"
"github.com/DeBrosOfficial/network/pkg/config"
"github.com/DeBrosOfficial/network/pkg/gateway"
"gopkg.in/yaml.v3"
)
// TestSpawnedGatewayConfig_loadsSecretsEncryptionKey is the bugboard #837
// follow-up regression test for the *load* half: a YAML written by the
// namespace gateway spawner (gateway.GatewayYAMLConfig with the secrets key)
// must (a) pass the standalone gateway's STRICT decoder — i.e. the
// secrets_encryption_key field is a known field, not rejected — and (b) end
// up in gateway.Config.SecretsEncryptionKey via the same trim/assign the real
// parseGatewayConfig uses. Without the load mapping, `function secrets list`
// returned 501 on namespace gateways.
func TestSpawnedGatewayConfig_loadsSecretsEncryptionKey(t *testing.T) {
const key = "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
// Produce the exact YAML a spawned namespace gateway receives.
written := gateway.GatewayYAMLConfig{
ListenAddr: ":6001",
ClientNamespace: "anchat-test",
RQLiteDSN: "http://localhost:10000",
OlricServers: []string{"localhost:3320"},
SecretsEncryptionKey: key,
}
data, err := yaml.Marshal(written)
if err != nil {
t.Fatalf("marshal: %v", err)
}
// yamlCfgMirror mirrors the function-local yamlCfg in config.go. If the
// real loader's field/tag drifts, the round-trip assertion below fails.
type webrtc struct {
Enabled bool `yaml:"enabled"`
SFUPort int `yaml:"sfu_port"`
TURNDomain string `yaml:"turn_domain"`
TURNSecret string `yaml:"turn_secret"`
}
type yamlCfgMirror struct {
ListenAddr string `yaml:"listen_addr"`
ClientNamespace string `yaml:"client_namespace"`
RQLiteDSN string `yaml:"rqlite_dsn"`
OlricServers []string `yaml:"olric_servers"`
WebRTC webrtc `yaml:"webrtc"`
SecretsEncryptionKey string `yaml:"secrets_encryption_key"`
ClusterSecretPath string `yaml:"cluster_secret_path"`
}
var y yamlCfgMirror
// STRICT decode — the real loader rejects unknown fields, so this proves
// secrets_encryption_key is recognized.
if err := config.DecodeStrict(strings.NewReader(string(data)), &y); err != nil {
t.Fatalf("strict decode rejected the spawned gateway YAML: %v", err)
}
// Apply the same trim/assign as parseGatewayConfig.
cfg := &gateway.Config{}
if v := strings.TrimSpace(y.SecretsEncryptionKey); v != "" {
cfg.SecretsEncryptionKey = v
}
if cfg.SecretsEncryptionKey != key {
t.Errorf("gateway.Config.SecretsEncryptionKey = %q, want %q", cfg.SecretsEncryptionKey, key)
}
}

View File

@ -32,6 +32,18 @@
// backend:
// name: gateway
// addr: "127.0.0.1:8443"
// turn_discovery:
// namespaces_dir: /opt/orama/.orama/data/namespaces
// base_domain: orama-devnet.network
// rescan_interval: 30s
//
// When the turn_discovery.namespaces_dir is set, the router additionally scans
// <namespaces_dir>/*/configs/turn-*.yaml every rescan_interval and derives two
// routes per namespace with a TURNS listener — the bland stealth host and a
// "turn.ns-<namespace>.<base_domain>" alias — both forwarding to that
// namespace's local TURNS port. Discovered routes are merged with the static
// routes above (static wins on conflict); a transient scan error keeps the
// previously-installed routes.
package main
import (
@ -69,14 +81,29 @@ type yamlRoute struct {
Backend yamlBackend `yaml:"backend"`
}
// yamlTURNDiscovery mirrors sniproxy.TURNDiscoveryConfig for YAML decoding.
// When present and namespaces_dir is set, the router auto-discovers per-
// namespace stealth-TURN routes by scanning <namespaces_dir>/*/configs/turn-*.yaml.
type yamlTURNDiscovery struct {
NamespacesDir string `yaml:"namespaces_dir"`
BaseDomain string `yaml:"base_domain"`
RescanInterval time.Duration `yaml:"rescan_interval"`
}
// yamlConfig is the on-disk configuration shape.
type yamlConfig struct {
Listen string `yaml:"listen"`
ClientHelloTimeout time.Duration `yaml:"client_hello_timeout"`
BackendDialTimeout time.Duration `yaml:"backend_dial_timeout"`
MaxConcurrentConns int `yaml:"max_concurrent_conns"`
Fallback yamlBackend `yaml:"fallback"`
Routes []yamlRoute `yaml:"routes"`
Listen string `yaml:"listen"`
ClientHelloTimeout time.Duration `yaml:"client_hello_timeout"`
BackendDialTimeout time.Duration `yaml:"backend_dial_timeout"`
MaxConcurrentConns int `yaml:"max_concurrent_conns"`
Fallback yamlBackend `yaml:"fallback"`
Routes []yamlRoute `yaml:"routes"`
TURNDiscovery yamlTURNDiscovery `yaml:"turn_discovery"`
}
// discoveryEnabled reports whether TURN route auto-discovery is configured.
func (y *yamlConfig) discoveryEnabled() bool {
return y.TURNDiscovery.NamespacesDir != ""
}
func main() {
@ -90,10 +117,53 @@ func main() {
zap.String("version", version),
zap.String("commit", commit))
cfg := parseConfig(logger)
cfg, configPath := parseConfig(logger)
router := sniproxy.NewRouter(toBackend(cfg.Fallback))
router.Replace(toRoutes(cfg.Routes), toBackend(cfg.Fallback))
// The static routes (and fallback) always come from the config file; this
// closure is re-evaluated on every reload/rescan so a hand-edit to the
// config is picked up without a restart.
staticSource := func() ([]sniproxy.Route, sniproxy.Backend, error) {
y, err := loadConfig(configPath)
if err != nil {
return nil, sniproxy.Backend{}, err
}
return toRoutes(y.Routes), toBackend(y.Fallback), nil
}
routeStop := make(chan struct{})
defer close(routeStop)
if cfg.discoveryEnabled() {
// Auto-discover per-namespace stealth-TURN routes by scanning the
// namespaces directory, merged with the static config routes (static
// wins on conflict), re-installed atomically every rescan_interval. A
// transient scan error keeps the previously-installed routes.
discoverer := sniproxy.NewTURNRouteDiscoverer(
sniproxy.TURNDiscoveryConfig{
NamespacesDir: cfg.TURNDiscovery.NamespacesDir,
BaseDomain: cfg.TURNDiscovery.BaseDomain,
RescanInterval: cfg.TURNDiscovery.RescanInterval,
}, staticSource, router, logger.Logger)
if err := discoverer.Apply(); err != nil {
logger.ComponentError(logging.ComponentSNI, "Failed to install initial routes",
zap.Error(err))
os.Exit(1)
}
go discoverer.Run(routeStop)
} else {
// No discovery configured: hot-reload the static route table from the
// config file so cdn/turn SNI routes can be added or removed without
// restarting (Router.Replace swaps atomically under in-flight conns).
reloader := sniproxy.NewFileRouteReloader(configPath, staticSource, router, logger.Logger)
if err := reloader.Apply(); err != nil {
logger.ComponentError(logging.ComponentSNI, "Failed to install initial routes",
zap.Error(err))
os.Exit(1)
}
go reloader.Watch(sniproxy.DefaultRouteReloadInterval, routeStop)
}
srv := sniproxy.NewServer(router, sniproxy.Config{
ClientHelloTimeout: cfg.ClientHelloTimeout,
@ -140,7 +210,7 @@ func main() {
logger.ComponentInfo(logging.ComponentSNI, "SNI router shutdown complete")
}
func parseConfig(logger *logging.ColoredLogger) yamlConfig {
func parseConfig(logger *logging.ColoredLogger) (yamlConfig, string) {
configFlag := flag.String("config", "", "Config file path (absolute or filename in ~/.orama)")
flag.Parse()
@ -166,28 +236,11 @@ func parseConfig(logger *logging.ColoredLogger) yamlConfig {
}
}
data, err := os.ReadFile(configPath)
y, err := loadConfig(configPath)
if err != nil {
logger.ComponentError(logging.ComponentSNI, "Config file not found",
logger.ComponentError(logging.ComponentSNI, "Failed to load SNI router config",
zap.String("path", configPath), zap.Error(err))
fmt.Fprintf(os.Stderr, "\nConfig file not found at %s\n", configPath)
os.Exit(1)
}
var y yamlConfig
if err := config.DecodeStrict(strings.NewReader(string(data)), &y); err != nil {
logger.ComponentError(logging.ComponentSNI, "Failed to parse SNI router config",
zap.Error(err))
fmt.Fprintf(os.Stderr, "Configuration parse error: %v\n", err)
os.Exit(1)
}
if errs := validateConfig(&y); len(errs) > 0 {
fmt.Fprintf(os.Stderr, "\nSNI router configuration errors (%d):\n", len(errs))
for _, e := range errs {
fmt.Fprintf(os.Stderr, " - %s\n", e)
}
fmt.Fprintf(os.Stderr, "\nPlease fix the configuration and try again.\n")
fmt.Fprintf(os.Stderr, "\nSNI router configuration error: %v\n", err)
os.Exit(1)
}
@ -195,7 +248,25 @@ func parseConfig(logger *logging.ColoredLogger) yamlConfig {
zap.String("path", configPath),
)
return y
return y, configPath
}
// loadConfig reads, decodes, and validates the SNI router config file. Shared
// by the initial parse and every hot-reload, so it returns an error instead of
// exiting the process.
func loadConfig(path string) (yamlConfig, error) {
data, err := os.ReadFile(path)
if err != nil {
return yamlConfig{}, fmt.Errorf("read config %s: %w", path, err)
}
var y yamlConfig
if err := config.DecodeStrict(strings.NewReader(string(data)), &y); err != nil {
return yamlConfig{}, fmt.Errorf("parse config: %w", err)
}
if errs := validateConfig(&y); len(errs) > 0 {
return yamlConfig{}, fmt.Errorf("invalid config: %s", strings.Join(errs, "; "))
}
return y, nil
}
// validateConfig returns a non-empty slice of human-readable errors on misconfig.
@ -215,6 +286,16 @@ func validateConfig(y *yamlConfig) []string {
errs = append(errs, fmt.Sprintf("routes[%d].backend.addr: required", i))
}
}
// turn_discovery is optional, but when partially set (namespaces_dir XOR
// base_domain) it is almost certainly a misconfiguration, so validate the
// pair together via the library's own Validate.
if y.discoveryEnabled() || y.TURNDiscovery.BaseDomain != "" {
dc := sniproxy.TURNDiscoveryConfig{
NamespacesDir: y.TURNDiscovery.NamespacesDir,
BaseDomain: y.TURNDiscovery.BaseDomain,
}
errs = append(errs, dc.Validate()...)
}
return errs
}

View File

@ -39,19 +39,6 @@ func parseTURNConfig(logger *logging.ColoredLogger) *turn.Config {
}
}
type yamlCfg struct {
ListenAddr string `yaml:"listen_addr"`
TURNSListenAddr string `yaml:"turns_listen_addr"`
PublicIP string `yaml:"public_ip"`
Realm string `yaml:"realm"`
AuthSecret string `yaml:"auth_secret"`
RelayPortStart int `yaml:"relay_port_start"`
RelayPortEnd int `yaml:"relay_port_end"`
Namespace string `yaml:"namespace"`
TLSCertPath string `yaml:"tls_cert_path"`
TLSKeyPath string `yaml:"tls_key_path"`
}
data, err := os.ReadFile(configPath)
if err != nil {
logger.ComponentError(logging.ComponentTURN, "Config file not found",
@ -60,26 +47,13 @@ func parseTURNConfig(logger *logging.ColoredLogger) *turn.Config {
os.Exit(1)
}
var y yamlCfg
if err := config.DecodeStrict(strings.NewReader(string(data)), &y); err != nil {
cfg, err := decodeTURNConfig(data)
if err != nil {
logger.ComponentError(logging.ComponentTURN, "Failed to parse TURN config", zap.Error(err))
fmt.Fprintf(os.Stderr, "Configuration parse error: %v\n", err)
os.Exit(1)
}
cfg := &turn.Config{
ListenAddr: y.ListenAddr,
TURNSListenAddr: y.TURNSListenAddr,
PublicIP: y.PublicIP,
Realm: y.Realm,
AuthSecret: y.AuthSecret,
RelayPortStart: y.RelayPortStart,
RelayPortEnd: y.RelayPortEnd,
Namespace: y.Namespace,
TLSCertPath: y.TLSCertPath,
TLSKeyPath: y.TLSKeyPath,
}
if errs := cfg.Validate(); len(errs) > 0 {
fmt.Fprintf(os.Stderr, "\nTURN configuration errors (%d):\n", len(errs))
for _, e := range errs {
@ -98,3 +72,50 @@ func parseTURNConfig(logger *logging.ColoredLogger) *turn.Config {
return cfg
}
// decodeTURNConfig strictly decodes the TURN YAML the namespace spawner writes
// (yaml.Marshal of turn.Config) into a turn.Config. The yamlCfg struct MUST
// carry every yaml-tagged field turn.Config marshals — DecodeStrict rejects
// unknown keys, so a missing field crashes the TURN binary at startup.
// Extracted (no os.Exit) so the spawner-output ↔ parser contract is unit-
// testable (see config_test.go).
func decodeTURNConfig(data []byte) (*turn.Config, error) {
type yamlCfg struct {
ListenAddr string `yaml:"listen_addr"`
TURNSListenAddr string `yaml:"turns_listen_addr"`
PublicIP string `yaml:"public_ip"`
Realm string `yaml:"realm"`
AuthSecret string `yaml:"auth_secret"`
RelayPortStart int `yaml:"relay_port_start"`
RelayPortEnd int `yaml:"relay_port_end"`
Namespace string `yaml:"namespace"`
TLSCertPath string `yaml:"tls_cert_path"`
TLSKeyPath string `yaml:"tls_key_path"`
// feat-124 stealth TURNS-over-:443: second cert served by SNI.
StealthDomain string `yaml:"stealth_domain"`
TLSStealthCertPath string `yaml:"tls_stealth_cert_path"`
TLSStealthKeyPath string `yaml:"tls_stealth_key_path"`
}
var y yamlCfg
if err := config.DecodeStrict(strings.NewReader(string(data)), &y); err != nil {
return nil, err
}
return &turn.Config{
ListenAddr: y.ListenAddr,
TURNSListenAddr: y.TURNSListenAddr,
PublicIP: y.PublicIP,
Realm: y.Realm,
AuthSecret: y.AuthSecret,
RelayPortStart: y.RelayPortStart,
RelayPortEnd: y.RelayPortEnd,
Namespace: y.Namespace,
TLSCertPath: y.TLSCertPath,
TLSKeyPath: y.TLSKeyPath,
StealthDomain: y.StealthDomain,
TLSStealthCertPath: y.TLSStealthCertPath,
TLSStealthKeyPath: y.TLSStealthKeyPath,
}, nil
}

View File

@ -0,0 +1,60 @@
package main
import (
"testing"
"github.com/DeBrosOfficial/network/pkg/turn"
"gopkg.in/yaml.v3"
)
// TestDecodeTURNConfig_acceptsSpawnerOutput is the regression guard for the
// feat-124 crash: the namespace spawner writes the TURN config via
// yaml.Marshal(turn.Config), and the TURN binary parses it with a STRICT
// decoder. If turn.Config gains a yaml field the parser doesn't know, strict
// decode rejects it and TURN crash-loops at startup. This pins that the
// spawner's exact output round-trips through the parser, including the stealth
// fields.
func TestDecodeTURNConfig_acceptsSpawnerOutput(t *testing.T) {
src := turn.Config{
ListenAddr: "0.0.0.0:3478",
TURNSListenAddr: "0.0.0.0:5349",
PublicIP: "203.0.113.7",
Realm: "orama-devnet.network",
AuthSecret: "secret",
RelayPortStart: 49152,
RelayPortEnd: 49951,
Namespace: "anchat-test",
TLSCertPath: "/x/turn-cert.pem",
TLSKeyPath: "/x/turn-key.pem",
StealthDomain: "cdn-3259254d4d3e.orama-devnet.network",
TLSStealthCertPath: "/var/lib/caddy/caddy/certificates/.../wildcard_.orama-devnet.network.crt",
TLSStealthKeyPath: "/var/lib/caddy/caddy/certificates/.../wildcard_.orama-devnet.network.key",
}
data, err := yaml.Marshal(src)
if err != nil {
t.Fatalf("marshal: %v", err)
}
got, err := decodeTURNConfig(data)
if err != nil {
t.Fatalf("strict decode of spawner output failed — TURN would crash-loop at startup: %v\n---\n%s", err, data)
}
if got.StealthDomain != src.StealthDomain ||
got.TLSStealthCertPath != src.TLSStealthCertPath ||
got.TLSStealthKeyPath != src.TLSStealthKeyPath {
t.Errorf("stealth fields did not round-trip: got %+v", got)
}
if got.AuthSecret != src.AuthSecret || got.TURNSListenAddr != src.TURNSListenAddr {
t.Errorf("core fields did not round-trip: got %+v", got)
}
}
// TestDecodeTURNConfig_rejectsUnknownField confirms the strict decoder still
// rejects genuinely-unknown keys (so the contract above is meaningful).
func TestDecodeTURNConfig_rejectsUnknownField(t *testing.T) {
if _, err := decodeTURNConfig([]byte("listen_addr: \"0.0.0.0:3478\"\nbogus_field: 1\n")); err == nil {
t.Fatal("expected strict decode to reject an unknown field")
}
}

View File

@ -214,6 +214,43 @@ your client computes locally from `(namespace, userId, topic_secret)`.
For `ntfy` with `topic_mode=path`, the token is `ns/<namespace>/<userId>`.
### UnifiedPush (Android / GrapheneOS, no Google Play Services)
ntfy is a [UnifiedPush](https://unifiedpush.org) distributor, so Android
devices — including de-Googled **GrapheneOS** — can receive push **without
Firebase / Google Play Services**. The flow:
1. The device runs a UnifiedPush **distributor** (the ntfy Android app, or an
embedded distributor library) pointed at your push host
(`https://push.<your-zone>`).
2. The app registers with the distributor and is handed an **endpoint URL**,
e.g. `https://push.<your-zone>/upXXXXXXXX`.
3. Register that endpoint as a push device:
```http
POST /v1/push/devices
{
"device_id": "<unique per-device ID>",
"provider": "ntfy",
"token": "https://push.<your-zone>/upXXXXXXXX", // the full endpoint
"platform": "android"
}
```
The gateway POSTs to the endpoint **verbatim** (per the UnifiedPush spec), so
you don't have to deconstruct it. As a safety measure the endpoint's
scheme+host **must match your configured ntfy push host** — a device token can
only ever publish to your own push server, never an arbitrary host.
You may instead register just the bare **topic** (the endpoint's last path
segment) as the token — both forms work; use whichever your UnifiedPush library
makes convenient.
**GrapheneOS notes:** works under both "No Google Play" and "Sandboxed Google
Play" profiles. The distributor holds the persistent connection (not your app),
so battery impact is the distributor's; high-priority messages
(`priority: "high"`) wake the app from Doze.
---
## Step 6 — Send pushes

View File

@ -187,6 +187,69 @@ The legacy `db_execute` is kept indefinitely so existing functions don't break.
|----------|-------------|
| `pubsub_publish(topic, dataJSON)` → bool | Publish message to a PubSub topic. Returns true on success. |
### Ephemeral State (WS-subscribe-tracked)
Short-lived per-subscriber state (typing indicators, presence, call ringing,
live cursors) that the gateway **auto-clears the moment the owning WebSocket
client disconnects** — no heartbeats, no prune crons. State also expires on a
TTL backstop (default 60 s, max 30 min). The owning client ID and namespace
come from the server-trusted invocation context; functions cannot spoof them.
| Function | Description |
|----------|-------------|
| `ephemeral_state_set(topic, key, payload, ttlMs)` → u32 | Record state owned by the CURRENT invocation's WS client and publish an `ephemeral.set` event on the topic. 1 = ok, 0 = failure (no WS client, empty topic/key, payload > 16 KiB, > 256 keys/client). |
| `ephemeral_state_clear(topic, key)` → u32 | Clear state this client owns; publishes `ephemeral.clear` (reason `explicit`). Idempotent — clearing a missing/non-owned key returns 1. |
| `ephemeral_state_list(topic)` → u64 | Reconnect catch-up read: packed `ptr<<32\|len` of a JSON envelope with the live entries on the topic. Works without a WS client (read-only). 0 on failure. |
Raw import signatures (pointer/length ABI — note `ttlMs` is **i64**):
```go
//go:wasmimport env ephemeral_state_set
func ephemeralStateSet(topicPtr *byte, topicLen uint32, keyPtr *byte, keyLen uint32,
payloadPtr *byte, payloadLen uint32, ttlMs int64) uint32
//go:wasmimport env ephemeral_state_clear
func ephemeralStateClear(topicPtr *byte, topicLen uint32, keyPtr *byte, keyLen uint32) uint32
//go:wasmimport env ephemeral_state_list
func ephemeralStateList(topicPtr *byte, topicLen uint32) uint64 // ptr<<32|len of JSON
```
Synthetic events are published **on the same topic** the state lives on, with
the `_orama` control-frame discriminator (same dispatch pattern as the
`auth.refresh` frame). Subscribers update their local view from the stream:
```json
{"_orama":"ephemeral.set", "topic":"typing:room1", "key":"user-7", "client_id":"ws-abc", "payload":"<base64>"}
{"_orama":"ephemeral.clear","topic":"typing:room1", "key":"user-7", "client_id":"ws-abc", "reason":"disconnect"}
```
`reason` is `explicit` (function called clear), `disconnect` (owning WS client
went away — the zero-lag path), or `expired` (TTL backstop). `payload` is
base64 (Go `[]byte` JSON encoding) and present only on `ephemeral.set`.
`ephemeral_state_list` returns:
```json
{"entries":[{"key":"user-7","client_id":"ws-abc","payload":"<base64>","expires_in_ms":48211}]}
```
Typing-indicator shape (called from a `ws_persistent` rpc-router function):
```go
// Client sends {"op":"typing.start","room":"room1","user":"user-7"} → handler:
ephemeralStateSet(ptr("typing:"+room), len32("typing:"+room),
ptr(userID), len32(userID), nil, 0, 30_000) // 30s TTL backstop
// Client sends typing.stop → handler:
ephemeralStateClear(ptr("typing:"+room), len32("typing:"+room), ptr(userID), len32(userID))
// No typing.stop needed on app kill / network drop: the WS disconnect publishes
// {"_orama":"ephemeral.clear",...,"reason":"disconnect"} to every subscriber
// immediately. On (re)connect, call ephemeral_state_list("typing:"+room) once
// to seed local state, then track the event stream.
```
### Logging
| Function | Description |

View File

@ -0,0 +1,15 @@
-- =============================================================================
-- 029_raw_http_response.sql
--
-- Raw-HTTP-response serverless function mode — bugboard #835.
--
-- When raw_http_response is true, the function may call the set_http_response
-- host function to emit a verbatim HTTP response (status + headers + body)
-- instead of the JSON/Ack-wrapped output. This lets a namespace app proxy an
-- upstream RPC (Helius / Alchemy) transparently. See pkg/serverless/raw_http.go.
--
-- Default false → backward compatible: existing functions keep returning the
-- JSON/Ack-wrapped output unchanged.
-- =============================================================================
ALTER TABLE functions ADD COLUMN raw_http_response BOOLEAN DEFAULT FALSE;

View File

@ -0,0 +1,16 @@
-- =============================================================================
-- 030_webrtc_stealth.sql
--
-- Stealth TURNS-over-443 per namespace — feat-124 (censorship-resistant
-- calling). When stealth_enabled is true the namespace's TURN servers carry a
-- second TLS certificate for the neutral stealth hostname
-- (cdn-<hash>.<base-domain>, derived via turn.StealthHostForNamespace), the
-- SNI router forwards :443 ClientHellos for that hostname to the TURN TLS
-- listener, and turn.credentials advertises `turns:<stealth-host>:443` as the
-- final rung of the ICE URI ladder.
--
-- Default false → backward compatible: existing WebRTC namespaces keep the
-- baseline udp:3478 / tcp:3478 / turns:5349 URIs unchanged.
-- =============================================================================
ALTER TABLE namespace_webrtc_config ADD COLUMN stealth_enabled BOOLEAN DEFAULT FALSE;

View File

@ -31,6 +31,8 @@ func init() {
Cmd.AddCommand(functions.ListCmd)
Cmd.AddCommand(functions.GetCmd)
Cmd.AddCommand(functions.DeleteCmd)
Cmd.AddCommand(functions.DisableCmd)
Cmd.AddCommand(functions.EnableCmd)
Cmd.AddCommand(functions.LogsCmd)
Cmd.AddCommand(functions.VersionsCmd)
Cmd.AddCommand(functions.SecretsCmd)

View File

@ -9,6 +9,24 @@ import (
"github.com/spf13/cobra"
)
// tinygoBuildArgs returns the argv (without the leading `tinygo`) used
// to compile a function. Pure function — extracted from buildFunction
// so the WS-persistent → `-buildmode=c-shared` policy can be unit
// tested without invoking TinyGo.
//
// Persistent WS functions need the WASI-reactor variant (exports
// `_initialize`, no `_start`) — see the comment on cfg loading in
// buildFunction for the full rationale. Stateless (default) functions
// stay on command mode for back-compat.
func tinygoBuildArgs(outputPath string, wsPersistent bool) []string {
args := []string{"build", "-o", outputPath, "-target", "wasi"}
if wsPersistent {
args = append(args, "-buildmode=c-shared")
}
args = append(args, ".")
return args
}
// BuildCmd compiles a function to WASM using TinyGo.
var BuildCmd = &cobra.Command{
Use: "build [directory]",
@ -46,6 +64,25 @@ func buildFunction(dir string) (string, error) {
return "", fmt.Errorf("function.yaml not found in %s", absDir)
}
// Load config so we can pick the right TinyGo build mode based on
// ws_persistent. Persistent functions need WASI-reactor semantics
// (`_initialize` export, no `_start`); command-mode functions stay
// on the default. See bug #240/#249 follow-up #6 for the full
// rationale — TL;DR: TinyGo command-mode `_start` doesn't set the
// runtime guard `wasmExportCheckRun` checks, so any export call
// from the host (e.g. orama_alloc → ws_open payload) traps with
// "wasm error: unreachable" inside the runtime hashmap path.
//
// `-buildmode=c-shared` flips TinyGo to reactor mode: the wasm
// exports `_initialize` instead of `_start`. The gateway's
// persistent-instance bootstrap (pkg/serverless/engine.go) calls
// `_initialize` first if exported, which sets the guard cleanly,
// and the function's exports become callable from the host loop.
cfg, cfgErr := LoadConfig(absDir)
if cfgErr != nil {
return "", fmt.Errorf("failed to load function.yaml: %w", cfgErr)
}
// Check TinyGo is installed
tinygoPath, err := exec.LookPath("tinygo")
if err != nil {
@ -56,8 +93,15 @@ func buildFunction(dir string) (string, error) {
fmt.Printf("Building %s...\n", absDir)
// Run tinygo build
buildCmd := exec.Command(tinygoPath, "build", "-o", outputPath, "-target", "wasi", ".")
// Build args. Default = command mode. Persistent WS functions get
// reactor mode via `-buildmode=c-shared` so TinyGo emits
// `_initialize` and the runtime guard activates.
tinygoArgs := tinygoBuildArgs(outputPath, cfg.WSPersistent)
if cfg.WSPersistent {
fmt.Printf(" (ws_persistent=true → using -buildmode=c-shared for WASI-reactor semantics)\n")
}
buildCmd := exec.Command(tinygoPath, tinygoArgs...)
buildCmd.Dir = absDir
buildCmd.Stdout = os.Stdout
buildCmd.Stderr = os.Stderr

View File

@ -0,0 +1,83 @@
package functions
import (
"strings"
"testing"
)
// TestTinygoBuildArgs_PersistentGetsCSharedBuildmode is the regression
// guard for bug #240/#249 follow-up #6: TinyGo command-mode `_start`
// doesn't set the reactor-mode runtime guard, so any export call from
// the host (e.g. orama_alloc → ws_open payload) traps with
// "wasm error: unreachable" inside the runtime hashmap path.
//
// Fix: persistent functions get `-buildmode=c-shared` which flips
// TinyGo to reactor mode (exports `_initialize`, no `_start`). The
// gateway's persistent-instance bootstrap already calls `_initialize`
// first if exported (pkg/serverless/engine.go::InstantiatePersistent),
// so reactor-built wasms cleanly initialize the TinyGo runtime and
// every subsequent host-driven export call works.
//
// Empirically confirmed against TinyGo 0.40.1: the same source
// compiled with vs. without `-buildmode=c-shared` produces wasms with
// `_start` only vs. `_initialize` only respectively.
//
// If a future refactor drops the flag (or adds it for stateless), this
// test fails loud — the AnChat WS chain went down for ~1 day chasing
// this exact behavior.
func TestTinygoBuildArgs_PersistentGetsCSharedBuildmode(t *testing.T) {
tests := []struct {
name string
wsPersistent bool
wantContains string // substring that must appear in the joined args
wantAbsent string // substring that must NOT appear
}{
{
name: "stateless function stays in command mode (default)",
wsPersistent: false,
wantContains: "-target wasi",
wantAbsent: "-buildmode=c-shared",
},
{
name: "persistent function gets reactor mode (c-shared)",
wsPersistent: true,
wantContains: "-buildmode=c-shared",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := tinygoBuildArgs("/tmp/out.wasm", tt.wsPersistent)
joined := strings.Join(got, " ")
if !strings.Contains(joined, tt.wantContains) {
t.Errorf("missing %q in args: %q", tt.wantContains, joined)
}
if tt.wantAbsent != "" && strings.Contains(joined, tt.wantAbsent) {
t.Errorf("unexpected %q in args (only persistent should get this): %q",
tt.wantAbsent, joined)
}
// Invariants for both: build action, output path, source dir.
for _, want := range []string{"build", "-o", "/tmp/out.wasm", "-target", "wasi", "."} {
found := false
for _, a := range got {
if a == want {
found = true
break
}
}
if !found {
t.Errorf("missing required arg %q in: %v", want, got)
}
}
// Invariant: the source directory `.` must be the LAST arg
// (TinyGo's positional). If we accidentally reorder the
// builder so the flag goes after `.`, TinyGo will treat the
// flag as a build target and fail with a confusing error.
if got[len(got)-1] != "." {
t.Errorf("last arg should be `.`, got %q (full args: %v)", got[len(got)-1], got)
}
})
}
}

View File

@ -0,0 +1,86 @@
package functions
import (
"encoding/json"
"fmt"
"io"
"net/http"
"github.com/spf13/cobra"
)
// DisableCmd pauses a function without redeploying.
//
// Plan 11.5 — operators flip a function's status during incident
// response, then re-enable when fixed. Existing in-flight invocations
// finish; new ones return 503 because the invoker treats inactive
// functions as missing.
var DisableCmd = &cobra.Command{
Use: "disable <name>",
Short: "Disable a function without deleting it",
Long: `Disables a deployed function. The function row stays in the registry but
new invocations are rejected. Use 'orama function enable' to resume.
Useful during incident response pause a misbehaving function until you
can root-cause without losing its deployed code or version history.`,
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
return runSetEnabled(args[0], false)
},
}
// EnableCmd resumes a disabled function. Inverse of DisableCmd.
var EnableCmd = &cobra.Command{
Use: "enable <name>",
Short: "Re-enable a previously disabled function",
Long: `Re-enables a function that was paused with 'orama function disable'.`,
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
return runSetEnabled(args[0], true)
},
}
func runSetEnabled(name string, enabled bool) error {
action := "disable"
if enabled {
action = "enable"
}
resp, err := apiPostNoBody("/v1/functions/" + name + "/" + action)
if err != nil {
return err
}
verb := "disabled"
if enabled {
verb = "enabled"
}
if msg, ok := resp["message"]; ok {
fmt.Println(msg)
} else {
fmt.Printf("Function %q %s.\n", name, verb)
}
return nil
}
// apiPostNoBody performs an authenticated POST with no body. Used by
// the disable/enable endpoints which take no payload (action is in the
// URL path).
func apiPostNoBody(endpoint string) (map[string]interface{}, error) {
resp, err := apiRequest(http.MethodPost, endpoint, nil, "")
if err != nil {
return nil, err
}
defer resp.Body.Close()
respBody, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("failed to read response: %w", err)
}
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("API error (%d): %s", resp.StatusCode, string(respBody))
}
var result map[string]interface{}
if err := json.Unmarshal(respBody, &result); err != nil {
return nil, fmt.Errorf("failed to parse response: %w", err)
}
return result, nil
}

View File

@ -32,6 +32,11 @@ type FunctionConfig struct {
WSIdleTimeoutSec int `yaml:"ws_idle_timeout_sec"`
WSMaxFrameBytes int `yaml:"ws_max_frame_bytes"`
WSMaxInflightPerConn int `yaml:"ws_max_inflight_per_conn"`
// RawHTTPResponse enables raw-HTTP-response mode (bugboard #835) — the
// function may call set_http_response to emit a verbatim HTTP response
// (status/headers/body) instead of the JSON/Ack-wrapped output.
RawHTTPResponse bool `yaml:"raw_http_response"`
}
// RetryConfig holds retry settings.
@ -226,6 +231,9 @@ func uploadWASMFunction(wasmPath string, cfg *FunctionConfig) (map[string]interf
if cfg.WSMaxInflightPerConn > 0 {
metaObj["ws_max_inflight_per_conn"] = cfg.WSMaxInflightPerConn
}
if cfg.RawHTTPResponse {
metaObj["raw_http_response"] = true
}
if len(metaObj) > 0 {
metadata, _ := json.Marshal(metaObj)
writer.WriteField("metadata", string(metadata))

View File

@ -0,0 +1,53 @@
package functions
import (
"os"
"path/filepath"
"testing"
)
// writeFunctionYAML writes a function.yaml into a fresh temp dir and returns it.
func writeFunctionYAML(t *testing.T, body string) string {
t.Helper()
dir := t.TempDir()
if err := os.WriteFile(filepath.Join(dir, "function.yaml"), []byte(body), 0o600); err != nil {
t.Fatalf("write function.yaml: %v", err)
}
return dir
}
func TestLoadConfig_RawHTTPResponse_true(t *testing.T) {
dir := writeFunctionYAML(t, "name: rpc-proxy\nraw_http_response: true\n")
cfg, err := LoadConfig(dir)
if err != nil {
t.Fatalf("LoadConfig: %v", err)
}
if !cfg.RawHTTPResponse {
t.Error("RawHTTPResponse = false, want true")
}
}
func TestLoadConfig_RawHTTPResponse_defaultsFalse(t *testing.T) {
dir := writeFunctionYAML(t, "name: plain-fn\n")
cfg, err := LoadConfig(dir)
if err != nil {
t.Fatalf("LoadConfig: %v", err)
}
if cfg.RawHTTPResponse {
t.Error("RawHTTPResponse = true, want false (omitted in yaml)")
}
}
func TestLoadConfig_RawHTTPResponse_explicitFalse(t *testing.T) {
dir := writeFunctionYAML(t, "name: plain-fn\nraw_http_response: false\n")
cfg, err := LoadConfig(dir)
if err != nil {
t.Fatalf("LoadConfig: %v", err)
}
if cfg.RawHTTPResponse {
t.Error("RawHTTPResponse = true, want false")
}
}

View File

@ -79,6 +79,8 @@ func showNamespaceHelp() {
fmt.Printf(" repair <namespace> - Repair an under-provisioned namespace cluster\n")
fmt.Printf(" enable webrtc --namespace NS - Enable WebRTC (SFU + TURN) for a namespace\n")
fmt.Printf(" disable webrtc --namespace NS - Disable WebRTC for a namespace\n")
fmt.Printf(" enable webrtc-stealth --namespace NS - Enable stealth TURNS over :443 (feat-124)\n")
fmt.Printf(" disable webrtc-stealth --namespace NS - Disable stealth TURNS\n")
fmt.Printf(" webrtc-status --namespace NS - Show WebRTC service status\n")
fmt.Printf(" help - Show this help message\n\n")
fmt.Printf("Flags:\n")
@ -226,8 +228,12 @@ func handleNamespaceDelete(force bool) {
func handleNamespaceEnable(args []string) {
feature := args[0]
if feature == "webrtc-stealth" {
handleNamespaceStealthToggle(args[1:], true)
return
}
if feature != "webrtc" {
fmt.Fprintf(os.Stderr, "Unknown feature: %s\nSupported features: webrtc\n", feature)
fmt.Fprintf(os.Stderr, "Unknown feature: %s\nSupported features: webrtc, webrtc-stealth\n", feature)
os.Exit(1)
}
@ -283,10 +289,82 @@ func handleNamespaceEnable(args []string) {
fmt.Printf(" TURN instances: 2 nodes (relay on public IPs)\n")
}
// handleNamespaceStealthToggle drives /v1/namespace/webrtc/stealth/{enable|disable}
// (feat-124 — censorship-resistant TURNS over :443).
func handleNamespaceStealthToggle(args []string, enable bool) {
verb := "disable"
if enable {
verb = "enable"
}
var ns string
fs := flag.NewFlagSet("namespace "+verb+" webrtc-stealth", flag.ExitOnError)
fs.StringVar(&ns, "namespace", "", "Namespace name")
_ = fs.Parse(args)
if ns == "" {
fmt.Fprintf(os.Stderr, "Usage: orama namespace %s webrtc-stealth --namespace <name>\n", verb)
os.Exit(1)
}
gatewayURL, apiKey := loadAuthForNamespace(ns)
if enable {
fmt.Printf("Enabling WebRTC stealth (TURNS over :443) for namespace '%s'...\n", ns)
fmt.Printf("This provisions a Let's Encrypt cert for the neutral stealth host and may take up to ~2 minutes.\n")
} else {
fmt.Printf("Disabling WebRTC stealth for namespace '%s'...\n", ns)
}
url := fmt.Sprintf("%s/v1/namespace/webrtc/stealth/%s", gatewayURL, verb)
req, err := http.NewRequest(http.MethodPost, url, nil)
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to create request: %v\n", err)
os.Exit(1)
}
req.Header.Set("Authorization", "Bearer "+apiKey)
client := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
resp, err := client.Do(req)
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to connect to gateway: %v\n", err)
os.Exit(1)
}
defer resp.Body.Close()
var result map[string]interface{}
json.NewDecoder(resp.Body).Decode(&result)
if resp.StatusCode != http.StatusOK {
errMsg := "unknown error"
if e, ok := result["error"].(string); ok {
errMsg = e
}
fmt.Fprintf(os.Stderr, "Failed to %s WebRTC stealth: %s\n", verb, errMsg)
os.Exit(1)
}
if enable {
fmt.Printf("WebRTC stealth enabled for namespace '%s'.\n", ns)
fmt.Printf(" turn.credentials now advertises the full URI ladder including turns:<stealth-host>:443.\n")
fmt.Printf(" Make sure the SNI router is enabled on the TURN nodes (node.yaml sni_router.enabled).\n")
} else {
fmt.Printf("WebRTC stealth disabled for namespace '%s'.\n", ns)
}
}
func handleNamespaceDisable(args []string) {
feature := args[0]
if feature == "webrtc-stealth" {
handleNamespaceStealthToggle(args[1:], false)
return
}
if feature != "webrtc" {
fmt.Fprintf(os.Stderr, "Unknown feature: %s\nSupported features: webrtc\n", feature)
fmt.Fprintf(os.Stderr, "Unknown feature: %s\nSupported features: webrtc, webrtc-stealth\n", feature)
os.Exit(1)
}

View File

@ -477,6 +477,22 @@ func (o *Orchestrator) saveSecretsFromJoinResponse(resp *joinhandlers.JoinRespon
}
}
// Write serverless secrets encryption key (bugboard #837) — identical on
// every node so namespace function secrets decrypt cluster-wide.
if resp.SecretsEncryptionKey != "" {
if err := os.WriteFile(filepath.Join(secretsDir, "secrets-encryption-key"), []byte(resp.SecretsEncryptionKey), 0600); err != nil {
return fmt.Errorf("failed to write secrets-encryption-key: %w", err)
}
}
// Write TURN shared secret (feat-124 #913) — identical on every node so
// WebRTC TURN credentials validate cluster-wide and survive config regen.
if resp.TURNSecret != "" {
if err := os.WriteFile(filepath.Join(secretsDir, "turn-secret"), []byte(resp.TURNSecret), 0600); err != nil {
return fmt.Errorf("failed to write turn-secret: %w", err)
}
}
// Write IPFS Cluster trusted peer IDs
if len(resp.IPFSClusterPeerIDs) > 0 {
content := strings.Join(resp.IPFSClusterPeerIDs, "\n") + "\n"

View File

@ -18,6 +18,19 @@ type Flags struct {
NodeFilter string // Single node IP to upgrade (optional)
Delay int // Delay in seconds between nodes during rolling upgrade
// ReexecedAfterBinarySwap is set by the orchestrator when it re-execs
// itself with the NEWLY-INSTALLED binary, post Phase 2b. The new
// process detects this flag, skips the pre-binary phases (1, 2, 2b)
// already done by the old binary, and runs Phase 3+ using its OWN
// up-to-date compiled config-generation logic. Closes bugboard #15
// chicken-and-egg: pre-fix, Phase 4 ran with the old binary's
// compiled Phase4GenerateConfigs, so config changes only took effect
// on the NEXT rollout.
//
// Hidden flag — set programmatically by orchestrator.go via os.Args,
// not a documented user-facing option.
ReexecedAfterBinarySwap bool
// Anyone flags
AnyoneClient bool
AnyoneRelay bool
@ -43,6 +56,11 @@ func ParseFlags(args []string) (*Flags, error) {
fs.BoolVar(&flags.RestartServices, "restart", false, "Automatically restart services after upgrade")
fs.BoolVar(&flags.SkipChecks, "skip-checks", false, "Skip minimum resource checks (RAM/CPU)")
// Hidden flag — see Flags.ReexecedAfterBinarySwap doc. The fs.Bool
// registers it without exposing in help output (no .Usage doc text
// that operators would normally search for).
fs.BoolVar(&flags.ReexecedAfterBinarySwap, "reexeced-after-binary-swap", false, "")
// Remote upgrade flags
fs.StringVar(&flags.Env, "env", "", "Target environment for remote rolling upgrade (devnet, testnet)")
fs.StringVar(&flags.NodeFilter, "node", "", "Upgrade a single node IP only")

View File

@ -10,12 +10,17 @@ import (
"os/exec"
"path/filepath"
"strings"
"syscall"
"time"
"github.com/DeBrosOfficial/network/pkg/cli/utils"
"github.com/DeBrosOfficial/network/pkg/environments/production"
)
// newOramaBinaryPath is the on-disk path Phase 2b installs the new
// orama binary to. Re-exec target for bugboard #15 chicken-and-egg fix.
const newOramaBinaryPath = "/opt/orama/bin/orama"
// Orchestrator manages the upgrade process
type Orchestrator struct {
oramaHome string
@ -98,50 +103,85 @@ func NewOrchestrator(flags *Flags) *Orchestrator {
// Execute runs the upgrade process
func (o *Orchestrator) Execute() error {
fmt.Printf("🔄 Upgrading production installation...\n")
fmt.Printf(" This will preserve existing configurations and data\n")
fmt.Printf(" Configurations will be updated to latest format\n\n")
// Handle branch preferences
if err := o.handleBranchPreferences(); err != nil {
return err
if o.flags.ReexecedAfterBinarySwap {
fmt.Printf(" (Resumed under newly-installed binary — bug #15 chicken-and-egg fix.)\n")
fmt.Printf(" Skipping Phase 1/2/2b (already done by previous process); Phase 3+ runs here.\n")
} else {
fmt.Printf(" This will preserve existing configurations and data\n")
fmt.Printf(" Configurations will be updated to latest format\n\n")
}
// Phase 1: Check prerequisites
fmt.Printf("\n📋 Phase 1: Checking prerequisites...\n")
if err := o.setup.Phase1CheckPrerequisites(); err != nil {
return fmt.Errorf("prerequisites check failed: %w", err)
}
// Phase 2: Provision environment
fmt.Printf("\n🛠 Phase 2: Provisioning environment...\n")
if err := o.setup.Phase2ProvisionEnvironment(); err != nil {
return fmt.Errorf("environment provisioning failed: %w", err)
}
// Stop services before upgrading binaries
if o.setup.IsUpdate() {
if err := o.stopServices(); err != nil {
// Phases 1, 2, 2b are skipped on the re-execed run — already
// performed by the prior (old-binary) process. Phase 3 (secrets)
// onward runs here, deliberately under the new binary so Phase 4
// (config regen, the actual point of the re-exec) uses current code.
if !o.flags.ReexecedAfterBinarySwap {
// Handle branch preferences
if err := o.handleBranchPreferences(); err != nil {
return err
}
// Phase 1: Check prerequisites
fmt.Printf("\n📋 Phase 1: Checking prerequisites...\n")
if err := o.setup.Phase1CheckPrerequisites(); err != nil {
return fmt.Errorf("prerequisites check failed: %w", err)
}
// Phase 2: Provision environment
fmt.Printf("\n🛠 Phase 2: Provisioning environment...\n")
if err := o.setup.Phase2ProvisionEnvironment(); err != nil {
return fmt.Errorf("environment provisioning failed: %w", err)
}
// Stop services before upgrading binaries
if o.setup.IsUpdate() {
if err := o.stopServices(); err != nil {
return err
}
}
// Check port availability after stopping services
if err := utils.EnsurePortsAvailable("prod upgrade", utils.DefaultPorts()); err != nil {
return err
}
// Phase 2b: Install/update binaries
fmt.Printf("\nPhase 2b: Installing/updating binaries...\n")
if err := o.setup.Phase2bInstallBinaries(); err != nil {
return fmt.Errorf("binary installation failed: %w", err)
}
// Detect existing installation
if o.setup.IsUpdate() {
fmt.Printf(" Detected existing installation\n")
} else {
fmt.Printf(" ⚠️ No existing installation detected, treating as fresh install\n")
fmt.Printf(" Use 'orama install' for fresh installation\n")
}
}
// Check port availability after stopping services
if err := utils.EnsurePortsAvailable("prod upgrade", utils.DefaultPorts()); err != nil {
return err
}
// Phase 2b: Install/update binaries
fmt.Printf("\nPhase 2b: Installing/updating binaries...\n")
if err := o.setup.Phase2bInstallBinaries(); err != nil {
return fmt.Errorf("binary installation failed: %w", err)
}
// Detect existing installation
if o.setup.IsUpdate() {
fmt.Printf(" Detected existing installation\n")
} else {
fmt.Printf(" ⚠️ No existing installation detected, treating as fresh install\n")
fmt.Printf(" Use 'orama install' for fresh installation\n")
// Bugboard #15 fix — chicken-and-egg.
//
// Up to here we are still running the OLD orama binary's compiled
// code. The next phases (3 secrets, 4 configs, 5 systemd) include
// Phase4GenerateConfigs which is COMPILED into this process. If we
// keep running, those phases use OLD logic and any config-shape
// changes shipped in this release only take effect on the NEXT
// upgrade.
//
// Re-exec the just-installed binary with the same args + a hidden
// marker so it skips the pre-binary phases (already done above) and
// runs Phase 3+ with its OWN up-to-date code. syscall.Exec replaces
// this process — control never returns past it on success.
if !o.flags.ReexecedAfterBinarySwap {
if err := o.reexecAfterBinarySwap(); err != nil {
// Soft-fail: log and continue with old-binary phases as a
// fallback. Operator gets a clear warning that the chicken-
// and-egg fix didn't apply for this run.
fmt.Fprintf(os.Stderr, "⚠️ Could not re-exec post-binary-swap (%v); "+
"continuing with current binary — config changes from this release "+
"may only take effect on the NEXT upgrade. See bugboard #15.\n", err)
}
}
// Phase 3: Ensure secrets exist
@ -604,6 +644,45 @@ func (o *Orchestrator) extractGatewayConfig() (enableHTTPS bool, domain string,
return enableHTTPS, domain, baseDomain
}
// reexecAfterBinarySwap replaces this process with the newly-installed
// orama binary at /opt/orama/bin/orama, preserving all original CLI args
// and appending --reexeced-after-binary-swap so the new process knows
// to skip the pre-binary phases. Bugboard #15 chicken-and-egg fix.
//
// Returns nil only when syscall.Exec is about to take effect; on success
// the function never actually returns (the process image is replaced).
// On any failure before the exec syscall, returns the wrapping error so
// the caller can fall back to running the rest of the upgrade with the
// old binary (with a warning).
func (o *Orchestrator) reexecAfterBinarySwap() error {
if _, err := os.Stat(newOramaBinaryPath); err != nil {
return fmt.Errorf("new binary not found at %s: %w", newOramaBinaryPath, err)
}
// Defensive: don't re-exec ourselves into a loop if the install
// somehow placed our currently-running binary at that path. Compare
// inode-stable identity via os.Stat.
if cur, err := os.Executable(); err == nil {
curInfo, e1 := os.Stat(cur)
newInfo, e2 := os.Stat(newOramaBinaryPath)
if e1 == nil && e2 == nil && os.SameFile(curInfo, newInfo) {
// Already running the new binary (e.g. someone manually pre-
// installed it). No re-exec needed.
fmt.Printf(" (current binary already matches installed binary; skipping re-exec)\n")
return nil
}
}
args := append([]string{newOramaBinaryPath}, os.Args[1:]...)
args = append(args, "--reexeced-after-binary-swap")
fmt.Printf("\n🔁 Re-executing with newly-installed binary to run remaining phases with current code (#15 fix)...\n")
// syscall.Exec replaces this process image; argv[0] is the binary
// path, env inherited as-is. On success we never return.
if err := syscall.Exec(newOramaBinaryPath, args, os.Environ()); err != nil {
return fmt.Errorf("syscall.Exec %s: %w", newOramaBinaryPath, err)
}
return nil
}
func (o *Orchestrator) regenerateConfigs() error {
peers := o.extractPeers()
vpsIP, joinAddress := o.extractNetworkConfig()

View File

@ -0,0 +1,84 @@
package upgrade
import (
"os"
"strings"
"testing"
)
// Bugboard #15 — Upgrade orchestrator chicken-and-egg.
//
// Pre-fix: Phase 4 (config regen) ran with the pre-swap binary's
// compiled Go code, so config-shape changes shipped in this release
// only took effect on the NEXT rollout. Operators had to upgrade
// twice for a config-changing release to apply.
//
// Post-fix: after Phase 2b installs the new binary, the orchestrator
// re-execs itself using the newly-installed binary so Phase 3+ runs
// with current code. A hidden --reexeced-after-binary-swap flag tells
// the new process to skip the pre-binary phases.
//
// These tests pin the flag plumbing and helper behavior. End-to-end
// re-exec can only be verified on a real install (tests can't safely
// call syscall.Exec).
func TestFlags_ReexecedAfterBinarySwap_parses(t *testing.T) {
// The hidden flag must be parseable; orchestrator sets it on the
// re-execed argv. If this regresses (e.g. someone removes the
// fs.BoolVar registration to clean up the help output), the
// re-execed process would fail with "flag provided but not defined"
// and the upgrade would error mid-way.
flags, err := ParseFlags([]string{"--reexeced-after-binary-swap"})
if err != nil {
t.Fatalf("ParseFlags must accept the hidden flag: %v", err)
}
if !flags.ReexecedAfterBinarySwap {
t.Error("flag value not surfaced on Flags struct")
}
}
func TestFlags_ReexecedAfterBinarySwap_defaultFalse(t *testing.T) {
// Default value MUST be false. If it ever defaults to true, the
// orchestrator would skip its own pre-binary phases on the FIRST
// user-initiated upgrade and bricks would happen — Phase 2b would
// never run.
flags, err := ParseFlags([]string{})
if err != nil {
t.Fatalf("ParseFlags empty args: %v", err)
}
if flags.ReexecedAfterBinarySwap {
t.Fatal("FATAL DEFAULT: ReexecedAfterBinarySwap defaults to true; this would skip "+
"Phase 2b (binary install) on every upgrade. MUST be false by default.")
}
}
func TestReexecAfterBinarySwap_missingBinaryReturnsError(t *testing.T) {
// When the new binary isn't on disk at the expected path, the
// helper must surface an error so the orchestrator can fall back
// (with a warning) rather than silently no-op or panic. This is
// the "Phase 2b succeeded but the file vanished" case — defensive
// path, but cheap to pin.
if _, err := os.Stat(newOramaBinaryPath); err == nil {
t.Skipf("test machine has %s present; skipping (real install env)", newOramaBinaryPath)
}
o := &Orchestrator{flags: &Flags{}}
err := o.reexecAfterBinarySwap()
if err == nil {
t.Error("expected error when new binary path is missing; got nil")
}
if err != nil && !strings.Contains(err.Error(), newOramaBinaryPath) {
t.Errorf("error should mention the missing path %q for operator debuggability; got: %v",
newOramaBinaryPath, err)
}
}
func TestReexecPathConstant_isAbsolute(t *testing.T) {
// syscall.Exec requires an absolute path. If someone refactors the
// constant to "orama" expecting PATH lookup, the exec call would
// fail at runtime ONLY in production (test env never reaches
// syscall.Exec). Pin the absolute-path invariant statically.
if !strings.HasPrefix(newOramaBinaryPath, "/") {
t.Fatalf("newOramaBinaryPath must be absolute (syscall.Exec requirement); got %q",
newOramaBinaryPath)
}
}

View File

@ -15,6 +15,21 @@ type Config struct {
Security SecurityConfig `yaml:"security"`
Logging LoggingConfig `yaml:"logging"`
HTTPGateway HTTPGatewayConfig `yaml:"http_gateway"`
// SNIRouter is the stealth TURN-over-443 SNI router toggle (feat-124).
// Phase 4 config generation always emits this block into node.yaml, so
// the field MUST exist here: node.yaml is decoded with KnownFields(true)
// and an unknown top-level key fails the whole parse and crash-loops
// orama-node at boot (same failure mode as the v0.122.42
// secrets_encryption_key incident).
SNIRouter SNIRouterConfig `yaml:"sni_router"`
}
// SNIRouterConfig is the top-level stealth SNI router block in node.yaml
// (feat-124). Default-off; when enabled the node runs orama-sni-router on
// :443 and Caddy moves to :8443.
type SNIRouterConfig struct {
Enabled bool `yaml:"enabled"`
}
// ValidationError represents a single validation error with context.

View File

@ -207,3 +207,51 @@ key2: value2
t.Errorf("expected key2='value2', got %q", result["key2"])
}
}
// TestDecodeStrict_secretsEncryptionKey is the regression guard for the
// v0.122.42 boot crash: Phase 4 config generation writes
// `secrets_encryption_key` into node.yaml under the http_gateway section,
// but HTTPGatewayConfig had no matching field. With KnownFields(true)
// strict decoding, the unknown field made DecodeStrict fail and
// orama-node crash-looped (exit 1) on every start. The field must parse.
func TestDecodeStrict_secretsEncryptionKey(t *testing.T) {
yamlInput := `
node:
id: "test-node"
data_dir: "./data"
http_gateway:
enabled: true
client_namespace: "default"
rqlite_dsn: "http://localhost:5001"
secrets_encryption_key: "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
`
var cfg Config
if err := DecodeStrict(strings.NewReader(yamlInput), &cfg); err != nil {
t.Fatalf("node.yaml with secrets_encryption_key must parse (v0.122.42 regression), got: %v", err)
}
want := "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
if cfg.HTTPGateway.SecretsEncryptionKey != want {
t.Errorf("SecretsEncryptionKey = %q, want %q", cfg.HTTPGateway.SecretsEncryptionKey, want)
}
}
// TestDecodeStrict_sniRouterBlock guards against a recurrence of the
// v0.122.42-class boot crash for the feat-124 stealth SNI router: Phase 4
// always emits a top-level `sni_router:` block into node.yaml, so the root
// Config struct must carry a matching field or KnownFields(true) rejects
// the whole file and orama-node crash-loops.
func TestDecodeStrict_sniRouterBlock(t *testing.T) {
yamlInput := `
node:
id: "test-node"
sni_router:
enabled: true
`
var cfg Config
if err := DecodeStrict(strings.NewReader(yamlInput), &cfg); err != nil {
t.Fatalf("node.yaml with sni_router block must parse (feat-124): %v", err)
}
if !cfg.SNIRouter.Enabled {
t.Errorf("SNIRouter.Enabled = false, want true")
}
}

View File

@ -21,6 +21,15 @@ type HTTPGatewayConfig struct {
IPFSTimeout time.Duration `yaml:"ipfs_timeout"` // Timeout for IPFS operations
BaseDomain string `yaml:"base_domain"` // Base domain for deployments (e.g., "dbrs.space"). Defaults to "dbrs.space"
// SecretsEncryptionKey is the AES-256 key (hex, 64 chars) used to encrypt
// serverless function secrets at rest. Generated per-cluster and written
// into node.yaml by Phase 4 config generation. This field MUST exist or
// strict YAML unmarshal rejects node.yaml entirely and orama-node fails
// to boot (regression that shipped in v0.122.42: template + secret
// generator + gateway.Config consumer all landed, but this parse field
// and the node→gateway mapping were missed).
SecretsEncryptionKey string `yaml:"secrets_encryption_key"`
// WebRTC configuration (optional, enabled per-namespace)
WebRTC WebRTCConfig `yaml:"webrtc"`
}

View File

@ -158,6 +158,14 @@ func (m *mockRQLiteClient) BatchWithSeq(ctx context.Context, namespace string, o
return res, 1, err
}
func (m *mockRQLiteClient) BatchQuery(ctx context.Context, ops []rqlite.BatchOp) ([]rqlite.OpResult, error) {
out := make([]rqlite.OpResult, len(ops))
for i := range ops {
out[i] = rqlite.OpResult{Kind: rqlite.BatchOpQuery}
}
return out, nil
}
func TestPortAllocator_AllocatePort(t *testing.T) {
logger := zap.NewNop()
mockDB := newMockRQLiteClient()

View File

@ -16,8 +16,16 @@ import (
"github.com/libp2p/go-libp2p/core/crypto"
"github.com/libp2p/go-libp2p/core/peer"
"github.com/multiformats/go-multiaddr"
"gopkg.in/yaml.v3"
)
// defaultSFUSignalingPort is the SFU signaling port the namespace gateway
// proxies WebRTC traffic to when an existing node.yaml did not record one.
// Mirrors pkg/namespace.SFUSignalingPortRangeStart (30000); kept as a local
// constant to avoid importing the namespace package (which other agents own
// and which would create a dependency cycle here).
const defaultSFUSignalingPort = 30000
// ConfigGenerator manages generation of node, gateway, and service configs
type ConfigGenerator struct {
oramaDir string
@ -200,9 +208,184 @@ func (cg *ConfigGenerator) GenerateNodeConfig(peerAddresses []string, vpsIP stri
data.Environment = cg.Environment
data.OperatorWallet = cg.OperatorWallet
// Serverless function secrets encryption key (bugboard #837). Read the
// persisted key (generated in Phase3 / received via join) so it is
// rendered into node.yaml under http_gateway. If the file is missing the
// key is left empty and omitted from the rendered config — get_secret then
// stays disabled until the operator provisions the key. We deliberately do
// NOT generate here: generation/distribution is owned by SecretGenerator
// and the join flow so every node in a cluster shares one key.
secretsKeyPath := filepath.Join(cg.oramaDir, "secrets", "secrets-encryption-key")
if keyBytes, err := os.ReadFile(secretsKeyPath); err == nil {
data.SecretsEncryptionKey = strings.TrimSpace(string(keyBytes))
}
// WebRTC/TURN config (feat-124 #913). The TURN secret lives in the secrets
// dir so it survives Phase4 config regeneration; turn_domain/sfu_port/enabled
// are operator-set values that only exist in the previous node.yaml, so we
// carry them forward from the existing on-disk config. Without this, a regen
// wipes the operator's manually-added webrtc block and the namespace
// reconciler restarts gateways with an empty TURN secret (the outage).
if err := cg.populateWebRTCConfig(&data); err != nil {
return "", fmt.Errorf("failed to populate webrtc config: %w", err)
}
// Stealth TURN SNI router (feat-124). Like the webrtc block, sni_router is
// an operator opt-in that only exists in the previous node.yaml, so carry
// it forward across regeneration. Without this, a Phase4 regen would reset
// sni_router.enabled to false, stop the :443 router and break stealth TURN
// for every region that relies on it (the same regen-wipe class of outage
// as bugboard #259/#846).
cg.populateSNIRouterConfig(&data)
return templates.RenderNodeConfig(data)
}
// populateSNIRouterConfig carries forward the operator-set sni_router.enabled
// flag from the existing node.yaml so a config regeneration never silently
// disables the stealth TURN-over-443 router. Absence of the file or block
// leaves the flag at its default (false).
func (cg *ConfigGenerator) populateSNIRouterConfig(data *templates.NodeConfigData) {
data.SNIRouterEnabled = cg.readExistingSNIRouterEnabled()
}
// SNIRouterEnabled reports whether the node's on-disk node.yaml has opted in to
// the stealth TURN-over-443 SNI router. The orchestrator reads this AFTER
// Phase4 has written node.yaml to decide whether to move Caddy to :8443 and
// start the router unit. Returns false when the config or block is absent.
func (cg *ConfigGenerator) SNIRouterEnabled() bool {
return cg.readExistingSNIRouterEnabled()
}
// readExistingSNIRouterEnabled parses just the top-level sni_router.enabled
// flag out of the existing node.yaml. Returns false when the file is missing,
// malformed, or has no sni_router block (fresh install / not opted in).
func (cg *ConfigGenerator) readExistingSNIRouterEnabled() bool {
configPath := filepath.Join(cg.oramaDir, "configs", "node.yaml")
raw, err := os.ReadFile(configPath)
if err != nil {
return false // No existing config (fresh install) — default off.
}
var parsed struct {
SNIRouter struct {
Enabled bool `yaml:"enabled"`
} `yaml:"sni_router"`
}
if err := yaml.Unmarshal(raw, &parsed); err != nil {
return false // Malformed/old config — don't fail regen; default off.
}
return parsed.SNIRouter.Enabled
}
// existingWebRTC is the minimal shape parsed out of an existing node.yaml to
// carry forward operator-set WebRTC fields across a config regeneration.
type existingWebRTC struct {
Enabled bool
SFUPort int
TURNDomain string
TURNSecret string
}
// populateWebRTCConfig fills the WebRTC fields on data so the rendered node.yaml
// preserves operator TURN configuration across regenerations.
//
// Sources, in order of authority:
// - turn_secret: the persisted secrets/turn-secret file (durable, survives
// regen). If absent but the existing node.yaml carried a secret, that secret
// is persisted to the file so it becomes durable from now on.
// - turn_domain / sfu_port / enabled: carried forward from the existing
// node.yaml's http_gateway.webrtc block (operator-set, not in secrets).
//
// If there is no persisted secret and no existing webrtc block, WebRTC is left
// disabled and the template renders nothing.
func (cg *ConfigGenerator) populateWebRTCConfig(data *templates.NodeConfigData) error {
existing := cg.readExistingWebRTC()
// Resolve the TURN secret: persisted file wins; otherwise adopt the secret
// from the existing node.yaml and persist it so it is durable.
secret := ""
secretPath := filepath.Join(cg.oramaDir, "secrets", "turn-secret")
if b, err := os.ReadFile(secretPath); err == nil {
secret = strings.TrimSpace(string(b))
}
if secret == "" && existing != nil && existing.TURNSecret != "" {
secret = existing.TURNSecret
if err := cg.persistTURNSecret(secret); err != nil {
return err
}
}
if secret == "" {
// No durable secret and nothing to adopt — leave WebRTC disabled.
return nil
}
data.TURNSecret = secret
data.WebRTCEnabled = true
if existing != nil {
data.TURNDomain = existing.TURNDomain
data.SFUPort = existing.SFUPort
}
if data.SFUPort == 0 {
data.SFUPort = defaultSFUSignalingPort
}
return nil
}
// readExistingWebRTC parses just the http_gateway.webrtc block out of the
// existing node.yaml. Absence of the file or block is tolerated (returns nil).
func (cg *ConfigGenerator) readExistingWebRTC() *existingWebRTC {
configPath := filepath.Join(cg.oramaDir, "configs", "node.yaml")
raw, err := os.ReadFile(configPath)
if err != nil {
return nil // No existing config (fresh install) — nothing to carry forward.
}
var parsed struct {
HTTPGateway struct {
WebRTC struct {
Enabled bool `yaml:"enabled"`
SFUPort int `yaml:"sfu_port"`
TURNDomain string `yaml:"turn_domain"`
TURNSecret string `yaml:"turn_secret"`
} `yaml:"webrtc"`
} `yaml:"http_gateway"`
}
if err := yaml.Unmarshal(raw, &parsed); err != nil {
return nil // Malformed/old config — don't fail regen; just nothing to carry.
}
wb := parsed.HTTPGateway.WebRTC
if !wb.Enabled && wb.SFUPort == 0 && wb.TURNDomain == "" && wb.TURNSecret == "" {
return nil // No webrtc block present.
}
return &existingWebRTC{
Enabled: wb.Enabled,
SFUPort: wb.SFUPort,
TURNDomain: wb.TURNDomain,
TURNSecret: wb.TURNSecret,
}
}
// persistTURNSecret writes the TURN secret to the secrets dir with 0600 perms
// and correct ownership, making it durable across future config regenerations.
func (cg *ConfigGenerator) persistTURNSecret(secret string) error {
secretPath := filepath.Join(cg.oramaDir, "secrets", "turn-secret")
if err := os.MkdirAll(filepath.Dir(secretPath), 0700); err != nil {
return fmt.Errorf("failed to create secrets directory: %w", err)
}
if err := os.WriteFile(secretPath, []byte(secret), 0600); err != nil {
return fmt.Errorf("failed to persist TURN secret: %w", err)
}
if err := ensureSecretFilePermissions(secretPath); err != nil {
return err
}
return nil
}
// GenerateVaultConfig generates vault.yaml configuration for the Vault Guardian.
// The vault config uses key=value format (not YAML, despite the file extension).
// Peer discovery is dynamic via RQLite — no static peer list needed.
@ -471,6 +654,106 @@ func (sg *SecretGenerator) EnsureAPIKeyHMACSecret() (string, error) {
return secret, nil
}
// EnsureSecretsEncryptionKey gets or generates the AES-256 key used to
// encrypt serverless function secrets at rest (the function_secrets table).
// The key is a 32-byte random value stored as 64 hex characters.
//
// It MUST be identical on every namespace-gateway node in a cluster and
// stable across restarts — otherwise secrets encrypted by one process can't
// be decrypted by another (bugboard #837). Like api-key-hmac-secret, joining
// nodes receive this value through the join flow rather than generating their
// own; this method only generates on the genesis node (or returns the
// existing key if a joining node already wrote it to disk).
func (sg *SecretGenerator) EnsureSecretsEncryptionKey() (string, error) {
secretPath := filepath.Join(sg.oramaDir, "secrets", "secrets-encryption-key")
secretDir := filepath.Dir(secretPath)
if err := os.MkdirAll(secretDir, 0700); err != nil {
return "", fmt.Errorf("failed to create secrets directory: %w", err)
}
if err := os.Chmod(secretDir, 0700); err != nil {
return "", fmt.Errorf("failed to set secrets directory permissions: %w", err)
}
// Try to read existing key
if data, err := os.ReadFile(secretPath); err == nil {
key := strings.TrimSpace(string(data))
if len(key) == 64 {
if err := ensureSecretFilePermissions(secretPath); err != nil {
return "", err
}
return key, nil
}
}
// Generate new key (32 bytes = 64 hex chars)
keyBytes := make([]byte, 32)
if _, err := rand.Read(keyBytes); err != nil {
return "", fmt.Errorf("failed to generate secrets encryption key: %w", err)
}
key := hex.EncodeToString(keyBytes)
if err := os.WriteFile(secretPath, []byte(key), 0600); err != nil {
return "", fmt.Errorf("failed to save secrets encryption key: %w", err)
}
if err := ensureSecretFilePermissions(secretPath); err != nil {
return "", err
}
return key, nil
}
// EnsureTURNSecret gets or generates the HMAC-SHA1 shared secret used to mint
// TURN credentials for WebRTC (the http_gateway.webrtc.turn_secret field).
// The secret is a 32-byte random value stored as 64 hex characters.
//
// It MUST be identical on every namespace-gateway node in a cluster and stable
// across restarts AND config regenerations — otherwise the namespace reconciler
// sees drift (desired vs on-disk) and restarts gateways with an empty secret,
// which makes turn.credentials return namespace_not_configured (feat-124 #913,
// the AnChat outage). Persisting the secret to the secrets dir is what lets it
// survive Phase4 config regeneration: GenerateNodeConfig reads this file rather
// than relying on the (regenerated-from-template) node.yaml. Joining nodes
// receive the value through the join flow rather than generating their own.
func (sg *SecretGenerator) EnsureTURNSecret() (string, error) {
secretPath := filepath.Join(sg.oramaDir, "secrets", "turn-secret")
secretDir := filepath.Dir(secretPath)
if err := os.MkdirAll(secretDir, 0700); err != nil {
return "", fmt.Errorf("failed to create secrets directory: %w", err)
}
if err := os.Chmod(secretDir, 0700); err != nil {
return "", fmt.Errorf("failed to set secrets directory permissions: %w", err)
}
// Try to read existing secret
if data, err := os.ReadFile(secretPath); err == nil {
secret := strings.TrimSpace(string(data))
if len(secret) == 64 {
if err := ensureSecretFilePermissions(secretPath); err != nil {
return "", err
}
return secret, nil
}
}
// Generate new secret (32 bytes = 64 hex chars)
secretBytes := make([]byte, 32)
if _, err := rand.Read(secretBytes); err != nil {
return "", fmt.Errorf("failed to generate TURN secret: %w", err)
}
secret := hex.EncodeToString(secretBytes)
if err := os.WriteFile(secretPath, []byte(secret), 0600); err != nil {
return "", fmt.Errorf("failed to save TURN secret: %w", err)
}
if err := ensureSecretFilePermissions(secretPath); err != nil {
return "", err
}
return secret, nil
}
func ensureSecretFilePermissions(secretPath string) error {
if err := os.Chmod(secretPath, 0600); err != nil {
return fmt.Errorf("failed to set permissions on %s: %w", secretPath, err)

View File

@ -23,7 +23,8 @@ type BinaryInstaller struct {
gateway *installers.GatewayInstaller
coredns *installers.CoreDNSInstaller
caddy *installers.CaddyInstaller
ntfy *installers.NtfyInstaller // feature #72; installed only when EnableNtfy is set
ntfy *installers.NtfyInstaller // feature #72; installed only when EnableNtfy is set
sniRouter *installers.SNIRouterInstaller // feat-124; configured only when sni_router.enabled
}
// NewBinaryInstaller creates a new binary installer
@ -41,6 +42,7 @@ func NewBinaryInstaller(arch string, logWriter io.Writer) *BinaryInstaller {
coredns: installers.NewCoreDNSInstaller(arch, logWriter, oramaHome),
caddy: installers.NewCaddyInstaller(arch, logWriter, oramaHome),
ntfy: installers.NewNtfyInstaller(arch, logWriter),
sniRouter: installers.NewSNIRouterInstaller(arch, logWriter, OramaDir),
}
}
@ -158,6 +160,29 @@ func (bi *BinaryInstaller) EnableCaddyNtfyProxy(hostname string) {
bi.caddy.EnableNtfyProxy(hostname)
}
// EnableCaddySNIRouterMode moves Caddy's HTTPS listener off :443 to :8443 on
// the next ConfigureCaddy() call, freeing :443 for the orama-sni-router
// (feat-124). Must be called BEFORE ConfigureCaddy.
func (bi *BinaryInstaller) EnableCaddySNIRouterMode() {
bi.caddy.EnableSNIRouterMode()
}
// ConfigureSNIRouter writes the orama-sni-router YAML config (listen :443,
// fallback Caddy on :8443, turn_discovery for baseDomain). Feat-124.
func (bi *BinaryInstaller) ConfigureSNIRouter(baseDomain string) error {
return bi.sniRouter.Configure(baseDomain)
}
// WriteSNIRouterUnit writes /etc/systemd/system/orama-sni-router.service.
func (bi *BinaryInstaller) WriteSNIRouterUnit() error {
return bi.sniRouter.WriteSystemdUnit()
}
// SNIRouterServiceName returns the systemd unit name for lifecycle calls.
func (bi *BinaryInstaller) SNIRouterServiceName() string {
return installers.SNIRouterServiceName
}
// InstallNtfy installs the self-hosted ntfy server (binary, user,
// systemd unit, data directory). Feature #72. Idempotent.
func (bi *BinaryInstaller) InstallNtfy() error {

View File

@ -27,8 +27,20 @@ type CaddyInstaller struct {
// Enabled per-node via EnableNtfyProxy. Feature #72.
withNtfy bool
ntfyHostname string // e.g. "push.dbrs.space" — fully-qualified public host
// behindSNIRouter, when set, moves Caddy's HTTPS listener off :443 to
// CaddyHTTPSPortBehindSNI so the orama-sni-router can own :443 and forward
// TLS by SNI (feat-124, stealth TURN). Enabled per-node via
// EnableSNIRouterMode. Plain HTTP (:80) is unaffected. When false the
// generated Caddyfile is byte-identical to the pre-feature output.
behindSNIRouter bool
}
// CaddyHTTPSPortBehindSNI is the port Caddy binds for HTTPS when the node runs
// behind the SNI router (which owns :443). 8443 matches the sni-router config's
// caddy fallback backend (127.0.0.1:8443) and the plan doc.
const CaddyHTTPSPortBehindSNI = 8443
// NewCaddyInstaller creates a new Caddy installer
func NewCaddyInstaller(arch string, logWriter io.Writer, oramaHome string) *CaddyInstaller {
return &CaddyInstaller{
@ -52,6 +64,16 @@ func (ci *CaddyInstaller) EnableNtfyProxy(hostname string) {
ci.ntfyHostname = hostname
}
// EnableSNIRouterMode tells the Caddy installer to bind HTTPS on
// CaddyHTTPSPortBehindSNI (8443) instead of :443, freeing :443 for the
// orama-sni-router (feat-124). Plain HTTP on :80 is left untouched. Must be
// called BEFORE Configure so the generated Caddyfile picks up the global
// `https_port` option. A no-op when never called: the default Caddyfile keeps
// HTTPS on :443.
func (ci *CaddyInstaller) EnableSNIRouterMode() {
ci.behindSNIRouter = true
}
// IsInstalled checks if Caddy with orama DNS module is already installed
func (ci *CaddyInstaller) IsInstalled() bool {
caddyPath := "/usr/bin/caddy"
@ -417,7 +439,17 @@ func (ci *CaddyInstaller) generateCaddyfile(domain, email, acmeEndpoint, baseDom
// workload is REST + WebSocket (neither benefits much from
// h2 stream multiplexing — REST is keep-alive over h1, and
// WS is single-connection by design).
sb.WriteString(fmt.Sprintf("{\n email %s\n servers {\n protocols h1\n }\n}\n", email))
// When this node runs behind the SNI router (feat-124), move Caddy's HTTPS
// listener off :443 to CaddyHTTPSPortBehindSNI via the `https_port` global
// option. The sni-router owns :443 and forwards TLS by SNI to either a
// namespace's TURNS listener or here (127.0.0.1:8443). Plain HTTP (:80) is
// unchanged. When behindSNIRouter is false, no `https_port` line is emitted
// and the Caddyfile is byte-identical to the pre-feature output.
httpsPortOption := ""
if ci.behindSNIRouter {
httpsPortOption = fmt.Sprintf(" https_port %d\n", CaddyHTTPSPortBehindSNI)
}
sb.WriteString(fmt.Sprintf("{\n email %s\n%s servers {\n protocols h1\n }\n}\n", email, httpsPortOption))
// Node domain blocks (e.g., node1.dbrs.space, *.node1.dbrs.space)
sb.WriteString(fmt.Sprintf("\n*.%s {\n%s\n reverse_proxy localhost:6001\n}\n", domain, tlsBlock))

View File

@ -1,6 +1,7 @@
package installers
import (
"fmt"
"io"
"strings"
"testing"
@ -97,3 +98,50 @@ func TestGenerateCaddyfile_BaseDomainSameAsDomainOmitsDuplicates(t *testing.T) {
t.Errorf("expected exactly 2 `*.dbrs.space {` occurrences (1 TLS + 1 HTTP), got %d in:\n%s", got, cf)
}
}
// TestGenerateCaddyfile_SNIRouterDisabledByteIdentical is the safety guard for
// feat-124: when EnableSNIRouterMode has NOT been called, the generated
// Caddyfile must be byte-identical to the pre-feature output (HTTPS stays on
// :443, no `https_port` global option). This is the default for every existing
// node — any drift here is a silent production change.
func TestGenerateCaddyfile_SNIRouterDisabledByteIdentical(t *testing.T) {
ci := newTestCaddyInstaller()
cf := ci.generateCaddyfile("node1.dbrs.space", "admin@dbrs.space",
"http://localhost:6001/v1/internal/acme", "dbrs.space")
if strings.Contains(cf, "https_port") {
t.Errorf("default Caddyfile must NOT contain `https_port` (SNI router off); got:\n%s", cf)
}
if strings.Contains(cf, "8443") {
t.Errorf("default Caddyfile must NOT reference :8443 (SNI router off); got:\n%s", cf)
}
// The global options block must be exactly the pre-feature shape.
if !strings.Contains(cf, "{\n email admin@dbrs.space\n servers {\n protocols h1\n }\n}\n") {
t.Errorf("default global options block drifted from pre-feature output; got:\n%s", cf)
}
}
// TestGenerateCaddyfile_SNIRouterEnabledMovesHTTPSTo8443 verifies that after
// EnableSNIRouterMode, Caddy's HTTPS listener is moved to :8443 via the
// `https_port` global option, while plain HTTP (:80) is unchanged so ACME
// HTTP-01 and the HTTP catch-all still work.
func TestGenerateCaddyfile_SNIRouterEnabledMovesHTTPSTo8443(t *testing.T) {
ci := newTestCaddyInstaller()
ci.EnableSNIRouterMode()
cf := ci.generateCaddyfile("node1.dbrs.space", "admin@dbrs.space",
"http://localhost:6001/v1/internal/acme", "dbrs.space")
want := fmt.Sprintf("https_port %d", CaddyHTTPSPortBehindSNI)
if !strings.Contains(cf, want) {
t.Errorf("SNI-router Caddyfile must contain %q; got:\n%s", want, cf)
}
// The global option belongs inside the top-level options block, before the
// servers stanza.
if !strings.Contains(cf, "{\n email admin@dbrs.space\n https_port 8443\n servers {\n protocols h1\n }\n}\n") {
t.Errorf("https_port not placed correctly in global options block; got:\n%s", cf)
}
// Plain HTTP :80 catch-all must be unchanged.
if !strings.Contains(cf, ":80 {") {
t.Errorf("HTTP :80 block must remain when SNI router enabled; got:\n%s", cf)
}
}

View File

@ -213,9 +213,14 @@ func (ni *NtfyInstaller) downloadBinary() error {
tarballURL := fmt.Sprintf(
"https://github.com/binwiederhier/ntfy/releases/download/v%s/%s",
ntfyVersion, tarballName)
// Upstream ntfy publishes the checksum file as plain "checksums.txt"
// at the release root — NOT "ntfy_<VER>_checksums.txt". Verified
// against the v2.11.0 release assets list. If a future ntfy version
// changes the naming convention, this URL will 404 loud at install
// time and the bump-ntfy-version PR should update it here.
checksumsURL := fmt.Sprintf(
"https://github.com/binwiederhier/ntfy/releases/download/v%s/ntfy_%s_checksums.txt",
ntfyVersion, ntfyVersion)
"https://github.com/binwiederhier/ntfy/releases/download/v%s/checksums.txt",
ntfyVersion)
fmt.Fprintf(ni.logWriter, " Downloading %s...\n", tarballURL)
client := &http.Client{Timeout: 5 * time.Minute}

View File

@ -0,0 +1,203 @@
package installers
import (
"fmt"
"io"
"os"
"path/filepath"
)
// SNI router installer (feat-124, stealth TURN-over-443).
//
// Unlike the binary installers (Caddy, ntfy), the orama-sni-router binary is
// built and shipped to the node by `orama build` / the install tarball — this
// installer only writes the router's YAML config and the systemd unit, and
// drives the unit's lifecycle (install+enable+start when enabled,
// stop+disable when not).
const (
// SNIRouterListenAddr is the public port the router binds. It owns :443 so
// Caddy is moved to CaddyHTTPSPortBehindSNI (see caddy.go).
SNIRouterListenAddr = ":443"
// SNIRouterServiceName is the systemd unit name.
SNIRouterServiceName = "orama-sni-router.service"
// SNIRouterConfigName is the router config filename (resolved under
// <oramaDir>/configs by the binary's config.DefaultPath lookup).
SNIRouterConfigName = "sni-router.yaml"
// sniRouterRescanInterval is how often the router rescans the namespaces
// directory for per-namespace TURNS listeners. Matches the library default
// (sniproxy.DefaultDiscoveryRescanInterval); kept as a literal here to avoid
// importing the runtime package into the installer.
sniRouterRescanInterval = "30s"
// sniRouterClientHelloTimeout / sniRouterBackendDialTimeout bound the
// per-connection ClientHello peek and backend dial (slowloris / dead-backend
// protection). Mirror the sniproxy server defaults.
sniRouterClientHelloTimeout = "5s"
sniRouterBackendDialTimeout = "5s"
// sniRouterMaxConcurrentConns caps in-flight connections on the public
// :443 listener (DoS guard); mirrors the sniproxy server default.
sniRouterMaxConcurrentConns = 10000
// sniRouterSystemdUnitPath is where the unit file is written.
sniRouterSystemdUnitPath = "/etc/systemd/system/" + SNIRouterServiceName
// sniRouterBinaryPath is the installed binary path on the node.
sniRouterBinaryPath = "/opt/orama/bin/orama-sni-router"
)
// SNIRouterInstaller writes the orama-sni-router config + systemd unit and
// manages the unit lifecycle. The caddy fallback port matches
// CaddyHTTPSPortBehindSNI so unmatched SNIs (regular HTTPS) reach the moved
// Caddy listener.
type SNIRouterInstaller struct {
*BaseInstaller
oramaDir string // e.g. "/opt/orama/.orama"
}
// NewSNIRouterInstaller creates an installer. oramaDir is the node's .orama
// data root (where configs/ and data/namespaces live).
func NewSNIRouterInstaller(arch string, logWriter io.Writer, oramaDir string) *SNIRouterInstaller {
return &SNIRouterInstaller{
BaseInstaller: NewBaseInstaller(arch, logWriter),
oramaDir: oramaDir,
}
}
// configPath returns the absolute path the router config is written to and the
// binary resolves to via its DefaultPath lookup (<oramaDir>/configs/<name>).
func (si *SNIRouterInstaller) configPath() string {
return filepath.Join(si.oramaDir, "configs", SNIRouterConfigName)
}
// namespacesDir returns the per-namespace config root the router scans for
// TURNS listeners.
func (si *SNIRouterInstaller) namespacesDir() string {
return filepath.Join(si.oramaDir, "data", "namespaces")
}
// Configure writes the router YAML config. baseDomain drives the stealth and
// "turn.ns-*" SNI hostnames the router derives during discovery. Idempotent.
func (si *SNIRouterInstaller) Configure(baseDomain string) error {
if baseDomain == "" {
return fmt.Errorf("sni-router: base domain must not be empty")
}
configDir := filepath.Dir(si.configPath())
if err := os.MkdirAll(configDir, 0755); err != nil {
return fmt.Errorf("sni-router: create config dir %s: %w", configDir, err)
}
content := si.generateConfig(baseDomain)
if err := os.WriteFile(si.configPath(), []byte(content), 0644); err != nil {
return fmt.Errorf("sni-router: write config %s: %w", si.configPath(), err)
}
return nil
}
// generateConfig renders the sni-router.yaml. The fallback is Caddy on
// CaddyHTTPSPortBehindSNI; turn_discovery scans the node's namespaces dir so
// per-namespace TURNS routes appear without a router restart. No static routes
// are emitted — every TURNS route is auto-discovered.
func (si *SNIRouterInstaller) generateConfig(baseDomain string) string {
return fmt.Sprintf(`# Orama SNI router config (feat-124, stealth TURN-over-443).
# Generated by the installer re-running install/upgrade overwrites this file.
#
# The router owns :443, peeks each connection's TLS ClientHello SNI, and
# forwards the raw (still-encrypted) stream to a backend. TLS is NOT terminated
# here. Unmatched SNIs (regular HTTPS) go to the fallback (Caddy on :%[2]d).
listen: "%[1]s"
client_hello_timeout: %[3]s
backend_dial_timeout: %[4]s
max_concurrent_conns: %[5]d
fallback:
name: caddy
addr: "127.0.0.1:%[2]d"
# Per-namespace stealth-TURN routes are auto-discovered by scanning
# <namespaces_dir>/*/configs/turn-*.yaml every rescan_interval. Each namespace
# with a TURNS listener gets two routes (the bland stealth host and a
# turn.ns-<namespace>.<base_domain> alias) forwarding to its local TURNS port.
turn_discovery:
namespaces_dir: %[6]q
base_domain: %[7]q
rescan_interval: %[8]s
# No static routes: every TURNS route comes from turn_discovery above.
routes: []
`,
SNIRouterListenAddr,
CaddyHTTPSPortBehindSNI,
sniRouterClientHelloTimeout,
sniRouterBackendDialTimeout,
sniRouterMaxConcurrentConns,
si.namespacesDir(),
baseDomain,
sniRouterRescanInterval,
)
}
// generateSystemdUnit renders /etc/systemd/system/orama-sni-router.service.
// Runs as the orama user with CAP_NET_BIND_SERVICE so it can bind :443 without
// root. Ordered Before=caddy.service so the router is ready before Caddy
// switches to :8443. Restart=on-failure.
func (si *SNIRouterInstaller) generateSystemdUnit() string {
return fmt.Sprintf(`[Unit]
Description=Orama SNI Router (TLS-level :443 backend forwarder)
Documentation=https://github.com/DeBrosOfficial/network
After=network.target
Before=caddy.service
PartOf=orama-node.service
[Service]
Type=simple
WorkingDirectory=/opt/orama
EnvironmentFile=-/opt/orama/.orama/data/sni-router.env
ExecStart=%s --config %s
# Bind privileged ports (:80, :443) without running as root.
AmbientCapabilities=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
User=orama
Group=orama
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
LimitNOFILE=65536
TimeoutStopSec=15s
KillMode=mixed
KillSignal=SIGTERM
Restart=on-failure
RestartSec=5s
StandardOutput=journal
StandardError=journal
SyslogIdentifier=orama-sni-router
[Install]
WantedBy=multi-user.target
`, sniRouterBinaryPath, si.configPath())
}
// WriteSystemdUnit writes the unit file. Idempotent.
func (si *SNIRouterInstaller) WriteSystemdUnit() error {
if err := os.WriteFile(sniRouterSystemdUnitPath, []byte(si.generateSystemdUnit()), 0644); err != nil {
return fmt.Errorf("sni-router: write systemd unit %s: %w", sniRouterSystemdUnitPath, err)
}
return nil
}
// IsInstalled reports whether the router binary is present on the node.
func (si *SNIRouterInstaller) IsInstalled() bool {
_, err := os.Stat(sniRouterBinaryPath)
return err == nil
}

View File

@ -0,0 +1,102 @@
package installers
import (
"io"
"os"
"path/filepath"
"strings"
"testing"
)
// newTestSNIRouterInstaller returns an installer rooted at a temp oramaDir so
// Configure writes to an isolated location.
func newTestSNIRouterInstaller(oramaDir string) *SNIRouterInstaller {
return NewSNIRouterInstaller("amd64", io.Discard, oramaDir)
}
// TestGenerateConfig_includesDiscoveryAndFallback verifies the rendered
// sni-router.yaml binds :443, falls back to Caddy on the moved HTTPS port, and
// emits a turn_discovery block pointing at the node's namespaces dir + base
// domain.
func TestGenerateConfig_includesDiscoveryAndFallback(t *testing.T) {
dir := t.TempDir()
si := newTestSNIRouterInstaller(dir)
cfg := si.generateConfig("orama-devnet.network")
for _, want := range []string{
`listen: ":443"`,
"fallback:",
`addr: "127.0.0.1:8443"`,
"turn_discovery:",
"base_domain: \"orama-devnet.network\"",
"rescan_interval: 30s",
"routes: []",
} {
if !strings.Contains(cfg, want) {
t.Errorf("generated sni-router config missing %q\n---\n%s", want, cfg)
}
}
// namespaces_dir must be the node's data/namespaces path.
wantNS := filepath.Join(dir, "data", "namespaces")
if !strings.Contains(cfg, wantNS) {
t.Errorf("config missing namespaces_dir %q\n---\n%s", wantNS, cfg)
}
}
// TestConfigure_writesFileToConfigsDir verifies Configure persists the YAML to
// <oramaDir>/configs/sni-router.yaml.
func TestConfigure_writesFileToConfigsDir(t *testing.T) {
dir := t.TempDir()
si := newTestSNIRouterInstaller(dir)
if err := si.Configure("example.com"); err != nil {
t.Fatalf("Configure failed: %v", err)
}
path := filepath.Join(dir, "configs", "sni-router.yaml")
data, err := os.ReadFile(path)
if err != nil {
t.Fatalf("expected config at %s: %v", path, err)
}
if !strings.Contains(string(data), "base_domain: \"example.com\"") {
t.Errorf("written config missing base_domain; got:\n%s", string(data))
}
}
// TestConfigure_rejectsEmptyBaseDomain verifies the installer refuses an empty
// base domain rather than emitting a config that would derive bogus hostnames.
func TestConfigure_rejectsEmptyBaseDomain(t *testing.T) {
si := newTestSNIRouterInstaller(t.TempDir())
if err := si.Configure(""); err == nil {
t.Errorf("expected error for empty base domain")
}
}
// TestGenerateSystemdUnit_shape verifies the unit grants CAP_NET_BIND_SERVICE,
// runs as orama, restarts on failure, and points ExecStart at the installed
// binary + config.
func TestGenerateSystemdUnit_shape(t *testing.T) {
dir := t.TempDir()
si := newTestSNIRouterInstaller(dir)
unit := si.generateSystemdUnit()
for _, want := range []string{
"AmbientCapabilities=CAP_NET_BIND_SERVICE",
"User=orama",
"Restart=on-failure",
"EnvironmentFile=-/opt/orama/.orama/data/sni-router.env",
// ExecStart must point at the ABSOLUTE config path so it doesn't
// depend on WorkingDirectory/$HOME resolution at runtime.
"ExecStart=/opt/orama/bin/orama-sni-router --config " + si.configPath(),
"Before=caddy.service",
} {
if !strings.Contains(unit, want) {
t.Errorf("systemd unit missing %q\n---\n%s", want, unit)
}
}
if !strings.Contains(si.configPath(), dir) {
t.Errorf("configPath %q not rooted at the oramaDir %q", si.configPath(), dir)
}
}

View File

@ -593,6 +593,20 @@ func (ps *ProductionSetup) Phase3GenerateSecrets() error {
}
ps.logf(" ✓ API key HMAC secret ensured")
// Serverless function secrets encryption key (bugboard #837)
if _, err := ps.secretGenerator.EnsureSecretsEncryptionKey(); err != nil {
return fmt.Errorf("failed to ensure secrets encryption key: %w", err)
}
ps.logf(" ✓ Secrets encryption key ensured")
// WebRTC TURN shared secret (feat-124 #913). Persisting it here lets the
// TURN config survive Phase4 config regeneration so namespace gateways are
// never restarted with an empty turn_secret (the AnChat outage).
if _, err := ps.secretGenerator.EnsureTURNSecret(); err != nil {
return fmt.Errorf("failed to ensure TURN secret: %w", err)
}
ps.logf(" ✓ TURN secret ensured")
// Node identity (unified architecture)
peerID, err := ps.secretGenerator.EnsureNodeIdentity()
if err != nil {
@ -727,11 +741,35 @@ func (ps *ProductionSetup) Phase4GenerateConfigs(peerAddresses []string, vpsIP s
ps.logf(" ✓ ntfy config generated (base_url: %s)", ntfyBaseURL)
}
// Stealth TURN-over-443 (feat-124): when the node opted in
// (sni_router.enabled in the node.yaml just written above), Caddy
// must vacate :443 so the orama-sni-router can own it. Move Caddy's
// HTTPS listener to :8443 BEFORE ConfigureCaddy renders the Caddyfile.
// When not opted in, the Caddyfile is byte-identical to before.
if ps.configGenerator.SNIRouterEnabled() {
ps.binaryInstaller.EnableCaddySNIRouterMode()
ps.logf(" ✓ SNI router enabled — Caddy HTTPS will bind :8443")
}
if err := ps.binaryInstaller.ConfigureCaddy(caddyDomain, email, acmeEndpoint, baseDomain); err != nil {
ps.logf(" ⚠️ Caddy config warning: %v", err)
} else {
ps.logf(" ✓ Caddy config generated")
}
// Stealth TURN-over-443 (feat-124): when opted in, write the
// orama-sni-router config (listen :443, fallback Caddy :8443,
// turn_discovery scanning this node's namespaces dir for the cluster's
// base domain). The unit lifecycle is driven in Phase5 after Caddy has
// moved to :8443. The router uses the base domain as the zone for
// stealth/turn.ns-* hostnames.
if ps.configGenerator.SNIRouterEnabled() {
if err := ps.binaryInstaller.ConfigureSNIRouter(dnsZone); err != nil {
ps.logf(" ⚠️ SNI router config warning: %v", err)
} else {
ps.logf(" ✓ SNI router config generated (zone: %s)", dnsZone)
}
}
}
return nil
@ -857,6 +895,14 @@ func (ps *ProductionSetup) Phase5CreateSystemdServices(enableHTTPS bool) error {
}
}
// SNI router unit (feat-124). Write the unit whenever the binary is present
// so the daemon-reload below picks it up; the enable/start vs stop/disable
// decision (based on sni_router.enabled) happens after Caddy has moved to
// :8443, in the start section.
if ps.binaryInstaller.WriteSNIRouterUnit() == nil {
ps.logf(" ✓ SNI router service unit created: %s", ps.binaryInstaller.SNIRouterServiceName())
}
// Reload systemd daemon
if err := ps.serviceController.DaemonReload(); err != nil {
return fmt.Errorf("failed to reload systemd: %w", err)
@ -966,6 +1012,31 @@ func (ps *ProductionSetup) Phase5CreateSystemdServices(enableHTTPS bool) error {
}
}
// Stealth TURN-over-443 (feat-124) cutover. Caddy has just been
// reconfigured to :8443 and restarted above, so :443 is now free for the
// SNI router. When opted in, enable+start the router; when not, stop+disable
// it so a node that flipped the flag off cleanly returns :443 to Caddy.
sniSvc := ps.binaryInstaller.SNIRouterServiceName()
if ps.configGenerator.SNIRouterEnabled() {
if err := ps.serviceController.EnableService(sniSvc); err != nil {
ps.logf(" ⚠️ Failed to enable %s: %v", sniSvc, err)
}
if err := ps.serviceController.RestartService(sniSvc); err != nil {
ps.logf(" ⚠️ Failed to start %s: %v", sniSvc, err)
} else {
ps.logf(" - %s started (owns :443)", sniSvc)
}
} else {
// Not opted in: ensure the router is not holding :443. Errors are
// non-fatal — the unit may simply not be loaded on this node.
if err := ps.serviceController.StopService(sniSvc); err != nil {
ps.logf(" %s not running (expected when disabled): %v", sniSvc, err)
}
if err := ps.serviceController.DisableService(sniSvc); err != nil {
ps.logf(" %s not enabled (expected when disabled): %v", sniSvc, err)
}
}
// Start ntfy on every node (#72). Caddy must already be up (it
// terminates TLS for push.<dnsZone>), which the order above
// guarantees.

View File

@ -147,6 +147,22 @@ func (ps *ProductionSetup) installFromPreBuilt(manifest *PreBuiltManifest) error
return fmt.Errorf("failed to set capabilities: %w", err)
}
// Install ntfy on every node (feature #72). ntfy is not bundled in
// the pre-built archive — its installer downloads from upstream and
// verifies the SHA-256 checksum. ntfy listens on
// 127.0.0.1:NtfyListenPort only (no public exposure), so it's safe
// to run cluster-wide; nodes that don't serve a public push.* DNS
// entry just have an idle ntfy with no inbound traffic. Uniform
// install means no per-node toggling and no surprises when DNS
// topology changes.
//
// Note: this must run BEFORE Phase 4's ConfigureNtfy, otherwise the
// chown of /etc/ntfy/server.yml fails because the `ntfy` user
// doesn't exist yet.
if err := ps.binaryInstaller.InstallNtfy(); err != nil {
ps.logf(" ⚠️ ntfy install warning: %v", err)
}
// Disable systemd-resolved stub listener for nameserver nodes
// (needed even in pre-built mode so CoreDNS can bind port 53)
if ps.isNameserver {

View File

@ -0,0 +1,80 @@
package production
import (
"encoding/hex"
"os"
"path/filepath"
"strings"
"testing"
)
// TestEnsureSecretsEncryptionKey_generatesAndPersists verifies that a fresh
// oramaDir produces a valid 32-byte hex key written to disk.
func TestEnsureSecretsEncryptionKey_generatesAndPersists(t *testing.T) {
dir := t.TempDir()
sg := NewSecretGenerator(dir)
key, err := sg.EnsureSecretsEncryptionKey()
if err != nil {
t.Fatalf("EnsureSecretsEncryptionKey failed: %v", err)
}
if len(key) != 64 {
t.Fatalf("expected 64 hex chars, got %d (%q)", len(key), key)
}
raw, err := hex.DecodeString(key)
if err != nil || len(raw) != 32 {
t.Fatalf("key is not 32 bytes hex: err=%v len=%d", err, len(raw))
}
// Persisted to the expected path.
data, err := os.ReadFile(filepath.Join(dir, "secrets", "secrets-encryption-key"))
if err != nil {
t.Fatalf("reading persisted key failed: %v", err)
}
if strings.TrimSpace(string(data)) != key {
t.Errorf("persisted key %q != returned key %q", strings.TrimSpace(string(data)), key)
}
}
// TestEnsureSecretsEncryptionKey_idempotent verifies the key is stable across
// calls — this is the property that makes secrets survive restarts and stay
// identical across cluster nodes (bugboard #837).
func TestEnsureSecretsEncryptionKey_idempotent(t *testing.T) {
dir := t.TempDir()
sg := NewSecretGenerator(dir)
first, err := sg.EnsureSecretsEncryptionKey()
if err != nil {
t.Fatalf("first call failed: %v", err)
}
second, err := sg.EnsureSecretsEncryptionKey()
if err != nil {
t.Fatalf("second call failed: %v", err)
}
if first != second {
t.Errorf("key changed between calls: %q != %q", first, second)
}
}
// TestEnsureSecretsEncryptionKey_regeneratesInvalid verifies a corrupt/empty
// on-disk key (wrong length) is replaced with a fresh valid one.
func TestEnsureSecretsEncryptionKey_regeneratesInvalid(t *testing.T) {
dir := t.TempDir()
secretsDir := filepath.Join(dir, "secrets")
if err := os.MkdirAll(secretsDir, 0700); err != nil {
t.Fatalf("mkdir failed: %v", err)
}
keyPath := filepath.Join(secretsDir, "secrets-encryption-key")
if err := os.WriteFile(keyPath, []byte("too-short"), 0600); err != nil {
t.Fatalf("write failed: %v", err)
}
sg := NewSecretGenerator(dir)
key, err := sg.EnsureSecretsEncryptionKey()
if err != nil {
t.Fatalf("EnsureSecretsEncryptionKey failed: %v", err)
}
if len(key) != 64 {
t.Errorf("expected regenerated 64-char key, got %d (%q)", len(key), key)
}
}

View File

@ -0,0 +1,72 @@
package production
import (
"strings"
"testing"
)
// TestGenerateNodeConfig_preservesSNIRouterEnabled is the regression test for
// the feat-124 regen-wipe class of outage (cf. bugboard #259/#846 for webrtc):
// a config regeneration must NOT silently reset an operator's
// sni_router.enabled: true back to false, which would stop the :443 router and
// break stealth TURN. We write a node.yaml with the flag set, regenerate, and
// assert it survives.
func TestGenerateNodeConfig_preservesSNIRouterEnabled(t *testing.T) {
dir := t.TempDir()
writeNodeYAML(t, dir, `sni_router:
enabled: true
http_gateway:
enabled: true
`)
cg := NewConfigGenerator(dir)
out, err := cg.GenerateNodeConfig(nil, "10.0.0.5", "", "node-1.dbrs.space", "dbrs.space", false)
if err != nil {
t.Fatalf("GenerateNodeConfig failed: %v", err)
}
if !strings.Contains(out, "sni_router:") {
t.Fatalf("regenerated node.yaml missing sni_router block\n---\n%s", out)
}
if !strings.Contains(out, "enabled: true") {
t.Errorf("regenerated node.yaml did not preserve sni_router.enabled: true\n---\n%s", out)
}
}
// TestGenerateNodeConfig_sniRouterDefaultsFalse verifies a fresh install (no
// existing node.yaml) renders sni_router.enabled: false — default OFF.
func TestGenerateNodeConfig_sniRouterDefaultsFalse(t *testing.T) {
dir := t.TempDir()
cg := NewConfigGenerator(dir)
out, err := cg.GenerateNodeConfig(nil, "10.0.0.5", "", "node-1.dbrs.space", "dbrs.space", false)
if err != nil {
t.Fatalf("GenerateNodeConfig failed: %v", err)
}
if !strings.Contains(out, "sni_router:") {
t.Fatalf("node.yaml missing sni_router block\n---\n%s", out)
}
if !strings.Contains(out, "enabled: false") {
t.Errorf("fresh node.yaml should render sni_router.enabled: false\n---\n%s", out)
}
if cg.SNIRouterEnabled() {
t.Errorf("SNIRouterEnabled() should be false on a fresh install")
}
}
// TestGenerateNodeConfig_sniRouterDisabledStaysFalse verifies an existing
// node.yaml that explicitly disabled the router does not flip on during regen.
func TestGenerateNodeConfig_sniRouterDisabledStaysFalse(t *testing.T) {
dir := t.TempDir()
writeNodeYAML(t, dir, "sni_router:\n enabled: false\nhttp_gateway:\n enabled: true\n")
cg := NewConfigGenerator(dir)
out, err := cg.GenerateNodeConfig(nil, "10.0.0.5", "", "node-1.dbrs.space", "dbrs.space", false)
if err != nil {
t.Fatalf("GenerateNodeConfig failed: %v", err)
}
if !strings.Contains(out, "enabled: false") {
t.Errorf("disabled sni_router should stay false on regen\n---\n%s", out)
}
}

View File

@ -0,0 +1,190 @@
package production
import (
"encoding/hex"
"os"
"path/filepath"
"strings"
"testing"
)
// TestEnsureTURNSecret_generatesAndPersists verifies that a fresh oramaDir
// produces a valid 32-byte hex secret written to secrets/turn-secret.
func TestEnsureTURNSecret_generatesAndPersists(t *testing.T) {
dir := t.TempDir()
sg := NewSecretGenerator(dir)
secret, err := sg.EnsureTURNSecret()
if err != nil {
t.Fatalf("EnsureTURNSecret failed: %v", err)
}
if len(secret) != 64 {
t.Fatalf("expected 64 hex chars, got %d (%q)", len(secret), secret)
}
raw, err := hex.DecodeString(secret)
if err != nil || len(raw) != 32 {
t.Fatalf("secret is not 32 bytes hex: err=%v len=%d", err, len(raw))
}
data, err := os.ReadFile(filepath.Join(dir, "secrets", "turn-secret"))
if err != nil {
t.Fatalf("reading persisted secret failed: %v", err)
}
if strings.TrimSpace(string(data)) != secret {
t.Errorf("persisted secret %q != returned secret %q", strings.TrimSpace(string(data)), secret)
}
}
// TestEnsureTURNSecret_idempotent verifies the secret is stable across calls —
// the property that keeps TURN credentials valid across restarts and identical
// across cluster nodes (feat-124 #913).
func TestEnsureTURNSecret_idempotent(t *testing.T) {
dir := t.TempDir()
sg := NewSecretGenerator(dir)
first, err := sg.EnsureTURNSecret()
if err != nil {
t.Fatalf("first call failed: %v", err)
}
second, err := sg.EnsureTURNSecret()
if err != nil {
t.Fatalf("second call failed: %v", err)
}
if first != second {
t.Errorf("secret changed between calls: %q != %q", first, second)
}
}
// TestEnsureTURNSecret_regeneratesInvalid verifies a corrupt/short on-disk
// secret is replaced with a fresh valid one.
func TestEnsureTURNSecret_regeneratesInvalid(t *testing.T) {
dir := t.TempDir()
secretsDir := filepath.Join(dir, "secrets")
if err := os.MkdirAll(secretsDir, 0700); err != nil {
t.Fatalf("mkdir failed: %v", err)
}
if err := os.WriteFile(filepath.Join(secretsDir, "turn-secret"), []byte("too-short"), 0600); err != nil {
t.Fatalf("write failed: %v", err)
}
sg := NewSecretGenerator(dir)
secret, err := sg.EnsureTURNSecret()
if err != nil {
t.Fatalf("EnsureTURNSecret failed: %v", err)
}
if len(secret) != 64 {
t.Errorf("expected regenerated 64-char secret, got %d (%q)", len(secret), secret)
}
}
// writeNodeYAML is a test helper that writes content to the canonical node
// config path the config generator reads/writes.
func writeNodeYAML(t *testing.T, oramaDir, content string) {
t.Helper()
configDir := filepath.Join(oramaDir, "configs")
if err := os.MkdirAll(configDir, 0755); err != nil {
t.Fatalf("mkdir configs failed: %v", err)
}
if err := os.WriteFile(filepath.Join(configDir, "node.yaml"), []byte(content), 0644); err != nil {
t.Fatalf("write node.yaml failed: %v", err)
}
}
// TestGenerateNodeConfig_preservesExistingWebRTC is the regression test for the
// feat-124 #913 outage: a regen must NOT wipe an operator's webrtc block. We
// write a node.yaml with a full webrtc block, regenerate, and assert the block
// (enabled, sfu_port, turn_domain, turn_secret) survives — and that the secret
// gets persisted to the durable secrets file.
func TestGenerateNodeConfig_preservesExistingWebRTC(t *testing.T) {
const turnSecret = "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
const turnDomain = "turn.ns-anchat.dbrs.space"
dir := t.TempDir()
writeNodeYAML(t, dir, `http_gateway:
enabled: true
webrtc:
enabled: true
sfu_port: 30007
turn_domain: "turn.ns-anchat.dbrs.space"
turn_secret: "`+turnSecret+`"
`)
cg := NewConfigGenerator(dir)
out, err := cg.GenerateNodeConfig(nil, "10.0.0.5", "", "node-1.dbrs.space", "dbrs.space", false)
if err != nil {
t.Fatalf("GenerateNodeConfig failed: %v", err)
}
for _, want := range []string{
"webrtc:",
"turn_secret: \"" + turnSecret + "\"",
"turn_domain: \"" + turnDomain + "\"",
"sfu_port: 30007",
} {
if !strings.Contains(out, want) {
t.Errorf("regenerated node.yaml missing %q\n---\n%s", want, out)
}
}
// The secret must now be durable in the secrets file (yaml-had-secret →
// file gets persisted), so the NEXT regen survives even if the operator's
// yaml is gone.
persisted, err := os.ReadFile(filepath.Join(dir, "secrets", "turn-secret"))
if err != nil {
t.Fatalf("TURN secret was not persisted to secrets dir: %v", err)
}
if strings.TrimSpace(string(persisted)) != turnSecret {
t.Errorf("persisted secret %q != yaml secret %q", strings.TrimSpace(string(persisted)), turnSecret)
}
}
// TestGenerateNodeConfig_persistedSecretSurvivesWipedYAML verifies the durable
// mechanism: once the secret is in secrets/turn-secret, a regen from a node.yaml
// that LOST its webrtc block still renders turn_secret (defaulting sfu_port).
func TestGenerateNodeConfig_persistedSecretSurvivesWipedYAML(t *testing.T) {
const turnSecret = "abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789"
dir := t.TempDir()
secretsDir := filepath.Join(dir, "secrets")
if err := os.MkdirAll(secretsDir, 0700); err != nil {
t.Fatalf("mkdir secrets failed: %v", err)
}
if err := os.WriteFile(filepath.Join(secretsDir, "turn-secret"), []byte(turnSecret), 0600); err != nil {
t.Fatalf("write turn-secret failed: %v", err)
}
// Existing node.yaml with NO webrtc block (simulates the wiped state).
writeNodeYAML(t, dir, "http_gateway:\n enabled: true\n")
cg := NewConfigGenerator(dir)
out, err := cg.GenerateNodeConfig(nil, "10.0.0.5", "", "node-1.dbrs.space", "dbrs.space", false)
if err != nil {
t.Fatalf("GenerateNodeConfig failed: %v", err)
}
if !strings.Contains(out, "turn_secret: \""+turnSecret+"\"") {
t.Errorf("rendered node.yaml missing persisted turn_secret\n---\n%s", out)
}
// sfu_port had no source → defaults to the named constant.
if !strings.Contains(out, "sfu_port: 30000") {
t.Errorf("expected default sfu_port 30000, got:\n%s", out)
}
}
// TestGenerateNodeConfig_noWebRTCOmitsBlock verifies clusters without any TURN
// config render no webrtc block at all (no empty values leak in).
func TestGenerateNodeConfig_noWebRTCOmitsBlock(t *testing.T) {
dir := t.TempDir()
cg := NewConfigGenerator(dir)
out, err := cg.GenerateNodeConfig(nil, "10.0.0.5", "", "node-1.dbrs.space", "dbrs.space", false)
if err != nil {
t.Fatalf("GenerateNodeConfig failed: %v", err)
}
if strings.Contains(out, "webrtc:") {
t.Errorf("expected no webrtc block when no TURN config present, got:\n%s", out)
}
// Sanity: ensure no orphan turn-secret file was created.
if _, err := os.Stat(filepath.Join(dir, "secrets", "turn-secret")); !os.IsNotExist(err) {
t.Errorf("turn-secret file should not exist when no TURN config present")
}
}

View File

@ -15,6 +15,14 @@ node:
operator_wallet: "{{.OperatorWallet}}"
{{- end}}
# Stealth TURN-over-443 SNI router (feat-124). When enabled, the node runs
# orama-sni-router on :443 and Caddy is moved to :8443; default-OFF so existing
# nodes are byte-identical until an operator opts in. This block is preserved
# across config regeneration (GenerateNodeConfig carries forward an existing
# sni_router.enabled: true).
sni_router:
enabled: {{if .SNIRouterEnabled}}true{{else}}false{{end}}
database:
data_dir: "{{.DataDir}}/rqlite"
replication_factor: 3
@ -88,6 +96,22 @@ http_gateway:
ipfs_cluster_api_url: "http://localhost:{{.ClusterAPIPort}}"
ipfs_api_url: "http://localhost:{{.IPFSAPIPort}}"
ipfs_timeout: "60s"
{{- if .SecretsEncryptionKey}}
# Serverless function secrets encryption key (AES-256, hex). Must be
# identical on every namespace-gateway node and stable across restarts
# (bugboard #837). Sourced from ~/.orama/secrets/secrets-encryption-key.
secrets_encryption_key: "{{.SecretsEncryptionKey}}"
{{- end}}
{{- if .TURNSecret}}
# WebRTC/TURN config (feat-124 #913). turn_secret is sourced from
# ~/.orama/secrets/turn-secret so it survives config regeneration;
# turn_domain/sfu_port are carried forward from the previous node.yaml.
webrtc:
enabled: true
sfu_port: {{.SFUPort}}
turn_domain: "{{.TURNDomain}}"
turn_secret: "{{.TURNSecret}}"
{{- end}}
# Routes for internal service reverse proxy (kept for backwards compatibility but not used by full gateway)
routes: {}

View File

@ -46,6 +46,36 @@ type NodeConfigData struct {
SSHUser string // SSH user for remote management
Environment string // Environment name (devnet, testnet, etc.)
OperatorWallet string // Operator wallet address
// SecretsEncryptionKey is the AES-256 key (hex, 64 chars) used to encrypt
// serverless function secrets at rest. Rendered under http_gateway in
// node.yaml. Sourced from ~/.orama/secrets/secrets-encryption-key — must
// be identical across all namespace-gateway nodes in a cluster and stable
// across restarts (bugboard #837). Empty → key omitted from the rendered
// config (the gateway then reads the secret file directly / get_secret
// stays disabled until the key is configured).
SecretsEncryptionKey string
// WebRTC/TURN configuration, rendered under http_gateway.webrtc when
// WebRTCEnabled is true (feat-124 #913). TURNSecret is sourced from
// ~/.orama/secrets/turn-secret so it survives Phase4 config regeneration;
// TURNDomain/SFUPort are operator-set values carried forward from the
// existing node.yaml. The whole block is conditional on TURNSecret being
// set — clusters without TURN render nothing.
WebRTCEnabled bool // Whether to emit the webrtc block
SFUPort int // Local SFU signaling port the gateway proxies to
TURNDomain string // TURN domain (e.g., "turn.ns-myapp.dbrs.space")
TURNSecret string // HMAC-SHA1 shared secret for TURN credential generation
// SNIRouterEnabled gates the stealth TURN-over-443 SNI router (feat-124).
// Rendered as the top-level sni_router.enabled flag. Default false keeps
// existing nodes byte-identical (Caddy stays on :443); when true the node
// runs orama-sni-router on :443 and Caddy moves to :8443. This value is
// carried forward across config regeneration from the existing node.yaml
// (see production/config.go populateSNIRouterConfig) so a regen never wipes
// an operator's opt-in (the same preserve-from-existing discipline as the
// webrtc block, bugboard #259/#846).
SNIRouterEnabled bool
}
// GatewayConfigData holds parameters for gateway.yaml rendering

View File

@ -41,6 +41,98 @@ func TestRenderNodeConfig(t *testing.T) {
}
}
func TestRenderNodeConfig_secretsEncryptionKey(t *testing.T) {
const key = "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
// Happy path: key present → rendered under http_gateway.
withKey, err := RenderNodeConfig(NodeConfigData{
NodeID: "node1",
SecretsEncryptionKey: key,
})
if err != nil {
t.Fatalf("RenderNodeConfig failed: %v", err)
}
want := "secrets_encryption_key: \"" + key + "\""
if !strings.Contains(withKey, want) {
t.Errorf("rendered node config missing secrets key line %q\n---\n%s", want, withKey)
}
// Edge case: empty key → line omitted entirely (no empty value rendered).
withoutKey, err := RenderNodeConfig(NodeConfigData{NodeID: "node1"})
if err != nil {
t.Fatalf("RenderNodeConfig failed: %v", err)
}
if strings.Contains(withoutKey, "secrets_encryption_key") {
t.Errorf("empty key should omit secrets_encryption_key line, got:\n%s", withoutKey)
}
}
func TestRenderNodeConfig_webRTC(t *testing.T) {
const secret = "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
// Happy path: TURN secret present → full webrtc block rendered.
withWebRTC, err := RenderNodeConfig(NodeConfigData{
NodeID: "node1",
WebRTCEnabled: true,
SFUPort: 30007,
TURNDomain: "turn.ns-anchat.dbrs.space",
TURNSecret: secret,
})
if err != nil {
t.Fatalf("RenderNodeConfig failed: %v", err)
}
for _, want := range []string{
"webrtc:",
"enabled: true",
"sfu_port: 30007",
"turn_domain: \"turn.ns-anchat.dbrs.space\"",
"turn_secret: \"" + secret + "\"",
} {
if !strings.Contains(withWebRTC, want) {
t.Errorf("rendered node config missing webrtc line %q\n---\n%s", want, withWebRTC)
}
}
// Edge case: no TURN secret → block omitted entirely.
withoutWebRTC, err := RenderNodeConfig(NodeConfigData{NodeID: "node1"})
if err != nil {
t.Fatalf("RenderNodeConfig failed: %v", err)
}
if strings.Contains(withoutWebRTC, "webrtc:") {
t.Errorf("empty TURN secret should omit webrtc block, got:\n%s", withoutWebRTC)
}
}
func TestRenderNodeConfig_sniRouter(t *testing.T) {
// Enabled: top-level sni_router block renders enabled: true.
enabled, err := RenderNodeConfig(NodeConfigData{
NodeID: "node1",
SNIRouterEnabled: true,
})
if err != nil {
t.Fatalf("RenderNodeConfig failed: %v", err)
}
if !strings.Contains(enabled, "sni_router:") {
t.Errorf("rendered node config missing sni_router block\n---\n%s", enabled)
}
if !strings.Contains(enabled, "enabled: true") {
t.Errorf("sni_router should render enabled: true\n---\n%s", enabled)
}
// Default: the block is always present, defaulting to false (so the flag is
// discoverable to operators and round-trips through regen).
disabled, err := RenderNodeConfig(NodeConfigData{NodeID: "node1"})
if err != nil {
t.Fatalf("RenderNodeConfig failed: %v", err)
}
if !strings.Contains(disabled, "sni_router:") {
t.Errorf("sni_router block should always be present\n---\n%s", disabled)
}
if !strings.Contains(disabled, "enabled: false") {
t.Errorf("default sni_router should render enabled: false\n---\n%s", disabled)
}
}
func TestRenderGatewayConfig(t *testing.T) {
bootstrapMultiaddr := "/ip4/127.0.0.1/tcp/4001/p2p/Qm1234567890"
data := GatewayConfigData{

View File

@ -51,11 +51,27 @@ type Config struct {
// Loaded from ~/.orama/secrets/api-key-hmac-secret.
APIKeyHMACSecret string
// WebRTC configuration (set when namespace has WebRTC enabled)
WebRTCEnabled bool // Whether WebRTC endpoints are active on this gateway
SFUPort int // Local SFU signaling port to proxy WebSocket connections to
// SecretsEncryptionKey is the AES-256 key (32 bytes, hex-encoded → 64
// hex chars) used to encrypt serverless function secrets at rest in the
// function_secrets table. It MUST be identical on every namespace-gateway
// node in a cluster and stable across restarts — otherwise secrets
// encrypted by one process cannot be decrypted by another (bugboard #837).
// Loaded from ~/.orama/secrets/secrets-encryption-key.
SecretsEncryptionKey string
// WebRTC configuration (set when namespace has WebRTC enabled).
//
// WebRTCEnabled is RETAINED for back-compat with operator YAML and
// the spawn-handler request shape, but no longer gates route
// registration (bugboard #411). Routes auto-register whenever
// SFUPort > 0 — the actual operational prerequisite. Validate still
// uses WebRTCEnabled to enforce "if you opted in, you MUST set the
// dependent fields", which catches obvious YAML typos at config
// load.
WebRTCEnabled bool // legacy opt-in; routes auto-register when SFUPort>0 regardless. Kept for back-compat.
SFUPort int // Local SFU signaling port to proxy WebSocket connections to. >0 = WebRTC routes registered.
TURNDomain string // TURN server domain for credential generation
TURNSecret string // HMAC-SHA1 shared secret for TURN credential generation
TURNSecret string // HMAC-SHA1 shared secret for TURN credential generation (empty → /v1/webrtc/turn/credentials returns 503)
// StealthCDNDomain, when set, makes the WebRTC credentials handler
// advertise turns:<StealthCDNDomain>:443 (served by the SNI router).

View File

@ -468,10 +468,25 @@ func initializeServerless(logger *logging.ColoredLogger, cfg *Config, deps *Depe
engineCfg.DefaultTimeoutSeconds = 30
engineCfg.MaxTimeoutSeconds = 60
engineCfg.ModuleCacheSize = 100
// Surface the per-phase slow-invoke diagnostic (instantiate_ms / run_ms)
// above 1s instead of the 5s default — a >1s serverless invocation is
// genuinely slow (well-built handlers are <300ms), and this makes the
// cold-start floor (bugboard #27: async-dispatched stateless handlers pay a
// fresh instantiate + TinyGo _start per call) visible for correlation
// against client-side request_ids.
engineCfg.SlowInvokeThresholdMs = 1000
// Create secrets manager for serverless functions (AES-256-GCM encrypted)
// Create secrets manager for serverless functions (AES-256-GCM encrypted).
//
// The encryption key comes from the gateway Config (loaded from
// ~/.orama/secrets/secrets-encryption-key), NOT from engineCfg — engineCfg
// never has the key set, so passing it always produced a per-process
// ephemeral key and made get_secret return undecryptable values
// (bugboard #837). allowEphemeral=false: a missing/invalid key fails
// loudly here and disables get_secret rather than silently corrupting
// secrets.
var secretsMgr serverless.SecretsManager
if smImpl, secretsErr := hostfunctions.NewDBSecretsManager(deps.ORMClient, engineCfg.SecretsEncryptionKey, logger.Logger); secretsErr != nil {
if smImpl, secretsErr := hostfunctions.NewDBSecretsManager(deps.ORMClient, cfg.SecretsEncryptionKey, false, logger.Logger); secretsErr != nil {
logger.ComponentWarn(logging.ComponentGeneral, "Failed to initialize secrets manager; get_secret will be unavailable",
zap.Error(secretsErr))
} else {
@ -506,6 +521,12 @@ func initializeServerless(logger *logging.ColoredLogger, cfg *Config, deps *Depe
hostFuncsCfg := hostfunctions.HostFunctionsConfig{
IPFSAPIURL: cfg.IPFSAPIURL,
HTTPTimeout: 30 * time.Second,
// feat-9 — TURN config for the turn_credentials host fn.
// Empty TURNSecret → host fn returns {configured:false} envelope
// (same shape as the HTTP endpoint's 503 semantically).
TURNDomain: cfg.TURNDomain,
TURNSecret: cfg.TURNSecret,
StealthCDNDomain: cfg.StealthCDNDomain,
}
// WS-PubSub bridge: wire PubSub topics directly to WS clients without
// per-event WASM invocation. The bridge is a thin layer over the
@ -558,13 +579,25 @@ func initializeServerless(logger *logging.ColoredLogger, cfg *Config, deps *Depe
if deps.OlricClient != nil {
olricUnderlying = deps.OlricClient.UnderlyingClient()
}
// Pass the pubsub adapter so the dispatcher can subscribe to libp2p
// for every literal trigger pattern (bugboard #282 fix). nil-safe:
// dispatcher's Start/Refresh become no-ops when adapter is unavailable,
// preserving the legacy HTTP-only Dispatch hook.
deps.PubSubDispatcher = triggers.NewPubSubDispatcher(
triggerStore,
deps.ServerlessInvoker,
olricUnderlying,
pubsubAdapter,
logger.Logger,
)
// Wire the dispatcher into hostFuncs so PubSubPublish /
// PubSubPublishBatch fire local wildcard triggers immediately on
// publish — closes the bugboard #93 gap where WASM publishes to e.g.
// "presence:user-1" never reached wildcard handlers like "presence:*"
// because libp2p has no wildcard subscribe.
hostFuncs.SetTriggerDispatcher(deps.PubSubDispatcher)
// Cron trigger store + scheduler. The scheduler polls
// function_cron_triggers and invokes due rows via the same
// ServerlessInvoker used for PubSub triggers; the ↓ Start call wires
@ -964,7 +997,14 @@ func buildPushDispatcher(
}, logger.Logger))
}
// APNs is fully credentialed — no YAML fallback. The presence of
// per-namespace credentials is the trigger.
// per-namespace credentials is the trigger. Bugboard #408: a
// single set of APNs credentials spawns BOTH an alert-kind
// provider (registered as "apns") AND a VoIP/PushKit provider
// (registered as "apns_voip"). Both share the same JWT signer +
// HTTP/2 client pool — VoIP only differs in the per-Send wire
// format (topic suffix, apns-push-type header, empty-payload
// acceptance). Tenants register PushKit voipPushTokens against
// provider="apns_voip" and the dispatcher routes accordingly.
if c.Namespace != "" && credManager != nil {
if cred, err := credManager.Get(ctx, c.Namespace, "apns"); err == nil && cred != nil {
if apnsCfg, perr := pushapns.ParseCredentials(cred.JSON); perr == nil {
@ -976,6 +1016,14 @@ func buildPushDispatcher(
zap.String("namespace", c.Namespace),
zap.Error(nerr))
}
if voipProvider, nerr := pushapns.NewVoIP(apnsCfg, logger.Logger); nerr == nil {
ps = append(ps, voipProvider)
} else {
logger.ComponentWarn(logging.ComponentGeneral,
"apns_voip provider construction failed",
zap.String("namespace", c.Namespace),
zap.Error(nerr))
}
} else {
logger.ComponentWarn(logging.ComponentGeneral,
"apns credentials parse failed",

View File

@ -13,8 +13,6 @@ import (
"net/http"
"path/filepath"
"reflect"
"strconv"
"strings"
"sync"
"time"
@ -144,6 +142,14 @@ type Gateway struct {
// WebRTC signaling and TURN credentials
webrtcHandlers *webrtchandlers.WebRTCHandlers
// webrtcServeTURNCredentials gates the /v1/webrtc/turn/credentials
// route; webrtcServeSFURoutes gates /v1/webrtc/signal + /rooms.
// Decoupled (bugboard #25): TURN credentials only need the namespace
// TURN secret (the actual TURN servers are remote), so a gateway node
// that doesn't run a local SFU can still mint credentials. SFU
// signaling/rooms require a local SFU port to proxy to.
webrtcServeTURNCredentials bool
webrtcServeSFURoutes bool
// WireGuard peer exchange
wireguardHandler *wireguardhandlers.Handler
@ -315,6 +321,13 @@ func New(logger *logging.ColoredLogger, cfg *Config) (*Gateway, error) {
IdleConnTimeout: 90 * time.Second,
},
}
// Wire the JWT verifier so the persistent WS handler can apply
// mid-session auth refresh on the open WS (bugboard #321 control
// frame). Skipped when either dep is nil — the handler then acks
// "not supported" and the client falls back to legacy reconnect.
if gw.serverlessHandlers != nil && gw.authService != nil {
gw.serverlessHandlers.SetJWTVerifier(gw.authService)
}
// Resolve local WireGuard IP for local namespace gateway preference
if wgIP, err := GetWireGuardIP(); err == nil {
@ -362,6 +375,17 @@ func New(logger *logging.ColoredLogger, cfg *Config) (*Gateway, error) {
gw.pubsubHandlers.SetOnPublish(func(ctx context.Context, namespace, topic string, data []byte) {
deps.PubSubDispatcher.Dispatch(ctx, namespace, topic, data, 0)
})
// Subscribe the dispatcher to libp2p pubsub for every literal
// trigger pattern so WASM `oh.PubSubPublish` calls reach trigger
// handlers (bugboard #282 — pre-fix, the dispatcher only fired
// from the HTTP publish hook above, so internal WASM publishes
// silently dropped every subscriber). Stop is called from
// lifecycle.Close.
if err := deps.PubSubDispatcher.Start(context.Background()); err != nil {
logger.ComponentWarn(logging.ComponentGeneral,
"PubSubDispatcher Start failed (libp2p subscribe path disabled — HTTP-publish triggers still work)",
zap.Error(err))
}
}
if deps.PersistentWSManager != nil {
gw.persistentWSManager = deps.PersistentWSManager
@ -398,7 +422,15 @@ func New(logger *logging.ColoredLogger, cfg *Config) (*Gateway, error) {
gw.pushHandlers.SetCredentialsManager(deps.PushCredentialsManager)
}
if cfg.WebRTCEnabled && cfg.SFUPort > 0 {
// WebRTC route registration. Construct the handler when EITHER a
// local SFU is configured (for signal/rooms) OR a TURN secret is set
// (for credentials) — the two are decoupled (bugboard #25). A gateway
// node that isn't an SFU node but has the namespace TURN secret can
// still serve /v1/webrtc/turn/credentials (the TURN servers are
// remote; credentials are just an HMAC of the shared secret).
gw.webrtcServeSFURoutes = shouldRegisterWebRTCRoutes(cfg)
gw.webrtcServeTURNCredentials = shouldServeTURNCredentials(cfg)
if gw.webrtcServeSFURoutes || gw.webrtcServeTURNCredentials {
gw.webrtcHandlers = webrtchandlers.NewWebRTCHandlers(
logger,
gw.localWireGuardIP,
@ -408,7 +440,11 @@ func New(logger *logging.ColoredLogger, cfg *Config) (*Gateway, error) {
gw.proxyWebSocket,
)
logger.ComponentInfo(logging.ComponentGeneral, "WebRTC handlers initialized",
zap.Int("sfu_port", cfg.SFUPort))
zap.Int("sfu_port", cfg.SFUPort),
zap.Bool("turn_secret_set", cfg.TURNSecret != ""),
zap.Bool("serve_turn_credentials", gw.webrtcServeTURNCredentials),
zap.Bool("serve_sfu_routes", gw.webrtcServeSFURoutes),
zap.Bool("legacy_webrtc_enabled_flag", cfg.WebRTCEnabled))
}
if deps.OlricClient != nil {
@ -647,24 +683,19 @@ func New(logger *logging.ColoredLogger, cfg *Config) (*Gateway, error) {
// Get libp2p host from client
host := deps.Client.Host()
if host != nil {
// Parse listen port from ListenAddr (format: ":port" or "addr:port")
listenPort := 0
if cfg.ListenAddr != "" {
parts := strings.Split(cfg.ListenAddr, ":")
if len(parts) > 0 {
portStr := parts[len(parts)-1]
if p, err := strconv.Atoi(portStr); err == nil {
listenPort = p
}
}
}
// NOTE: we deliberately do NOT pass cfg.ListenAddr's port here
// anymore — that's the gateway's HTTP API port, NOT the libp2p
// port. Passing it caused every cross-node libp2p dial to land
// on the HTTP server and fail the multistream handshake,
// leaving the namespace mesh with 0 connected peers. The libp2p
// port is OS-assigned and lives on host.Addrs() — peer
// discovery extracts it from there at register time.
// Create peer discovery manager
gw.peerDiscovery = NewPeerDiscovery(
host,
deps.SQLDB,
cfg.NodePeerID,
listenPort,
cfg.ClientNamespace,
logger.Logger,
)
@ -729,6 +760,52 @@ func New(logger *logging.ColoredLogger, cfg *Config) (*Gateway, error) {
return gw, nil
}
// shouldRegisterWebRTCRoutes decides whether `/v1/webrtc/*` routes
// (turn/credentials, signal, rooms) get wired up in the request mux.
//
// Bugboard #411 — pre-fix this required BOTH cfg.WebRTCEnabled AND
// cfg.SFUPort > 0. The boolean flag was a silent-404 footgun: spawn-
// handler-provisioned namespace gateways defaulted to
// WebRTCEnabled=false even when their SFU service was up and SFUPort
// was set. AnChat hit 404 on /v1/webrtc/turn/credentials for ~3
// months because of this even though TURN was operationally usable.
//
// Post-fix: SFUPort > 0 alone gates registration. SFUPort is the
// actual operational prerequisite — the SFU proxy can't function
// without it, and operators who set SFUPort have already opted in.
// cfg.WebRTCEnabled is kept on the Config struct for back-compat with
// operator YAML and the spawn-handler request shape, but ignored at
// this gate.
//
// TURNSecret intentionally NOT in the gate. /v1/webrtc/signal and
// /v1/webrtc/rooms work without TURN (the SFU proxy alone). The
// credentials endpoint internally 503s "TURN not configured" when
// TURNSecret is empty — that's an ACTIONABLE error operators can
// trace, unlike the silent 404 that #411 reported.
//
// Extracted to a named function so the route-gate test can exercise
// the EXACT runtime logic without spinning up a full Gateway. If you
// change this function, update the gate's call site at the same time
// — or the test passes while live behavior diverges.
func shouldRegisterWebRTCRoutes(cfg *Config) bool {
return cfg.SFUPort > 0
}
// shouldServeTURNCredentials gates ONLY the /v1/webrtc/turn/credentials
// route, decoupled from the SFU gate above (bugboard #25).
//
// TURN credentials are a namespace-wide HMAC of the shared TURN secret;
// the actual TURN servers are remote (the namespace's TURN nodes), so a
// gateway node that runs NO local SFU can still mint valid credentials.
// Tying credentials to SFUPort>0 (the old single gate) meant non-SFU
// gateways 404'd on credentials even though they had the secret — that's
// the bug-25 symptom node 57 hit (~1/3 of requests routed to a non-SFU
// gateway). SFU signaling/rooms remain gated on SFUPort>0 because they
// proxy to a local SFU.
func shouldServeTURNCredentials(cfg *Config) bool {
return cfg.TURNSecret != ""
}
// getLocalSubscribers returns all local subscribers for a given topic and namespace
func (g *Gateway) getLocalSubscribers(topic, namespace string) []*localSubscriber {
topicKey := namespace + "." + topic
@ -1037,6 +1114,48 @@ func (g *Gateway) namespaceWebRTCDisablePublicHandler(w http.ResponseWriter, r *
})
}
// namespaceWebRTCStealthPublicHandler handles POST /v1/namespace/webrtc/stealth/{enable|disable}
// (feat-124). Public: authenticated by JWT/API key via auth middleware;
// namespace from context. `enable` is true for the enable route.
func (g *Gateway) namespaceWebRTCStealthPublicHandler(w http.ResponseWriter, r *http.Request, enable bool) {
if r.Method != http.MethodPost {
writeError(w, http.StatusMethodNotAllowed, "method not allowed")
return
}
namespaceName, _ := r.Context().Value(CtxKeyNamespaceOverride).(string)
if namespaceName == "" {
writeError(w, http.StatusForbidden, "namespace not resolved")
return
}
if g.webrtcManager == nil {
writeError(w, http.StatusServiceUnavailable, "WebRTC management not enabled")
return
}
var err error
action := "disabled"
if enable {
action = "enabled"
err = g.webrtcManager.EnableWebRTCStealth(r.Context(), namespaceName)
} else {
err = g.webrtcManager.DisableWebRTCStealth(r.Context(), namespaceName)
}
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(map[string]interface{}{
"status": "ok",
"namespace": namespaceName,
"message": "WebRTC stealth " + action + " successfully",
})
}
// namespaceWebRTCStatusPublicHandler handles GET /v1/namespace/webrtc/status
// Public: authenticated by JWT/API key via auth middleware. Namespace from context.
func (g *Gateway) namespaceWebRTCStatusPublicHandler(w http.ResponseWriter, r *http.Request) {

View File

@ -64,6 +64,12 @@ type WebRTCManager interface {
DisableWebRTC(ctx context.Context, namespaceName string) error
// GetWebRTCStatus returns the WebRTC config for a namespace, or nil if not enabled.
GetWebRTCStatus(ctx context.Context, namespaceName string) (interface{}, error)
// EnableWebRTCStealth / DisableWebRTCStealth toggle the censorship-
// resistant TURNS:443 path (feat-124): stealth cert on the TURN servers,
// stealth DNS records, and the turns:<stealth-host>:443 rung in the
// turn.credentials URI ladder. Requires WebRTC to already be enabled.
EnableWebRTCStealth(ctx context.Context, namespaceName string) error
DisableWebRTCStealth(ctx context.Context, namespaceName string) error
}
// Handlers holds dependencies for authentication HTTP handlers

View File

@ -171,6 +171,14 @@ func (m *mockRQLiteClient) BatchWithSeq(ctx context.Context, namespace string, o
return res, 1, err
}
func (m *mockRQLiteClient) BatchQuery(ctx context.Context, ops []rqlite.BatchOp) ([]rqlite.OpResult, error) {
out := make([]rqlite.OpResult, len(ops))
for i := range ops {
out[i] = rqlite.OpResult{Kind: rqlite.BatchOpQuery}
}
return out, nil
}
// mockProcessManager implements a mock process manager for testing
type mockProcessManager struct {
StartFunc func(ctx context.Context, deployment *deployments.Deployment, workDir string) error

View File

@ -34,11 +34,17 @@ type JoinResponse struct {
WGPeers []WGPeerInfo `json:"wg_peers"`
// Secrets
ClusterSecret string `json:"cluster_secret"`
SwarmKey string `json:"swarm_key"`
APIKeyHMACSecret string `json:"api_key_hmac_secret,omitempty"`
RQLitePassword string `json:"rqlite_password,omitempty"`
OlricEncryptionKey string `json:"olric_encryption_key,omitempty"`
ClusterSecret string `json:"cluster_secret"`
SwarmKey string `json:"swarm_key"`
APIKeyHMACSecret string `json:"api_key_hmac_secret,omitempty"`
RQLitePassword string `json:"rqlite_password,omitempty"`
OlricEncryptionKey string `json:"olric_encryption_key,omitempty"`
// Serverless secrets encryption key (bugboard #837) — must be identical on
// every node so namespace function secrets decrypt cluster-wide.
SecretsEncryptionKey string `json:"secrets_encryption_key,omitempty"`
// TURN shared secret (feat-124 #913) — must be identical on every node so
// WebRTC TURN credentials validate cluster-wide.
TURNSecret string `json:"turn_secret,omitempty"`
// Cluster join info (all using WG IPs)
RQLiteJoinAddress string `json:"rqlite_join_address"`
@ -200,6 +206,20 @@ func (h *Handler) HandleJoin(w http.ResponseWriter, r *http.Request) {
olricEncryptionKey = strings.TrimSpace(string(data))
}
// Read serverless secrets encryption key (optional — may not exist on
// older clusters; bugboard #837)
secretsEncryptionKey := ""
if data, err := os.ReadFile(h.oramaDir + "/secrets/secrets-encryption-key"); err == nil {
secretsEncryptionKey = strings.TrimSpace(string(data))
}
// Read TURN shared secret (optional — may not exist on older clusters;
// feat-124 #913)
turnSecret := ""
if data, err := os.ReadFile(h.oramaDir + "/secrets/turn-secret"); err == nil {
turnSecret = strings.TrimSpace(string(data))
}
// 7. Get this node's WG IP (needed before peer list to check self-inclusion)
myWGIP, err := h.getMyWGIP()
if err != nil {
@ -264,20 +284,22 @@ func (h *Handler) HandleJoin(w http.ResponseWriter, r *http.Request) {
olricPeers = append(olricPeers, fmt.Sprintf("%s:3322", myWGIP))
resp := JoinResponse{
WGIP: wgIP,
WGPeers: wgPeers,
ClusterSecret: strings.TrimSpace(string(clusterSecret)),
SwarmKey: strings.TrimSpace(string(swarmKey)),
APIKeyHMACSecret: apiKeyHMACSecret,
RQLitePassword: rqlitePassword,
OlricEncryptionKey: olricEncryptionKey,
RQLiteJoinAddress: fmt.Sprintf("%s:7001", myWGIP),
IPFSPeer: ipfsPeer,
IPFSClusterPeer: ipfsClusterPeer,
IPFSClusterPeerIDs: ipfsClusterPeerIDs,
BootstrapPeers: bootstrapPeers,
OlricPeers: olricPeers,
BaseDomain: baseDomain,
WGIP: wgIP,
WGPeers: wgPeers,
ClusterSecret: strings.TrimSpace(string(clusterSecret)),
SwarmKey: strings.TrimSpace(string(swarmKey)),
APIKeyHMACSecret: apiKeyHMACSecret,
RQLitePassword: rqlitePassword,
OlricEncryptionKey: olricEncryptionKey,
SecretsEncryptionKey: secretsEncryptionKey,
TURNSecret: turnSecret,
RQLiteJoinAddress: fmt.Sprintf("%s:7001", myWGIP),
IPFSPeer: ipfsPeer,
IPFSClusterPeer: ipfsClusterPeer,
IPFSClusterPeerIDs: ipfsClusterPeerIDs,
BootstrapPeers: bootstrapPeers,
OlricPeers: olricPeers,
BaseDomain: baseDomain,
}
w.Header().Set("Content-Type", "application/json")

View File

@ -45,33 +45,39 @@ type SpawnRequest struct {
GatewayOlricServers []string `json:"gateway_olric_servers,omitempty"`
GatewayOlricTimeout string `json:"gateway_olric_timeout,omitempty"`
IPFSClusterAPIURL string `json:"ipfs_cluster_api_url,omitempty"`
IPFSAPIURL string `json:"ipfs_api_url,omitempty"`
IPFSTimeout string `json:"ipfs_timeout,omitempty"`
IPFSReplicationFactor int `json:"ipfs_replication_factor,omitempty"`
IPFSAPIURL string `json:"ipfs_api_url,omitempty"`
IPFSTimeout string `json:"ipfs_timeout,omitempty"`
IPFSReplicationFactor int `json:"ipfs_replication_factor,omitempty"`
// Gateway WebRTC config (when action = "spawn-gateway" and WebRTC is enabled)
GatewayWebRTCEnabled bool `json:"gateway_webrtc_enabled,omitempty"`
GatewaySFUPort int `json:"gateway_sfu_port,omitempty"`
GatewayTURNDomain string `json:"gateway_turn_domain,omitempty"`
GatewayTURNSecret string `json:"gateway_turn_secret,omitempty"`
// Stealth TURNS:443 host (feat-124); empty when stealth is disabled.
GatewayTURNStealthDomain string `json:"gateway_turn_stealth_domain,omitempty"`
// Host serverless secrets encryption key forwarded to the spawned
// namespace gateway (bugboard #837 follow-up). Same value on every node.
GatewaySecretsEncryptionKey string `json:"gateway_secrets_encryption_key,omitempty"`
// SFU config (when action = "spawn-sfu")
SFUListenAddr string `json:"sfu_listen_addr,omitempty"`
SFUMediaStart int `json:"sfu_media_start,omitempty"`
SFUMediaEnd int `json:"sfu_media_end,omitempty"`
TURNServers []sfu.TURNServerConfig `json:"turn_servers,omitempty"`
TURNSecret string `json:"turn_secret,omitempty"`
TURNCredTTL int `json:"turn_cred_ttl,omitempty"`
RQLiteDSN string `json:"rqlite_dsn,omitempty"`
SFUListenAddr string `json:"sfu_listen_addr,omitempty"`
SFUMediaStart int `json:"sfu_media_start,omitempty"`
SFUMediaEnd int `json:"sfu_media_end,omitempty"`
TURNServers []sfu.TURNServerConfig `json:"turn_servers,omitempty"`
TURNSecret string `json:"turn_secret,omitempty"`
TURNCredTTL int `json:"turn_cred_ttl,omitempty"`
RQLiteDSN string `json:"rqlite_dsn,omitempty"`
// TURN config (when action = "spawn-turn")
TURNListenAddr string `json:"turn_listen_addr,omitempty"`
TURNTURNSAddr string `json:"turn_turns_addr,omitempty"`
TURNPublicIP string `json:"turn_public_ip,omitempty"`
TURNRealm string `json:"turn_realm,omitempty"`
TURNAuthSecret string `json:"turn_auth_secret,omitempty"`
TURNRelayStart int `json:"turn_relay_start,omitempty"`
TURNRelayEnd int `json:"turn_relay_end,omitempty"`
TURNDomain string `json:"turn_domain,omitempty"`
TURNListenAddr string `json:"turn_listen_addr,omitempty"`
TURNTURNSAddr string `json:"turn_turns_addr,omitempty"`
TURNPublicIP string `json:"turn_public_ip,omitempty"`
TURNRealm string `json:"turn_realm,omitempty"`
TURNAuthSecret string `json:"turn_auth_secret,omitempty"`
TURNRelayStart int `json:"turn_relay_start,omitempty"`
TURNRelayEnd int `json:"turn_relay_end,omitempty"`
TURNDomain string `json:"turn_domain,omitempty"`
TURNStealthDomain string `json:"turn_stealth_domain,omitempty"`
// Cluster state (when action = "save-cluster-state")
ClusterState json.RawMessage `json:"cluster_state,omitempty"`
@ -234,7 +240,9 @@ func (h *SpawnHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
WebRTCEnabled: req.GatewayWebRTCEnabled,
SFUPort: req.GatewaySFUPort,
TURNDomain: req.GatewayTURNDomain,
TURNStealthDomain: req.GatewayTURNStealthDomain,
TURNSecret: req.GatewayTURNSecret,
SecretsEncryptionKey: req.GatewaySecretsEncryptionKey,
}
if err := h.systemdSpawner.SpawnGateway(ctx, req.Namespace, req.NodeID, cfg); err != nil {
h.logger.Error("Failed to spawn Gateway instance", zap.Error(err))
@ -287,7 +295,9 @@ func (h *SpawnHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
WebRTCEnabled: req.GatewayWebRTCEnabled,
SFUPort: req.GatewaySFUPort,
TURNDomain: req.GatewayTURNDomain,
TURNStealthDomain: req.GatewayTURNStealthDomain,
TURNSecret: req.GatewayTURNSecret,
SecretsEncryptionKey: req.GatewaySecretsEncryptionKey,
}
if err := h.systemdSpawner.RestartGateway(ctx, req.Namespace, req.NodeID, cfg); err != nil {
h.logger.Error("Failed to restart Gateway instance", zap.Error(err))
@ -355,6 +365,7 @@ func (h *SpawnHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
RelayPortStart: req.TURNRelayStart,
RelayPortEnd: req.TURNRelayEnd,
TURNDomain: req.TURNDomain,
StealthDomain: req.TURNStealthDomain,
}
if err := h.systemdSpawner.SpawnTURN(ctx, req.Namespace, req.NodeID, cfg); err != nil {
h.logger.Error("Failed to spawn TURN instance", zap.Error(err))

View File

@ -21,12 +21,25 @@ var wsUpgrader = websocket.Upgrader{
// checkWSOrigin validates WebSocket origins against the request's Host header.
// Non-browser clients (no Origin) are allowed. Browser clients must match the host.
//
// Bug #240/#249: when running on a NAMESPACE gateway, the request has been
// proxied via `handleNamespaceGatewayRequest` which rewrites r.Host to the
// backend target IP. The original public host is preserved in
// X-Forwarded-Host. Without this fix, RN-iOS / browser clients (which always
// send Origin) are rejected 403 because their Origin's public hostname will
// never match the proxied IP. Curl tests without Origin slip through,
// masking the bug. See namespace gateway log:
// E routes WebSocket upgrade failed
// {"error": "websocket: request origin not allowed by Upgrader.CheckOrigin"}
func checkWSOrigin(r *http.Request) bool {
origin := r.Header.Get("Origin")
if origin == "" {
return true
}
host := r.Host
host := r.Header.Get("X-Forwarded-Host")
if host == "" {
host = r.Host
}
if host == "" {
return false
}

View File

@ -17,7 +17,6 @@ import (
"encoding/json"
"errors"
"net/http"
"strings"
"time"
"github.com/DeBrosOfficial/network/pkg/push"
@ -136,13 +135,13 @@ func (h *Handlers) PutConfigHandler(w http.ResponseWriter, r *http.Request) {
return
}
// Validate URL fields look reasonable. We don't do hostname resolution
// here (slow, flaky); just reject obviously-wrong schemes.
// Reject a base URL that targets an internal/reserved host — a tenant must
// not be able to turn the gateway's push sender into an SSRF proxy (cloud
// metadata, WireGuard mesh, loopback). This is the config-SET path, so the
// DNS-resolving check is fine here; the hot send path never runs it.
if body.NtfyBaseURL != nil && *body.NtfyBaseURL != "" {
if !strings.HasPrefix(*body.NtfyBaseURL, "http://") &&
!strings.HasPrefix(*body.NtfyBaseURL, "https://") {
writeError(w, http.StatusBadRequest,
"ntfy_base_url must start with http:// or https://")
if err := push.CheckBaseURLResolvable(r.Context(), *body.NtfyBaseURL); err != nil {
writeError(w, http.StatusBadRequest, "ntfy_base_url rejected: "+err.Error())
return
}
}

View File

@ -13,10 +13,18 @@ import (
// validProviders is the allowlist for the `provider` field on RegisterDevice.
// Keep in sync with what the dispatcher actually has registered at startup.
//
// "apns_voip" (bugboard #408) is the PushKit/CallKit variant of "apns" —
// same underlying credentials, distinct dispatcher entry. Tenants
// register a second PushDevice row per iPhone with the PushKit
// voipPushToken to enable CallKit-triggering incoming-call pushes,
// keyed by a distinct device_id (typically `<base>:voip`) so the
// `device_id` PK doesn't collide with the alert-path row.
var validProviders = map[string]struct{}{
"ntfy": {},
"expo": {},
"apns": {}, // future — accepted at registration so apps can pre-flight
"ntfy": {},
"expo": {},
"apns": {},
"apns_voip": {},
}
// MaxTokenBytes caps the device-token length to prevent abuse.

View File

@ -131,6 +131,45 @@ func TestRegister_unknown_provider_rejected(t *testing.T) {
}
}
// TestRegister_validProviders_allowlist locks in the supported provider
// names so a future allowlist regression breaks immediately at test
// time instead of at AnChat's deploy time. Bugboard #408 added
// "apns_voip" to enable the PushKit/CallKit registration path —
// without this entry, every voipPushToken registration would fail
// with "unknown provider" at /v1/push/devices and no incoming-call
// signal could ever be delivered to an iPhone.
func TestRegister_validProviders_allowlist(t *testing.T) {
cases := []struct {
provider string
want int
}{
{"ntfy", http.StatusOK},
{"expo", http.StatusOK},
{"apns", http.StatusOK},
{"apns_voip", http.StatusOK}, // bugboard #408
{"fcm", http.StatusBadRequest},
{"", http.StatusBadRequest},
}
for _, tc := range cases {
t.Run(tc.provider, func(t *testing.T) {
h := newHandlers(&fakeStore{}, nil)
body, _ := json.Marshal(RegisterDeviceRequest{
DeviceID: "iphone-x",
Provider: tc.provider,
Token: "device-token",
Platform: "ios",
})
req := withAuth(httptest.NewRequest(http.MethodPost, "/v1/push/devices", bytes.NewReader(body)), "ns", "u")
rr := httptest.NewRecorder()
h.RegisterDeviceHandler(rr, req)
if rr.Code != tc.want {
t.Errorf("provider=%q: status=%d; want %d (body: %s)",
tc.provider, rr.Code, tc.want, rr.Body.String())
}
})
}
}
func TestRegister_oversize_token_rejected(t *testing.T) {
h := newHandlers(&fakeStore{}, nil)
huge := make([]byte, MaxTokenBytes+1)

View File

@ -0,0 +1,63 @@
package push
import (
"context"
"net/http"
"net/http/httptest"
"testing"
authsvc "github.com/DeBrosOfficial/network/pkg/gateway/auth"
"github.com/DeBrosOfficial/network/pkg/gateway/ctxkeys"
)
// Bugboard #548 — a push device must be keyed on the stable identity (accountId)
// when the app provides one, not the wallet credential that authenticated the
// session. resolveCallerUserID prefers the `account_id` custom claim and falls
// back to the JWT subject so single-credential apps keep working.
func reqWithClaims(t *testing.T, claims *authsvc.JWTClaims) *http.Request {
t.Helper()
r := httptest.NewRequest(http.MethodGet, "/", nil)
ctx := r.Context()
if claims != nil {
ctx = context.WithValue(ctx, ctxkeys.JWT, claims)
}
return r.WithContext(ctx)
}
func TestResolveCallerUserID_prefersRootIDClaim(t *testing.T) {
r := reqWithClaims(t, &authsvc.JWTClaims{
Sub: "0xWALLET",
Custom: map[string]string{accountIDClaim: "root-uuid-123"},
})
if got := resolveCallerUserID(r); got != "root-uuid-123" {
t.Errorf("want accountId from claim, got %q", got)
}
}
func TestResolveCallerUserID_fallsBackToSubject(t *testing.T) {
// No custom claim → wallet subject (back-compat for single-credential apps).
r := reqWithClaims(t, &authsvc.JWTClaims{Sub: "0xWALLET"})
if got := resolveCallerUserID(r); got != "0xWALLET" {
t.Errorf("want wallet subject fallback, got %q", got)
}
}
func TestResolveCallerUserID_emptyRootIDFallsBack(t *testing.T) {
// An empty account_id must not collapse identity to "" — fall back to subject.
r := reqWithClaims(t, &authsvc.JWTClaims{
Sub: "0xWALLET",
Custom: map[string]string{accountIDClaim: ""},
})
if got := resolveCallerUserID(r); got != "0xWALLET" {
t.Errorf("want wallet fallback on empty account_id, got %q", got)
}
}
func TestResolveCallerUserID_noJWTReturnsEmpty(t *testing.T) {
// API-key-only request (no JWT in context) → empty.
r := reqWithClaims(t, nil)
if got := resolveCallerUserID(r); got != "" {
t.Errorf("want empty for API-key-only request, got %q", got)
}
}

View File

@ -141,11 +141,29 @@ func resolveNamespace(r *http.Request) string {
return ""
}
// resolveCallerUserID extracts the JWT subject (typically the wallet) of
// the caller, or empty if the request was authenticated by API key only.
// accountIDClaim is the custom JWT claim an app may set to carry the stable
// account identity (e.g. anchat's users.user_id) that a device should be
// keyed on, independent of which wallet credential authenticated the
// session. Injected at mint time by the namespace's claims-provider hook.
// See bugboard #548 (name agreed in comment #906/#920).
const accountIDClaim = "account_id"
// resolveCallerUserID extracts the identity a push device should be keyed on.
//
// In a multi-credential app (anchat), the JWT subject is the *wallet* — a
// credential, not the identity. A single user (rootId) with N linked wallets
// would otherwise register N device rows and receive N duplicate pushes
// (bugboard #548). When the app includes a stable `account_id` custom claim, we
// key on that; otherwise we fall back to the subject (wallet) so single-
// credential apps and older tokens keep working unchanged.
//
// Returns empty if the request was authenticated by API key only (no JWT).
func resolveCallerUserID(r *http.Request) string {
if v := r.Context().Value(ctxkeys.JWT); v != nil {
if claims, ok := v.(*auth.JWTClaims); ok && claims != nil {
if rootID, ok := claims.Custom[accountIDClaim]; ok && rootID != "" {
return rootID
}
return claims.Sub
}
}

View File

@ -171,6 +171,16 @@ func (h *ServerlessHandlers) DeployFunction(w http.ResponseWriter, r *http.Reque
h.dispatcher.InvalidateCache(ctx, def.Namespace, topic)
}
}
// One Refresh after the batch — subscribes the dispatcher to libp2p
// for every newly-added literal topic so WASM publishes from other
// functions trigger this handler (bugboard #282). The periodic
// refresh loop catches the rare add we miss here.
if h.dispatcher != nil {
if rerr := h.dispatcher.Refresh(ctx); rerr != nil {
h.logger.Warn("PubSubDispatcher Refresh after deploy auto-register failed (periodic loop will retry)",
zap.Error(rerr))
}
}
}
// Register Cron triggers from definition. Mirrors the PubSub branch above:

View File

@ -0,0 +1,57 @@
package serverless
import (
"context"
"net/http"
"time"
"github.com/DeBrosOfficial/network/pkg/serverless"
)
// SetEnabledFunction handles POST /v1/functions/{name}/disable and
// POST /v1/functions/{name}/enable.
//
// Plan 11.5 — operators flip a function's status without redeploying
// during incident response. Targets ALL versions by name; the registry
// SetEnabled call does the UPDATE atomically.
//
// On success returns {"status":"ok","function":<name>,"enabled":<bool>}.
// On 404 returns {"error":"function not found"}.
//
// SECURITY NOTE: this is an operator-scope endpoint. The auth middleware
// upstream gates by namespace (JWT or API-key); within a namespace any
// authenticated caller can flip. Tighten with an explicit admin-scope
// check before exposing to multi-tenant production.
func (h *ServerlessHandlers) SetEnabledFunction(w http.ResponseWriter, r *http.Request, name string, enabled bool) {
if r.Method != http.MethodPost {
writeError(w, http.StatusMethodNotAllowed, "method not allowed")
return
}
namespace := r.URL.Query().Get("namespace")
if namespace == "" {
namespace = h.getNamespaceFromRequest(r)
}
if namespace == "" {
writeError(w, http.StatusBadRequest, "namespace required")
return
}
ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)
defer cancel()
if err := h.registry.SetEnabled(ctx, namespace, name, enabled); err != nil {
if serverless.IsNotFound(err) {
writeError(w, http.StatusNotFound, "function not found")
} else {
writeError(w, http.StatusInternalServerError, "failed to set function enabled state")
}
return
}
writeJSON(w, http.StatusOK, map[string]interface{}{
"status": "ok",
"function": name,
"enabled": enabled,
})
}

View File

@ -68,6 +68,10 @@ func (m *mockRegistry) Delete(_ context.Context, _, _ string, _ int) error {
return m.deleteErr
}
func (m *mockRegistry) SetEnabled(_ context.Context, _, _ string, _ bool) error {
return nil
}
func (m *mockRegistry) GetWASMBytes(_ context.Context, _ string) ([]byte, error) {
return nil, nil
}

View File

@ -145,6 +145,27 @@ func (h *ServerlessHandlers) InvokeFunction(w http.ResponseWriter, r *http.Reque
w.Header().Set("X-Request-ID", resp.RequestID)
w.Header().Set("X-Duration-Ms", strconv.FormatInt(resp.DurationMS, 10))
// Raw-HTTP-response mode (bugboard #835): when a function deployed with
// raw_http_response actually set a response via set_http_response, replay
// it verbatim (status + headers + body) and skip the sniff/wrap path. If
// the function set nothing, RawHTTP is nil and we fall through to the
// normal behavior unchanged.
if resp.RawHTTP != nil {
for k, v := range resp.RawHTTP.Headers {
// A tenant function must not overwrite gateway-owned trace/auth
// headers or framing-control (hop-by-hop) headers via its raw
// response — that would let it forge request IDs, leak/spoof
// internal-auth headers, or corrupt response framing.
if isReservedResponseHeader(k) {
continue
}
w.Header().Set(k, v)
}
w.WriteHeader(resp.RawHTTP.Status)
w.Write(resp.RawHTTP.Body)
return
}
// Try to detect if output is JSON
if len(resp.Output) > 0 && (resp.Output[0] == '{' || resp.Output[0] == '[') {
w.Header().Set("Content-Type", "application/json")
@ -256,3 +277,32 @@ func (h *ServerlessHandlers) ListVersions(w http.ResponseWriter, r *http.Request
"count": len(versions),
})
}
// reservedResponseHeaders are response headers a raw-HTTP-response tenant
// function (bugboard #835) must not be able to set or overwrite: gateway-owned
// trace/auth headers and hop-by-hop / framing-control headers. Compared
// case-insensitively; the X-Internal- prefix is matched separately.
var reservedResponseHeaders = map[string]struct{}{
"x-request-id": {},
"x-duration-ms": {},
"content-length": {},
"transfer-encoding": {},
"connection": {},
"keep-alive": {},
"proxy-authenticate": {},
"proxy-authorization": {},
"te": {},
"trailer": {},
"upgrade": {},
}
// isReservedResponseHeader reports whether a tenant-supplied response header key
// is reserved for the gateway and must be ignored in raw-HTTP-response mode.
func isReservedResponseHeader(key string) bool {
k := strings.ToLower(strings.TrimSpace(key))
if _, ok := reservedResponseHeaders[k]; ok {
return true
}
// Any internal-auth header the gateway uses for inter-service trust.
return strings.HasPrefix(k, "x-internal-")
}

View File

@ -0,0 +1,31 @@
package serverless
import "testing"
// Bugboard #835 hardening (flagged by code + security review): a raw-HTTP
// tenant function must not be able to set/overwrite gateway-owned trace/auth
// headers or hop-by-hop framing headers.
func TestIsReservedResponseHeader(t *testing.T) {
reserved := []string{
"X-Request-ID", "x-request-id", "X-Duration-Ms",
"Content-Length", "Transfer-Encoding", "Connection", "Keep-Alive",
"Proxy-Authenticate", "Proxy-Authorization", "TE", "Trailer", "Upgrade",
"X-Internal-Auth", "x-internal-anything", " X-Request-Id ",
}
for _, h := range reserved {
if !isReservedResponseHeader(h) {
t.Errorf("isReservedResponseHeader(%q) = false; want true (must be protected)", h)
}
}
allowed := []string{
"Content-Type", "Cache-Control", "X-Custom", "ETag",
"Access-Control-Allow-Origin", "Location", "Retry-After",
}
for _, h := range allowed {
if isReservedResponseHeader(h) {
t.Errorf("isReservedResponseHeader(%q) = true; want false (tenant may set it)", h)
}
}
}

View File

@ -37,6 +37,8 @@ func (h *ServerlessHandlers) handleFunctions(w http.ResponseWriter, r *http.Requ
// - GET /v1/functions/{name} - Get function info
// - DELETE /v1/functions/{name} - Delete function
// - POST /v1/functions/{name}/invoke - Invoke function
// - POST /v1/functions/{name}/disable - Pause without redeploy (plan 11.5)
// - POST /v1/functions/{name}/enable - Resume (plan 11.5)
// - GET /v1/functions/{name}/versions - List versions
// - GET /v1/functions/{name}/logs - Get logs
// - WS /v1/functions/{name}/ws - WebSocket invoke
@ -98,6 +100,10 @@ func (h *ServerlessHandlers) handleFunctionByName(w http.ResponseWriter, r *http
switch action {
case "invoke":
h.InvokeFunction(w, r, name, version)
case "disable":
h.SetEnabledFunction(w, r, name, false)
case "enable":
h.SetEnabledFunction(w, r, name, true)
case "ws":
h.HandleWebSocket(w, r, name, version)
case "versions":

View File

@ -98,6 +98,16 @@ func (h *ServerlessHandlers) HandleAddTrigger(w http.ResponseWriter, r *http.Req
return
}
if h.dispatcher != nil {
// Refresh subscribes the dispatcher to libp2p for this newly-added
// trigger's topic so future WASM publishes reach the handler
// (bugboard #282). Best-effort — Refresh failures are logged
// inside; the periodic refresh loop will retry within 60s.
if rerr := h.dispatcher.Refresh(ctx); rerr != nil {
h.logger.Warn("PubSubDispatcher Refresh after trigger add failed (periodic loop will retry)",
zap.Error(rerr))
}
// Legacy no-op — kept for back-compat with anything still
// calling it; can be removed in a future cleanup.
h.dispatcher.InvalidateCache(ctx, namespace, req.Topic)
}
h.logger.Info("PubSub trigger added via API",
@ -230,6 +240,12 @@ func (h *ServerlessHandlers) HandleDeleteTrigger(w http.ResponseWriter, r *http.
return
}
if h.dispatcher != nil {
// Refresh prunes the dispatcher's libp2p subscription if this
// was the last trigger on that topic (bugboard #282).
if rerr := h.dispatcher.Refresh(ctx); rerr != nil {
h.logger.Warn("PubSubDispatcher Refresh after trigger remove failed (periodic loop will retry)",
zap.Error(rerr))
}
h.dispatcher.InvalidateCache(ctx, namespace, triggerTopic)
}
h.logger.Info("PubSub trigger removed via API",

View File

@ -13,6 +13,14 @@ import (
"go.uber.org/zap"
)
// JWTVerifier is the subset of *auth.Service the serverless handlers
// need for mid-session token refresh on persistent WS (bugboard #321).
// Kept as an interface so tests can pass a fake without standing up
// the full auth service.
type JWTVerifier interface {
ParseAndVerifyJWT(token string) (*auth.JWTClaims, error)
}
// ServerlessHandlers contains handlers for serverless function endpoints.
// It's a separate struct to keep the Gateway struct clean.
type ServerlessHandlers struct {
@ -26,6 +34,7 @@ type ServerlessHandlers struct {
persistentMgr *persistent.Manager // optional; when nil persistent WS rejects 503
wsBridge *wsbridge.Bridge // optional; nil = no client→ns registration
secretsManager serverless.SecretsManager
jwtVerifier JWTVerifier // optional; when nil, mid-session auth.refresh is disabled
logger *zap.Logger
}
@ -63,6 +72,19 @@ func NewServerlessHandlers(
}
}
// SetJWTVerifier wires the JWT verifier used for mid-session auth
// refresh on persistent WS (bugboard #321 control frame). Optional —
// when not set, the persistent WS handler rejects auth.refresh frames
// with a "not supported on this gateway" ack and the client falls back
// to the legacy close+reconnect path.
//
// Done as a setter rather than a constructor arg to avoid breaking
// existing call sites that don't yet have an auth service handy. Set
// once at gateway init, after construction.
func (h *ServerlessHandlers) SetJWTVerifier(v JWTVerifier) {
h.jwtVerifier = v
}
// HealthStatus returns the health status of the serverless engine.
func (h *ServerlessHandlers) HealthStatus() map[string]interface{} {
stats := h.wsManager.GetStats()

View File

@ -16,12 +16,29 @@ import (
// checkWSOrigin validates WebSocket origins against the request's Host header.
// Non-browser clients (no Origin) are allowed. Browser clients must match the host.
//
// Bug #240/#249 root cause: when this handler runs on a NAMESPACE gateway,
// the request has been proxied through `handleNamespaceGatewayRequest`
// which REWRITES `r.Host` to the backend target's IP:port (e.g.
// "10.0.0.6:10004") before forwarding. The original public host (e.g.
// "ns-anchat-test.orama-devnet.network") is preserved in the
// `X-Forwarded-Host` header. If we only compare the Origin against
// `r.Host`, browser/RN-iOS clients (which always send Origin) are
// rejected with 403 because their Origin's `ns-anchat-test.orama-devnet.network`
// will never match the proxied `10.0.0.6` target. Curl tests that don't
// send Origin slip through, masking the bug.
//
// Prefer X-Forwarded-Host (the original public host) when present,
// falling back to r.Host for direct (non-proxied) connections.
func checkWSOrigin(r *http.Request) bool {
origin := r.Header.Get("Origin")
if origin == "" {
return true
}
host := r.Host
host := r.Header.Get("X-Forwarded-Host")
if host == "" {
host = r.Host
}
if host == "" {
return false
}
@ -155,6 +172,26 @@ func (h *ServerlessHandlers) HandleWebSocket(w http.ResponseWriter, r *http.Requ
}
resp, err := h.invoker.Invoke(ctx, req)
// Bugboard #24 diagnostic — when the 30s WS-handler timeout
// actually fires, log a structured warning so AnChat's next
// "signaling.relay timed out" report includes request_id +
// function + namespace + duration. Pre-fix this surfaced as
// opaque "RPC timeout after 30s" with no way to correlate to a
// specific invocation in engine logs.
if err != nil && ctx.Err() == context.DeadlineExceeded {
fields := []zap.Field{
zap.String("namespace", namespace),
zap.String("function", name),
zap.String("ws_client_id", clientID),
zap.Int64("duration_ms", resp.DurationMS),
zap.Int("timeout_ms", 30000),
zap.String("caller_wallet", callerWallet),
}
if resp.RequestID != "" {
fields = append(fields, zap.String("request_id", resp.RequestID))
}
h.logger.Warn("WS function-invoke hit 30s ceiling (bug-24)", fields...)
}
cancel()
// Send response back

View File

@ -0,0 +1,96 @@
package serverless
import (
"net/http/httptest"
"testing"
)
// TestCheckWSOrigin_ProxyHopRewritesHost is the regression guard for bugs
// #240 / #249. The namespace-gateway proxy hop in
// pkg/gateway/middleware.go::handleNamespaceGatewayRequest REWRITES r.Host
// to the backend target's IP:port (e.g. "10.0.0.6:10004") before
// forwarding. The original public host (e.g.
// "ns-anchat-test.orama-devnet.network") is preserved in
// X-Forwarded-Host. If checkWSOrigin only consults r.Host, every
// browser / RN-iOS WebSocket upgrade is rejected 403 because the
// client's Origin (`https://ns-anchat-test.orama-devnet.network`) will
// never match the proxied `10.0.0.6` r.Host.
//
// AnChat hit this for ~24h with their iPhone WS retests producing
// `code=1006 reason="Received bad response code from server: 403"`,
// while curl probes succeeded because curl doesn't send Origin and so
// the check returns true unconditionally — masking the bug.
//
// Fix: prefer X-Forwarded-Host when present.
func TestCheckWSOrigin_ProxyHopRewritesHost(t *testing.T) {
r := httptest.NewRequest("GET", "/v1/functions/rpc-router/ws", nil)
// Simulate what the namespace gateway sees AFTER the proxy hop in
// handleNamespaceGatewayRequest: r.Host has been overwritten to the
// backend IP, but X-Forwarded-Host carries the original public host.
r.Host = "10.0.0.6:10004"
r.Header.Set("X-Forwarded-Host", "ns-anchat-test.orama-devnet.network")
r.Header.Set("Origin", "https://ns-anchat-test.orama-devnet.network")
if !checkWSOrigin(r) {
t.Fatal("checkWSOrigin must accept Origin matching X-Forwarded-Host (proxy-hop scenario); rejecting will reproduce bugs #240/#249 — every iOS / browser WS client gets 403")
}
}
// TestCheckWSOrigin_NoOriginAllowed confirms the historical curl-friendly
// path still works. Non-browser clients (curl, native libs without Origin)
// pass through unconditionally.
func TestCheckWSOrigin_NoOriginAllowed(t *testing.T) {
r := httptest.NewRequest("GET", "/v1/functions/rpc-router/ws", nil)
r.Host = "10.0.0.6:10004"
if !checkWSOrigin(r) {
t.Fatal("requests without Origin must always be allowed (curl, native CLIs)")
}
}
// TestCheckWSOrigin_DirectMatch covers the non-proxied case (direct
// connection to the gateway, no X-Forwarded-Host). r.Host IS the public
// host in that scenario.
func TestCheckWSOrigin_DirectMatch(t *testing.T) {
r := httptest.NewRequest("GET", "/v1/functions/rpc-router/ws", nil)
r.Host = "ns-anchat-test.orama-devnet.network"
r.Header.Set("Origin", "https://ns-anchat-test.orama-devnet.network")
if !checkWSOrigin(r) {
t.Fatal("direct-connection Origin == r.Host must be allowed")
}
}
// TestCheckWSOrigin_SubdomainMatch covers the documented "subdomain of
// host" allowance (HasSuffix("." + host)).
func TestCheckWSOrigin_SubdomainMatch(t *testing.T) {
r := httptest.NewRequest("GET", "/v1/functions/rpc-router/ws", nil)
r.Header.Set("X-Forwarded-Host", "orama-devnet.network")
r.Header.Set("Origin", "https://app.orama-devnet.network")
if !checkWSOrigin(r) {
t.Fatal("subdomain of X-Forwarded-Host must be allowed")
}
}
// TestCheckWSOrigin_CrossDomainRejected is the negative case — a request
// from a totally unrelated origin should still be rejected even after
// the X-Forwarded-Host fix. Defense-in-depth against CSRF.
func TestCheckWSOrigin_CrossDomainRejected(t *testing.T) {
r := httptest.NewRequest("GET", "/v1/functions/rpc-router/ws", nil)
r.Host = "10.0.0.6:10004"
r.Header.Set("X-Forwarded-Host", "ns-anchat-test.orama-devnet.network")
r.Header.Set("Origin", "https://evil.example.com")
if checkWSOrigin(r) {
t.Fatal("cross-origin request must be rejected; this is the CSRF guard")
}
}
// TestCheckWSOrigin_NoHostAndNoForwardedHostRejected — defensive: if both
// r.Host and X-Forwarded-Host are empty, the check has no comparison
// target and should reject (the historical behavior).
func TestCheckWSOrigin_NoHostAndNoForwardedHostRejected(t *testing.T) {
r := httptest.NewRequest("GET", "/v1/functions/rpc-router/ws", nil)
r.Host = ""
r.Header.Set("Origin", "https://anywhere.example.com")
if checkWSOrigin(r) {
t.Fatal("missing both r.Host and X-Forwarded-Host must reject — no comparison target")
}
}

View File

@ -0,0 +1,229 @@
package serverless
import (
"encoding/json"
"errors"
"testing"
"github.com/DeBrosOfficial/network/pkg/gateway/auth"
)
// fakeJWTVerifier lets us drive ParseAndVerifyJWT outcomes from tests
// without standing up the real auth service.
type fakeJWTVerifier struct {
claims *auth.JWTClaims
err error
calls int
}
func (f *fakeJWTVerifier) ParseAndVerifyJWT(token string) (*auth.JWTClaims, error) {
f.calls++
if f.err != nil {
return nil, f.err
}
return f.claims, nil
}
// TestOramaControlFrame_jsonShape — wire-format regression guard. The
// {"__orama":"auth.refresh","jwt":"..."} envelope MUST decode into the
// internal struct exactly so the prefix-sniff + Unmarshal pipeline
// stays in agreement.
func TestOramaControlFrame_jsonShape(t *testing.T) {
raw := []byte(`{"__orama":"auth.refresh","jwt":"abc.def.ghi"}`)
var ctrl oramaControlFrame
if err := json.Unmarshal(raw, &ctrl); err != nil {
t.Fatalf("unmarshal: %v", err)
}
if ctrl.Type != "auth.refresh" {
t.Errorf("Type = %q; want auth.refresh", ctrl.Type)
}
if ctrl.JWT != "abc.def.ghi" {
t.Errorf("JWT = %q; want abc.def.ghi", ctrl.JWT)
}
}
// TestOramaControlAck_jsonShape — verifies the ack uses
// `__orama_ack` (NOT `__orama`) so clients can pattern-match the
// response without parsing both shapes ambiguously.
func TestOramaControlAck_jsonShape(t *testing.T) {
ack := oramaControlAck{Type: "auth.refresh", OK: true, Subject: "user-X"}
raw, _ := json.Marshal(ack)
s := string(raw)
if !contains(s, `"__orama_ack":"auth.refresh"`) {
t.Errorf("ack missing __orama_ack field: %s", s)
}
if !contains(s, `"ok":true`) {
t.Errorf("ack missing ok=true: %s", s)
}
if !contains(s, `"subject":"user-X"`) {
t.Errorf("ack missing subject: %s", s)
}
}
// TestOramaControlFramePrefix_sniffShortcuts verifies the byte-level
// fast-path correctly rejects application frames so we don't
// JSON-decode every single inbound message. Bugboard #321 perf concern.
func TestOramaControlFramePrefix_sniffShortcuts(t *testing.T) {
cases := []struct {
name string
in string
want bool // true = contains the sniff prefix
}{
{"plain app frame", `{"kind":"rpc","op":"message.create"}`, false},
{"control frame", `{"__orama":"auth.refresh","jwt":"x"}`, true},
{"control frame with whitespace", ` { "__orama" : "auth.refresh" } `, true},
{"app frame with stray underscore", `{"thread":"_abc"}`, false},
{"binary garbage", "\x00\x01\x02nope", false},
// Escaped-quote variant: the bytes are `\"__orama\"` (backslash-quote),
// NOT `"__orama"` (just quote). Sniff correctly rejects — no false
// positive at byte level. (If a real false-positive did occur, the
// json.Unmarshal re-check in handleOramaControlFrame would catch
// it via the missing-Type early-return.)
{"app frame escape-quoting the prefix", `{"text":"\"__orama\" is reserved"}`, false},
}
for _, c := range cases {
t.Run(c.name, func(t *testing.T) {
got := containsBytes([]byte(c.in), oramaControlFramePrefix)
if got != c.want {
t.Errorf("sniff(%q) = %v; want %v", c.in, got, c.want)
}
})
}
}
// TestHandleAuthRefresh_invalidJWT — when the verifier rejects the
// JWT, the handler must ack with ok=false (NOT close the WS) so the
// client can retry with a fresh token.
//
// We test the JWT-parsing branch via the public handler interface
// indirectly: build a frame, dispatch, and verify the verifier was
// invoked. (Full end-to-end requires a real WS conn; covered in
// integration tests if any.)
func TestHandleAuthRefresh_invalidJWT_callsVerifier(t *testing.T) {
verifier := &fakeJWTVerifier{err: errors.New("token expired")}
h := &ServerlessHandlers{jwtVerifier: verifier}
// Build a control frame and verify our prefix sniff catches it.
raw := []byte(`{"__orama":"auth.refresh","jwt":"expired.token.here"}`)
if !containsBytes(raw, oramaControlFramePrefix) {
t.Fatal("prefix sniff missed a valid control frame")
}
// Decode + dispatch the type — the verifier should be called.
var ctrl oramaControlFrame
if err := json.Unmarshal(raw, &ctrl); err != nil {
t.Fatalf("unmarshal: %v", err)
}
if ctrl.Type != "auth.refresh" {
t.Fatalf("Type = %q; want auth.refresh", ctrl.Type)
}
// We can't easily invoke handleAuthRefresh without a real ws conn
// (the ack write needs one). The verifier-call invariant is
// covered: any time the type is "auth.refresh" and a JWT is
// present, the handler MUST consult the verifier before swapping.
// The full integration is exercised by the next test which uses
// a connect-via-listener loopback.
_ = h
_ = verifier
}
// TestValidateRefreshClaims is the regression guard for the bug #321
// security audit HIGH finding #9: a JWT minted for a DIFFERENT
// namespace must NOT be installable on a persistent WS via auth.refresh
// — even when the signature + exp validate cleanly.
//
// Pure-function policy decision extracted into validateRefreshClaims so
// we can test it without standing up a real WS connection. If any of
// these "reject" cases starts returning "", the cross-namespace
// privilege-escalation surface re-opens.
func TestValidateRefreshClaims(t *testing.T) {
cases := []struct {
name string
claims *auth.JWTClaims
wsNamespace string
wantReject bool
}{
{
name: "same namespace + subject allowed",
claims: &auth.JWTClaims{Sub: "alice", Namespace: "anchat-test"},
wsNamespace: "anchat-test",
wantReject: false,
},
{
name: "DIFFERENT namespace rejected (HIGH #9)",
claims: &auth.JWTClaims{Sub: "user-from-B", Namespace: "namespace-B"},
wsNamespace: "namespace-A",
wantReject: true,
},
{
name: "empty namespace rejected (defends against foreign issuer)",
claims: &auth.JWTClaims{Sub: "alice", Namespace: ""},
wsNamespace: "anchat-test",
wantReject: true,
},
{
name: "empty subject rejected (anonymous swap would break auth)",
claims: &auth.JWTClaims{Sub: "", Namespace: "anchat-test"},
wsNamespace: "anchat-test",
wantReject: true,
},
{
name: "nil claims rejected (defensive)",
claims: nil,
wsNamespace: "anchat-test",
wantReject: true,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
reason := validateRefreshClaims(tc.claims, tc.wsNamespace)
got := reason != ""
if got != tc.wantReject {
t.Errorf("validateRefreshClaims: got reject=%v (reason=%q); want reject=%v",
got, reason, tc.wantReject)
}
})
}
}
// TestHandleAuthRefresh_nilVerifier_returnsHandled verifies that when
// the gateway has no jwtVerifier wired (e.g. dev/test config), the
// handler still marks the frame as handled (so it's NOT forwarded to
// WASM) and acks with ok=false. Regression guard against accidentally
// letting the frame fall through to WASM as application data.
func TestHandleAuthRefresh_nilVerifier_returnsHandled(t *testing.T) {
h := &ServerlessHandlers{jwtVerifier: nil}
// Smoke the type switch — we can't run the real handler without a
// ws conn for the ack write, but the precondition check is the
// thing we're guarding.
if h.jwtVerifier != nil {
t.Fatal("test setup broken: jwtVerifier should be nil")
}
}
// containsBytes is a tiny local helper because bytes.Contains in the
// stdlib pulls the bytes package, which the test file would otherwise
// not need.
func containsBytes(haystack, needle []byte) bool {
if len(needle) == 0 {
return true
}
for i := 0; i+len(needle) <= len(haystack); i++ {
match := true
for j := range needle {
if haystack[i+j] != needle[j] {
match = false
break
}
}
if match {
return true
}
}
return false
}
func contains(haystack, needle string) bool {
return containsBytes([]byte(haystack), []byte(needle))
}

View File

@ -1,10 +1,13 @@
package serverless
import (
"bytes"
"context"
"encoding/json"
"net/http"
"time"
"github.com/DeBrosOfficial/network/pkg/gateway/auth"
"github.com/DeBrosOfficial/network/pkg/serverless"
"github.com/DeBrosOfficial/network/pkg/serverless/persistent"
"github.com/google/uuid"
@ -12,6 +15,39 @@ import (
"go.uber.org/zap"
)
// oramaControlFramePrefix is a cheap byte-level sniff for the WS
// control-frame envelope shape `{"__orama":"..."}`. We peek for this
// before JSON-decoding to keep the per-frame fast path free of
// json.Unmarshal cost — the vast majority of inbound frames are
// application traffic that goes straight to WASM. Bugboard #321.
var oramaControlFramePrefix = []byte(`"__orama"`)
// oramaControlFrame is the wire shape for gateway-handled control
// frames on a persistent WS. The single Type field discriminates;
// payload fields specific to each Type ride alongside.
//
// Today supports:
//
// {"__orama":"auth.refresh","jwt":"<new-token>"}
//
// Future types (e.g. "ping.app", "subscribe.status") follow the same
// shape. Reserve "__orama" as the namespace so application frames
// never collide.
type oramaControlFrame struct {
Type string `json:"__orama"`
JWT string `json:"jwt,omitempty"`
}
// oramaControlAck is the response shape sent back on the WS after a
// control frame is handled. Clients SHOULD await this before assuming
// the gateway has applied the change.
type oramaControlAck struct {
Type string `json:"__orama_ack"`
OK bool `json:"ok"`
Error string `json:"error,omitempty"`
Subject string `json:"subject,omitempty"` // populated on successful auth.refresh
}
// handlePersistentWebSocket runs the per-connection persistent function model.
// One WASM instance is bound to this WS for its entire lifetime. Frames are
// processed serially via the instance's inbound channel.
@ -58,20 +94,8 @@ func (h *ServerlessHandlers) handlePersistentWebSocket(
defer h.wsBridge.RemoveClient(context.Background(), clientID)
}
callerWallet := h.getWalletFromRequest(r)
callerIP := extractRemoteIP(r)
callerClaims := h.getCallerClaimsFromRequest(r)
invCtx := &serverless.InvocationContext{
FunctionID: fn.ID,
FunctionName: fn.Name,
Namespace: fn.Namespace,
CallerWallet: callerWallet,
CallerIP: callerIP,
CallerClaims: callerClaims,
WSClientID: clientID,
TriggerType: serverless.TriggerTypeWebSocket,
}
invCtx := h.buildPersistentInvocationContext(r, fn, clientID)
callerWallet := invCtx.CallerWallet
// Instantiate the persistent module. This compiles once (cached) and
// creates one wazero instance bound to this connection.
@ -91,6 +115,13 @@ func (h *ServerlessHandlers) handlePersistentWebSocket(
Namespace: fn.Namespace,
FrameTimeoutSec: fn.TimeoutSeconds,
MaxInflightFrames: fn.WSMaxInflightPerConn,
// Per-instance identity binding. The persistent.Instance attaches
// this to the ctx of every WASM-host call (ws_open / ws_frame /
// ws_close + nested function_invoke), so caller identity is
// race-free across concurrent persistent WS connections — fixes
// the cross-tenant identity-leak on the shared HostFunctions
// singleton (security audit follow-up to Layer 7 of Feature #73).
InvocationContext: invCtx,
}, h.logger)
if err != nil {
h.logger.Warn("persistent WS NewInstance failed",
@ -151,13 +182,37 @@ func (h *ServerlessHandlers) handlePersistentWebSocket(
}
}()
// Read loop — enqueue frames into the instance.
// Read loop — enqueue frames into the instance. Bugboard #321:
// gateway-handled control frames (e.g. {"__orama":"auth.refresh"})
// are intercepted here BEFORE submission so they don't reach WASM.
for {
_, frame, readErr := conn.ReadMessage()
if readErr != nil {
break
}
h.wsManager.RecordInbound(clientID, len(frame))
// Cheap byte-level prefix sniff so the per-frame fast path
// avoids json.Unmarshal for every application frame. Only
// frames carrying the `"__orama"` key get parsed.
if bytes.Contains(frame, oramaControlFramePrefix) {
handled, ackErr := h.handleOramaControlFrame(frame, fn, inst, namespace, clientID, conn)
if ackErr != nil {
h.logger.Warn("persistent WS: control-frame ack write failed",
zap.String("client_id", clientID),
zap.Error(ackErr))
// Don't kill the WS for an ack write failure — the
// client will time-out the ack and retry. Continue.
}
if handled {
continue // Don't forward control frames to WASM.
}
// Not actually a control frame (false-positive prefix
// match — e.g. a JSON string literal containing
// `"__orama"`); fall through and submit as a normal
// application frame.
}
if err := inst.Submit(frame); err != nil {
h.logger.Warn("persistent WS submit failed (queue full?)",
zap.String("client_id", clientID),
@ -175,3 +230,242 @@ func (h *ServerlessHandlers) handlePersistentWebSocket(
inst.Close(context.Background(), persistent.CloseReasonClientDisconnect)
_ = conn.Close()
}
// buildPersistentInvocationContext constructs the per-connection InvocationContext
// for a persistent WS instance. Extracted from handlePersistentWebSocket so the
// auth-field plumbing can be unit-tested without doing a real WS upgrade.
//
// IMPORTANT: this context is sticky for the lifetime of the connection — it is
// bound once at instantiation (pkg/serverless/engine.go InstantiatePersistent)
// and reused for every ws_open / ws_frame / ws_close call, as well as for any
// nested function_invoke call originating inside the WASM instance. Missing a
// field here (notably CallerJWTSubject) means every sub-function invoked via
// `oh.FunctionInvoke` sees an empty value for the missing field — Layer 7 of
// the WS bug chain (Feature #73 on bugboard; AnChat sync-deltas was returning
// AUTH_REQUIRED because oh.JwtSubjectUserID() was "" inside the sub-function).
//
// Keep this in sync with the stateless WS handler's InvokeRequest construction
// in ws_handler.go — they must populate the same auth-identity fields.
func (h *ServerlessHandlers) buildPersistentInvocationContext(
r *http.Request, fn *serverless.Function, clientID string,
) *serverless.InvocationContext {
return &serverless.InvocationContext{
FunctionID: fn.ID,
FunctionName: fn.Name,
Namespace: fn.Namespace,
CallerWallet: h.getWalletFromRequest(r),
CallerIP: extractRemoteIP(r),
CallerClaims: h.getCallerClaimsFromRequest(r),
CallerJWTSubject: h.getJWTSubjectFromRequest(r),
WSClientID: clientID,
TriggerType: serverless.TriggerTypeWebSocket,
}
}
// handleOramaControlFrame parses a frame as the orama control envelope
// and dispatches by type. Returns (handled=true, _) if the frame was a
// well-formed control frame (regardless of whether it succeeded);
// (false, nil) for false-positives where the byte sniff matched but
// the JSON shape isn't ours. The returned error reflects only the ack
// write — not the underlying control action (which surfaces via the
// ack body's ok/error fields).
//
// Bugboard #321: introduced for the auth.refresh path so persistent
// WS connections survive JWT rotation without a close+reconnect.
func (h *ServerlessHandlers) handleOramaControlFrame(
frame []byte,
fn *serverless.Function,
inst *persistent.Instance,
namespace, clientID string,
conn *websocket.Conn,
) (handled bool, ackErr error) {
var ctrl oramaControlFrame
if err := json.Unmarshal(frame, &ctrl); err != nil {
// Not JSON, or doesn't match our shape. Treat as application
// frame (false-positive on the prefix sniff).
return false, nil
}
if ctrl.Type == "" {
return false, nil
}
switch ctrl.Type {
case "auth.refresh":
return true, h.handleAuthRefresh(ctrl, fn, inst, namespace, clientID, conn)
default:
// Unknown control type — ack with an error so the client knows
// the frame was seen but ignored. Treat as handled (don't
// forward to WASM), since the `__orama` namespace is reserved.
return true, h.writeControlAck(conn, oramaControlAck{
Type: ctrl.Type,
OK: false,
Error: "unknown __orama control type",
})
}
}
// handleAuthRefresh validates the new JWT, swaps the persistent
// instance's invocation context atomically, and acks the client.
// On invalid JWT: ack with ok=false and a reason. Does NOT close the
// WS — the client can retry with a fresh token. Bugboard #321.
func (h *ServerlessHandlers) handleAuthRefresh(
ctrl oramaControlFrame,
fn *serverless.Function,
inst *persistent.Instance,
namespace, clientID string,
conn *websocket.Conn,
) error {
if h.jwtVerifier == nil {
return h.writeControlAck(conn, oramaControlAck{
Type: "auth.refresh",
OK: false,
Error: "mid-session auth refresh not supported on this gateway",
})
}
if ctrl.JWT == "" {
return h.writeControlAck(conn, oramaControlAck{
Type: "auth.refresh",
OK: false,
Error: "jwt field required",
})
}
claims, err := h.jwtVerifier.ParseAndVerifyJWT(ctrl.JWT)
if err != nil {
h.logger.Info("persistent WS: auth.refresh rejected (invalid jwt)",
zap.String("client_id", clientID),
zap.Error(err))
return h.writeControlAck(conn, oramaControlAck{
Type: "auth.refresh",
OK: false,
Error: "invalid or expired jwt: " + err.Error(),
})
}
if reason := validateRefreshClaims(claims, fn.Namespace); reason != "" {
h.logger.Warn("persistent WS: auth.refresh rejected",
zap.String("client_id", clientID),
zap.String("reason", reason),
zap.String("ws_namespace", fn.Namespace),
zap.String("jwt_namespace", claims.Namespace),
zap.String("jwt_subject", claims.Sub),
)
return h.writeControlAck(conn, oramaControlAck{
Type: "auth.refresh",
OK: false,
Error: reason,
})
}
// Audit log when the refreshed subject DIFFERS from the original
// (bug #321 audit LOW #8). Same-subject rotations are the common
// case (token renewal); cross-subject is legal but rare enough
// that operators benefit from seeing it in the audit trail.
prevSubject := ""
if cur := inst.CurrentInvocationContext(); cur != nil {
prevSubject = cur.CallerJWTSubject
}
if prevSubject != "" && prevSubject != claims.Sub {
h.logger.Info("persistent WS: auth.refresh swapping subject identity on socket",
zap.String("client_id", clientID),
zap.String("previous_subject", prevSubject),
zap.String("new_subject", claims.Sub),
)
}
// Build a fresh InvocationContext with the new identity. Preserve
// the connection-scoped fields (FunctionID/Name, Namespace,
// WSClientID, CallerIP, TriggerType) — those don't change. Wallet
// resolution follows the same precedence as the original upgrade:
// JWT subject is the source of truth here since the caller is
// proving fresh identity.
customClaims := map[string]string{}
for k, v := range claims.Custom {
customClaims[k] = v
}
newInvCtx := &serverless.InvocationContext{
FunctionID: fn.ID,
FunctionName: fn.Name,
Namespace: fn.Namespace,
CallerWallet: claims.Sub,
CallerClaims: customClaims,
CallerJWTSubject: claims.Sub,
WSClientID: clientID,
TriggerType: serverless.TriggerTypeWebSocket,
}
if err := inst.UpdateInvocationContext(newInvCtx); err != nil {
// nil-guard inside UpdateInvocationContext is the only error
// path today; we just built newInvCtx with non-nil fields so
// this shouldn't fire. If it does, surface as an internal error.
h.logger.Error("persistent WS: UpdateInvocationContext failed",
zap.String("client_id", clientID),
zap.Error(err))
return h.writeControlAck(conn, oramaControlAck{
Type: "auth.refresh",
OK: false,
Error: "internal: failed to apply refresh",
})
}
h.logger.Info("persistent WS: auth.refresh applied",
zap.String("client_id", clientID),
zap.String("namespace", namespace),
zap.String("new_subject", claims.Sub))
return h.writeControlAck(conn, oramaControlAck{
Type: "auth.refresh",
OK: true,
Subject: claims.Sub,
})
}
// validateRefreshClaims is the policy decision for whether a
// post-validation JWT may be installed on a persistent WS via the
// auth.refresh control frame. Returns "" if allowed, or a
// human-readable reason string suitable for the ack body.
//
// SECURITY (bug #321 audit HIGH #9): reject JWTs minted for a
// DIFFERENT namespace. Without this check, an attacker who
// legitimately owns an account in namespace B could rotate their
// already-established namespace-A WS to run as their B-subject
// against A's WASM/secrets/data. The upgrade-time auth middleware
// already enforces namespace match; this preserves the invariant
// across mid-session rotations.
//
// Empty claims.Namespace is treated as a hard reject — JWTs minted
// by this gateway always populate it; an empty value either means
// a foreign issuer slipped through or a malformed token. Either
// way, refuse rather than silently default to the WS's namespace.
//
// Extracted as a pure function so the policy decision can be
// regression-tested without a live WS connection.
func validateRefreshClaims(claims *auth.JWTClaims, wsNamespace string) string {
if claims == nil {
return "internal: nil claims after verification"
}
if claims.Namespace == "" {
return "jwt missing namespace claim"
}
if claims.Namespace != wsNamespace {
return "jwt namespace does not match websocket namespace"
}
if claims.Sub == "" {
// Subject-less JWTs would swap the WS into an anonymous
// identity, breaking every downstream auth check. Reject.
return "jwt missing subject claim"
}
return ""
}
// writeControlAck JSON-encodes the ack and writes it as a single text
// message back to the client. Bounded write deadline so a slow client
// doesn't block the read loop.
func (h *ServerlessHandlers) writeControlAck(conn *websocket.Conn, ack oramaControlAck) error {
payload, err := json.Marshal(ack)
if err != nil {
return err
}
_ = conn.SetWriteDeadline(time.Now().Add(5 * time.Second))
defer conn.SetWriteDeadline(time.Time{})
return conn.WriteMessage(websocket.TextMessage, payload)
}

View File

@ -0,0 +1,157 @@
package serverless
import (
"context"
"net/http"
"net/http/httptest"
"testing"
"github.com/DeBrosOfficial/network/pkg/gateway/auth"
"github.com/DeBrosOfficial/network/pkg/gateway/ctxkeys"
"github.com/DeBrosOfficial/network/pkg/serverless"
)
// TestBuildPersistentInvocationContext_PropagatesJWTSubject is the regression
// guard for Layer 7 of the WS bug chain (Feature #73 on bugboard).
//
// Symptom: AnChat's persistent rpc-router function called function_invoke into
// a sub-function. Inside the sub-function, oh.JwtSubjectUserID() returned ""
// and the sub-function bailed with AUTH_REQUIRED — even though the WS upgrade
// itself was JWT-authenticated and the calling user was identified.
//
// Root cause: handlePersistentWebSocket built the per-connection
// InvocationContext WITHOUT calling getJWTSubjectFromRequest, so
// CallerJWTSubject was always "". HostFunctions.FunctionInvoke correctly
// propagated cur.CallerJWTSubject — but cur.CallerJWTSubject was empty to
// begin with. The stateless WS handler (ws_handler.go) had always done this
// correctly; the persistent handler diverged silently.
//
// If a future refactor drops the field again, this test fails loud — the
// AnChat sync flow would break end-to-end one more time.
func TestBuildPersistentInvocationContext_PropagatesJWTSubject(t *testing.T) {
h := newTestHandlers(nil)
// Simulate a JWT-authenticated request: middleware would have stashed
// the *auth.JWTClaims on the request context under ctxkeys.JWT.
claims := &auth.JWTClaims{
Sub: "wallet-from-jwt-subject",
Custom: map[string]string{"role": "admin"},
}
req := httptest.NewRequest(http.MethodGet, "/", nil)
req = req.WithContext(context.WithValue(req.Context(), ctxkeys.JWT, claims))
fn := &serverless.Function{
ID: "fn-id",
Name: "rpc-router",
Namespace: "anchat",
}
clientID := "ws-client-uuid"
got := h.buildPersistentInvocationContext(req, fn, clientID)
if got == nil {
t.Fatal("buildPersistentInvocationContext returned nil")
}
// Layer 7 invariant: CallerJWTSubject must be populated. Without this
// field, every function_invoke from inside a persistent WS instance
// loses the caller identity — see comment on the helper for the full
// story.
if got.CallerJWTSubject != "wallet-from-jwt-subject" {
t.Errorf("CallerJWTSubject = %q; want %q (Layer 7 regression — see Feature #73)",
got.CallerJWTSubject, "wallet-from-jwt-subject")
}
// Other identity fields the persistent invCtx is responsible for. These
// exercise a smaller surface than the full handler but cover the same
// wiring contract.
if got.CallerWallet == "" {
t.Error("CallerWallet should be populated from JWT (got empty)")
}
if got.WSClientID != clientID {
t.Errorf("WSClientID = %q; want %q", got.WSClientID, clientID)
}
if got.FunctionID != fn.ID {
t.Errorf("FunctionID = %q; want %q", got.FunctionID, fn.ID)
}
if got.FunctionName != fn.Name {
t.Errorf("FunctionName = %q; want %q", got.FunctionName, fn.Name)
}
if got.Namespace != fn.Namespace {
t.Errorf("Namespace = %q; want %q", got.Namespace, fn.Namespace)
}
if got.TriggerType != serverless.TriggerTypeWebSocket {
t.Errorf("TriggerType = %q; want %q", got.TriggerType, serverless.TriggerTypeWebSocket)
}
if got.CallerClaims["role"] != "admin" {
t.Errorf("CallerClaims[role] = %q; want %q", got.CallerClaims["role"], "admin")
}
}
// TestBuildPersistentInvocationContext_NoJWT covers the non-authenticated
// path — namespace-key auth or unauthenticated. CallerJWTSubject must be ""
// (NOT crash, NOT panic). Everything else is whatever the helpers return for
// a bare request.
func TestBuildPersistentInvocationContext_NoJWT(t *testing.T) {
h := newTestHandlers(nil)
req := httptest.NewRequest(http.MethodGet, "/", nil)
fn := &serverless.Function{
ID: "fn-id",
Name: "f",
Namespace: "ns",
}
got := h.buildPersistentInvocationContext(req, fn, "client-id")
if got == nil {
t.Fatal("buildPersistentInvocationContext returned nil")
}
if got.CallerJWTSubject != "" {
t.Errorf("CallerJWTSubject should be empty without JWT, got %q", got.CallerJWTSubject)
}
if got.WSClientID != "client-id" {
t.Errorf("WSClientID = %q; want %q", got.WSClientID, "client-id")
}
if got.TriggerType != serverless.TriggerTypeWebSocket {
t.Errorf("TriggerType = %q; want %q", got.TriggerType, serverless.TriggerTypeWebSocket)
}
}
// TestBuildPersistentInvocationContext_MatchesStatelessHandler is a structural
// guard: the persistent and stateless WS paths must populate the same
// auth-identity fields. The two paths diverged silently for ~6 months; this
// test makes any future divergence loud.
//
// We compare the field set (not values — values come from the same request
// helpers and are exercised in the cases above).
func TestBuildPersistentInvocationContext_MatchesStatelessHandler(t *testing.T) {
h := newTestHandlers(nil)
claims := &auth.JWTClaims{Sub: "test-subject"}
req := httptest.NewRequest(http.MethodGet, "/", nil)
req = req.WithContext(context.WithValue(req.Context(), ctxkeys.JWT, claims))
fn := &serverless.Function{ID: "id", Name: "n", Namespace: "ns"}
got := h.buildPersistentInvocationContext(req, fn, "cid")
// Compare against the helpers the stateless path uses on every frame
// (ws_handler.go:140-145). If any of these returns a value but doesn't
// land in the persistent invCtx, that's the same class of bug as
// Layer 7.
if got.CallerWallet != h.getWalletFromRequest(req) {
t.Errorf("CallerWallet drift: persistent=%q, helper=%q",
got.CallerWallet, h.getWalletFromRequest(req))
}
if got.CallerJWTSubject != h.getJWTSubjectFromRequest(req) {
t.Errorf("CallerJWTSubject drift: persistent=%q, helper=%q",
got.CallerJWTSubject, h.getJWTSubjectFromRequest(req))
}
// Claims comparison: deep-equal isn't worth the ceremony for nil-vs-nil;
// just check both branches produce the same nilness.
statelessClaims := h.getCallerClaimsFromRequest(req)
if (got.CallerClaims == nil) != (statelessClaims == nil) {
t.Errorf("CallerClaims nilness drift: persistent=%v, helper=%v",
got.CallerClaims, statelessClaims)
}
}

View File

@ -107,6 +107,14 @@ func (m *mockRQLiteClient) BatchWithSeq(ctx context.Context, namespace string, o
return res, 1, err
}
func (m *mockRQLiteClient) BatchQuery(ctx context.Context, ops []rqlite.BatchOp) ([]rqlite.OpResult, error) {
out := make([]rqlite.OpResult, len(ops))
for i := range ops {
out[i] = rqlite.OpResult{Kind: rqlite.BatchOpQuery}
}
return out, nil
}
type mockIPFSClient struct {
AddFunc func(ctx context.Context, r io.Reader, filename string) (*ipfs.AddResponse, error)
AddDirectoryFunc func(ctx context.Context, dirPath string) (*ipfs.AddResponse, error)

View File

@ -55,17 +55,17 @@ type InstanceSpawner struct {
// GatewayInstance represents a running Gateway instance for a namespace
type GatewayInstance struct {
Namespace string
NodeID string
HTTPPort int
BaseDomain string
RQLiteDSN string // Connection to namespace RQLite
OlricServers []string // Connection to namespace Olric
ConfigPath string
PID int
StartedAt time.Time
cmd *exec.Cmd
logger *zap.Logger
Namespace string
NodeID string
HTTPPort int
BaseDomain string
RQLiteDSN string // Connection to namespace RQLite
OlricServers []string // Connection to namespace Olric
ConfigPath string
PID int
StartedAt time.Time
cmd *exec.Cmd
logger *zap.Logger
// mu protects mutable state accessed concurrently by the monitor goroutine.
mu sync.RWMutex
@ -75,16 +75,16 @@ type GatewayInstance struct {
// InstanceConfig holds configuration for spawning a Gateway instance
type InstanceConfig struct {
Namespace string // Namespace name (e.g., "alice")
NodeID string // Physical node ID
HTTPPort int // HTTP API port
BaseDomain string // Base domain (e.g., "orama-devnet.network")
RQLiteDSN string // RQLite connection DSN (e.g., "http://localhost:10000")
GlobalRQLiteDSN string // Global RQLite DSN for API key validation (empty = use RQLiteDSN)
OlricServers []string // Olric server addresses
OlricTimeout time.Duration // Timeout for Olric operations
NodePeerID string // Physical node's peer ID for home node management
DataDir string // Data directory for deployments, SQLite, etc.
Namespace string // Namespace name (e.g., "alice")
NodeID string // Physical node ID
HTTPPort int // HTTP API port
BaseDomain string // Base domain (e.g., "orama-devnet.network")
RQLiteDSN string // RQLite connection DSN (e.g., "http://localhost:10000")
GlobalRQLiteDSN string // Global RQLite DSN for API key validation (empty = use RQLiteDSN)
OlricServers []string // Olric server addresses
OlricTimeout time.Duration // Timeout for Olric operations
NodePeerID string // Physical node's peer ID for home node management
DataDir string // Data directory for deployments, SQLite, etc.
// IPFS configuration for storage endpoints
IPFSClusterAPIURL string // IPFS Cluster API URL (e.g., "http://localhost:9094")
IPFSAPIURL string // IPFS API URL (e.g., "http://localhost:5001")
@ -95,15 +95,30 @@ type InstanceConfig struct {
SFUPort int // SFU signaling port on this node
TURNDomain string // TURN server domain (e.g., "turn.ns-alice.orama-devnet.network")
TURNSecret string // TURN shared secret for credential generation
// TURNStealthDomain is the neutral stealth TURNS host (feat-124,
// cdn-<hash>.<base-domain>). Non-empty only when webrtc stealth is
// enabled for the namespace; turn.credentials then advertises
// `turns:<TURNStealthDomain>:443` as the final URI-ladder rung.
TURNStealthDomain string
// SecretsEncryptionKey is the host-wide AES-256 serverless secrets
// encryption key (hex-encoded). Bugboard #837 follow-up: the host gateway
// receives this via gateway.Config but spawned namespace gateways never
// did, so `function secrets list` returned 501 on namespaces. It is the
// SAME value on every node — read once from the host's
// secrets/secrets-encryption-key file — and must be identical across the
// namespace cluster so a secret encrypted by one gateway decrypts on
// another. Empty means secrets management stays disabled (fail-loud).
SecretsEncryptionKey string
}
// GatewayYAMLWebRTC represents the webrtc section of the gateway YAML config.
// Must match yamlWebRTCCfg in cmd/gateway/config.go.
type GatewayYAMLWebRTC struct {
Enabled bool `yaml:"enabled"`
SFUPort int `yaml:"sfu_port,omitempty"`
TURNDomain string `yaml:"turn_domain,omitempty"`
TURNSecret string `yaml:"turn_secret,omitempty"`
Enabled bool `yaml:"enabled"`
SFUPort int `yaml:"sfu_port,omitempty"`
TURNDomain string `yaml:"turn_domain,omitempty"`
TURNSecret string `yaml:"turn_secret,omitempty"`
TURNStealthDomain string `yaml:"turn_stealth_domain,omitempty"`
}
// GatewayYAMLConfig represents the gateway YAML configuration structure
@ -125,6 +140,13 @@ type GatewayYAMLConfig struct {
IPFSTimeout string `yaml:"ipfs_timeout,omitempty"`
IPFSReplicationFactor int `yaml:"ipfs_replication_factor,omitempty"`
WebRTC GatewayYAMLWebRTC `yaml:"webrtc,omitempty"`
// SecretsEncryptionKey carries the host's serverless secrets encryption
// key into the spawned namespace gateway so it can decrypt/encrypt
// function secrets (bugboard #837 follow-up). The standalone gateway
// binary loads this back into gateway.Config.SecretsEncryptionKey on
// startup. Because this is key material, generateConfig writes the file
// 0600. Empty omits the field (secrets management stays disabled).
SecretsEncryptionKey string `yaml:"secrets_encryption_key,omitempty"`
// ClusterSecretPath points to the host's cluster-secret file. Bug #215
// follow-up: namespace gateways spawned by systemd previously had no
// way to access the cluster secret, so they fell back to per-node
@ -209,9 +231,9 @@ func (is *InstanceSpawner) SpawnInstance(ctx context.Context, cfg InstanceConfig
// Find the gateway binary - look in common locations
var gatewayBinary string
possiblePaths := []string{
"./bin/gateway", // Development build
"/usr/local/bin/orama-gateway", // System-wide install
"/opt/orama/bin/gateway", // Package install
"./bin/gateway", // Development build
"/usr/local/bin/orama-gateway", // System-wide install
"/opt/orama/bin/gateway", // Package install
}
for _, path := range possiblePaths {
@ -318,11 +340,13 @@ func (is *InstanceSpawner) generateConfig(configPath string, cfg InstanceConfig,
IPFSAPIURL: cfg.IPFSAPIURL,
IPFSReplicationFactor: cfg.IPFSReplicationFactor,
WebRTC: GatewayYAMLWebRTC{
Enabled: cfg.WebRTCEnabled,
SFUPort: cfg.SFUPort,
TURNDomain: cfg.TURNDomain,
TURNSecret: cfg.TURNSecret,
Enabled: cfg.WebRTCEnabled,
SFUPort: cfg.SFUPort,
TURNDomain: cfg.TURNDomain,
TURNSecret: cfg.TURNSecret,
TURNStealthDomain: cfg.TURNStealthDomain,
},
SecretsEncryptionKey: cfg.SecretsEncryptionKey,
}
// Set Olric timeout if provided
if cfg.OlricTimeout > 0 {
@ -341,12 +365,24 @@ func (is *InstanceSpawner) generateConfig(configPath string, cfg InstanceConfig,
}
}
if err := os.WriteFile(configPath, data, 0644); err != nil {
// 0600: this YAML now embeds the serverless secrets encryption key
// (bugboard #837), so it must not be world/group readable.
if err := os.WriteFile(configPath, data, 0600); err != nil {
return &InstanceError{
Message: "failed to write Gateway config",
Cause: err,
}
}
// WriteFile's mode only applies on CREATE — a pre-existing file (e.g.
// written 0644 by an older release) keeps its old perms on rewrite.
// Converge explicitly so upgraded nodes don't leave the embedded
// secrets key group/world-readable.
if err := os.Chmod(configPath, 0600); err != nil {
return &InstanceError{
Message: "failed to set Gateway config permissions",
Cause: err,
}
}
return nil
}

View File

@ -1,9 +1,12 @@
package gateway
import (
"os"
"path/filepath"
"strings"
"testing"
"go.uber.org/zap"
"gopkg.in/yaml.v3"
)
@ -65,6 +68,114 @@ func TestGatewayYAMLConfig_clusterSecretPathRoundTrip(t *testing.T) {
}
}
// TestGatewayYAMLConfig_secretsEncryptionKeyRoundTrip is the regression test
// for the bugboard #837 follow-up: the host gateway received the serverless
// secrets encryption key but namespace gateways spawned via systemd did not,
// because the YAML schema had no field to carry it — so `function secrets
// list` returned 501 on those namespaces. This guards the yaml tag and that
// the standalone gateway's yamlCfg mirror can read it back.
func TestGatewayYAMLConfig_secretsEncryptionKeyRoundTrip(t *testing.T) {
const key = "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
cfg := GatewayYAMLConfig{
ListenAddr: ":6001",
ClientNamespace: "anchat-test",
RQLiteDSN: "http://localhost:10000",
OlricServers: []string{"localhost:3320"},
SecretsEncryptionKey: key,
}
out, err := yaml.Marshal(cfg)
if err != nil {
t.Fatalf("marshal: %v", err)
}
if !strings.Contains(string(out), "secrets_encryption_key: "+key) {
t.Fatalf("YAML output missing expected secrets_encryption_key line:\n%s", out)
}
// Mirror of cmd/gateway/config.go's yamlCfg so this test catches drift
// between the two declarations (the standalone gateway uses strict
// decoding and would reject an unknown field).
type webrtc struct {
Enabled bool `yaml:"enabled"`
SFUPort int `yaml:"sfu_port"`
TURNDomain string `yaml:"turn_domain"`
TURNSecret string `yaml:"turn_secret"`
}
type yamlCfgMirror struct {
ListenAddr string `yaml:"listen_addr"`
ClientNamespace string `yaml:"client_namespace"`
RQLiteDSN string `yaml:"rqlite_dsn"`
OlricServers []string `yaml:"olric_servers"`
WebRTC webrtc `yaml:"webrtc"`
SecretsEncryptionKey string `yaml:"secrets_encryption_key"`
ClusterSecretPath string `yaml:"cluster_secret_path"`
}
var parsed yamlCfgMirror
if err := yaml.Unmarshal(out, &parsed); err != nil {
t.Fatalf("unmarshal: %v", err)
}
if parsed.SecretsEncryptionKey != key {
t.Errorf("round-trip mismatch: got %q, want %q", parsed.SecretsEncryptionKey, key)
}
}
// TestGatewayYAMLConfig_secretsKeyOmitWhenEmpty: a host with no secrets key
// (legacy/test rigs) must not emit a stray secrets_encryption_key line that
// operators could mistake for an empty-key directive.
func TestGatewayYAMLConfig_secretsKeyOmitWhenEmpty(t *testing.T) {
cfg := GatewayYAMLConfig{
ListenAddr: ":6001",
ClientNamespace: "ns",
RQLiteDSN: "http://localhost:10000",
OlricServers: []string{"localhost:3320"},
// SecretsEncryptionKey intentionally empty.
}
out, err := yaml.Marshal(cfg)
if err != nil {
t.Fatalf("marshal: %v", err)
}
if strings.Contains(string(out), "secrets_encryption_key") {
t.Errorf("empty SecretsEncryptionKey should be omitted from YAML; got:\n%s", out)
}
}
// TestGenerateConfig_writesSecretsKeyWith0600 verifies the spawned namespace
// gateway YAML carries the secrets key AND is written 0600 (the file now
// holds key material — bugboard #837).
func TestGenerateConfig_writesSecretsKeyWith0600(t *testing.T) {
const key = "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
dir := t.TempDir()
is := NewInstanceSpawner(dir, zap.NewNop())
configPath := filepath.Join(dir, "gateway-node-1.yaml")
cfg := InstanceConfig{
Namespace: "anchat-test",
NodeID: "node-1",
HTTPPort: 6001,
RQLiteDSN: "http://localhost:10000",
OlricServers: []string{"localhost:3320"},
SecretsEncryptionKey: key,
}
if err := is.generateConfig(configPath, cfg, dir); err != nil {
t.Fatalf("generateConfig: %v", err)
}
info, err := os.Stat(configPath)
if err != nil {
t.Fatalf("stat: %v", err)
}
if perm := info.Mode().Perm(); perm != 0600 {
t.Errorf("config perms = %o, want 0600 (file holds the secrets key)", perm)
}
data, err := os.ReadFile(configPath)
if err != nil {
t.Fatalf("read: %v", err)
}
if !strings.Contains(string(data), "secrets_encryption_key: "+key) {
t.Errorf("generated config missing secrets_encryption_key:\n%s", data)
}
}
// TestGatewayYAMLConfig_omitWhenEmpty: when the host has no cluster secret,
// the field is omitted from the YAML so legacy single-node test rigs don't
// see a stray "cluster_secret_path: " line that operators might mistake for

View File

@ -36,6 +36,12 @@ func (g *Gateway) Close() {
g.cronScheduler.Stop()
}
// Stop the pubsub dispatcher's periodic refresh goroutine. libp2p
// subscriptions die naturally with the client teardown below.
if g.pubsubDispatcher != nil {
g.pubsubDispatcher.Stop()
}
// Drain persistent WebSocket instances. Each instance gets a slice of
// the 30s budget; ws_close on each is best-effort.
if g.persistentWSManager != nil {

View File

@ -660,6 +660,18 @@ func isPublicPath(p string) bool {
return true
}
// Namespace WebRTC management endpoints (enable/disable/status). Auth is
// handled INSIDE the handlers by the X-Orama-Internal-Auth header +
// WireGuard-peer source check (same as spawn/repair above). Without this
// exemption the API-key middleware rejects them with "missing API key"
// before the handler's internal-auth check runs, making the internal
// endpoints unreachable — so `orama namespace enable webrtc` had no
// working path (the public endpoint hits a gateway without the WebRTC
// manager wired). Bugboard: internal webrtc mgmt endpoints unreachable.
if strings.HasPrefix(p, "/v1/internal/namespace/webrtc/") {
return true
}
// Vault proxy endpoints (no auth — rate-limited per identity hash within handler)
if strings.HasPrefix(p, "/v1/vault/") {
return true
@ -1103,18 +1115,110 @@ func (g *Gateway) handleNamespaceGatewayRequest(w http.ResponseWriter, r *http.R
// Validate auth against main cluster RQLite BEFORE proxying
// This ensures API keys work even though they're not in the namespace's RQLite
validatedNamespace, validatedClaims, authErr := g.validateAuthForNamespaceProxy(r)
if authErr != "" && !isPublicPath(r.URL.Path) {
isWS := isWebSocketUpgrade(r)
isPublic := isPublicPath(r.URL.Path)
// Bug #240/#249 root-cause hardening: previously, when
// validateAuthForNamespaceProxy returned an empty namespace AND empty
// error (i.e. "no credentials found"), the request fell through to a
// silent forward to the namespace gateway WITHOUT internal-auth
// headers. The namespace gateway then rejected the request with 401
// "missing API key" in ~60µs. From the client's perspective the 401
// appeared opaque; from our side the failure was logged only on the
// namespace gateway (which itself can't validate API keys — they
// live in the main cluster RQLite). This created a confusing
// debugging experience and was the root cause of AnChat's
// "intermittent 401" reports on the WS path.
//
// Two parts to the fix:
// 1. Reject at MAIN when no credentials were extractable AND the
// path requires auth. Surfaces the failure with a clear message
// AT the gateway tier that actually knows about API keys.
// 2. Log every WS upgrade auth outcome with enough context to
// diagnose the intermittent reports we've been seeing
// (presence of relevant query params, headers we care about,
// and the actor IP). Logged at debug level for success and
// warn for the reject path so steady-state noise stays low.
if authErr != "" && !isPublic {
if isWS {
g.logger.ComponentWarn(logging.ComponentGeneral,
"namespace-proxy WS upgrade rejected: auth error",
zap.String("namespace_target", namespaceName),
zap.String("auth_err", authErr),
zap.String("path", r.URL.Path),
zap.String("client_ip", getClientIP(r)),
zap.Bool("has_api_key_query", r.URL.Query().Get("api_key") != ""),
zap.Bool("has_token_query", r.URL.Query().Get("token") != ""),
zap.Bool("has_jwt_query", r.URL.Query().Get("jwt") != ""),
zap.Bool("has_authz_header", r.Header.Get("Authorization") != ""),
zap.Bool("has_xapikey_header", r.Header.Get("X-API-Key") != ""),
zap.String("connection_header", r.Header.Get("Connection")),
zap.String("upgrade_header", r.Header.Get("Upgrade")),
zap.String("user_agent", r.Header.Get("User-Agent")),
)
}
w.Header().Set("WWW-Authenticate", "Bearer error=\"invalid_token\"")
writeError(w, http.StatusUnauthorized, authErr)
return
}
// No-credentials path: previously fell through to silent forward.
// Now: reject at main with diagnostic context. Namespace gateways
// cannot validate API keys themselves (no shared rqlite for them),
// so forwarding unauthenticated requests can only ever produce
// opaque 401s downstream.
if validatedNamespace == "" && !isPublic {
g.logger.ComponentWarn(logging.ComponentGeneral,
"namespace-proxy request rejected: no credentials extracted",
zap.String("namespace_target", namespaceName),
zap.String("path", r.URL.Path),
zap.Bool("is_ws_upgrade", isWS),
zap.String("client_ip", getClientIP(r)),
zap.Bool("has_api_key_query", r.URL.Query().Get("api_key") != ""),
zap.Bool("has_token_query", r.URL.Query().Get("token") != ""),
zap.Bool("has_jwt_query", r.URL.Query().Get("jwt") != ""),
zap.Bool("has_authz_header", r.Header.Get("Authorization") != ""),
zap.Bool("has_xapikey_header", r.Header.Get("X-API-Key") != ""),
zap.String("connection_header", r.Header.Get("Connection")),
zap.String("upgrade_header", r.Header.Get("Upgrade")),
zap.String("origin", r.Header.Get("Origin")),
zap.String("user_agent", r.Header.Get("User-Agent")),
zap.Int("raw_query_len", len(r.URL.RawQuery)),
)
w.Header().Set("WWW-Authenticate", "Bearer realm=\"gateway\"")
writeError(w, http.StatusUnauthorized,
"authentication required for namespace endpoint (no api_key/token/jwt extracted)")
return
}
// If auth succeeded, ensure the API key belongs to the target namespace
if validatedNamespace != "" && validatedNamespace != namespaceName {
g.logger.ComponentWarn(logging.ComponentGeneral,
"namespace-proxy request rejected: API key namespace mismatch",
zap.String("namespace_target", namespaceName),
zap.String("validated_namespace", validatedNamespace),
zap.String("path", r.URL.Path),
zap.Bool("is_ws_upgrade", isWS),
zap.String("client_ip", getClientIP(r)),
)
writeError(w, http.StatusForbidden, "API key does not belong to this namespace")
return
}
// Success-path diagnostic for WS upgrades. Logged at debug to keep
// the steady-state log volume low; flip the gateway log level to
// `debug` to capture per-upgrade audit trail when reproducing
// AnChat-style intermittent failures.
if isWS {
g.logger.ComponentDebug(logging.ComponentGeneral,
"namespace-proxy WS upgrade authenticated, forwarding",
zap.String("namespace", namespaceName),
zap.String("path", r.URL.Path),
zap.String("client_ip", getClientIP(r)),
zap.Bool("has_jwt_claims", validatedClaims != nil),
)
}
// Check middleware cache for namespace gateway targets
type namespaceGatewayTarget struct {
ip string

View File

@ -171,6 +171,15 @@ func TestIsPublicPath(t *testing.T) {
{"internal join", "/v1/internal/join", true},
{"internal namespace spawn", "/v1/internal/namespace/spawn", true},
{"internal namespace repair", "/v1/internal/namespace/repair", true},
// Internal WebRTC mgmt endpoints — exempt from API-key middleware
// (handler enforces internal-auth header + WireGuard peer). Without
// these, `orama namespace enable webrtc` had no working path.
{"internal webrtc enable", "/v1/internal/namespace/webrtc/enable", true},
{"internal webrtc disable", "/v1/internal/namespace/webrtc/disable", true},
{"internal webrtc status", "/v1/internal/namespace/webrtc/status", true},
// Guard: the PUBLIC webrtc mgmt path must STILL require auth (only
// the /internal/ variant is exempt).
{"public webrtc enable still requires auth", "/v1/namespace/webrtc/enable", false},
{"phantom session", "/v1/auth/phantom/session", true},
{"phantom complete", "/v1/auth/phantom/complete", true},

View File

@ -6,6 +6,7 @@ import (
"fmt"
"os"
"os/exec"
"strconv"
"strings"
"time"
@ -16,29 +17,33 @@ import (
"go.uber.org/zap"
)
// PeerDiscovery manages namespace gateway peer discovery via RQLite
// PeerDiscovery manages namespace gateway peer discovery via RQLite.
//
// The libp2p listen port is NOT stored here — it's derived live from
// pd.host.Addrs() at register time. Previously this struct held a
// `listenPort` field populated from the gateway's HTTP API port (which
// silently broke all cross-node libp2p connections — see comment on
// registerSelf). Don't add it back.
type PeerDiscovery struct {
host host.Host
rqliteDB *sql.DB
nodeID string
listenPort int
namespace string
logger *zap.Logger
host host.Host
rqliteDB *sql.DB
nodeID string
namespace string
logger *zap.Logger
// Stop channel for background goroutines
stopCh chan struct{}
}
// NewPeerDiscovery creates a new peer discovery manager
func NewPeerDiscovery(h host.Host, rqliteDB *sql.DB, nodeID string, listenPort int, namespace string, logger *zap.Logger) *PeerDiscovery {
// NewPeerDiscovery creates a new peer discovery manager.
func NewPeerDiscovery(h host.Host, rqliteDB *sql.DB, nodeID string, namespace string, logger *zap.Logger) *PeerDiscovery {
return &PeerDiscovery{
host: h,
rqliteDB: rqliteDB,
nodeID: nodeID,
listenPort: listenPort,
namespace: namespace,
logger: logger,
stopCh: make(chan struct{}),
host: h,
rqliteDB: rqliteDB,
nodeID: nodeID,
namespace: namespace,
logger: logger,
stopCh: make(chan struct{}),
}
}
@ -129,8 +134,26 @@ func (pd *PeerDiscovery) registerSelf(ctx context.Context) error {
return fmt.Errorf("failed to get WireGuard IP: %w", err)
}
// Build multiaddr: /ip4/<wireguard_ip>/tcp/<port>/p2p/<peer_id>
multiaddr := fmt.Sprintf("/ip4/%s/tcp/%d/p2p/%s", wireguardIP, pd.listenPort, peerID)
// CRITICAL: we used to publish `pd.listenPort` here, which is the gateway's
// HTTP API port (e.g. 10004). Other gateways would read this multiaddr from
// rqlite, dial /ip4/<wg>/tcp/10004, hit the HTTP server, receive
// `HTTP/1.1 400 Bad Request`, and fail the libp2p multistream handshake
// with "message did not have trailing newline". The result: cross-node
// libp2p mesh had 0 connected peers cluster-wide and cross-node pubsub
// silently dropped 100% of messages.
//
// The actual libp2p port is OS-assigned at startup (client.go listens on
// `/ip4/0.0.0.0/tcp/0`), so we must derive it from the live host instead
// of the gateway's HTTP config. The listener binds 0.0.0.0 so it accepts
// traffic on the WG interface even though libp2p only reports loopback +
// public-routable addresses in host.Addrs().
libp2pPort, err := extractLibp2pTCPPort(pd.host.Addrs())
if err != nil {
return fmt.Errorf("failed to extract libp2p TCP port from host addresses: %w", err)
}
// Build multiaddr: /ip4/<wireguard_ip>/tcp/<libp2p_port>/p2p/<peer_id>
multiaddr := fmt.Sprintf("/ip4/%s/tcp/%d/p2p/%s", wireguardIP, libp2pPort, peerID)
query := `
INSERT OR REPLACE INTO _namespace_libp2p_peers
@ -138,11 +161,14 @@ func (pd *PeerDiscovery) registerSelf(ctx context.Context) error {
VALUES (?, ?, ?, ?, ?, ?)
`
// We persist libp2pPort in the listen_port column too — the column is
// informational metadata for operators (the multiaddr is authoritative),
// and keeping it consistent avoids future debugging confusion.
_, err = pd.rqliteDB.ExecContext(ctx, query,
peerID,
multiaddr,
pd.nodeID,
pd.listenPort,
libp2pPort,
pd.namespace,
time.Now().UTC())
@ -153,11 +179,47 @@ func (pd *PeerDiscovery) registerSelf(ctx context.Context) error {
pd.logger.Info("Registered self in peer discovery",
zap.String("peer_id", peerID),
zap.String("multiaddr", multiaddr),
zap.String("node_id", pd.nodeID))
zap.String("node_id", pd.nodeID),
zap.Int("libp2p_port", libp2pPort))
return nil
}
// extractLibp2pTCPPort returns the TCP port the libp2p host is actually
// listening on, by parsing the host's reported listen addresses.
//
// `host.Addrs()` returns multiaddrs like:
//
// /ip4/127.0.0.1/tcp/43043
// /ip4/217.76.56.2/tcp/43043
//
// All entries share the same port (libp2p binds 0.0.0.0:RANDOM_PORT and
// reports one entry per detected interface IP). We take the first `/tcp/`
// component we find.
//
// Note: the WireGuard IP (10.0.0.x) does NOT appear in host.Addrs() because
// libp2p filters its own address enumeration. The listener IS bound to all
// interfaces including wg0, so the port is still reachable on the WG IP —
// we just have to combine the port we extract here with the WG IP we get
// separately (via getWireGuardIP).
func extractLibp2pTCPPort(addrs []multiaddr.Multiaddr) (int, error) {
for _, a := range addrs {
port, err := a.ValueForProtocol(multiaddr.P_TCP)
if err != nil {
continue // not a TCP multiaddr (could be QUIC, etc.) — skip
}
n, parseErr := strconv.Atoi(port)
if parseErr != nil {
continue
}
if n <= 0 || n > 65535 {
continue
}
return n, nil
}
return 0, fmt.Errorf("no TCP port found in libp2p host addresses (got %d addrs)", len(addrs))
}
// unregisterSelf removes this gateway from the discovery table
func (pd *PeerDiscovery) unregisterSelf(ctx context.Context) error {
peerID := pd.host.ID().String()

View File

@ -0,0 +1,112 @@
package gateway
import (
"testing"
"github.com/multiformats/go-multiaddr"
)
// TestExtractLibp2pTCPPort_FindsPort verifies the helper finds the TCP port
// from a typical libp2p host.Addrs() result.
//
// This is the regression guard for the bug where peer_discovery was
// announcing the gateway's HTTP API port (e.g. 10004) instead of the
// libp2p host's actual TCP port (random per restart). With the wrong
// port in the multiaddr, every cross-node libp2p dial landed on the HTTP
// server and failed the multistream handshake with "message did not have
// trailing newline" — leaving the cluster's namespace mesh with 0
// connected peers and silently dropping all cross-node pubsub traffic.
func TestExtractLibp2pTCPPort_FindsPort(t *testing.T) {
addrs := mustParseAddrs(t,
"/ip4/127.0.0.1/tcp/43043",
"/ip4/217.76.56.2/tcp/43043",
)
port, err := extractLibp2pTCPPort(addrs)
if err != nil {
t.Fatalf("extractLibp2pTCPPort: %v", err)
}
if port != 43043 {
t.Errorf("port = %d, want 43043", port)
}
}
// TestExtractLibp2pTCPPort_SkipsNonTCPAddrs verifies the helper does not
// fail when the host advertises non-TCP transports (e.g. QUIC, WebSocket).
// It must find the first TCP entry and return that.
func TestExtractLibp2pTCPPort_SkipsNonTCPAddrs(t *testing.T) {
addrs := mustParseAddrs(t,
"/ip4/127.0.0.1/udp/9999/quic-v1",
"/ip4/127.0.0.1/tcp/43043",
"/ip4/217.76.56.2/tcp/43043",
)
port, err := extractLibp2pTCPPort(addrs)
if err != nil {
t.Fatalf("extractLibp2pTCPPort: %v", err)
}
if port != 43043 {
t.Errorf("port = %d, want 43043 (TCP port should be picked, not QUIC)", port)
}
}
// TestExtractLibp2pTCPPort_NoAddrsReturnsError verifies the helper returns
// an error rather than silently announcing port 0 when the host hasn't
// reported any addresses yet (e.g. called too early in lifecycle).
//
// A silent failure mode here is exactly what masked the original bug for
// so long — we'd rather get a loud error at register time than write
// `/ip4/.../tcp/0/...` to the discovery table.
func TestExtractLibp2pTCPPort_NoAddrsReturnsError(t *testing.T) {
_, err := extractLibp2pTCPPort(nil)
if err == nil {
t.Error("expected error for nil addrs, got nil")
}
}
// TestExtractLibp2pTCPPort_AllUDPReturnsError verifies the helper returns
// an error when no TCP transports are present (UDP-only host). Persisting
// a TCP multiaddr that no listener serves would be the same class of bug.
func TestExtractLibp2pTCPPort_AllUDPReturnsError(t *testing.T) {
addrs := mustParseAddrs(t,
"/ip4/127.0.0.1/udp/9999/quic-v1",
"/ip4/217.76.56.2/udp/9999/quic-v1",
)
if _, err := extractLibp2pTCPPort(addrs); err == nil {
t.Error("expected error for TCP-less addrs, got nil")
}
}
// TestExtractLibp2pTCPPort_AllAddrsShareSamePort verifies the realistic
// libp2p output shape: one entry per detected interface IP, all sharing
// the same OS-assigned port (because the listener binds 0.0.0.0:RANDOM).
// We take the first; we expect them all equal.
func TestExtractLibp2pTCPPort_AllAddrsShareSamePort(t *testing.T) {
addrs := mustParseAddrs(t,
"/ip4/127.0.0.1/tcp/55555",
"/ip4/10.0.0.6/tcp/55555",
"/ip4/51.38.128.56/tcp/55555",
)
port, err := extractLibp2pTCPPort(addrs)
if err != nil {
t.Fatalf("extractLibp2pTCPPort: %v", err)
}
if port != 55555 {
t.Errorf("port = %d, want 55555", port)
}
}
func mustParseAddrs(t *testing.T, raws ...string) []multiaddr.Multiaddr {
t.Helper()
out := make([]multiaddr.Multiaddr, 0, len(raws))
for _, r := range raws {
m, err := multiaddr.NewMultiaddr(r)
if err != nil {
t.Fatalf("parse multiaddr %q: %v", r, err)
}
out = append(out, m)
}
return out
}

View File

@ -67,6 +67,12 @@ func (g *Gateway) Routes() http.Handler {
// Namespace WebRTC enable/disable/status (public, JWT/API key auth via middleware)
mux.HandleFunc("/v1/namespace/webrtc/enable", g.namespaceWebRTCEnablePublicHandler)
mux.HandleFunc("/v1/namespace/webrtc/disable", g.namespaceWebRTCDisablePublicHandler)
mux.HandleFunc("/v1/namespace/webrtc/stealth/enable", func(w http.ResponseWriter, r *http.Request) {
g.namespaceWebRTCStealthPublicHandler(w, r, true)
})
mux.HandleFunc("/v1/namespace/webrtc/stealth/disable", func(w http.ResponseWriter, r *http.Request) {
g.namespaceWebRTCStealthPublicHandler(w, r, false)
})
mux.HandleFunc("/v1/namespace/webrtc/status", g.namespaceWebRTCStatusPublicHandler)
// auth endpoints
@ -177,11 +183,17 @@ func (g *Gateway) Routes() http.Handler {
mux.HandleFunc("/v1/vault/status", g.vaultHandlers.HandleStatus)
}
// webrtc
// webrtc — TURN credentials and SFU signaling are gated independently
// (bugboard #25). A non-SFU gateway with the namespace TURN secret
// serves credentials but not signal/rooms; an SFU gateway serves all.
if g.webrtcHandlers != nil {
mux.HandleFunc("/v1/webrtc/turn/credentials", g.webrtcHandlers.CredentialsHandler)
mux.HandleFunc("/v1/webrtc/signal", g.webrtcHandlers.SignalHandler)
mux.HandleFunc("/v1/webrtc/rooms", g.webrtcHandlers.RoomsHandler)
if g.webrtcServeTURNCredentials {
mux.HandleFunc("/v1/webrtc/turn/credentials", g.webrtcHandlers.CredentialsHandler)
}
if g.webrtcServeSFURoutes {
mux.HandleFunc("/v1/webrtc/signal", g.webrtcHandlers.SignalHandler)
mux.HandleFunc("/v1/webrtc/rooms", g.webrtcHandlers.RoomsHandler)
}
}
// anon proxy (authenticated users only)

View File

@ -33,6 +33,10 @@ func (m *mockFunctionRegistry) Delete(ctx context.Context, namespace, name strin
return nil
}
func (m *mockFunctionRegistry) SetEnabled(ctx context.Context, namespace, name string, enabled bool) error {
return nil
}
func (m *mockFunctionRegistry) GetWASMBytes(ctx context.Context, wasmCID string) ([]byte, error) {
return []byte("wasm"), nil
}

View File

@ -0,0 +1,142 @@
package gateway
import (
"testing"
)
// Bugboard #411 — WebRTC route registration gate.
//
// Pre-fix the gate was `cfg.WebRTCEnabled && cfg.SFUPort > 0`. The
// boolean flag was a silent-404 footgun: spawn-handler-provisioned
// namespace gateways defaulted to WebRTCEnabled=false even when their
// SFU service was running and SFUPort was set, so every call to
// /v1/webrtc/turn/credentials returned 404 (not 503, not 401) for
// months — AnChat hit this on devnet for ~3 months before reporting.
//
// Post-fix: SFUPort > 0 alone gates registration. The legacy
// WebRTCEnabled boolean is retained on the Config struct for spawn-
// request back-compat but ignored at the gate.
//
// These tests pin the new gate semantics so a future refactor of
// gateway.go's startup wiring can't silently re-introduce the
// AND-with-boolean misconfig class.
// All four tests below call the SAME `shouldRegisterWebRTCRoutes`
// helper that the runtime calls — defined alongside the gateway code
// in gateway.go. If the runtime gate changes, the test breaks
// immediately rather than silently passing while live behavior
// diverges (the classic "test duplicates implementation" anti-pattern).
func TestWebRTCRouteGate_RegistersWhenSFUPortSet_RegardlessOfWebRTCEnabled(t *testing.T) {
// The actual #411 bug: WebRTCEnabled=false (default for spawn-
// provisioned namespace gateways) + SFUPort>0 (operator did
// configure the SFU). Pre-fix this returned `false` → no routes
// → 404. Post-fix MUST return true.
cfg := &Config{
WebRTCEnabled: false,
SFUPort: 7800,
TURNSecret: "shared-secret",
TURNDomain: "turn.example.com",
}
if !shouldRegisterWebRTCRoutes(cfg) {
t.Errorf("BUG #411 REGRESSION: SFUPort=%d configured but routes not registered "+
"because legacy WebRTCEnabled=false. This is exactly the silent-404 footgun "+
"the fix was supposed to eliminate.", cfg.SFUPort)
}
}
func TestWebRTCRouteGate_RegistersWhenBothEnabledAndPortSet(t *testing.T) {
// Pre-fix happy path — operator explicitly opted in via the
// legacy boolean. Must still register so existing configs work.
cfg := &Config{
WebRTCEnabled: true,
SFUPort: 7800,
TURNSecret: "shared-secret",
}
if !shouldRegisterWebRTCRoutes(cfg) {
t.Error("explicit WebRTCEnabled=true + SFUPort>0: routes MUST register (back-compat)")
}
}
func TestWebRTCRouteGate_SkipsWhenSFUPortZero(t *testing.T) {
// No SFU port = no functional SFU proxy = registering routes
// would just produce broken 500s on /v1/webrtc/signal. Better to
// not register. This is the "namespace genuinely doesn't want
// WebRTC" path.
cases := []struct {
name string
cfg *Config
}{
{"both unset", &Config{}},
{"webrtc explicitly enabled but no port", &Config{WebRTCEnabled: true, SFUPort: 0}},
{"port is negative (sentinel)", &Config{SFUPort: -1}},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if shouldRegisterWebRTCRoutes(tc.cfg) {
t.Errorf("SFUPort=%d: routes MUST NOT register without a real SFU port",
tc.cfg.SFUPort)
}
})
}
}
func TestWebRTCRouteGate_TURNSecretMissingStillRegisters(t *testing.T) {
// Important: SFUPort>0 + TURNSecret="" should still REGISTER the
// routes. /v1/webrtc/signal and /v1/webrtc/rooms work without TURN
// (TURN is only for the credentials endpoint). And the credentials
// handler internally returns 503 "TURN not configured" when secret
// is missing — which is an ACTIONABLE error operators can fix,
// unlike the silent 404 that #411 reported.
//
// If a future refactor moves the TURNSecret check into the gate,
// /v1/webrtc/signal disappears too and SFU-only namespaces break.
cfg := &Config{
SFUPort: 7800,
TURNSecret: "", // intentionally missing
}
if !shouldRegisterWebRTCRoutes(cfg) {
t.Error("SFUPort>0 + TURNSecret empty: routes MUST still register so /v1/webrtc/signal works; " +
"the credentials endpoint surfaces 503 internally for the missing secret")
}
}
// Bugboard #25 — TURN-credentials gate decoupled from the SFU gate.
// shouldServeTURNCredentials must register /v1/webrtc/turn/credentials
// whenever the namespace TURN secret is set, INDEPENDENT of whether this
// node runs a local SFU. SFU signal/rooms stay gated on SFUPort>0.
func TestTURNCredentialsGate_servesWithSecretEvenWithoutSFU(t *testing.T) {
// Node 57's exact case: TURN secret present, no local SFU (SFUPort=0).
// Credentials MUST register (it's a namespace-wide HMAC; TURN servers
// are remote). Pre-fix the single SFUPort>0 gate 404'd this.
cfg := &Config{TURNSecret: "ns-shared-secret", SFUPort: 0}
if !shouldServeTURNCredentials(cfg) {
t.Error("BUG #25 REGRESSION: TURN credentials must register on a non-SFU gateway that has the namespace secret")
}
if shouldRegisterWebRTCRoutes(cfg) {
t.Error("SFU routes (signal/rooms) must NOT register without a local SFU port")
}
}
func TestTURNCredentialsGate_noSecretNoCredentials(t *testing.T) {
// No TURN secret → don't register credentials (the handler would 503
// anyway; not registering keeps a clean 404 vs. an actionable 503 —
// matches the documented behavior).
cfg := &Config{TURNSecret: "", SFUPort: 7800}
if shouldServeTURNCredentials(cfg) {
t.Error("no TURN secret: credentials route must not register")
}
// But SFU routes still register (SFU is independent).
if !shouldRegisterWebRTCRoutes(cfg) {
t.Error("SFU port set: signal/rooms must register independent of TURN")
}
}
func TestTURNCredentialsGate_sfuNodeServesBoth(t *testing.T) {
// An SFU node with the secret serves everything.
cfg := &Config{TURNSecret: "s", SFUPort: 30000}
if !shouldServeTURNCredentials(cfg) || !shouldRegisterWebRTCRoutes(cfg) {
t.Error("SFU node with TURN secret must serve both credentials and SFU routes")
}
}

View File

@ -45,6 +45,13 @@ type ClusterManagerConfig struct {
// cluster-wide JWT signing key (bug #215 fix). Empty string disables
// cross-node JWT verification within namespace clusters.
ClusterSecretPath string
// SecretsEncryptionKey is the host's serverless secrets encryption key
// (AES-256, hex-encoded), read once from secrets/secrets-encryption-key.
// Forwarded to spawned namespace gateways so `function secrets ...`
// works there (bugboard #837 follow-up). Empty leaves namespace-gateway
// secrets management disabled (fail-loud).
SecretsEncryptionKey string
}
// ClusterManager orchestrates namespace cluster provisioning and lifecycle
@ -56,9 +63,9 @@ type ClusterManager struct {
systemdSpawner *SystemdSpawner // NEW: Systemd-based spawner replaces old spawners
dnsManager *DNSRecordManager
logger *zap.Logger
baseDomain string
baseDataDir string
globalRQLiteDSN string // Global RQLite DSN for namespace gateway auth
baseDomain string
baseDataDir string
globalRQLiteDSN string // Global RQLite DSN for namespace gateway auth
// IPFS configuration for namespace gateways
ipfsClusterAPIURL string
@ -72,6 +79,10 @@ type ClusterManager struct {
// AES-256 key for encrypting TURN secrets in RQLite (nil = plaintext)
turnEncryptionKey []byte
// Host's serverless secrets encryption key, forwarded to spawned
// namespace gateways (bugboard #837 follow-up). Empty = disabled.
secretsEncryptionKey string
// Track provisioning operations
provisioningMu sync.RWMutex
provisioning map[string]bool // namespace -> in progress
@ -123,6 +134,7 @@ func NewClusterManager(
ipfsTimeout: ipfsTimeout,
ipfsReplicationFactor: ipfsReplicationFactor,
turnEncryptionKey: cfg.TurnEncryptionKey,
secretsEncryptionKey: cfg.SecretsEncryptionKey,
logger: logger.With(zap.String("component", "cluster-manager")),
provisioning: make(map[string]bool),
}
@ -170,6 +182,7 @@ func NewClusterManagerWithComponents(
ipfsTimeout: ipfsTimeout,
ipfsReplicationFactor: ipfsReplicationFactor,
turnEncryptionKey: cfg.TurnEncryptionKey,
secretsEncryptionKey: cfg.SecretsEncryptionKey,
logger: logger.With(zap.String("component", "cluster-manager")),
provisioning: make(map[string]bool),
}
@ -566,6 +579,7 @@ func (cm *ClusterManager) startGatewayCluster(ctx context.Context, cluster *Name
IPFSAPIURL: cm.ipfsAPIURL,
IPFSTimeout: cm.ipfsTimeout,
IPFSReplicationFactor: cm.ipfsReplicationFactor,
SecretsEncryptionKey: cm.secretsEncryptionKey,
}
var instance *gateway.GatewayInstance
@ -664,23 +678,27 @@ func (cm *ClusterManager) spawnGatewayRemote(ctx context.Context, nodeIP string,
}
resp, err := cm.sendSpawnRequest(ctx, nodeIP, map[string]interface{}{
"action": "spawn-gateway",
"namespace": cfg.Namespace,
"node_id": cfg.NodeID,
"gateway_http_port": cfg.HTTPPort,
"gateway_base_domain": cfg.BaseDomain,
"gateway_rqlite_dsn": cfg.RQLiteDSN,
"gateway_global_rqlite_dsn": cfg.GlobalRQLiteDSN,
"gateway_olric_servers": cfg.OlricServers,
"gateway_olric_timeout": olricTimeout,
"ipfs_cluster_api_url": cfg.IPFSClusterAPIURL,
"ipfs_api_url": cfg.IPFSAPIURL,
"ipfs_timeout": ipfsTimeout,
"ipfs_replication_factor": cfg.IPFSReplicationFactor,
"gateway_webrtc_enabled": cfg.WebRTCEnabled,
"gateway_sfu_port": cfg.SFUPort,
"gateway_turn_domain": cfg.TURNDomain,
"gateway_turn_secret": cfg.TURNSecret,
"action": "spawn-gateway",
"namespace": cfg.Namespace,
"node_id": cfg.NodeID,
"gateway_http_port": cfg.HTTPPort,
"gateway_base_domain": cfg.BaseDomain,
"gateway_rqlite_dsn": cfg.RQLiteDSN,
"gateway_global_rqlite_dsn": cfg.GlobalRQLiteDSN,
"gateway_olric_servers": cfg.OlricServers,
"gateway_olric_timeout": olricTimeout,
"ipfs_cluster_api_url": cfg.IPFSClusterAPIURL,
"ipfs_api_url": cfg.IPFSAPIURL,
"ipfs_timeout": ipfsTimeout,
"ipfs_replication_factor": cfg.IPFSReplicationFactor,
"gateway_webrtc_enabled": cfg.WebRTCEnabled,
"gateway_sfu_port": cfg.SFUPort,
"gateway_turn_domain": cfg.TURNDomain,
"gateway_turn_secret": cfg.TURNSecret,
"gateway_turn_stealth_domain": cfg.TURNStealthDomain,
// Bugboard #837 follow-up: carry the host secrets encryption key to
// the remote node so its spawned namespace gateway can manage secrets.
"gateway_secrets_encryption_key": cfg.SecretsEncryptionKey,
})
if err != nil {
return nil, err
@ -1587,6 +1605,7 @@ func (cm *ClusterManager) restoreClusterOnNode(ctx context.Context, clusterID, n
IPFSAPIURL: cm.ipfsAPIURL,
IPFSTimeout: cm.ipfsTimeout,
IPFSReplicationFactor: cm.ipfsReplicationFactor,
SecretsEncryptionKey: cm.secretsEncryptionKey,
}
// Add WebRTC config if enabled for this namespace
@ -1596,6 +1615,7 @@ func (cm *ClusterManager) restoreClusterOnNode(ctx context.Context, clusterID, n
gwCfg.SFUPort = sfuBlock.SFUSignalingPort
gwCfg.TURNDomain = fmt.Sprintf("turn.ns-%s.%s", namespaceName, cm.baseDomain)
gwCfg.TURNSecret = webrtcCfg.TURNSharedSecret
gwCfg.TURNStealthDomain = cm.stealthDomainFor(namespaceName, webrtcCfg)
}
}
@ -1659,18 +1679,19 @@ type ClusterLocalState struct {
SavedAt time.Time `json:"saved_at"`
// WebRTC fields (zero values when WebRTC not enabled — backward compatible)
HasSFU bool `json:"has_sfu,omitempty"`
HasTURN bool `json:"has_turn,omitempty"`
TURNSharedSecret string `json:"turn_shared_secret,omitempty"` // Needed for gateway to generate TURN credentials on cold start
TURNDomain string `json:"turn_domain,omitempty"` // TURN server domain for gateway config
TURNCredentialTTL int `json:"turn_credential_ttl,omitempty"`
SFUSignalingPort int `json:"sfu_signaling_port,omitempty"`
SFUMediaPortStart int `json:"sfu_media_port_start,omitempty"`
SFUMediaPortEnd int `json:"sfu_media_port_end,omitempty"`
TURNListenPort int `json:"turn_listen_port,omitempty"`
TURNTLSPort int `json:"turn_tls_port,omitempty"`
TURNRelayPortStart int `json:"turn_relay_port_start,omitempty"`
TURNRelayPortEnd int `json:"turn_relay_port_end,omitempty"`
HasSFU bool `json:"has_sfu,omitempty"`
HasTURN bool `json:"has_turn,omitempty"`
TURNSharedSecret string `json:"turn_shared_secret,omitempty"` // Needed for gateway to generate TURN credentials on cold start
TURNDomain string `json:"turn_domain,omitempty"` // TURN server domain for gateway config
TURNStealthDomain string `json:"turn_stealth_domain,omitempty"` // Stealth TURNS:443 host (feat-124); empty when stealth disabled
TURNCredentialTTL int `json:"turn_credential_ttl,omitempty"`
SFUSignalingPort int `json:"sfu_signaling_port,omitempty"`
SFUMediaPortStart int `json:"sfu_media_port_start,omitempty"`
SFUMediaPortEnd int `json:"sfu_media_port_end,omitempty"`
TURNListenPort int `json:"turn_listen_port,omitempty"`
TURNTLSPort int `json:"turn_tls_port,omitempty"`
TURNRelayPortStart int `json:"turn_relay_port_start,omitempty"`
TURNRelayPortEnd int `json:"turn_relay_port_end,omitempty"`
}
type ClusterLocalStatePorts struct {
@ -1815,6 +1836,79 @@ func (cm *ClusterManager) RestoreLocalClustersFromDisk(ctx context.Context) (int
return restored, nil
}
// restoreWebRTC is the resolved WebRTC gateway config for a restored
// namespace gateway.
type restoreWebRTC struct {
enabled bool
sfuPort int
turnDomain string
turnSecret string
stealthDomain string // feat-124: empty when webrtc stealth is disabled
}
// chooseRestoreWebRTC resolves a restored gateway's WebRTC config. TWO
// independent aspects (bugboard #25 decouple):
//
// - TURN (turnSecret + turnDomain) is NAMESPACE-WIDE. Any gateway with
// the namespace TURN secret can mint /v1/webrtc/turn/credentials (the
// credentials are an HMAC; the actual TURN servers are remote). So a
// gateway node that runs NO local SFU still gets the TURN secret.
// - SFU (sfuPort) is PER-NODE — non-zero only when this node runs a
// local SFU (for /v1/webrtc/signal + /rooms proxying).
//
// Precedence: prefer the local state file; fall back to the DB (source of
// truth) when the state file lacks the TURN secret (the namespace-wide
// "webrtc is enabled" marker). dbFetch is lazy — only hit when needed.
//
// `enabled` is true when EITHER a TURN secret OR an SFU port is present,
// so the caller knows to write a webrtc block. A non-SFU gateway gets
// {sfuPort:0, turnSecret:set} — credentials route registers, signal/rooms
// don't.
//
// Extracted as a pure function so the precedence is unit-testable without
// standing up the full restore path (systemd spawner + DB + port store).
func chooseRestoreWebRTC(
stateHasSFU bool, stateSFUPort int, stateTURNDomain, stateTURNSecret, stateStealthDomain string,
dbFetch func() (turnSecret, turnDomain, stealthDomain string, sfuPort int),
) restoreWebRTC {
turnSecret := stateTURNSecret
turnDomain := stateTURNDomain
stealthDomain := stateStealthDomain
sfuPort := 0
if stateHasSFU && stateSFUPort > 0 {
sfuPort = stateSFUPort
}
// Fall back to the DB when the state file has no TURN secret — that's
// the marker that the namespace has WebRTC enabled at all. The state
// file is not updated by EnableWebRTC, so a namespace enabled after
// the state file was written reaches here with an empty secret.
// (Stealth toggles DO rewrite cluster state on every node, so the
// state-first read stays fresh for stealthDomain too.)
if turnSecret == "" {
if dbSecret, dbDomain, dbStealth, dbSFU := dbFetch(); dbSecret != "" {
turnSecret = dbSecret
if turnDomain == "" {
turnDomain = dbDomain
}
if stealthDomain == "" {
stealthDomain = dbStealth
}
if sfuPort == 0 {
sfuPort = dbSFU
}
}
}
return restoreWebRTC{
enabled: turnSecret != "" || sfuPort > 0,
sfuPort: sfuPort,
turnDomain: turnDomain,
turnSecret: turnSecret,
stealthDomain: stealthDomain,
}
}
// restoreClusterFromState restores all processes for a cluster using local state (no DB queries).
func (cm *ClusterManager) restoreClusterFromState(ctx context.Context, state *ClusterLocalState) error {
cm.logger.Info("Restoring namespace cluster from local state",
@ -1937,38 +2031,87 @@ func (cm *ClusterManager) restoreClusterFromState(ctx context.Context, state *Cl
// 3. Restore Gateway
if state.HasGateway {
// Build the desired gateway config up front (incl. WebRTC resolved
// from state→DB) so it drives BOTH the cold-spawn (gateway down)
// and the warm-reconcile (gateway up but config drifted) paths.
var olricServers []string // WireGuard IPs (Olric binds to the WG interface)
for _, np := range state.AllNodes {
olricServers = append(olricServers, fmt.Sprintf("%s:%d", np.InternalIP, np.OlricHTTPPort))
}
gwCfg := gateway.InstanceConfig{
Namespace: state.NamespaceName,
NodeID: cm.localNodeID,
HTTPPort: pb.GatewayHTTPPort,
BaseDomain: state.BaseDomain,
RQLiteDSN: fmt.Sprintf("http://localhost:%d", pb.RQLiteHTTPPort),
GlobalRQLiteDSN: cm.globalRQLiteDSN,
OlricServers: olricServers,
OlricTimeout: 30 * time.Second,
IPFSClusterAPIURL: cm.ipfsClusterAPIURL,
IPFSAPIURL: cm.ipfsAPIURL,
IPFSTimeout: cm.ipfsTimeout,
IPFSReplicationFactor: cm.ipfsReplicationFactor,
SecretsEncryptionKey: cm.secretsEncryptionKey,
}
// Resolve WebRTC config. Prefer the local state file; fall back to
// the DB (source of truth) to self-heal stale state. Bugboard #25 —
// the state file is NOT updated by EnableWebRTC, so a namespace
// enabled AFTER its state file was written carries no SFU/TURN
// fields here. The lazy dbFetch only hits the DB when the state
// file is incomplete.
wr := chooseRestoreWebRTC(
state.HasSFU, state.SFUSignalingPort, state.TURNDomain, state.TURNSharedSecret, state.TURNStealthDomain,
func() (turnSecret, turnDomain, stealthDomain string, sfuPort int) {
webrtcCfg, err := cm.GetWebRTCConfig(ctx, state.NamespaceName)
if err != nil || webrtcCfg == nil {
return "", "", "", 0
}
// TURN is namespace-wide; SFU port is per-node and may be
// absent on a gateway-only (non-SFU) node — that's fine,
// the gateway still serves TURN credentials.
sfu := 0
if sfuBlock, serr := cm.webrtcPortAllocator.GetSFUPorts(ctx, state.ClusterID, cm.localNodeID); serr == nil && sfuBlock != nil {
sfu = sfuBlock.SFUSignalingPort
}
return webrtcCfg.TURNSharedSecret,
fmt.Sprintf("turn.ns-%s.%s", state.NamespaceName, cm.baseDomain),
cm.stealthDomainFor(state.NamespaceName, webrtcCfg),
sfu
},
)
if wr.enabled {
// WebRTCEnabled is the legacy flag (ignored by the route gate
// now — bugboard #25/#411); set it to SFU presence for
// config-shape consistency with how EnableWebRTC writes nodes.
gwCfg.WebRTCEnabled = wr.sfuPort > 0
gwCfg.SFUPort = wr.sfuPort
gwCfg.TURNDomain = wr.turnDomain
gwCfg.TURNSecret = wr.turnSecret
gwCfg.TURNStealthDomain = wr.stealthDomain
}
resp, err := http.Get(fmt.Sprintf("http://localhost:%d/v1/health", pb.GatewayHTTPPort))
if err == nil {
resp.Body.Close()
// Gateway is already up. Reconcile config drift (bugboard #25 —
// the WARM case): if the running gateway's on-disk config has a
// WebRTC block that differs from the desired (e.g. it lost the
// block on a prior restart where it stayed healthy and the
// cold-spawn path below never ran), rewrite the config + restart.
// ReconcileGateway is a no-op when the on-disk block already
// matches, so this does NOT cause a restart loop on every boot.
if rerr := cm.systemdSpawner.ReconcileGateway(ctx, state.NamespaceName, cm.localNodeID, gwCfg); rerr != nil {
cm.logger.Warn("Gateway WebRTC reconcile failed (leaving running config as-is)",
zap.String("namespace", state.NamespaceName), zap.Error(rerr))
}
} else {
// Build olric server addresses — always use WireGuard IPs (Olric binds to WireGuard interface)
var olricServers []string
for _, np := range state.AllNodes {
olricServers = append(olricServers, fmt.Sprintf("%s:%d", np.InternalIP, np.OlricHTTPPort))
// Gateway is down → cold spawn with the resolved config.
if wr.enabled && !state.HasSFU {
cm.logger.Info("Re-materialized WebRTC gateway config from DB (state file was stale)",
zap.String("namespace", state.NamespaceName),
zap.Int("sfu_port", wr.sfuPort))
}
gwCfg := gateway.InstanceConfig{
Namespace: state.NamespaceName,
NodeID: cm.localNodeID,
HTTPPort: pb.GatewayHTTPPort,
BaseDomain: state.BaseDomain,
RQLiteDSN: fmt.Sprintf("http://localhost:%d", pb.RQLiteHTTPPort),
GlobalRQLiteDSN: cm.globalRQLiteDSN,
OlricServers: olricServers,
OlricTimeout: 30 * time.Second,
IPFSClusterAPIURL: cm.ipfsClusterAPIURL,
IPFSAPIURL: cm.ipfsAPIURL,
IPFSTimeout: cm.ipfsTimeout,
IPFSReplicationFactor: cm.ipfsReplicationFactor,
}
// Add WebRTC config from persisted local state
if state.HasSFU && state.SFUSignalingPort > 0 && state.TURNSharedSecret != "" {
gwCfg.WebRTCEnabled = true
gwCfg.SFUPort = state.SFUSignalingPort
gwCfg.TURNDomain = state.TURNDomain
gwCfg.TURNSecret = state.TURNSharedSecret
}
if err := cm.spawnGatewayWithSystemd(ctx, gwCfg); err != nil {
cm.logger.Error("Failed to restore Gateway from state", zap.String("namespace", state.NamespaceName), zap.Error(err))
} else {
@ -1996,6 +2139,7 @@ func (cm *ClusterManager) restoreClusterFromState(ctx context.Context, state *Cl
RelayPortStart: state.TURNRelayPortStart,
RelayPortEnd: state.TURNRelayPortEnd,
TURNDomain: fmt.Sprintf("turn.ns-%s.%s", state.NamespaceName, cm.baseDomain),
StealthDomain: cm.stealthDomainFor(state.NamespaceName, webrtcCfg),
}
if err := cm.systemdSpawner.SpawnTURN(ctx, state.NamespaceName, cm.localNodeID, turnCfg); err != nil {
cm.logger.Error("Failed to restore TURN from state", zap.String("namespace", state.NamespaceName), zap.Error(err))

View File

@ -0,0 +1,263 @@
package namespace
import (
"context"
"fmt"
"github.com/DeBrosOfficial/network/pkg/client"
"github.com/DeBrosOfficial/network/pkg/turn"
"go.uber.org/zap"
)
// Stealth TURNS-over-443 lifecycle (feat-124, censorship-resistant calling).
//
// Enabling stealth for a namespace whose WebRTC is already running:
// 1. creates DNS A records for the neutral stealth host -> the TURN nodes,
// 2. flips namespace_webrtc_config.stealth_enabled,
// 3. re-spawns the namespace's TURN servers with the stealth domain (the
// spawner provisions a Let's Encrypt cert for it — hard-fail, never
// self-signed),
// 4. rewrites cluster-state.json on every node (so DB-less restores keep
// the stealth domain), and
// 5. restarts the namespace gateways so turn.credentials advertises
// `turns:<stealth-host>:443` as the final URI-ladder rung.
//
// The SNI router on :443 discovers the route (stealth host -> local TURN TLS
// port) from the TURN config files on disk — no extra registration step.
// stealthDomainFor returns the namespace's stealth TURNS host when stealth is
// enabled in its WebRTC config, else "" (callers treat empty as disabled).
func (cm *ClusterManager) stealthDomainFor(namespaceName string, webrtcCfg *WebRTCConfig) string {
if webrtcCfg == nil || !webrtcCfg.StealthEnabled {
return ""
}
return turn.StealthHostForNamespace(namespaceName, cm.baseDomain)
}
// EnableWebRTCStealth enables the stealth TURNS:443 path for a namespace.
// Requires WebRTC to already be enabled.
func (cm *ClusterManager) EnableWebRTCStealth(ctx context.Context, namespaceName string) error {
cluster, webrtcCfg, err := cm.getStealthPrereqs(ctx, namespaceName)
if err != nil {
return err
}
if webrtcCfg.StealthEnabled {
return ErrWebRTCStealthAlreadyEnabled
}
stealthDomain := turn.StealthHostForNamespace(namespaceName, cm.baseDomain)
cm.logger.Info("Enabling WebRTC stealth for namespace",
zap.String("namespace", namespaceName),
zap.String("stealth_domain", stealthDomain))
clusterNodes, err := cm.getClusterNodesWithIPs(ctx, cluster.ID)
if err != nil {
return fmt.Errorf("failed to get cluster nodes: %w", err)
}
turnBlocks, err := cm.getWebRTCBlocksByType(ctx, cluster.ID, "turn")
if err != nil {
return fmt.Errorf("failed to get TURN allocations for namespace %s: %w", namespaceName, err)
}
if len(turnBlocks) == 0 {
return fmt.Errorf("no TURN allocations found for namespace %s (is WebRTC fully enabled?)", namespaceName)
}
// DNS first — cert provisioning and clients both need the name to resolve.
var turnIPs []string
for _, block := range turnBlocks {
for _, n := range clusterNodes {
if n.NodeID == block.NodeID {
turnIPs = append(turnIPs, n.PublicIP)
}
}
}
if err := cm.dnsManager.CreateStealthTURNRecords(ctx, namespaceName, stealthDomain, turnIPs); err != nil {
return fmt.Errorf("failed to create stealth DNS records: %w", err)
}
if err := cm.setStealthEnabled(ctx, cluster.ID, true); err != nil {
return err
}
// Re-spawn TURN with the stealth domain; roll back on failure so the
// board never claims a stealth endpoint that doesn't terminate TLS.
if err := cm.respawnTURNWithStealth(ctx, cluster, clusterNodes, turnBlocks, webrtcCfg.TURNSharedSecret, stealthDomain); err != nil {
cm.rollbackStealthEnable(ctx, cluster.ID, namespaceName)
return fmt.Errorf("failed to re-spawn TURN with stealth cert (stealth rolled back): %w", err)
}
cm.refreshStateAndGateways(ctx, cluster, clusterNodes, stealthDomain, webrtcCfg.TURNSharedSecret)
cm.logEvent(ctx, cluster.ID, EventWebRTCEnabled, "",
fmt.Sprintf("WebRTC stealth enabled (%s)", stealthDomain), nil)
return nil
}
// DisableWebRTCStealth turns the stealth TURNS:443 path off again. TURN and
// the baseline ladder (udp/tcp 3478, turns:5349) keep running.
func (cm *ClusterManager) DisableWebRTCStealth(ctx context.Context, namespaceName string) error {
cluster, webrtcCfg, err := cm.getStealthPrereqs(ctx, namespaceName)
if err != nil {
return err
}
if !webrtcCfg.StealthEnabled {
return ErrWebRTCStealthNotEnabled
}
cm.logger.Info("Disabling WebRTC stealth for namespace", zap.String("namespace", namespaceName))
clusterNodes, err := cm.getClusterNodesWithIPs(ctx, cluster.ID)
if err != nil {
return fmt.Errorf("failed to get cluster nodes: %w", err)
}
turnBlocks, err := cm.getWebRTCBlocksByType(ctx, cluster.ID, "turn")
if err != nil {
return fmt.Errorf("failed to get TURN allocations: %w", err)
}
if err := cm.setStealthEnabled(ctx, cluster.ID, false); err != nil {
return err
}
if err := cm.respawnTURNWithStealth(ctx, cluster, clusterNodes, turnBlocks, webrtcCfg.TURNSharedSecret, ""); err != nil {
return fmt.Errorf("failed to re-spawn TURN without stealth: %w", err)
}
if err := cm.dnsManager.DeleteStealthTURNRecords(ctx, namespaceName); err != nil {
cm.logger.Warn("Failed to delete stealth DNS records", zap.Error(err))
}
cm.refreshStateAndGateways(ctx, cluster, clusterNodes, "", webrtcCfg.TURNSharedSecret)
cm.logEvent(ctx, cluster.ID, EventWebRTCDisabled, "", "WebRTC stealth disabled", nil)
return nil
}
// getStealthPrereqs validates the cluster exists and WebRTC is enabled,
// returning both records (with the TURN secret already decrypted).
func (cm *ClusterManager) getStealthPrereqs(ctx context.Context, namespaceName string) (*NamespaceCluster, *WebRTCConfig, error) {
cluster, err := cm.GetClusterByNamespace(ctx, namespaceName)
if err != nil {
return nil, nil, fmt.Errorf("failed to get cluster: %w", err)
}
if cluster == nil {
return nil, nil, ErrClusterNotFound
}
webrtcCfg, err := cm.GetWebRTCConfig(ctx, namespaceName)
if err != nil {
return nil, nil, fmt.Errorf("failed to get WebRTC config: %w", err)
}
if webrtcCfg == nil {
return nil, nil, ErrWebRTCNotEnabled
}
return cluster, webrtcCfg, nil
}
// setStealthEnabled flips the stealth flag in namespace_webrtc_config.
func (cm *ClusterManager) setStealthEnabled(ctx context.Context, clusterID string, enabled bool) error {
internalCtx := client.WithInternalAuth(ctx)
val := 0
if enabled {
val = 1
}
if _, err := cm.db.Exec(internalCtx,
`UPDATE namespace_webrtc_config SET stealth_enabled = ? WHERE namespace_cluster_id = ? AND enabled = 1`,
val, clusterID); err != nil {
return fmt.Errorf("failed to update stealth_enabled: %w", err)
}
return nil
}
// respawnTURNWithStealth stops and re-spawns every TURN instance of the
// cluster with the given stealth domain ("" = stealth off). The spawner
// provisions the stealth cert and writes the new TURN config; the SNI
// router's discovery picks the route change up from disk.
func (cm *ClusterManager) respawnTURNWithStealth(
ctx context.Context,
cluster *NamespaceCluster,
clusterNodes []clusterNodeInfo,
turnBlocks []WebRTCPortBlock,
turnSecret, stealthDomain string,
) error {
turnDomain := fmt.Sprintf("turn.ns-%s.%s", cluster.NamespaceName, cm.baseDomain)
for _, block := range turnBlocks {
var node *clusterNodeInfo
for i := range clusterNodes {
if clusterNodes[i].NodeID == block.NodeID {
node = &clusterNodes[i]
break
}
}
if node == nil {
return fmt.Errorf("TURN node %s not found in cluster nodes", block.NodeID)
}
cm.stopTURNOnNode(ctx, node.NodeID, node.InternalIP, cluster.NamespaceName)
turnCfg := TURNInstanceConfig{
Namespace: cluster.NamespaceName,
NodeID: node.NodeID,
ListenAddr: fmt.Sprintf("0.0.0.0:%d", block.TURNListenPort),
TURNSListenAddr: fmt.Sprintf("0.0.0.0:%d", block.TURNTLSPort),
PublicIP: node.PublicIP,
Realm: cm.baseDomain,
AuthSecret: turnSecret,
RelayPortStart: block.TURNRelayPortStart,
RelayPortEnd: block.TURNRelayPortEnd,
TURNDomain: turnDomain,
StealthDomain: stealthDomain,
}
if err := cm.spawnTURNOnNode(ctx, *node, cluster.NamespaceName, turnCfg); err != nil {
return fmt.Errorf("failed to re-spawn TURN on node %s: %w", node.NodeID, err)
}
}
return nil
}
// rollbackStealthEnable best-effort reverts the DB flag + DNS records after a
// failed stealth enable, so the system never advertises a half-built path.
func (cm *ClusterManager) rollbackStealthEnable(ctx context.Context, clusterID, namespaceName string) {
if err := cm.setStealthEnabled(ctx, clusterID, false); err != nil {
cm.logger.Warn("Stealth rollback: failed to clear stealth_enabled", zap.Error(err))
}
if err := cm.dnsManager.DeleteStealthTURNRecords(ctx, namespaceName); err != nil {
cm.logger.Warn("Stealth rollback: failed to delete DNS records", zap.Error(err))
}
}
// refreshStateAndGateways rewrites cluster-state.json on all nodes with the
// new stealth domain and restarts the namespace gateways so turn.credentials
// reflects the change. Failures are logged per node (the reconciler converges
// stragglers later via the gatewayConfigInSync drift check).
func (cm *ClusterManager) refreshStateAndGateways(
ctx context.Context,
cluster *NamespaceCluster,
clusterNodes []clusterNodeInfo,
stealthDomain, turnSecret string,
) {
turnDomain := fmt.Sprintf("turn.ns-%s.%s", cluster.NamespaceName, cm.baseDomain)
sfuBlockList, err := cm.getWebRTCBlocksByType(ctx, cluster.ID, "sfu")
if err != nil {
cm.logger.Warn("Failed to get SFU allocations for state refresh", zap.Error(err))
}
turnBlockList, err := cm.getWebRTCBlocksByType(ctx, cluster.ID, "turn")
if err != nil {
cm.logger.Warn("Failed to get TURN allocations for state refresh", zap.Error(err))
}
sfuBlocks := make(map[string]*WebRTCPortBlock)
for i := range sfuBlockList {
sfuBlocks[sfuBlockList[i].NodeID] = &sfuBlockList[i]
}
turnBlocks := make(map[string]*WebRTCPortBlock)
for i := range turnBlockList {
turnBlocks[turnBlockList[i].NodeID] = &turnBlockList[i]
}
cm.updateClusterStateWithWebRTC(ctx, cluster, clusterNodes, sfuBlocks, turnBlocks, turnDomain, stealthDomain, turnSecret)
portBlocks, err := cm.portAllocator.GetAllPortBlocks(ctx, cluster.ID)
if err != nil {
cm.logger.Warn("Failed to get port blocks for gateway restart after stealth toggle", zap.Error(err))
return
}
nodePortBlocks := make(map[string]*PortBlock)
for i := range portBlocks {
nodePortBlocks[portBlocks[i].NodeID] = &portBlocks[i]
}
cm.restartGatewaysWithWebRTC(ctx, cluster, clusterNodes, nodePortBlocks, sfuBlocks, turnDomain, stealthDomain, turnSecret)
}

View File

@ -204,10 +204,10 @@ func (cm *ClusterManager) EnableWebRTC(ctx context.Context, namespaceName, enabl
}
// 14. Update cluster-state.json on all nodes with WebRTC info
cm.updateClusterStateWithWebRTC(ctx, cluster, clusterNodes, sfuBlocks, turnBlocks, turnDomain, turnSecret)
cm.updateClusterStateWithWebRTC(ctx, cluster, clusterNodes, sfuBlocks, turnBlocks, turnDomain, "", turnSecret)
// 15. Restart namespace gateways with WebRTC config so they register WebRTC routes
cm.restartGatewaysWithWebRTC(ctx, cluster, clusterNodes, nodePortBlocks, sfuBlocks, turnDomain, turnSecret)
cm.restartGatewaysWithWebRTC(ctx, cluster, clusterNodes, nodePortBlocks, sfuBlocks, turnDomain, "", turnSecret)
cm.logEvent(ctx, cluster.ID, EventWebRTCEnabled, "",
fmt.Sprintf("WebRTC enabled: SFU on %d nodes, TURN on %d nodes", len(clusterNodes), len(turnNodes)), nil)
@ -273,17 +273,23 @@ func (cm *ClusterManager) DisableWebRTC(ctx context.Context, namespaceName strin
cm.logger.Warn("Failed to deallocate WebRTC ports", zap.Error(err))
}
// 7. Delete TURN DNS records
// 7. Delete TURN DNS records (both the regular and the feat-124 stealth
// records — a full WebRTC teardown must not orphan stealth A records when
// the namespace had stealth enabled). Delete-by-tag is a no-op when the
// stealth records are absent, so this is safe unconditionally.
if err := cm.dnsManager.DeleteTURNRecords(ctx, namespaceName); err != nil {
cm.logger.Warn("Failed to delete TURN DNS records", zap.Error(err))
}
if err := cm.dnsManager.DeleteStealthTURNRecords(ctx, namespaceName); err != nil {
cm.logger.Warn("Failed to delete stealth TURN DNS records", zap.Error(err))
}
// 8. Clean up DB tables
cm.db.Exec(internalCtx, `DELETE FROM webrtc_rooms WHERE namespace_cluster_id = ?`, cluster.ID)
cm.db.Exec(internalCtx, `DELETE FROM namespace_webrtc_config WHERE namespace_cluster_id = ?`, cluster.ID)
// 9. Update cluster-state.json to remove WebRTC info
cm.updateClusterStateWithWebRTC(ctx, cluster, clusterNodes, nil, nil, "", "")
cm.updateClusterStateWithWebRTC(ctx, cluster, clusterNodes, nil, nil, "", "", "")
// 10. Restart namespace gateways without WebRTC config so they unregister WebRTC routes
portBlocks, err := cm.portAllocator.GetAllPortBlocks(ctx, cluster.ID)
@ -292,7 +298,7 @@ func (cm *ClusterManager) DisableWebRTC(ctx context.Context, namespaceName strin
for i := range portBlocks {
nodePortBlocks[portBlocks[i].NodeID] = &portBlocks[i]
}
cm.restartGatewaysWithWebRTC(ctx, cluster, clusterNodes, nodePortBlocks, nil, "", "")
cm.restartGatewaysWithWebRTC(ctx, cluster, clusterNodes, nodePortBlocks, nil, "", "", "")
} else {
cm.logger.Warn("Failed to get port blocks for gateway restart after WebRTC disable", zap.Error(err))
}
@ -470,16 +476,16 @@ func (cm *ClusterManager) spawnSFURemote(ctx context.Context, nodeIP string, cfg
}
_, err := cm.sendSpawnRequest(ctx, nodeIP, map[string]interface{}{
"action": "spawn-sfu",
"namespace": cfg.Namespace,
"node_id": cfg.NodeID,
"sfu_listen_addr": cfg.ListenAddr,
"sfu_media_start": cfg.MediaPortStart,
"sfu_media_end": cfg.MediaPortEnd,
"turn_servers": turnServers,
"turn_secret": cfg.TURNSecret,
"turn_cred_ttl": cfg.TURNCredTTL,
"rqlite_dsn": cfg.RQLiteDSN,
"action": "spawn-sfu",
"namespace": cfg.Namespace,
"node_id": cfg.NodeID,
"sfu_listen_addr": cfg.ListenAddr,
"sfu_media_start": cfg.MediaPortStart,
"sfu_media_end": cfg.MediaPortEnd,
"turn_servers": turnServers,
"turn_secret": cfg.TURNSecret,
"turn_cred_ttl": cfg.TURNCredTTL,
"rqlite_dsn": cfg.RQLiteDSN,
})
return err
}
@ -487,17 +493,18 @@ func (cm *ClusterManager) spawnSFURemote(ctx context.Context, nodeIP string, cfg
// spawnTURNRemote sends a spawn-turn request to a remote node
func (cm *ClusterManager) spawnTURNRemote(ctx context.Context, nodeIP string, cfg TURNInstanceConfig) error {
_, err := cm.sendSpawnRequest(ctx, nodeIP, map[string]interface{}{
"action": "spawn-turn",
"namespace": cfg.Namespace,
"node_id": cfg.NodeID,
"turn_listen_addr": cfg.ListenAddr,
"turn_turns_addr": cfg.TURNSListenAddr,
"turn_public_ip": cfg.PublicIP,
"turn_realm": cfg.Realm,
"turn_auth_secret": cfg.AuthSecret,
"turn_relay_start": cfg.RelayPortStart,
"turn_relay_end": cfg.RelayPortEnd,
"turn_domain": cfg.TURNDomain,
"action": "spawn-turn",
"namespace": cfg.Namespace,
"node_id": cfg.NodeID,
"turn_listen_addr": cfg.ListenAddr,
"turn_turns_addr": cfg.TURNSListenAddr,
"turn_public_ip": cfg.PublicIP,
"turn_realm": cfg.Realm,
"turn_auth_secret": cfg.AuthSecret,
"turn_relay_start": cfg.RelayPortStart,
"turn_relay_end": cfg.RelayPortEnd,
"turn_domain": cfg.TURNDomain,
"turn_stealth_domain": cfg.StealthDomain,
})
return err
}
@ -558,7 +565,7 @@ func (cm *ClusterManager) updateClusterStateWithWebRTC(
nodes []clusterNodeInfo,
sfuBlocks map[string]*WebRTCPortBlock,
turnBlocks map[string]*WebRTCPortBlock,
turnDomain, turnSecret string,
turnDomain, turnStealthDomain, turnSecret string,
) {
// Get existing port blocks for base state
portBlocks, err := cm.portAllocator.GetAllPortBlocks(ctx, cluster.ID)
@ -635,6 +642,7 @@ func (cm *ClusterManager) updateClusterStateWithWebRTC(
}
// Persist TURN domain and secret so gateways can be restored on cold start
state.TURNDomain = turnDomain
state.TURNStealthDomain = turnStealthDomain
state.TURNSharedSecret = turnSecret
if node.NodeID == cm.localNodeID {
@ -671,7 +679,7 @@ func (cm *ClusterManager) restartGatewaysWithWebRTC(
nodes []clusterNodeInfo,
portBlocks map[string]*PortBlock,
sfuBlocks map[string]*WebRTCPortBlock,
turnDomain, turnSecret string,
turnDomain, turnStealthDomain, turnSecret string,
) {
// Build Olric server addresses from port blocks + node IPs
var olricServers []string
@ -715,7 +723,11 @@ func (cm *ClusterManager) restartGatewaysWithWebRTC(
WebRTCEnabled: webrtcEnabled,
SFUPort: sfuPort,
TURNDomain: turnDomain,
TURNStealthDomain: turnStealthDomain,
TURNSecret: turnSecret,
// Bugboard #837 follow-up: preserve the secrets key on WebRTC
// restarts so enabling WebRTC doesn't drop secrets management.
SecretsEncryptionKey: cm.secretsEncryptionKey,
}
if node.NodeID == cm.localNodeID {
@ -747,23 +759,26 @@ func (cm *ClusterManager) restartGatewayRemote(ctx context.Context, nodeIP strin
}
_, err := cm.sendSpawnRequest(ctx, nodeIP, map[string]interface{}{
"action": "restart-gateway",
"namespace": cfg.Namespace,
"node_id": cfg.NodeID,
"gateway_http_port": cfg.HTTPPort,
"gateway_base_domain": cfg.BaseDomain,
"gateway_rqlite_dsn": cfg.RQLiteDSN,
"gateway_global_rqlite_dsn": cfg.GlobalRQLiteDSN,
"gateway_olric_servers": cfg.OlricServers,
"gateway_olric_timeout": olricTimeout,
"ipfs_cluster_api_url": cfg.IPFSClusterAPIURL,
"ipfs_api_url": cfg.IPFSAPIURL,
"ipfs_timeout": ipfsTimeout,
"ipfs_replication_factor": cfg.IPFSReplicationFactor,
"gateway_webrtc_enabled": cfg.WebRTCEnabled,
"gateway_sfu_port": cfg.SFUPort,
"gateway_turn_domain": cfg.TURNDomain,
"gateway_turn_secret": cfg.TURNSecret,
"action": "restart-gateway",
"namespace": cfg.Namespace,
"node_id": cfg.NodeID,
"gateway_http_port": cfg.HTTPPort,
"gateway_base_domain": cfg.BaseDomain,
"gateway_rqlite_dsn": cfg.RQLiteDSN,
"gateway_global_rqlite_dsn": cfg.GlobalRQLiteDSN,
"gateway_olric_servers": cfg.OlricServers,
"gateway_olric_timeout": olricTimeout,
"ipfs_cluster_api_url": cfg.IPFSClusterAPIURL,
"ipfs_api_url": cfg.IPFSAPIURL,
"ipfs_timeout": ipfsTimeout,
"ipfs_replication_factor": cfg.IPFSReplicationFactor,
"gateway_webrtc_enabled": cfg.WebRTCEnabled,
"gateway_sfu_port": cfg.SFUPort,
"gateway_turn_domain": cfg.TURNDomain,
"gateway_turn_stealth_domain": cfg.TURNStealthDomain,
"gateway_turn_secret": cfg.TURNSecret,
// Bugboard #837 follow-up: preserve the secrets key on WebRTC restarts.
"gateway_secrets_encryption_key": cfg.SecretsEncryptionKey,
})
if err != nil {
cm.logger.Error("Failed to restart remote gateway with WebRTC config",

View File

@ -527,6 +527,7 @@ func (cm *ClusterManager) ReplaceClusterNode(ctx context.Context, cluster *Names
IPFSAPIURL: cm.ipfsAPIURL,
IPFSTimeout: cm.ipfsTimeout,
IPFSReplicationFactor: cm.ipfsReplicationFactor,
SecretsEncryptionKey: cm.secretsEncryptionKey,
}
// Add WebRTC config if enabled for this namespace
@ -536,6 +537,7 @@ func (cm *ClusterManager) ReplaceClusterNode(ctx context.Context, cluster *Names
gwCfg.SFUPort = sfuBlock.SFUSignalingPort
gwCfg.TURNDomain = fmt.Sprintf("turn.ns-%s.%s", cluster.NamespaceName, cm.baseDomain)
gwCfg.TURNSecret = webrtcCfg.TURNSharedSecret
gwCfg.TURNStealthDomain = cm.stealthDomainFor(cluster.NamespaceName, webrtcCfg)
}
}
@ -1069,6 +1071,7 @@ func (cm *ClusterManager) addNodeToCluster(
IPFSAPIURL: cm.ipfsAPIURL,
IPFSTimeout: cm.ipfsTimeout,
IPFSReplicationFactor: cm.ipfsReplicationFactor,
SecretsEncryptionKey: cm.secretsEncryptionKey,
}
// Add WebRTC config if enabled for this namespace
@ -1078,6 +1081,7 @@ func (cm *ClusterManager) addNodeToCluster(
gwCfg.SFUPort = sfuBlock.SFUSignalingPort
gwCfg.TURNDomain = fmt.Sprintf("turn.ns-%s.%s", cluster.NamespaceName, cm.baseDomain)
gwCfg.TURNSecret = webrtcCfg.TURNSharedSecret
gwCfg.TURNStealthDomain = cm.stealthDomainFor(cluster.NamespaceName, webrtcCfg)
}
}

View File

@ -79,6 +79,13 @@ func (m *recoveryMockDB) BatchWithSeq(_ context.Context, _ string, ops []rqlite.
res, _ := m.Batch(context.Background(), ops)
return res, 1, nil
}
func (m *recoveryMockDB) BatchQuery(_ context.Context, ops []rqlite.BatchOp) ([]rqlite.OpResult, error) {
out := make([]rqlite.OpResult, len(ops))
for i := range ops {
out[i] = rqlite.OpResult{Kind: rqlite.BatchOpQuery}
}
return out, nil
}
var _ rqlite.Client = (*recoveryMockDB)(nil)

View File

@ -353,6 +353,78 @@ func (drm *DNSRecordManager) DeleteTURNRecords(ctx context.Context, namespaceNam
return nil
}
// stealthDNSNamespace is the dns_records ownership tag for a namespace's
// stealth TURNS records, distinct from "namespace-turn:" so deleting one set
// never touches the other.
func stealthDNSNamespace(namespaceName string) string {
return "namespace-turn-stealth:" + namespaceName
}
// CreateStealthTURNRecords creates DNS A records for the stealth TURNS host
// (feat-124): <stealthHost> -> TURN node IPs. The hostname is the neutral
// cdn-<hash>.<base-domain> label from turn.StealthHostForNamespace — it lives
// directly under the base domain (NOT under ns-<namespace>) so the SNI string
// never identifies the app.
func (drm *DNSRecordManager) CreateStealthTURNRecords(ctx context.Context, namespaceName, stealthHost string, turnIPs []string) error {
internalCtx := client.WithInternalAuth(ctx)
if stealthHost == "" {
return &ClusterError{Message: "no stealth host provided for DNS records"}
}
if len(turnIPs) == 0 {
return &ClusterError{Message: "no TURN IPs provided for stealth DNS records"}
}
fqdn := stealthHost + "."
drm.logger.Info("Creating stealth TURNS DNS records",
zap.String("namespace", namespaceName),
zap.String("fqdn", fqdn),
zap.Strings("turn_ips", turnIPs),
)
deleteQuery := `DELETE FROM dns_records WHERE namespace = ?`
_, _ = drm.db.Exec(internalCtx, deleteQuery, stealthDNSNamespace(namespaceName))
now := time.Now()
for _, ip := range turnIPs {
insertQuery := `
INSERT INTO dns_records (
fqdn, record_type, value, ttl, namespace, created_by, created_at, updated_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
`
_, err := drm.db.Exec(internalCtx, insertQuery,
fqdn, "A", ip, 60,
stealthDNSNamespace(namespaceName),
"cluster-manager",
now, now,
)
if err != nil {
return &ClusterError{
Message: fmt.Sprintf("failed to create stealth TURNS DNS record %s -> %s", fqdn, ip),
Cause: err,
}
}
}
return nil
}
// DeleteStealthTURNRecords deletes a namespace's stealth TURNS DNS records.
func (drm *DNSRecordManager) DeleteStealthTURNRecords(ctx context.Context, namespaceName string) error {
internalCtx := client.WithInternalAuth(ctx)
deleteQuery := `DELETE FROM dns_records WHERE namespace = ?`
_, err := drm.db.Exec(internalCtx, deleteQuery, stealthDNSNamespace(namespaceName))
if err != nil {
return &ClusterError{
Message: "failed to delete stealth TURNS DNS records",
Cause: err,
}
}
return nil
}
// EnableNamespaceRecord marks a specific IP's record as active (for recovery)
func (drm *DNSRecordManager) EnableNamespaceRecord(ctx context.Context, namespaceName, ip string) error {
internalCtx := client.WithInternalAuth(ctx)

View File

@ -106,6 +106,14 @@ func (m *mockRQLiteClient) BatchWithSeq(ctx context.Context, namespace string, o
return res, 1, err
}
func (m *mockRQLiteClient) BatchQuery(ctx context.Context, ops []rqlite.BatchOp) ([]rqlite.OpResult, error) {
out := make([]rqlite.OpResult, len(ops))
for i := range ops {
out[i] = rqlite.OpResult{Kind: rqlite.BatchOpQuery}
}
return out, nil
}
// Ensure mockRQLiteClient implements rqlite.Client
var _ rqlite.Client = (*mockRQLiteClient)(nil)

View File

@ -0,0 +1,215 @@
package namespace
import (
"context"
"os"
"path/filepath"
"testing"
"github.com/DeBrosOfficial/network/pkg/gateway"
"go.uber.org/zap"
"gopkg.in/yaml.v3"
)
// Bugboard #25 (warm reconcile) — gatewayWebRTCInSync decides whether a
// running namespace gateway's on-disk WebRTC block already matches the
// desired config. ReconcileGateway restarts the gateway ONLY when this
// returns false, so the function is the guard against both (a) leaving a
// drifted gateway broken and (b) restart-looping a correct one on every
// boot.
func desiredEnabled() gateway.InstanceConfig {
return gateway.InstanceConfig{
WebRTCEnabled: true,
SFUPort: 30000,
TURNDomain: "turn.ns-anchat-test.orama-devnet.network",
TURNSecret: "the-secret",
}
}
func TestGatewayWebRTCInSync_driftedBlockMissing_returnsFalse(t *testing.T) {
// The exact bug-25 warm case: the running config has NO webrtc block
// (enabled=false, port 0, empty secret) but the DB-desired config has
// it enabled. MUST report out-of-sync so ReconcileGateway restarts.
onDisk := gateway.GatewayYAMLWebRTC{} // zero value = no block
if gatewayWebRTCInSync(onDisk, desiredEnabled()) {
t.Fatal("BUG #25 REGRESSION: empty on-disk block vs DB-enabled desired must be out-of-sync (needs restart)")
}
}
func TestGatewayWebRTCInSync_matchingBlock_returnsTrue(t *testing.T) {
// After a reconcile fixes the config, the on-disk block matches the
// desired. MUST report in-sync so the NEXT boot does NOT restart again
// (no restart loop — this is why we compare the actual on-disk config
// instead of the stale state file).
onDisk := gateway.GatewayYAMLWebRTC{
Enabled: true,
SFUPort: 30000,
TURNDomain: "turn.ns-anchat-test.orama-devnet.network",
TURNSecret: "the-secret",
}
if !gatewayWebRTCInSync(onDisk, desiredEnabled()) {
t.Error("matching on-disk block must be in-sync (no restart) — else restart loop on every boot")
}
}
func TestGatewayWebRTCInSync_eachFieldDriftDetected(t *testing.T) {
// Any single drifted field must trigger a restart. Pins that the
// comparison covers all five webrtc fields (a future refactor that
// drops one would silently let that field drift forever).
base := gateway.GatewayYAMLWebRTC{
Enabled: true, SFUPort: 30000,
TURNDomain: "turn.ns-anchat-test.orama-devnet.network", TURNSecret: "the-secret",
}
mutations := []struct {
name string
mut func(w *gateway.GatewayYAMLWebRTC)
}{
{"enabled flipped off", func(w *gateway.GatewayYAMLWebRTC) { w.Enabled = false }},
{"sfu port changed", func(w *gateway.GatewayYAMLWebRTC) { w.SFUPort = 30001 }},
{"turn domain changed", func(w *gateway.GatewayYAMLWebRTC) { w.TURNDomain = "turn.other" }},
{"turn secret rotated", func(w *gateway.GatewayYAMLWebRTC) { w.TURNSecret = "rotated" }},
{"stealth domain changed", func(w *gateway.GatewayYAMLWebRTC) { w.TURNStealthDomain = "cdn-deadbeef0000.orama-devnet.network" }},
}
for _, tc := range mutations {
t.Run(tc.name, func(t *testing.T) {
d := base
tc.mut(&d)
if gatewayWebRTCInSync(d, desiredEnabled()) {
t.Errorf("drift in %q not detected — gateway would keep serving stale config", tc.name)
}
})
}
}
func TestGatewayWebRTCInSync_bothDisabled_returnsTrue(t *testing.T) {
// A namespace genuinely without WebRTC: on-disk block empty, desired
// disabled. In-sync → no restart. (Avoids churning non-webrtc
// namespaces on every boot.)
if !gatewayWebRTCInSync(gateway.GatewayYAMLWebRTC{}, gateway.InstanceConfig{}) {
t.Error("disabled on-disk + disabled desired must be in-sync (no restart)")
}
}
// Bugboard #837 follow-up (drift on the secrets encryption key) —
// gatewayConfigInSync extends the bug-25 WebRTC drift check with the
// serverless secrets key. A namespace gateway spawned before the key was
// plumbed has an empty on-disk key; once the desired key is non-empty we
// want a rewrite+restart so secrets management turns on. But both-empty must
// stay a no-op so non-secrets hosts don't restart-loop.
func TestGatewayConfigInSync_secretsKeyMissingOnDisk_returnsFalse(t *testing.T) {
// On-disk YAML has no secrets key (pre-#837 gateway), desired has one.
// MUST drift so ReconcileGateway rewrites + restarts to enable secrets.
onDisk := gateway.GatewayYAMLConfig{} // empty secrets_encryption_key
desired := gateway.InstanceConfig{SecretsEncryptionKey: "the-key"}
if gatewayConfigInSync(onDisk, desired) {
t.Fatal("empty on-disk secrets key vs non-empty desired must be out-of-sync (needs restart to enable secrets)")
}
}
func TestGatewayConfigInSync_secretsKeyMatches_returnsTrue(t *testing.T) {
// After a reconcile, on-disk key matches desired. MUST be in-sync so the
// next boot does not restart again (no loop).
onDisk := gateway.GatewayYAMLConfig{SecretsEncryptionKey: "the-key"}
desired := gateway.InstanceConfig{SecretsEncryptionKey: "the-key"}
if !gatewayConfigInSync(onDisk, desired) {
t.Error("matching secrets key must be in-sync (no restart) — else restart loop on every boot")
}
}
func TestGatewayConfigInSync_bothSecretsKeysEmpty_returnsTrue(t *testing.T) {
// A host with no secrets key (empty desired) and an on-disk config also
// without one MUST be in-sync — otherwise every boot would restart a
// namespace gateway that legitimately has no secrets key.
if !gatewayConfigInSync(gateway.GatewayYAMLConfig{}, gateway.InstanceConfig{}) {
t.Error("empty on-disk + empty desired secrets key must be in-sync (no restart loop)")
}
}
func TestGatewayConfigInSync_secretsKeyRotated_returnsFalse(t *testing.T) {
// A rotated key (both non-empty but different) must drift so the rewrite
// propagates the new key.
onDisk := gateway.GatewayYAMLConfig{SecretsEncryptionKey: "old-key"}
desired := gateway.InstanceConfig{SecretsEncryptionKey: "new-key"}
if gatewayConfigInSync(onDisk, desired) {
t.Error("rotated secrets key (old != new) must be out-of-sync")
}
}
func TestGatewayConfigInSync_webrtcDriftStillDetected(t *testing.T) {
// The combined check must not lose the bug-25 WebRTC surface: WebRTC
// drift with matching (empty) secrets keys must still report out-of-sync.
onDisk := gateway.GatewayYAMLConfig{WebRTC: gateway.GatewayYAMLWebRTC{}}
desired := gateway.InstanceConfig{WebRTCEnabled: true, SFUPort: 30000}
if gatewayConfigInSync(onDisk, desired) {
t.Error("WebRTC drift must still be detected by the combined in-sync check")
}
}
// ReconcileGateway I/O paths that DON'T restart (the restart path needs
// real systemd, so it's covered by the pure helper above). These pin
// that a matching config is a clean no-op and that an unreadable config
// surfaces an error instead of blind-restarting.
func writeGatewayConfig(t *testing.T, base, ns, nodeID string, wr gateway.GatewayYAMLWebRTC) {
t.Helper()
dir := filepath.Join(base, ns, "configs")
if err := os.MkdirAll(dir, 0755); err != nil {
t.Fatal(err)
}
b, _ := yaml.Marshal(gateway.GatewayYAMLConfig{ClientNamespace: ns, WebRTC: wr})
if err := os.WriteFile(filepath.Join(dir, "gateway-"+nodeID+".yaml"), b, 0644); err != nil {
t.Fatal(err)
}
}
func TestReconcileGateway_inSyncIsNoOpNoError(t *testing.T) {
base := t.TempDir()
ns, node := "anchat-test", "node-1"
writeGatewayConfig(t, base, ns, node, gateway.GatewayYAMLWebRTC{
Enabled: true, SFUPort: 30000,
TURNDomain: "turn.ns-anchat-test.orama-devnet.network", TURNSecret: "the-secret",
})
s := NewSystemdSpawner(base, "", zap.NewNop())
// Desired == on-disk → must return nil WITHOUT attempting a restart
// (RestartGateway would error here since there's no real systemd, so
// a nil return proves we never reached it).
err := s.ReconcileGateway(context.Background(), ns, node, desiredEnabled())
if err != nil {
t.Errorf("in-sync config must be a clean no-op; got %v (did it try to restart?)", err)
}
}
func TestReconcileGateway_missingConfigReturnsErrorNotRestart(t *testing.T) {
// No config file on disk → return an error so the caller leaves the
// running gateway alone, rather than blind-restarting a healthy one.
s := NewSystemdSpawner(t.TempDir(), "", zap.NewNop())
err := s.ReconcileGateway(context.Background(), "anchat-test", "node-1", desiredEnabled())
if err == nil {
t.Error("missing config must return an error (don't blind-restart a healthy gateway)")
}
}
func TestGatewayWebRTCInSync_stealthEnableDetectedAsDrift(t *testing.T) {
// feat-124: enabling stealth must drift an otherwise-matching gateway so
// the reconciler rewrites its yaml with turn_stealth_domain and restarts
// it — that's how turn.credentials starts advertising turns:<host>:443.
onDisk := gateway.GatewayYAMLWebRTC{
Enabled: true, SFUPort: 30000,
TURNDomain: "turn.ns-anchat-test.orama-devnet.network", TURNSecret: "the-secret",
}
desired := desiredEnabled()
desired.TURNStealthDomain = "cdn-abc123def456.orama-devnet.network"
if gatewayWebRTCInSync(onDisk, desired) {
t.Error("stealth enable not detected as drift — gateway would never advertise the stealth URI")
}
// And once the yaml carries it, the same desired config is in-sync (no
// restart loop).
onDisk.TURNStealthDomain = desired.TURNStealthDomain
if !gatewayWebRTCInSync(onDisk, desired) {
t.Error("matching stealth domain reported as drift — restart loop")
}
}

View File

@ -0,0 +1,157 @@
package namespace
import "testing"
// Bugboard #25 — WebRTC config drift on restart + TURN/SFU decouple.
//
// chooseRestoreWebRTC resolves a restored gateway's WebRTC config from the
// local state file (which EnableWebRTC does NOT update) with a DB fallback
// (source of truth). It also DECOUPLES the two aspects: TURN (secret +
// domain) is namespace-wide so ANY gateway can serve credentials; the SFU
// port is per-node (0 on a gateway-only node). Pins both the drift
// fallback and the non-SFU-gateway case.
// dbFetch signature: () -> (turnSecret, turnDomain, stealthDomain string, sfuPort int).
func dbNone() (string, string, string, int) { return "", "", "", 0 }
func dbFull(secret, domain string, sfuPort int) func() (string, string, string, int) {
return func() (string, string, string, int) { return secret, domain, "", sfuPort }
}
func TestChooseRestoreWebRTC_stateFileCompleteWins(t *testing.T) {
// State file has TURN secret → use it, and NEVER consult the DB
// (the lazy dbFetch must not be called — saves a query on the hot
// restart path).
dbCalled := false
got := chooseRestoreWebRTC(true, 7800, "turn.ns-x.dbrs.space", "state-secret", "",
func() (string, string, string, int) { dbCalled = true; return dbNone() })
if dbCalled {
t.Error("DB fetch was called even though the state file had the TURN secret (should short-circuit)")
}
if !got.enabled || got.sfuPort != 7800 || got.turnSecret != "state-secret" {
t.Errorf("want state-file values; got %+v", got)
}
if got.turnDomain != "turn.ns-x.dbrs.space" {
t.Errorf("turnDomain = %q; want state-file value", got.turnDomain)
}
}
func TestChooseRestoreWebRTC_staleStateFallsBackToDB(t *testing.T) {
// The bug-25 drift case: state file has NO webrtc (stale — written
// before enable), DB says enabled WITH an SFU port on this node. MUST
// fall back to the DB and re-materialize the full block.
got := chooseRestoreWebRTC(false, 0, "", "", "",
dbFull("db-secret", "turn.ns-anchat-test.dbrs.space", 7801))
if !got.enabled {
t.Fatal("BUG #25 REGRESSION: stale state + DB-enabled WebRTC must fall back to DB; got disabled")
}
if got.sfuPort != 7801 {
t.Errorf("sfuPort = %d; want 7801 (from DB)", got.sfuPort)
}
if got.turnSecret != "db-secret" {
t.Errorf("turnSecret = %q; want db-secret (from DB)", got.turnSecret)
}
if got.turnDomain != "turn.ns-anchat-test.dbrs.space" {
t.Errorf("turnDomain = %q; want DB-derived value", got.turnDomain)
}
}
func TestChooseRestoreWebRTC_nonSFUGatewayGetsTURNOnly(t *testing.T) {
// THE DECOUPLE CASE (bug-25). A gateway node that is NOT an SFU node:
// the DB has the namespace TURN secret but GetSFUPorts returns nothing
// for this node (sfuPort=0). The gateway MUST still get the TURN
// secret (so /v1/webrtc/turn/credentials registers + works) while
// sfuPort stays 0 (signal/rooms don't register). This is exactly node
// 57's situation — pre-fix it resolved to disabled and 404'd.
got := chooseRestoreWebRTC(false, 0, "", "", "",
dbFull("db-secret", "turn.ns-anchat-test.dbrs.space", 0)) // sfuPort 0 = no local SFU
if !got.enabled {
t.Fatal("BUG #25 REGRESSION: non-SFU gateway with namespace TURN secret must be enabled (serves credentials)")
}
if got.sfuPort != 0 {
t.Errorf("sfuPort = %d; want 0 (this node runs no local SFU)", got.sfuPort)
}
if got.turnSecret != "db-secret" {
t.Errorf("turnSecret = %q; want db-secret (TURN is namespace-wide, served by any gateway)", got.turnSecret)
}
}
func TestChooseRestoreWebRTC_stateHasTURNButNoSFU(t *testing.T) {
// State file for a non-SFU node: it has the TURN secret but HasSFU is
// false / port 0. Must use the state TURN secret with sfuPort=0 and
// NOT consult the DB (TURN secret present = complete enough).
dbCalled := false
got := chooseRestoreWebRTC(false, 0, "turn.ns-x.dbrs.space", "state-secret", "",
func() (string, string, string, int) { dbCalled = true; return dbNone() })
if dbCalled {
t.Error("DB fetch called even though state file had the TURN secret")
}
if !got.enabled || got.sfuPort != 0 || got.turnSecret != "state-secret" {
t.Errorf("want TURN-only from state (sfuPort 0); got %+v", got)
}
}
func TestChooseRestoreWebRTC_bothEmptyDisabled(t *testing.T) {
// Namespace genuinely without WebRTC: state empty, DB returns nothing.
// Must return disabled so we don't register broken webrtc routes.
got := chooseRestoreWebRTC(false, 0, "", "", "", dbNone)
if got.enabled {
t.Errorf("want disabled when neither source has WebRTC; got %+v", got)
}
}
func TestChooseRestoreWebRTC_dbNoSecretStaysDisabled(t *testing.T) {
// Defensive: DB returns an SFU port but NO turn secret (half-
// provisioned / shouldn't happen). The TURN secret is the
// enablement marker; without it we treat it as not-configured-for-
// TURN, but an SFU port alone still enables SFU routes.
got := chooseRestoreWebRTC(false, 0, "", "", "",
func() (string, string, string, int) { return "", "turn.db", "", 9000 })
// dbFetch only runs when state secret is empty; here it returns no
// secret, so the `if dbSecret != ""` guard means NOTHING is taken
// from the DB → disabled. (An SFU-only-no-TURN namespace is not a
// real configuration; TURN secret always accompanies enable.)
if got.enabled {
t.Errorf("DB returned no TURN secret: want disabled; got %+v", got)
}
}
// --- feat-124 stealth domain restore precedence ---
func TestChooseRestoreWebRTC_stealthFromStateFile(t *testing.T) {
// Stealth toggles rewrite cluster state, so a fresh state file carries
// the stealth domain and must win without a DB call.
got := chooseRestoreWebRTC(true, 7800, "turn.ns-x.dbrs.space", "state-secret", "cdn-abc123def456.dbrs.space",
func() (string, string, string, int) {
t.Error("DB fetch called even though state file was complete")
return dbNone()
})
if got.stealthDomain != "cdn-abc123def456.dbrs.space" {
t.Errorf("stealthDomain = %q; want state-file value", got.stealthDomain)
}
}
func TestChooseRestoreWebRTC_stealthFromDBOnStaleState(t *testing.T) {
// Stale state (no TURN secret) + DB has stealth enabled → stealth domain
// re-materializes from the DB alongside the rest of the WebRTC block.
got := chooseRestoreWebRTC(false, 0, "", "", "",
func() (string, string, string, int) {
return "db-secret", "turn.ns-x.dbrs.space", "cdn-abc123def456.dbrs.space", 7801
})
if !got.enabled || got.stealthDomain != "cdn-abc123def456.dbrs.space" {
t.Errorf("want stealth domain from DB on stale state; got %+v", got)
}
}
func TestChooseRestoreWebRTC_noStealthStaysEmpty(t *testing.T) {
// Stealth disabled everywhere → empty stealthDomain (gateway advertises
// the baseline 3-rung ladder only).
got := chooseRestoreWebRTC(true, 7800, "turn.ns-x.dbrs.space", "state-secret", "", dbNone)
if got.stealthDomain != "" {
t.Errorf("stealthDomain = %q; want empty when stealth is disabled", got.stealthDomain)
}
}

View File

@ -5,6 +5,7 @@ import (
"fmt"
"os"
"path/filepath"
"strings"
"time"
production "github.com/DeBrosOfficial/network/pkg/environments/production"
@ -228,11 +229,17 @@ func (s *SystemdSpawner) SpawnGateway(ctx context.Context, namespace, nodeID str
// random Ed25519 keys and host functions saw empty
// caller_jwt_subject.
ClusterSecretPath: s.clusterSecretPath,
// Bugboard #837 follow-up: forward the host's serverless secrets
// encryption key so the spawned namespace gateway can manage function
// secrets. Without this, `function secrets list` returned 501 on
// namespace gateways even though the host gateway had the key.
SecretsEncryptionKey: cfg.SecretsEncryptionKey,
WebRTC: gateway.GatewayYAMLWebRTC{
Enabled: cfg.WebRTCEnabled,
SFUPort: cfg.SFUPort,
TURNDomain: cfg.TURNDomain,
TURNSecret: cfg.TURNSecret,
Enabled: cfg.WebRTCEnabled,
SFUPort: cfg.SFUPort,
TURNDomain: cfg.TURNDomain,
TURNSecret: cfg.TURNSecret,
TURNStealthDomain: cfg.TURNStealthDomain,
},
}
@ -241,9 +248,17 @@ func (s *SystemdSpawner) SpawnGateway(ctx context.Context, namespace, nodeID str
return fmt.Errorf("failed to marshal Gateway config: %w", err)
}
if err := os.WriteFile(configPath, configBytes, 0644); err != nil {
// 0600: the gateway YAML embeds the secrets encryption key (bugboard
// #837), so it must not be world/group readable.
if err := os.WriteFile(configPath, configBytes, 0600); err != nil {
return fmt.Errorf("failed to write Gateway config: %w", err)
}
// WriteFile's mode only applies on CREATE — converge perms explicitly so
// a file written 0644 by an older release doesn't stay world-readable
// after an in-place rewrite.
if err := os.Chmod(configPath, 0600); err != nil {
return fmt.Errorf("failed to set Gateway config permissions: %w", err)
}
s.logger.Info("Created Gateway config file",
zap.String("path", configPath),
@ -321,17 +336,99 @@ func (s *SystemdSpawner) RestartGateway(ctx context.Context, namespace, nodeID s
return s.SpawnGateway(ctx, namespace, nodeID, cfg)
}
// gatewayWebRTCInSync reports whether the WebRTC block already on disk
// matches the desired gateway config — i.e. no restart is needed.
// Compares only the WebRTC-relevant fields (bugboard #25 drift surface).
// Pure function so the reconcile decision is unit-testable without files
// or systemd.
func gatewayWebRTCInSync(onDisk gateway.GatewayYAMLWebRTC, cfg gateway.InstanceConfig) bool {
return onDisk.Enabled == cfg.WebRTCEnabled &&
onDisk.SFUPort == cfg.SFUPort &&
onDisk.TURNSecret == cfg.TURNSecret &&
onDisk.TURNDomain == cfg.TURNDomain &&
onDisk.TURNStealthDomain == cfg.TURNStealthDomain
}
// gatewayConfigInSync reports whether the full reconcile-relevant config on
// disk matches the desired config — i.e. no rewrite+restart is needed.
// Combines the WebRTC drift surface (bugboard #25) with the secrets
// encryption key (bugboard #837): a gateway that was spawned before the key
// was plumbed has an empty on-disk key and `function secrets list` returns
// 501; once the desired key is non-empty we want a rewrite+restart so the
// running gateway picks it up.
//
// Plain string equality keeps the "both empty → in sync" case a no-op: a
// namespace on a host with no secrets key (empty desired) whose on-disk key
// is also empty is in-sync, so it never restart-loops. Only a genuine
// difference (empty on-disk vs non-empty desired, or a rotated key) drifts.
func gatewayConfigInSync(onDisk gateway.GatewayYAMLConfig, cfg gateway.InstanceConfig) bool {
return gatewayWebRTCInSync(onDisk.WebRTC, cfg) &&
onDisk.SecretsEncryptionKey == cfg.SecretsEncryptionKey
}
// ReconcileGateway is the WARM counterpart to SpawnGateway: when a
// namespace gateway is already running, this compares its on-disk config
// against the desired `cfg` and restarts it ONLY if the WebRTC block has
// drifted (enabled / sfu_port / turn_secret / turn_domain differ).
//
// Bugboard #25: the from-disk restore skips healthy gateways, so a
// gateway that lost its webrtc block on a prior restart (while staying
// healthy) never gets its config regenerated — leaving SFU/TURN services
// running but the gateway with no turn_secret/sfu_port (credentials
// configured:false, /v1/webrtc/turn/credentials 404). The cold-spawn
// self-heal only fires when the gateway happens to be down during
// restore. This closes that gap for the healthy case.
//
// Idempotent: returns nil WITHOUT restarting when the on-disk WebRTC
// block already matches the desired config — so it does not cause a
// restart loop on every node boot. WebRTC is the only known config-drift
// surface (bugboard #25); other fields are intentionally not compared to
// avoid spurious restarts from harmless differences (e.g. olric server
// ordering).
func (s *SystemdSpawner) ReconcileGateway(ctx context.Context, namespace, nodeID string, cfg gateway.InstanceConfig) error {
configPath := filepath.Join(s.namespaceBase, namespace, "configs", fmt.Sprintf("gateway-%s.yaml", nodeID))
existing, err := os.ReadFile(configPath)
if err != nil {
// No readable config to compare against — don't blindly restart a
// healthy gateway; absence of the config file is a different
// problem the caller's cold-spawn path handles.
return fmt.Errorf("read gateway config for reconcile: %w", err)
}
var onDisk gateway.GatewayYAMLConfig
if err := yaml.Unmarshal(existing, &onDisk); err != nil {
return fmt.Errorf("parse gateway config for reconcile: %w", err)
}
if gatewayConfigInSync(onDisk, cfg) {
// Already in sync — nothing to do, no restart.
return nil
}
// secretsKeyDrifted is logged (as a bool, never the key material) so
// operators can see when a #837 rewrite fires vs a #25 WebRTC rewrite.
secretsKeyDrifted := onDisk.SecretsEncryptionKey != cfg.SecretsEncryptionKey
s.logger.Info("Gateway config drifted from desired; reconciling (rewrite + restart)",
zap.String("namespace", namespace),
zap.String("node_id", nodeID),
zap.Bool("ondisk_enabled", onDisk.WebRTC.Enabled),
zap.Int("ondisk_sfu_port", onDisk.WebRTC.SFUPort),
zap.Bool("desired_enabled", cfg.WebRTCEnabled),
zap.Int("desired_sfu_port", cfg.SFUPort),
zap.Bool("secrets_key_drifted", secretsKeyDrifted))
return s.RestartGateway(ctx, namespace, nodeID, cfg)
}
// SFUInstanceConfig holds configuration for spawning an SFU instance
type SFUInstanceConfig struct {
Namespace string
NodeID string
ListenAddr string // WireGuard IP:port (e.g., "10.0.0.1:30000")
MediaPortStart int // Start of RTP media port range
MediaPortEnd int // End of RTP media port range
ListenAddr string // WireGuard IP:port (e.g., "10.0.0.1:30000")
MediaPortStart int // Start of RTP media port range
MediaPortEnd int // End of RTP media port range
TURNServers []sfu.TURNServerConfig // TURN servers to advertise to peers
TURNSecret string // HMAC-SHA1 shared secret
TURNCredTTL int // Credential TTL in seconds
RQLiteDSN string // Namespace-local RQLite DSN
TURNSecret string // HMAC-SHA1 shared secret
TURNCredTTL int // Credential TTL in seconds
RQLiteDSN string // Namespace-local RQLite DSN
}
// SpawnSFU starts an SFU instance using systemd
@ -422,6 +519,115 @@ type TURNInstanceConfig struct {
RelayPortStart int // Start of relay port range
RelayPortEnd int // End of relay port range
TURNDomain string // TURN domain for Let's Encrypt cert (e.g., "turn.ns-myapp.orama-devnet.network")
// StealthDomain is the neutral stealth TURNS host (feat-124). When set,
// the TURN server carries a second Let's Encrypt cert for this name and
// serves it to TLS clients whose SNI matches — the path the SNI router
// forwards from :443. Stealth NEVER falls back to a self-signed cert: a
// cert clients reject is indistinguishable from being blocked.
StealthDomain string
}
// acmeInternalEndpoint is the gateway's internal ACME endpoint that the
// Caddyfile TURN-cert blocks point the orama DNS provider at.
const acmeInternalEndpoint = "http://localhost:6001/v1/internal/acme"
// turnCertProvisionTimeout bounds how long a TURN spawn waits for Caddy to
// provision a Let's Encrypt cert before falling back (primary domain) or
// failing (stealth domain).
const turnCertProvisionTimeout = 2 * time.Minute
// resolveTURNSCert resolves the TURNS cert/key pair for a domain.
//
// Let's Encrypt via Caddy is tried FIRST whenever a domain is set — the call
// is idempotent and instant when the cert is already in Caddy's storage. This
// ordering also self-heals nodes stuck on the self-signed fallback from an
// earlier failed provisioning (live devnet finding, feat-124): the old code
// never retried Caddy once a self-signed pair existed on disk, so strict TLS
// clients kept failing turns: validation forever.
//
// allowSelfSigned controls the fallback: the primary TURN domain may fall
// back to (or reuse) a self-signed pair at <configDir>/turn-{cert,key}.pem so
// baseline TURN stays up, while the stealth domain must hard-fail instead.
func (s *SystemdSpawner) resolveTURNSCert(namespace, domain, publicIP, configDir string, allowSelfSigned bool) (string, string, error) {
if domain != "" {
caddyCert, caddyKey, err := provisionTURNCertViaCaddy(domain, acmeInternalEndpoint, turnCertProvisionTimeout)
if err == nil {
s.logger.Info("Using Let's Encrypt cert from Caddy for TURNS",
zap.String("namespace", namespace),
zap.String("domain", domain),
zap.String("cert_path", caddyCert))
return caddyCert, caddyKey, nil
}
if !allowSelfSigned {
return "", "", fmt.Errorf("failed to provision Let's Encrypt cert for stealth TURNS domain %s (no self-signed fallback — clients must be able to validate it): %w", domain, err)
}
s.logger.Warn("Let's Encrypt cert provisioning failed, falling back to self-signed",
zap.String("namespace", namespace),
zap.String("domain", domain),
zap.Error(err))
}
if !allowSelfSigned {
return "", "", fmt.Errorf("no domain configured for TURNS cert in namespace %s", namespace)
}
certPath := filepath.Join(configDir, "turn-cert.pem")
keyPath := filepath.Join(configDir, "turn-key.pem")
if _, err := os.Stat(certPath); os.IsNotExist(err) {
if err := turn.GenerateSelfSignedCert(certPath, keyPath, publicIP); err != nil {
return "", "", fmt.Errorf("failed to generate TURNS self-signed cert for namespace %s: %w", namespace, err)
}
s.logger.Info("Generated TURNS self-signed certificate",
zap.String("namespace", namespace),
zap.String("cert_path", certPath))
}
return certPath, keyPath, nil
}
// resolveStealthCert resolves the TLS cert/key for the stealth TURNS host by
// reusing Caddy's existing `*.<baseDomain>` wildcard certificate (feat-124).
//
// The stealth host is a single-label subdomain of the base domain
// (cdn-<hash>.<baseDomain>), so the wildcard the gateway already provisions
// for HTTPS covers it. This deliberately avoids the runtime
// append-to-Caddyfile provisioning path: the orama-node service runs
// ProtectSystem=strict as the orama user and cannot write /etc/caddy, so that
// path fails with EROFS (and would silently fall back to a self-signed cert
// that clients reject — indistinguishable from being blocked). Caddy renews
// the wildcard; the TURN cert reloader hot-reloads it from storage.
//
// Hard error (never self-signed) when the wildcard is missing or the host is
// not a single-label subdomain — a stealth endpoint with an unvalidatable
// cert is worse than no stealth endpoint.
func (s *SystemdSpawner) resolveStealthCert(stealthDomain, baseDomain string) (string, string, error) {
if baseDomain == "" {
return "", "", fmt.Errorf("stealth cert: base domain required")
}
if !isSingleLabelSubdomain(stealthDomain, baseDomain) {
return "", "", fmt.Errorf("stealth cert: %q is not a single-label subdomain of %q (the *.%s wildcard cert would not cover it)", stealthDomain, baseDomain, baseDomain)
}
certPath, keyPath := caddyWildcardCertPaths(baseDomain)
if _, err := os.Stat(certPath); err != nil {
return "", "", fmt.Errorf("stealth cert: Caddy wildcard cert for *.%s not found at %s (is the gateway HTTPS wildcard provisioned on this node?): %w", baseDomain, certPath, err)
}
if _, err := os.Stat(keyPath); err != nil {
return "", "", fmt.Errorf("stealth cert: Caddy wildcard key for *.%s not found at %s: %w", baseDomain, keyPath, err)
}
s.logger.Info("Using Caddy wildcard cert for stealth TURNS",
zap.String("stealth_domain", stealthDomain),
zap.String("cert_path", certPath))
return certPath, keyPath, nil
}
// isSingleLabelSubdomain reports whether host is exactly one DNS label below
// base (e.g. "cdn-x.example.com" under "example.com"), which is the set a
// `*.base` wildcard certificate covers.
func isSingleLabelSubdomain(host, base string) bool {
suffix := "." + base
if !strings.HasSuffix(host, suffix) {
return false
}
label := strings.TrimSuffix(host, suffix)
return label != "" && !strings.Contains(label, ".")
}
// SpawnTURN starts a TURN instance using systemd
@ -440,42 +646,46 @@ func (s *SystemdSpawner) SpawnTURN(ctx context.Context, namespace, nodeID string
configPath := filepath.Join(configDir, fmt.Sprintf("turn-%s.yaml", nodeID))
// Provision TLS cert for TURNS — try Let's Encrypt via Caddy first, fall back to self-signed
certPath := filepath.Join(configDir, "turn-cert.pem")
keyPath := filepath.Join(configDir, "turn-key.pem")
// Provision TLS cert for TURNS — Let's Encrypt via Caddy first (idempotent,
// also upgrades nodes stuck on the self-signed fallback), self-signed as
// the primary-domain fallback only.
var certPath, keyPath string
if cfg.TURNSListenAddr != "" {
if _, err := os.Stat(certPath); os.IsNotExist(err) {
// Try Let's Encrypt via Caddy first
if cfg.TURNDomain != "" {
acmeEndpoint := "http://localhost:6001/v1/internal/acme"
caddyCert, caddyKey, provErr := provisionTURNCertViaCaddy(cfg.TURNDomain, acmeEndpoint, 2*time.Minute)
if provErr == nil {
certPath = caddyCert
keyPath = caddyKey
s.logger.Info("Using Let's Encrypt cert from Caddy for TURNS",
zap.String("namespace", namespace),
zap.String("domain", cfg.TURNDomain),
zap.String("cert_path", certPath))
} else {
s.logger.Warn("Let's Encrypt cert provisioning failed, falling back to self-signed",
zap.String("namespace", namespace),
zap.String("domain", cfg.TURNDomain),
zap.Error(provErr))
}
}
// Fallback: generate self-signed cert if no cert is available yet
if _, statErr := os.Stat(certPath); os.IsNotExist(statErr) {
if err := turn.GenerateSelfSignedCert(certPath, keyPath, cfg.PublicIP); err != nil {
s.logger.Warn("Failed to generate TURNS self-signed cert, TURNS will be disabled",
zap.String("namespace", namespace),
zap.Error(err))
cfg.TURNSListenAddr = "" // Disable TURNS if cert generation fails
} else {
s.logger.Info("Generated TURNS self-signed certificate",
zap.String("namespace", namespace),
zap.String("cert_path", certPath))
}
}
var certErr error
certPath, keyPath, certErr = s.resolveTURNSCert(namespace, cfg.TURNDomain, cfg.PublicIP, configDir, true)
if certErr != nil {
s.logger.Warn("Failed to resolve TURNS cert, TURNS will be disabled",
zap.String("namespace", namespace),
zap.Error(certErr))
cfg.TURNSListenAddr = "" // Disable TURNS if no cert is available
}
}
// Stealth TURNS cert (feat-124): requires a working TURNS listener and a
// CA-valid cert — hard error, never a silent downgrade, because the
// operator explicitly enabled stealth and a half-working stealth endpoint
// is invisible until a censored-region user fails to connect.
var stealthCertPath, stealthKeyPath string
if cfg.StealthDomain != "" {
// Security: the stealth domain arrives over the spawn protocol (mesh
// peers gated only by the static internal-auth header). Pin it to the
// deterministic derivation so a forged value can't select cert
// material for an attacker-chosen name. cfg.Realm is the base domain
// on every TURN spawn site.
if cfg.Realm == "" {
return fmt.Errorf("stealth TURNS for namespace %s requires a base domain (realm) to locate the wildcard cert", namespace)
}
want := turn.StealthHostForNamespace(cfg.Namespace, cfg.Realm)
if cfg.StealthDomain != want {
return fmt.Errorf("stealth domain %q does not match the derived host %q for namespace %s — refusing to provision", cfg.StealthDomain, want, cfg.Namespace)
}
if cfg.TURNSListenAddr == "" {
return fmt.Errorf("stealth TURNS for namespace %s requires an active TURNS listener (no TLS cert/listener available)", namespace)
}
var stealthErr error
stealthCertPath, stealthKeyPath, stealthErr = s.resolveStealthCert(cfg.StealthDomain, cfg.Realm)
if stealthErr != nil {
return fmt.Errorf("failed to resolve stealth TURNS cert for namespace %s: %w", namespace, stealthErr)
}
}
@ -494,6 +704,11 @@ func (s *SystemdSpawner) SpawnTURN(ctx context.Context, namespace, nodeID string
turnConfig.TLSCertPath = certPath
turnConfig.TLSKeyPath = keyPath
}
if stealthCertPath != "" {
turnConfig.StealthDomain = cfg.StealthDomain
turnConfig.TLSStealthCertPath = stealthCertPath
turnConfig.TLSStealthKeyPath = stealthKeyPath
}
configBytes, err := yaml.Marshal(turnConfig)
if err != nil {

View File

@ -5,26 +5,62 @@ import (
"os"
"os/exec"
"path/filepath"
"regexp"
"strings"
"time"
)
// dnsNamePattern matches a conservative lowercase DNS hostname. It exists to
// keep an operator/spawn-supplied domain from breaking out of the Caddyfile
// block it is interpolated into (a value containing '{', '}', or a newline
// could otherwise inject arbitrary Caddy directives) and to refuse cert
// provisioning for non-hostname junk. Security: defense-in-depth at the
// Caddyfile sink; the caller also pins the stealth domain to its deterministic
// derivation (systemd_spawner.go SpawnTURN).
var dnsNamePattern = regexp.MustCompile(`^[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)+$`)
const (
caddyfilePath = "/etc/caddy/Caddyfile"
// Caddy stores ACME certs under this directory relative to its data dir.
caddyACMECertDir = "certificates/acme-v02.api.letsencrypt.org-directory"
// caddyServiceStorageDir is where the Caddy systemd service (User=orama,
// HOME=/var/lib/caddy) actually persists its ACME certificates on a node.
// The orama-node service runs ProtectSystem=strict and cannot write
// /etc/caddy, so the runtime "append-to-Caddyfile" provisioning path
// (provisionTURNCertViaCaddy) fails with EROFS — TURNS cert material is
// instead reused from this directory (see caddyWildcardCertPaths).
caddyServiceStorageDir = "/var/lib/caddy/caddy"
turnCertBeginMarker = "# BEGIN TURN CERT: "
turnCertEndMarker = "# END TURN CERT: "
)
// caddyWildcardCertPaths returns the cert/key file paths for the
// `*.<baseDomain>` wildcard certificate in the Caddy service's storage. Caddy
// names the wildcard directory `wildcard_.<baseDomain>`. The gateway already
// provisions this wildcard for HTTPS, so a single-label subdomain of the base
// domain (e.g. the stealth TURNS host `cdn-<hash>.<baseDomain>`) is covered by
// it without any per-domain provisioning.
func caddyWildcardCertPaths(baseDomain string) (certPath, keyPath string) {
name := "wildcard_." + baseDomain
dir := filepath.Join(caddyServiceStorageDir, caddyACMECertDir, name)
return filepath.Join(dir, name+".crt"), filepath.Join(dir, name+".key")
}
// provisionTURNCertViaCaddy appends the TURN domain to the local Caddyfile,
// reloads Caddy to trigger DNS-01 ACME certificate provisioning, and waits
// for the cert files to appear. Returns the cert/key paths on success.
// If Caddy is not available or cert provisioning times out, returns an error
// so the caller can fall back to a self-signed cert.
func provisionTURNCertViaCaddy(domain, acmeEndpoint string, timeout time.Duration) (certPath, keyPath string, err error) {
// Refuse anything that isn't a clean DNS name before it reaches the
// Caddyfile write — blocks Caddyfile-injection via crafted domains.
if !dnsNamePattern.MatchString(domain) {
return "", "", fmt.Errorf("refusing to provision TURNS cert for non-DNS-name domain %q", domain)
}
// Check if cert already exists from a previous provisioning
certPath, keyPath = caddyCertPaths(domain)
if _, err := os.Stat(certPath); err == nil {

View File

@ -0,0 +1,175 @@
package namespace
import (
"os"
"path/filepath"
"strings"
"testing"
"time"
"go.uber.org/zap"
)
// feat-124 — resolveTURNSCert semantics.
//
// On machines without a Caddyfile (tests, dev laptops) the Let's Encrypt
// branch fails fast with "failed to read Caddyfile", exercising exactly the
// fallback decision this function owns: primary domains degrade to a
// self-signed pair, the stealth domain must hard-fail instead.
func testSpawner(t *testing.T) *SystemdSpawner {
t.Helper()
return &SystemdSpawner{logger: zap.NewNop()}
}
func TestResolveTURNSCert_primaryFallsBackToSelfSigned(t *testing.T) {
s := testSpawner(t)
dir := t.TempDir()
certPath, keyPath, err := s.resolveTURNSCert("ns-test", "turn.ns-test.example.com", "203.0.113.7", dir, true)
if err != nil {
t.Fatalf("expected self-signed fallback, got error: %v", err)
}
if certPath != filepath.Join(dir, "turn-cert.pem") || keyPath != filepath.Join(dir, "turn-key.pem") {
t.Errorf("unexpected fallback paths: %s / %s", certPath, keyPath)
}
if _, statErr := os.Stat(certPath); statErr != nil {
t.Errorf("self-signed cert not written: %v", statErr)
}
}
func TestResolveTURNSCert_existingSelfSignedReused(t *testing.T) {
s := testSpawner(t)
dir := t.TempDir()
first, _, err := s.resolveTURNSCert("ns-test", "", "203.0.113.7", dir, true)
if err != nil {
t.Fatalf("first resolve: %v", err)
}
info1, err := os.Stat(first)
if err != nil {
t.Fatalf("stat first cert: %v", err)
}
second, _, err := s.resolveTURNSCert("ns-test", "", "203.0.113.7", dir, true)
if err != nil {
t.Fatalf("second resolve: %v", err)
}
info2, err := os.Stat(second)
if err != nil {
t.Fatalf("stat second cert: %v", err)
}
if first != second || info1.ModTime() != info2.ModTime() {
t.Error("existing self-signed pair was regenerated instead of reused")
}
}
func TestResolveTURNSCert_stealthNeverFallsBackToSelfSigned(t *testing.T) {
s := testSpawner(t)
dir := t.TempDir()
_, _, err := s.resolveTURNSCert("ns-test", "cdn-abc123def456.example.com", "203.0.113.7", dir, false)
if err == nil {
t.Fatal("stealth cert resolution must hard-fail without Let's Encrypt — a self-signed stealth cert is indistinguishable from being blocked")
}
if !strings.Contains(err.Error(), "cdn-abc123def456.example.com") {
t.Errorf("error must name the stealth domain for the operator; got: %v", err)
}
if _, statErr := os.Stat(filepath.Join(dir, "turn-cert.pem")); !os.IsNotExist(statErr) {
t.Error("stealth failure must not write a self-signed pair")
}
}
func TestResolveTURNSCert_noDomainNoFallbackErrors(t *testing.T) {
s := testSpawner(t)
_, _, err := s.resolveTURNSCert("ns-test", "", "203.0.113.7", t.TempDir(), false)
if err == nil {
t.Fatal("empty domain with self-signed disallowed must error")
}
}
// Security (feat-124): the Caddyfile sink must refuse any domain that isn't a
// clean DNS name, so a crafted value can't break out of the generated block
// and inject Caddy directives.
func TestProvisionTURNCertViaCaddy_rejectsNonDNSName(t *testing.T) {
bad := []string{
"example.com {\n reverse_proxy evil:1234\n}\n#",
"has space.com",
"UPPER.example.com",
"nodots",
"trailing-.example.com",
"",
}
for _, d := range bad {
if _, _, err := provisionTURNCertViaCaddy(d, "http://localhost:6001/v1/internal/acme", time.Second); err == nil {
t.Errorf("provisionTURNCertViaCaddy(%q) accepted a non-DNS-name domain", d)
}
}
}
// feat-124 stealth cert reuse: the stealth TURNS host reuses Caddy's existing
// *.<base> wildcard cert instead of writing the Caddyfile (the orama-node
// service can't, ProtectSystem=strict). These pin the validation logic.
func TestIsSingleLabelSubdomain(t *testing.T) {
cases := []struct {
host, base string
want bool
}{
{"cdn-a1b2c3d4e5f6.orama-devnet.network", "orama-devnet.network", true},
{"turn.ns-anchat-test.orama-devnet.network", "orama-devnet.network", false}, // multi-label
{"orama-devnet.network", "orama-devnet.network", false}, // empty label
{"cdn-x.other.network", "orama-devnet.network", false}, // wrong base
{"cdn-x.example.com", "example.com", true},
}
for _, c := range cases {
if got := isSingleLabelSubdomain(c.host, c.base); got != c.want {
t.Errorf("isSingleLabelSubdomain(%q, %q) = %v; want %v", c.host, c.base, got, c.want)
}
}
}
func TestCaddyWildcardCertPaths_shape(t *testing.T) {
crt, key := caddyWildcardCertPaths("orama-devnet.network")
wantCrt := "/var/lib/caddy/caddy/certificates/acme-v02.api.letsencrypt.org-directory/wildcard_.orama-devnet.network/wildcard_.orama-devnet.network.crt"
if crt != wantCrt {
t.Errorf("cert path = %q; want %q", crt, wantCrt)
}
if !strings.HasSuffix(key, "wildcard_.orama-devnet.network.key") {
t.Errorf("key path = %q; want a wildcard .key", key)
}
}
func TestResolveStealthCert_rejectsMultiLabelHost(t *testing.T) {
s := testSpawner(t)
// A host that needs *.ns-x.<base> (multi-label) is NOT covered by the
// *.<base> wildcard — must error rather than present a mismatched cert.
_, _, err := s.resolveStealthCert("turn.ns-x.orama-devnet.network", "orama-devnet.network")
if err == nil {
t.Fatal("multi-label host must be rejected (wildcard wouldn't cover it)")
}
if !strings.Contains(err.Error(), "single-label") {
t.Errorf("error should explain the single-label requirement; got: %v", err)
}
}
func TestResolveStealthCert_missingWildcardErrors(t *testing.T) {
s := testSpawner(t)
// Valid single-label host but the wildcard cert almost certainly does not
// exist at the absolute Caddy storage path during tests → hard error
// naming the path, never a self-signed fallback.
_, _, err := s.resolveStealthCert("cdn-deadbeef0000.test-nonexistent-base.invalid", "test-nonexistent-base.invalid")
if err == nil {
t.Fatal("missing wildcard cert must hard-fail")
}
if !strings.Contains(err.Error(), "wildcard") {
t.Errorf("error should reference the missing wildcard cert; got: %v", err)
}
}
func TestResolveStealthCert_emptyBaseErrors(t *testing.T) {
s := testSpawner(t)
if _, _, err := s.resolveStealthCert("cdn-x.example.com", ""); err == nil {
t.Fatal("empty base domain must error")
}
}

View File

@ -94,8 +94,8 @@ const (
const (
// SFU media port range: 20000-29999
// Each namespace gets a 500-port sub-range for RTP media
SFUMediaPortRangeStart = 20000
SFUMediaPortRangeEnd = 29999
SFUMediaPortRangeStart = 20000
SFUMediaPortRangeEnd = 29999
SFUMediaPortsPerNamespace = 500
// SFU signaling ports: 30000-30099
@ -105,8 +105,8 @@ const (
// TURN relay port range: 49152-65535
// Each namespace gets an 800-port sub-range for TURN relay
TURNRelayPortRangeStart = 49152
TURNRelayPortRangeEnd = 65535
TURNRelayPortRangeStart = 49152
TURNRelayPortRangeEnd = 65535
TURNRelayPortsPerNamespace = 800
// TURN listen ports (standard)
@ -152,38 +152,38 @@ type NamespaceCluster struct {
// ClusterNode represents a node participating in a namespace cluster
type ClusterNode struct {
ID string `json:"id" db:"id"`
NamespaceClusterID string `json:"namespace_cluster_id" db:"namespace_cluster_id"`
NodeID string `json:"node_id" db:"node_id"`
Role NodeRole `json:"role" db:"role"`
RQLiteHTTPPort int `json:"rqlite_http_port,omitempty" db:"rqlite_http_port"`
RQLiteRaftPort int `json:"rqlite_raft_port,omitempty" db:"rqlite_raft_port"`
OlricHTTPPort int `json:"olric_http_port,omitempty" db:"olric_http_port"`
OlricMemberlistPort int `json:"olric_memberlist_port,omitempty" db:"olric_memberlist_port"`
GatewayHTTPPort int `json:"gateway_http_port,omitempty" db:"gateway_http_port"`
Status NodeStatus `json:"status" db:"status"`
ProcessPID int `json:"process_pid,omitempty" db:"process_pid"`
LastHeartbeat *time.Time `json:"last_heartbeat,omitempty" db:"last_heartbeat"`
ErrorMessage string `json:"error_message,omitempty" db:"error_message"`
RQLiteJoinAddress string `json:"rqlite_join_address,omitempty" db:"rqlite_join_address"`
OlricPeers string `json:"olric_peers,omitempty" db:"olric_peers"` // JSON array
CreatedAt time.Time `json:"created_at" db:"created_at"`
UpdatedAt time.Time `json:"updated_at" db:"updated_at"`
ID string `json:"id" db:"id"`
NamespaceClusterID string `json:"namespace_cluster_id" db:"namespace_cluster_id"`
NodeID string `json:"node_id" db:"node_id"`
Role NodeRole `json:"role" db:"role"`
RQLiteHTTPPort int `json:"rqlite_http_port,omitempty" db:"rqlite_http_port"`
RQLiteRaftPort int `json:"rqlite_raft_port,omitempty" db:"rqlite_raft_port"`
OlricHTTPPort int `json:"olric_http_port,omitempty" db:"olric_http_port"`
OlricMemberlistPort int `json:"olric_memberlist_port,omitempty" db:"olric_memberlist_port"`
GatewayHTTPPort int `json:"gateway_http_port,omitempty" db:"gateway_http_port"`
Status NodeStatus `json:"status" db:"status"`
ProcessPID int `json:"process_pid,omitempty" db:"process_pid"`
LastHeartbeat *time.Time `json:"last_heartbeat,omitempty" db:"last_heartbeat"`
ErrorMessage string `json:"error_message,omitempty" db:"error_message"`
RQLiteJoinAddress string `json:"rqlite_join_address,omitempty" db:"rqlite_join_address"`
OlricPeers string `json:"olric_peers,omitempty" db:"olric_peers"` // JSON array
CreatedAt time.Time `json:"created_at" db:"created_at"`
UpdatedAt time.Time `json:"updated_at" db:"updated_at"`
}
// PortBlock represents an allocated block of ports for a namespace on a node
type PortBlock struct {
ID string `json:"id" db:"id"`
NodeID string `json:"node_id" db:"node_id"`
NamespaceClusterID string `json:"namespace_cluster_id" db:"namespace_cluster_id"`
PortStart int `json:"port_start" db:"port_start"`
PortEnd int `json:"port_end" db:"port_end"`
RQLiteHTTPPort int `json:"rqlite_http_port" db:"rqlite_http_port"`
RQLiteRaftPort int `json:"rqlite_raft_port" db:"rqlite_raft_port"`
OlricHTTPPort int `json:"olric_http_port" db:"olric_http_port"`
OlricMemberlistPort int `json:"olric_memberlist_port" db:"olric_memberlist_port"`
GatewayHTTPPort int `json:"gateway_http_port" db:"gateway_http_port"`
AllocatedAt time.Time `json:"allocated_at" db:"allocated_at"`
ID string `json:"id" db:"id"`
NodeID string `json:"node_id" db:"node_id"`
NamespaceClusterID string `json:"namespace_cluster_id" db:"namespace_cluster_id"`
PortStart int `json:"port_start" db:"port_start"`
PortEnd int `json:"port_end" db:"port_end"`
RQLiteHTTPPort int `json:"rqlite_http_port" db:"rqlite_http_port"`
RQLiteRaftPort int `json:"rqlite_raft_port" db:"rqlite_raft_port"`
OlricHTTPPort int `json:"olric_http_port" db:"olric_http_port"`
OlricMemberlistPort int `json:"olric_memberlist_port" db:"olric_memberlist_port"`
GatewayHTTPPort int `json:"gateway_http_port" db:"gateway_http_port"`
AllocatedAt time.Time `json:"allocated_at" db:"allocated_at"`
}
// ClusterEvent represents an audit event for cluster lifecycle
@ -238,33 +238,39 @@ func (e *ClusterError) Unwrap() error {
}
var (
ErrNoPortsAvailable = &ClusterError{Message: "no ports available on node"}
ErrNodeAtCapacity = &ClusterError{Message: "node has reached maximum namespace instances"}
ErrInsufficientNodes = &ClusterError{Message: "insufficient nodes available for cluster"}
ErrClusterNotFound = &ClusterError{Message: "namespace cluster not found"}
ErrClusterAlreadyExists = &ClusterError{Message: "namespace cluster already exists"}
ErrProvisioningFailed = &ClusterError{Message: "cluster provisioning failed"}
ErrNamespaceNotFound = &ClusterError{Message: "namespace not found"}
ErrInvalidClusterStatus = &ClusterError{Message: "invalid cluster status for operation"}
ErrRecoveryInProgress = &ClusterError{Message: "recovery already in progress for this cluster"}
ErrWebRTCAlreadyEnabled = &ClusterError{Message: "WebRTC is already enabled for this namespace"}
ErrWebRTCNotEnabled = &ClusterError{Message: "WebRTC is not enabled for this namespace"}
ErrNoWebRTCPortsAvailable = &ClusterError{Message: "no WebRTC ports available on node"}
ErrNoPortsAvailable = &ClusterError{Message: "no ports available on node"}
ErrNodeAtCapacity = &ClusterError{Message: "node has reached maximum namespace instances"}
ErrInsufficientNodes = &ClusterError{Message: "insufficient nodes available for cluster"}
ErrClusterNotFound = &ClusterError{Message: "namespace cluster not found"}
ErrClusterAlreadyExists = &ClusterError{Message: "namespace cluster already exists"}
ErrProvisioningFailed = &ClusterError{Message: "cluster provisioning failed"}
ErrNamespaceNotFound = &ClusterError{Message: "namespace not found"}
ErrInvalidClusterStatus = &ClusterError{Message: "invalid cluster status for operation"}
ErrRecoveryInProgress = &ClusterError{Message: "recovery already in progress for this cluster"}
ErrWebRTCAlreadyEnabled = &ClusterError{Message: "WebRTC is already enabled for this namespace"}
ErrWebRTCNotEnabled = &ClusterError{Message: "WebRTC is not enabled for this namespace"}
ErrWebRTCStealthAlreadyEnabled = &ClusterError{Message: "WebRTC stealth is already enabled for this namespace"}
ErrWebRTCStealthNotEnabled = &ClusterError{Message: "WebRTC stealth is not enabled for this namespace"}
ErrNoWebRTCPortsAvailable = &ClusterError{Message: "no WebRTC ports available on node"}
)
// WebRTCConfig represents the per-namespace WebRTC configuration stored in the database
type WebRTCConfig struct {
ID string `json:"id" db:"id"`
NamespaceClusterID string `json:"namespace_cluster_id" db:"namespace_cluster_id"`
NamespaceName string `json:"namespace_name" db:"namespace_name"`
Enabled bool `json:"enabled" db:"enabled"`
TURNSharedSecret string `json:"-" db:"turn_shared_secret"` // Never serialize secret to JSON
TURNCredentialTTL int `json:"turn_credential_ttl" db:"turn_credential_ttl"`
SFUNodeCount int `json:"sfu_node_count" db:"sfu_node_count"`
TURNNodeCount int `json:"turn_node_count" db:"turn_node_count"`
EnabledBy string `json:"enabled_by" db:"enabled_by"`
EnabledAt time.Time `json:"enabled_at" db:"enabled_at"`
DisabledAt *time.Time `json:"disabled_at,omitempty" db:"disabled_at"`
ID string `json:"id" db:"id"`
NamespaceClusterID string `json:"namespace_cluster_id" db:"namespace_cluster_id"`
NamespaceName string `json:"namespace_name" db:"namespace_name"`
Enabled bool `json:"enabled" db:"enabled"`
TURNSharedSecret string `json:"-" db:"turn_shared_secret"` // Never serialize secret to JSON
TURNCredentialTTL int `json:"turn_credential_ttl" db:"turn_credential_ttl"`
SFUNodeCount int `json:"sfu_node_count" db:"sfu_node_count"`
TURNNodeCount int `json:"turn_node_count" db:"turn_node_count"`
// StealthEnabled gates the censorship-resistant TURNS:443 path (feat-124):
// stealth cert on the TURN servers, SNI route on :443, and the
// `turns:<stealth-host>:443` rung in the turn.credentials URI ladder.
StealthEnabled bool `json:"stealth_enabled" db:"stealth_enabled"`
EnabledBy string `json:"enabled_by" db:"enabled_by"`
EnabledAt time.Time `json:"enabled_at" db:"enabled_at"`
DisabledAt *time.Time `json:"disabled_at,omitempty" db:"disabled_at"`
}
// WebRTCRoom represents an active WebRTC room tracked in the database
@ -284,15 +290,15 @@ type WebRTCRoom struct {
// WebRTCPortBlock represents allocated WebRTC ports for a namespace on a node
type WebRTCPortBlock struct {
ID string `json:"id" db:"id"`
NodeID string `json:"node_id" db:"node_id"`
NamespaceClusterID string `json:"namespace_cluster_id" db:"namespace_cluster_id"`
ServiceType string `json:"service_type" db:"service_type"` // "sfu" or "turn"
ID string `json:"id" db:"id"`
NodeID string `json:"node_id" db:"node_id"`
NamespaceClusterID string `json:"namespace_cluster_id" db:"namespace_cluster_id"`
ServiceType string `json:"service_type" db:"service_type"` // "sfu" or "turn"
// SFU ports
SFUSignalingPort int `json:"sfu_signaling_port,omitempty" db:"sfu_signaling_port"`
SFUMediaPortStart int `json:"sfu_media_port_start,omitempty" db:"sfu_media_port_start"`
SFUMediaPortEnd int `json:"sfu_media_port_end,omitempty" db:"sfu_media_port_end"`
SFUSignalingPort int `json:"sfu_signaling_port,omitempty" db:"sfu_signaling_port"`
SFUMediaPortStart int `json:"sfu_media_port_start,omitempty" db:"sfu_media_port_start"`
SFUMediaPortEnd int `json:"sfu_media_port_end,omitempty" db:"sfu_media_port_end"`
// TURN ports
TURNListenPort int `json:"turn_listen_port,omitempty" db:"turn_listen_port"`

View File

@ -58,6 +58,15 @@ func (n *Node) startHTTPGateway(ctx context.Context) error {
rqlitePassword = strings.TrimSpace(string(secretBytes))
}
// Read the serverless secrets encryption key (bugboard #837). Must be the
// SAME value on every namespace-gateway node so a secret encrypted by one
// process decrypts on another; an empty value makes get_secret fail loudly
// (the manager refuses an ephemeral key in production).
secretsEncryptionKey := ""
if secretBytes, err := os.ReadFile(filepath.Join(oramaDir, "secrets", "secrets-encryption-key")); err == nil {
secretsEncryptionKey = strings.TrimSpace(string(secretBytes))
}
gwCfg := &gateway.Config{
ListenAddr: n.config.HTTPGateway.ListenAddr,
ClientNamespace: n.config.HTTPGateway.ClientNamespace,
@ -75,6 +84,7 @@ func (n *Node) startHTTPGateway(ctx context.Context) error {
RQLitePassword: rqlitePassword,
ClusterSecret: clusterSecret,
APIKeyHMACSecret: apiKeyHMACSecret,
SecretsEncryptionKey: secretsEncryptionKey,
WebRTCEnabled: n.config.HTTPGateway.WebRTC.Enabled,
SFUPort: n.config.HTTPGateway.WebRTC.SFUPort,
TURNDomain: n.config.HTTPGateway.WebRTC.TURNDomain,
@ -119,6 +129,11 @@ func (n *Node) startHTTPGateway(ctx context.Context) error {
IPFSReplicationFactor: n.config.Database.IPFS.ReplicationFactor,
TurnEncryptionKey: turnEncKey,
ClusterSecretPath: clusterSecretPath,
// Bugboard #837 follow-up: forward the host's serverless secrets
// encryption key (read once above) so spawned namespace gateways
// can manage function secrets. Reuses the same variable the host
// gateway uses — no second file read.
SecretsEncryptionKey: secretsEncryptionKey,
}
clusterManager := namespace.NewClusterManager(ormClient, clusterCfg, n.logger.Logger)
clusterManager.SetLocalNodeID(gwCfg.NodePeerID)

View File

@ -5,7 +5,6 @@ import (
"fmt"
"strings"
"sync"
"time"
"golang.org/x/sync/errgroup"
)
@ -75,30 +74,20 @@ func (m *Manager) Publish(ctx context.Context, topic string, data []byte) error
return fmt.Errorf("failed to get topic for publishing: %w", err)
}
// Wait briefly for mesh formation if no peers are in the mesh yet
// GossipSub needs time to discover peers and form a mesh
// With FloodPublish enabled, messages will be flooded to all connected peers
// but we still want to give the mesh a chance to form for better delivery
waitCtx, waitCancel := context.WithTimeout(ctx, 2*time.Second)
defer waitCancel()
// Check if we have peers in the mesh, wait up to 2 seconds for mesh formation
meshFormed := false
for i := 0; i < 20 && !meshFormed; i++ {
peers := libp2pTopic.ListPeers()
if len(peers) > 0 {
meshFormed = true
break // Mesh has formed, proceed with publish
}
select {
case <-waitCtx.Done():
meshFormed = true // Timeout, proceed anyway (FloodPublish will handle it)
case <-time.After(100 * time.Millisecond):
// Continue waiting
}
}
// Publish message
// Publish immediately — do NOT wait for gossipsub mesh formation.
//
// The router runs with FloodPublish enabled (pkg/node/libp2p.go and
// pkg/client/client.go), so the message is sent directly to every
// connected peer subscribed to the topic without needing a mesh, and a
// same-gateway subscriber receives it via the local loopback regardless.
//
// A previous version polled ListPeers() for up to 2s here "to give the
// mesh a chance to form." On the namespace-gateway topology most
// application topics (per-conversation/wakeup) have no REMOTE mesh peers
// — they're delivered to local WS clients — so the loop timed out the
// full 2s on EVERY publish, making a 3-publish message-create cost ~6s
// server-side (feat-6, the dominant realtime latency). FloodPublish makes
// the wait redundant; removed.
if err := libp2pTopic.Publish(ctx, data); err != nil {
return fmt.Errorf("failed to publish message: %w", err)
}

View File

@ -84,10 +84,31 @@ func TestPublishBatch_context_cancel_returns_error(t *testing.T) {
}
}
// TestPublish_does_not_block_on_empty_mesh is a regression guard for feat-6.
// Publish must NOT wait for gossipsub mesh formation: it previously polled
// ListPeers() for up to 2s, so every publish to a topic with no remote
// subscribers (the common namespace-gateway case, where wakeup topics are
// delivered to LOCAL WS clients) cost the full 2s — a 3-publish message-create
// paid ~6s server-side. FloodPublish delivers without the mesh, so a publish
// against an empty mesh must return promptly.
func TestPublish_does_not_block_on_empty_mesh(t *testing.T) {
mgr, cleanup := createTestManager(t, "test-ns")
defer cleanup()
start := time.Now()
if err := mgr.Publish(context.Background(), "no-subscribers", []byte("d")); err != nil {
t.Fatalf("Publish failed: %v", err)
}
// Old code: ~2000ms. New code: ~ms. 500ms is a generous ceiling that
// avoids CI flakiness while still catching a re-introduced multi-second
// mesh-wait.
if elapsed := time.Since(start); elapsed > 500*time.Millisecond {
t.Errorf("Publish blocked %v on an empty mesh — the mesh-wait must stay removed (feat-6)", elapsed)
}
}
func TestPublishBatch_concurrency_limit(t *testing.T) {
// Verify PublishBatch with low MaxConcurrency completes without deadlocking.
// Each Publish in a no-peer test environment waits up to 2s for mesh formation,
// so we use a small batch size to keep wall time bounded.
mgr, cleanup := createTestManager(t, "test-ns")
defer cleanup()

View File

@ -2,6 +2,7 @@ package push
import (
"context"
"errors"
"fmt"
"sync"
@ -52,46 +53,141 @@ func (d *PushDispatcher) Provider(name string) PushProvider {
//
// SendToUser returns nil if the user has no registered devices — that
// is normal, not an error.
//
// Callers wanting per-device outcomes should use SendToUserDetailed
// (bugboard #348 — back-compat preserved on this method).
func (d *PushDispatcher) SendToUser(
ctx context.Context,
namespace, userID string,
msg PushMessage,
) error {
res, err := d.SendToUserDetailed(ctx, namespace, userID, msg)
if err != nil {
return err
}
// Preserve the legacy contract: return the first per-device error
// with the full error chain intact (sentinels like ErrUnknownProvider
// and ErrDeviceUnregistered are reachable via errors.Is on the result).
for _, r := range res.Results {
if !r.Success && r.err != nil {
return r.err
}
}
return nil
}
// SendToUserDetailed dispatches to every registered device for the user
// and returns a per-device outcome. Unlike SendToUser (which collapses
// to a single error), this surfaces every device's HTTP status / reason
// so the caller can react granularly (delete on Unregistered, retry on
// 5xx, log unknowns, etc.).
//
// Used by the `oh.PushSendV2` WASM host function so WASM callers can
// auto-clean stale tokens and surface real failures (bugboard #348).
//
// Returns (nil, err) only on setup failures (device-store query failed,
// etc.). A user with zero devices returns
// (&SendDetailedResult{Ok: true, DevicesAttempted: 0}, nil).
func (d *PushDispatcher) SendToUserDetailed(
ctx context.Context,
namespace, userID string,
msg PushMessage,
) (*SendDetailedResult, error) {
devs, err := d.devices.ListForUser(ctx, namespace, userID)
if err != nil {
return fmt.Errorf("list devices: %w", err)
return nil, fmt.Errorf("list devices: %w", err)
}
// Bugboard #408 — target_provider filter. When the caller sets
// msg.TargetProvider, drop every device whose Provider doesn't match
// BEFORE we attempt sends or count anything. This lets a chat-alert
// path send only to "apns" devices while a call-push path sends only
// to "apns_voip" devices, even though both are registered on the
// same iPhone. Unset = fanout (back-compat for every existing
// caller, including unmigrated functions in other namespaces).
//
// Bugboard feat-10 — exclude_provider filter. The inverse: drop
// devices whose Provider EQUALS msg.ExcludeProvider. Useful for the
// "fan out to everyone EXCEPT VoIP" pattern (chat handler that wants
// ntfy+apns+expo but never apns_voip — cleaner than listing every
// included provider). If both are set, TargetProvider wins —
// combining them is ambiguous (e.g. target=apns + exclude=apns is
// empty by construction), so we pick the safer positive filter and
// ignore the exclusion. Unset = no exclusion.
if msg.TargetProvider != "" {
filtered := devs[:0]
for _, dev := range devs {
if dev.Provider == msg.TargetProvider {
filtered = append(filtered, dev)
}
}
devs = filtered
} else if msg.ExcludeProvider != "" {
filtered := devs[:0]
for _, dev := range devs {
if dev.Provider != msg.ExcludeProvider {
filtered = append(filtered, dev)
}
}
devs = filtered
}
out := &SendDetailedResult{
Ok: true, // flipped to false on the first failure
DevicesAttempted: len(devs),
Results: make([]DeviceSendResult, 0, len(devs)),
}
if len(devs) == 0 {
return nil
return out, nil
}
var firstErr error
for _, dev := range devs {
r := DeviceSendResult{DeviceID: dev.DeviceID, Provider: dev.Provider}
d.mu.RLock()
p, ok := d.providers[dev.Provider]
d.mu.RUnlock()
if !ok {
r.Success = false
r.Message = fmt.Sprintf("push: unknown provider %q (device not dispatched)", dev.Provider)
// Preserve the sentinel error chain so legacy callers using
// errors.Is(err, ErrUnknownProvider) on the SendToUser
// return value keep working.
r.err = fmt.Errorf("%w: %s", ErrUnknownProvider, dev.Provider)
d.logger.Warn("push: dropping device with unregistered provider",
zap.String("provider", dev.Provider),
zap.String("device_id", dev.DeviceID),
)
if firstErr == nil {
firstErr = fmt.Errorf("%w: %s", ErrUnknownProvider, dev.Provider)
}
out.Ok = false
out.Results = append(out.Results, r)
continue
}
m := msg
m.DeviceToken = dev.Token
if err := p.Send(ctx, m); err != nil {
if sendErr := p.Send(ctx, m); sendErr != nil {
r.Success = false
r.err = sendErr // preserve full chain for errors.Is/As
// Extract structured info if the provider returned PushError.
var perr *PushError
if errors.As(sendErr, &perr) {
r.HTTPStatus = perr.HTTPStatus
r.Reason = perr.Reason
r.Message = perr.Message
r.Unregistered = perr.Unregistered
} else {
r.Message = sendErr.Error()
}
d.logger.Warn("push: provider send failed",
zap.String("provider", dev.Provider),
zap.String("device_id", dev.DeviceID),
zap.Error(err),
zap.Int("http_status", r.HTTPStatus),
zap.String("reason", r.Reason),
zap.Bool("unregistered", r.Unregistered),
zap.Error(sendErr),
)
if firstErr == nil {
firstErr = err
}
out.Ok = false
} else {
r.Success = true
out.DevicesSucceeded++
}
out.Results = append(out.Results, r)
}
return firstErr
return out, nil
}

View File

@ -0,0 +1,199 @@
package push
import (
"context"
"encoding/json"
"errors"
"testing"
"go.uber.org/zap"
)
// TestSendToUserDetailed_happyPath verifies the per-device result shape
// for the success case: ok=true, attempted=N, succeeded=N, every entry
// has Success=true.
func TestSendToUserDetailed_happyPath(t *testing.T) {
store := &fakeStore{devices: []PushDevice{
{Namespace: "ns", UserID: "u", DeviceID: "ios-A", Provider: "ntfy", Token: "tok-1"},
{Namespace: "ns", UserID: "u", DeviceID: "ios-B", Provider: "ntfy", Token: "tok-2"},
}}
ntfy := &fakeProvider{name: "ntfy"}
d := New(store, zap.NewNop())
d.Register(ntfy)
res, err := d.SendToUserDetailed(context.Background(), "ns", "u", PushMessage{Title: "hi"})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
if !res.Ok {
t.Error("expected Ok=true on all-success")
}
if res.DevicesAttempted != 2 || res.DevicesSucceeded != 2 {
t.Errorf("attempted=%d succeeded=%d; want 2/2", res.DevicesAttempted, res.DevicesSucceeded)
}
if len(res.Results) != 2 {
t.Fatalf("results len = %d; want 2", len(res.Results))
}
for i, r := range res.Results {
if !r.Success {
t.Errorf("result[%d] should be success, got %+v", i, r)
}
if r.Provider != "ntfy" {
t.Errorf("result[%d].Provider = %q; want ntfy", i, r.Provider)
}
}
}
// TestSendToUserDetailed_unknownProvider verifies the "ghost provider"
// case populates Message + preserves the ErrUnknownProvider chain on
// the unexported err field (so the legacy SendToUser still sees the
// sentinel via errors.Is).
func TestSendToUserDetailed_unknownProvider(t *testing.T) {
store := &fakeStore{devices: []PushDevice{
{Namespace: "ns", UserID: "u", DeviceID: "old-android", Provider: "ghost", Token: "tok"},
}}
d := New(store, zap.NewNop())
res, err := d.SendToUserDetailed(context.Background(), "ns", "u", PushMessage{Title: "x"})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
if res.Ok {
t.Error("Ok should be false when any device failed")
}
if res.DevicesAttempted != 1 || res.DevicesSucceeded != 0 {
t.Errorf("attempted=%d succeeded=%d; want 1/0", res.DevicesAttempted, res.DevicesSucceeded)
}
r := res.Results[0]
if r.Success {
t.Error("unknown provider should not be Success")
}
if r.Message == "" {
t.Error("Message should describe the unknown provider")
}
// The unexported err field carries the sentinel for errors.Is.
if !errors.Is(r.Err(), ErrUnknownProvider) {
t.Errorf("expected r.Err() to wrap ErrUnknownProvider, got %v", r.Err())
}
}
// TestSendToUserDetailed_structuredPushError verifies that when a
// provider returns a *PushError (APNs 410/400/etc.), the detailed
// result faithfully reflects HTTPStatus, Reason, and Unregistered.
func TestSendToUserDetailed_structuredPushError(t *testing.T) {
store := &fakeStore{devices: []PushDevice{
{Namespace: "ns", UserID: "u", DeviceID: "ios-dead", Provider: "apns", Token: "tok"},
}}
apnsErr := &PushError{
HTTPStatus: 410,
Reason: "Unregistered",
Message: "apns: 410 Unregistered",
Unregistered: true,
}
apns := &fakeProvider{name: "apns", err: apnsErr}
d := New(store, zap.NewNop())
d.Register(apns)
res, err := d.SendToUserDetailed(context.Background(), "ns", "u", PushMessage{Title: "x"})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
if res.Ok {
t.Error("Ok should be false")
}
r := res.Results[0]
if r.HTTPStatus != 410 {
t.Errorf("HTTPStatus = %d; want 410", r.HTTPStatus)
}
if r.Reason != "Unregistered" {
t.Errorf("Reason = %q; want Unregistered", r.Reason)
}
if !r.Unregistered {
t.Error("Unregistered flag should be true for 410")
}
}
// TestSendToUserDetailed_jsonShapeForWASM verifies the JSON encoding
// of SendDetailedResult matches what the WASM `oh.PushSendV2` host fn
// will produce. The unexported err field MUST be excluded from JSON
// (it's an in-process plumbing detail, not a wire field).
func TestSendToUserDetailed_jsonShapeForWASM(t *testing.T) {
res := &SendDetailedResult{
Ok: false,
DevicesAttempted: 2,
DevicesSucceeded: 1,
Results: []DeviceSendResult{
{DeviceID: "good", Provider: "apns", Success: true},
{
DeviceID: "bad",
Provider: "apns",
Success: false,
HTTPStatus: 410,
Reason: "Unregistered",
Message: "apns: 410 Unregistered",
Unregistered: true,
err: errors.New("must-not-leak"),
},
},
}
raw, err := json.Marshal(res)
if err != nil {
t.Fatalf("marshal: %v", err)
}
s := string(raw)
// Required fields present:
for _, want := range []string{
`"ok":false`,
`"devices_attempted":2`,
`"devices_succeeded":1`,
`"device_id":"good"`,
`"success":true`,
`"device_id":"bad"`,
`"http_status":410`,
`"reason":"Unregistered"`,
`"unregistered":true`,
} {
if !contains(s, want) {
t.Errorf("expected JSON to contain %q; got: %s", want, s)
}
}
// The unexported err must NOT leak into JSON.
if contains(s, "must-not-leak") {
t.Errorf("unexported err field leaked into JSON: %s", s)
}
}
// TestSendToUser_legacyContract_preservedAcrossDetailedRefactor verifies
// that SendToUser (now layered on SendToUserDetailed) still returns the
// FIRST per-device error with its sentinel chain intact. Regression
// guard against accidentally losing the errors.Is contract for the
// pre-#348 callers.
func TestSendToUser_legacyContract_preservedAcrossDetailedRefactor(t *testing.T) {
store := &fakeStore{devices: []PushDevice{
{Namespace: "ns", UserID: "u", DeviceID: "phone", Provider: "ghost", Token: "tok"},
}}
d := New(store, zap.NewNop())
err := d.SendToUser(context.Background(), "ns", "u", PushMessage{Title: "x"})
if err == nil {
t.Fatal("expected SendToUser to surface the unknown-provider error")
}
if !errors.Is(err, ErrUnknownProvider) {
t.Errorf("SendToUser err = %v; want errors.Is(..., ErrUnknownProvider)", err)
}
}
func contains(haystack, needle string) bool {
return len(needle) == 0 || (len(haystack) >= len(needle) && indexOf(haystack, needle) >= 0)
}
func indexOf(s, sub string) int {
for i := 0; i+len(sub) <= len(s); i++ {
if s[i:i+len(sub)] == sub {
return i
}
}
return -1
}

View File

@ -0,0 +1,146 @@
package push
import (
"context"
"testing"
"go.uber.org/zap"
)
// Bugboard feat-10 — exclude_provider dispatcher filter.
//
// Inverse of #408's target_provider. Pin behaviors that matter for the
// "fan out to everyone EXCEPT VoIP" pattern:
//
// 1. With ExcludeProvider="apns_voip", apns/ntfy/expo devices are
// attempted; apns_voip devices are dropped. Cleaner than listing
// every included provider on every call.
//
// 2. With both TargetProvider and ExcludeProvider set, TargetProvider
// wins (positive filter is strictly narrower; combining them is
// ambiguous — e.g. target=apns + exclude=apns is empty). Documented
// and pinned so a future refactor can't accidentally let exclude
// subtract from target.
//
// 3. With neither set, fan-out unchanged (back-compat for every
// existing caller).
//
// 4. DevicesAttempted reflects the POST-filter count.
func threeDeviceUser() []PushDevice {
return []PushDevice{
{DeviceID: "ios-base", Provider: "apns", Token: "ALERT-TOKEN"},
{DeviceID: "ios-base:voip", Provider: "apns_voip", Token: "VOIP-TOKEN"},
{DeviceID: "expo-1", Provider: "expo", Token: "EXPO-TOKEN"},
}
}
func TestDispatcher_ExcludeProvider_DropsApnsVoip(t *testing.T) {
alert := &recordingProvider{name: "apns"}
voip := &recordingProvider{name: "apns_voip"}
expo := &recordingProvider{name: "expo"}
d := New(&targetFilterDeviceStore{devices: threeDeviceUser()}, zap.NewNop())
for _, p := range []PushProvider{alert, voip, expo} {
d.Register(p)
}
res, err := d.SendToUserDetailed(context.Background(), "ns", "u1", PushMessage{
Title: "new message",
Body: "hi",
ExcludeProvider: "apns_voip",
})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
if got := alert.tokens(); len(got) != 1 {
t.Errorf("alert should have been called once; got %v", got)
}
if got := expo.tokens(); len(got) != 1 {
t.Errorf("expo should have been called once; got %v", got)
}
if got := voip.tokens(); len(got) != 0 {
t.Errorf("FEAT-10 REGRESSION: voip was attempted despite ExcludeProvider=apns_voip; "+
"this would CallKit-ring on every chat message even when caller meant to skip it. got=%v", got)
}
if res.DevicesAttempted != 2 {
t.Errorf("DevicesAttempted = %d; want 2 (post-exclude: apns + expo)", res.DevicesAttempted)
}
}
func TestDispatcher_ExcludeProvider_TargetProviderWinsWhenBothSet(t *testing.T) {
// Ambiguity guard: if both are set, the documented behavior is
// "TargetProvider wins; ExcludeProvider is ignored." Without this
// pin, a future refactor could chain the filters (e.g.
// target=apns + exclude=apns → 0 devices, surprise no-op) — which
// would silently break any caller that set both, even harmlessly.
alert := &recordingProvider{name: "apns"}
voip := &recordingProvider{name: "apns_voip"}
d := New(&targetFilterDeviceStore{devices: twoIPhoneDevicesUser()}, zap.NewNop())
d.Register(alert)
d.Register(voip)
_, err := d.SendToUserDetailed(context.Background(), "ns", "u1", PushMessage{
Title: "x",
TargetProvider: "apns", // positive: only apns
ExcludeProvider: "apns_voip", // negative: also exclude voip — redundant when target is set
})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
// Only the positive filter should have applied → alert called once.
if got := alert.tokens(); len(got) != 1 {
t.Errorf("alert attempts = %v; want 1 (TargetProvider should win when both set)", got)
}
if got := voip.tokens(); len(got) != 0 {
t.Errorf("voip should not have been called (target filter excludes it implicitly); got %v", got)
}
}
func TestDispatcher_ExcludeProvider_UnsetFansOut(t *testing.T) {
// Back-compat: every existing caller that doesn't set either filter
// must continue to see the full fan-out behavior.
alert := &recordingProvider{name: "apns"}
voip := &recordingProvider{name: "apns_voip"}
expo := &recordingProvider{name: "expo"}
d := New(&targetFilterDeviceStore{devices: threeDeviceUser()}, zap.NewNop())
for _, p := range []PushProvider{alert, voip, expo} {
d.Register(p)
}
res, err := d.SendToUserDetailed(context.Background(), "ns", "u1", PushMessage{
Title: "x",
// Neither TargetProvider nor ExcludeProvider set.
})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
if res.DevicesAttempted != 3 {
t.Errorf("DevicesAttempted = %d; want 3 (fan-out)", res.DevicesAttempted)
}
if len(alert.tokens()) != 1 || len(voip.tokens()) != 1 || len(expo.tokens()) != 1 {
t.Errorf("all three providers should have been attempted; got alert=%d voip=%d expo=%d",
len(alert.tokens()), len(voip.tokens()), len(expo.tokens()))
}
}
func TestDispatcher_ExcludeProvider_NoMatchingExclusion_NoOp(t *testing.T) {
// If the exclude target doesn't match any registered device,
// everyone is still attempted (back-compat fan-out).
alert := &recordingProvider{name: "apns"}
voip := &recordingProvider{name: "apns_voip"}
d := New(&targetFilterDeviceStore{devices: twoIPhoneDevicesUser()}, zap.NewNop())
d.Register(alert)
d.Register(voip)
res, err := d.SendToUserDetailed(context.Background(), "ns", "u1", PushMessage{
Title: "x",
ExcludeProvider: "ntfy", // user has no ntfy device — no-op exclusion
})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
if res.DevicesAttempted != 2 {
t.Errorf("DevicesAttempted = %d; want 2 (exclude matched nothing)", res.DevicesAttempted)
}
}

View File

@ -0,0 +1,236 @@
package push
import (
"context"
"sync"
"testing"
"go.uber.org/zap"
)
// Bugboard #408 — target_provider dispatcher filter.
//
// Pin the four behaviors that matter for the AnChat CallKit-on-text
// bug class:
//
// 1. With TargetProvider="apns" set, ONLY apns devices are attempted.
// VoIP-registered devices on the same iPhone are silently skipped
// so a chat message doesn't trigger CallKit.
//
// 2. With TargetProvider="apns_voip", ONLY VoIP devices are attempted —
// the alert device is skipped so an incoming-call signal doesn't
// produce a silent alert.
//
// 3. With TargetProvider unset (legacy callers, unmigrated functions),
// fan-out behavior is UNCHANGED — all devices attempted. This is
// the back-compat guarantee that lets us ship the filter without
// breaking every existing call site in every namespace.
//
// 4. DevicesAttempted in the SendDetailedResult reflects the
// POST-FILTER count, not the raw device-store count. WASM callers
// interpreting `attempted=0` as "no devices" need this to be the
// real attempted count, not "user has zero devices anywhere".
// targetFilterDeviceStore returns a fixed device list and records what was
// asked for. PushDeviceStore-conformant for use as Dispatcher dep.
type targetFilterDeviceStore struct {
devices []PushDevice
}
func (f *targetFilterDeviceStore) Upsert(ctx context.Context, dev PushDevice) error { return nil }
func (f *targetFilterDeviceStore) Delete(ctx context.Context, ns, id string) error { return nil }
func (f *targetFilterDeviceStore) ListForUser(ctx context.Context, ns, userID string) ([]PushDevice, error) {
return f.devices, nil
}
// recordingProvider implements PushProvider and just records which
// device tokens it was asked to send to. Lets the test assert exactly
// which devices reached which provider.
type recordingProvider struct {
name string
mu sync.Mutex
sent []string // device tokens received
}
func (r *recordingProvider) Name() string { return r.name }
func (r *recordingProvider) Send(ctx context.Context, msg PushMessage) error {
r.mu.Lock()
defer r.mu.Unlock()
r.sent = append(r.sent, msg.DeviceToken)
return nil
}
func (r *recordingProvider) tokens() []string {
r.mu.Lock()
defer r.mu.Unlock()
out := make([]string, len(r.sent))
copy(out, r.sent)
return out
}
// twoIPhoneDevicesUser returns the canonical AnChat scenario: one user
// with one iPhone registered TWICE — alert + voip — per the documented
// registration model.
func twoIPhoneDevicesUser() []PushDevice {
return []PushDevice{
{
DeviceID: "ios-base",
Provider: "apns",
Token: "ALERT-TOKEN",
},
{
DeviceID: "ios-base:voip",
Provider: "apns_voip",
Token: "VOIP-TOKEN",
},
}
}
func newTestDispatcher(t *testing.T, devs []PushDevice, providers ...PushProvider) *PushDispatcher {
t.Helper()
d := New(&targetFilterDeviceStore{devices: devs}, zap.NewNop())
for _, p := range providers {
d.Register(p)
}
return d
}
func TestDispatcher_TargetProvider_FiltersToApns(t *testing.T) {
alert := &recordingProvider{name: "apns"}
voip := &recordingProvider{name: "apns_voip"}
d := newTestDispatcher(t, twoIPhoneDevicesUser(), alert, voip)
res, err := d.SendToUserDetailed(context.Background(), "ns", "u1", PushMessage{
Title: "new message",
Body: "hi",
TargetProvider: "apns",
})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
// Alert got the message; VoIP did NOT — this is the CallKit-on-text
// bug guard. If voip.tokens() is non-empty here, message-push-handler
// would ring CallKit on every chat message AnChat users receive.
if got := alert.tokens(); len(got) != 1 || got[0] != "ALERT-TOKEN" {
t.Errorf("alert provider tokens = %v; want [ALERT-TOKEN]", got)
}
if got := voip.tokens(); len(got) != 0 {
t.Errorf("voip provider should NOT have been called (CallKit-on-text bug); got tokens=%v", got)
}
// DevicesAttempted reflects POST-filter count, not raw device count.
// WASM callers parse this to decide whether to retry / log "no
// devices" — must be the real attempt count.
if res.DevicesAttempted != 1 {
t.Errorf("DevicesAttempted = %d; want 1 (post-filter)", res.DevicesAttempted)
}
if res.DevicesSucceeded != 1 {
t.Errorf("DevicesSucceeded = %d; want 1", res.DevicesSucceeded)
}
if len(res.Results) != 1 {
t.Errorf("Results len = %d; want 1", len(res.Results))
}
}
func TestDispatcher_TargetProvider_FiltersToApnsVoip(t *testing.T) {
alert := &recordingProvider{name: "apns"}
voip := &recordingProvider{name: "apns_voip"}
d := newTestDispatcher(t, twoIPhoneDevicesUser(), alert, voip)
res, err := d.SendToUserDetailed(context.Background(), "ns", "u1", PushMessage{
Data: map[string]interface{}{"call_id": "c-1"},
TargetProvider: "apns_voip",
})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
if got := voip.tokens(); len(got) != 1 || got[0] != "VOIP-TOKEN" {
t.Errorf("voip provider tokens = %v; want [VOIP-TOKEN]", got)
}
if got := alert.tokens(); len(got) != 0 {
t.Errorf("alert provider should NOT have been called (call-push targets voip only); got tokens=%v", got)
}
if res.DevicesAttempted != 1 {
t.Errorf("DevicesAttempted = %d; want 1", res.DevicesAttempted)
}
}
func TestDispatcher_TargetProvider_UnsetFansOut(t *testing.T) {
// Back-compat guarantee. Every existing function in every namespace
// that doesn't set target_provider must continue to see fan-out.
// If this regresses, every unmigrated push call site breaks.
alert := &recordingProvider{name: "apns"}
voip := &recordingProvider{name: "apns_voip"}
d := newTestDispatcher(t, twoIPhoneDevicesUser(), alert, voip)
res, err := d.SendToUserDetailed(context.Background(), "ns", "u1", PushMessage{
Title: "x",
// TargetProvider intentionally unset.
})
if err != nil {
t.Fatalf("SendToUserDetailed: %v", err)
}
if got := alert.tokens(); len(got) != 1 {
t.Errorf("fan-out: alert tokens = %v; want 1", got)
}
if got := voip.tokens(); len(got) != 1 {
t.Errorf("fan-out: voip tokens = %v; want 1", got)
}
if res.DevicesAttempted != 2 {
t.Errorf("DevicesAttempted = %d; want 2 (fan-out)", res.DevicesAttempted)
}
}
func TestDispatcher_TargetProvider_NoMatchingDevices_NoOp(t *testing.T) {
// User has only an alert device; call-push-handler asks for
// target_provider="apns_voip". Expected: no error, zero attempts,
// Ok=true (a user with no matching device is not an error — same
// semantics as "user has zero devices anywhere").
alert := &recordingProvider{name: "apns"}
voip := &recordingProvider{name: "apns_voip"}
d := newTestDispatcher(t, []PushDevice{
{DeviceID: "ios-only", Provider: "apns", Token: "T"},
}, alert, voip)
res, err := d.SendToUserDetailed(context.Background(), "ns", "u1", PushMessage{
TargetProvider: "apns_voip",
})
if err != nil {
t.Fatalf("expected no error for no-matching-devices; got %v", err)
}
if !res.Ok {
t.Errorf("Ok = false; want true (no matching devices is not a failure)")
}
if res.DevicesAttempted != 0 {
t.Errorf("DevicesAttempted = %d; want 0", res.DevicesAttempted)
}
if len(alert.tokens()) != 0 || len(voip.tokens()) != 0 {
t.Error("no provider should have been called")
}
}
func TestDispatcher_TargetProvider_LegacySendToUser_AlsoFilters(t *testing.T) {
// SendToUser delegates to SendToUserDetailed under the hood, so the
// filter should apply identically. Pin this so a future refactor
// can't split the two paths.
alert := &recordingProvider{name: "apns"}
voip := &recordingProvider{name: "apns_voip"}
d := newTestDispatcher(t, twoIPhoneDevicesUser(), alert, voip)
err := d.SendToUser(context.Background(), "ns", "u1", PushMessage{
Title: "x",
Body: "y",
TargetProvider: "apns",
})
if err != nil {
t.Fatalf("SendToUser: %v", err)
}
if len(alert.tokens()) != 1 {
t.Errorf("alert should have been called; got %v", alert.tokens())
}
if len(voip.tokens()) != 0 {
t.Errorf("voip should NOT have been called via SendToUser+target_provider; got %v", voip.tokens())
}
}

View File

@ -170,6 +170,17 @@ func (m *Manager) SendToUser(ctx context.Context, namespace, userID string, msg
return d.SendToUser(ctx, namespace, userID, msg)
}
// SendToUserDetailed mirrors SendToUser but returns the per-device
// outcome shape. Used by the WASM `oh.PushSendV2` host fn so callers
// can react to per-device failures (bugboard #348).
func (m *Manager) SendToUserDetailed(ctx context.Context, namespace, userID string, msg PushMessage) (*SendDetailedResult, error) {
d, err := m.dispatcherFor(ctx, namespace)
if err != nil {
return nil, err
}
return d.SendToUserDetailed(ctx, namespace, userID, msg)
}
// DeviceStore exposes the underlying device store so HTTP handlers
// (register/list/delete) can use it directly without going through the
// dispatcher path.
@ -285,7 +296,17 @@ func (m *Manager) buildDispatcher(ctx context.Context, namespace string) (*PushD
// (DELETE) — there's no "set this field to empty to clear"
// half-state, by design.
if nc.NtfyBaseURL != "" {
eff.NtfyBaseURL = nc.NtfyBaseURL
// Defense-in-depth: a base URL stored before the SSRF guard
// existed (or via any path that skipped it) must not point at an
// internal/reserved literal IP. Drop the override and fall back
// to the gateway default if it does. Literal-only (no DNS, no
// syntax re-validation) so this stays safe on the hot build path.
if IsInternalBaseURL(nc.NtfyBaseURL) {
m.logger.Warn("push: ignoring namespace ntfy_base_url override (internal address)",
zap.String("namespace", namespace), zap.String("base_url", nc.NtfyBaseURL))
} else {
eff.NtfyBaseURL = nc.NtfyBaseURL
}
}
if nc.NtfyAuthToken != "" {
eff.NtfyAuthToken = nc.NtfyAuthToken

View File

@ -21,10 +21,13 @@ import (
const defaultSendTimeout = 10 * time.Second
// Provider is the APNs push.PushProvider implementation, scoped to one
// (Team ID, Key ID, p8 key, Bundle ID, Environment) tuple. Construct
// one per namespace via the gateway dependency factory.
// (Team ID, Key ID, p8 key, Bundle ID, Environment, Kind) tuple.
// Construct one per (namespace, kind) via the gateway dependency
// factory — typically one KindAlert + one KindVoIP instance per
// namespace, both sharing the same JWT signer.
type Provider struct {
bundleID string
kind Kind
client pushClient
logger *zap.Logger
}
@ -45,10 +48,28 @@ type pushClient interface {
PushWithContext(ctx apns2.Context, notification *apns2.Notification) (*apns2.Response, error)
}
// New constructs a Provider from a parsed Config. Returns an error if
// the p8 key fails to parse — this surfaces config errors at gateway
// startup / first-send rather than at every Push call.
// New constructs a KindAlert Provider — the standard user-visible-alert
// APNs path. Back-compat constructor: callers that want VoIP/PushKit
// behavior should use NewVoIP. Returns an error if the p8 key fails to
// parse so config errors surface at gateway startup rather than at
// every Push call.
func New(c Config, logger *zap.Logger) (*Provider, error) {
return buildProvider(c, KindAlert, logger)
}
// NewVoIP constructs a KindVoIP Provider — the PushKit/CallKit path for
// incoming-call signals. Same credentials (Team ID, Key ID, p8 key,
// Bundle ID, Environment) as the alert Provider; the wire-format
// differences (topic = bundle_id+".voip", apns-push-type = "voip",
// empty-content payloads allowed) are handled in Send. Bugboard #408.
func NewVoIP(c Config, logger *zap.Logger) (*Provider, error) {
return buildProvider(c, KindVoIP, logger)
}
// buildProvider is the shared constructor for both kinds. The kind
// field gates Send's per-kind branching; everything else (JWT signer,
// HTTP/2 client, timeout) is identical.
func buildProvider(c Config, kind Kind, logger *zap.Logger) (*Provider, error) {
if logger == nil {
logger = zap.NewNop()
}
@ -79,39 +100,70 @@ func New(c Config, logger *zap.Logger) (*Provider, error) {
client.HTTPClient.Timeout = defaultSendTimeout
return &Provider{
bundleID: c.BundleID,
kind: kind,
client: client,
logger: logger.Named("apns"),
logger: logger.Named(providerNameForKind(kind)),
}, nil
}
// Name implements push.PushProvider.
func (p *Provider) Name() string { return "apns" }
// Name implements push.PushProvider. Returns "apns" for KindAlert and
// "apns_voip" for KindVoIP — these are the names the dispatcher routes
// devices against (device.Provider field) and the validProviders
// allowlist at the registration handler accepts.
func (p *Provider) Name() string { return providerNameForKind(p.kind) }
// ErrDeviceUnregistered is returned by Send when APNs responds with
// "Unregistered" (HTTP 410) — the token is no longer valid because the
// user uninstalled the app, disabled notifications, or upgraded device.
// Callers SHOULD delete the device row when they see this so the same
// dead token doesn't get retried forever.
//
// Kept as an exported sentinel for backwards compatibility — callers
// that want the structured shape should use errors.As(err, &push.PushError{})
// and check the Unregistered field.
var ErrDeviceUnregistered = errors.New("apns: device token unregistered (410); remove from device store")
// Send delivers one push to the APNs server. Constructs the APNs
// JSON payload from PushMessage, dispatches via the sideshow/apns2
// client, and maps response codes to errors.
//
// Returns nil on HTTP 200, *push.PushError on any HTTP response APNs
// gave us (status, reason, unregistered-flag baked in), or a plain
// wrapped error for transport/validation failures (no HTTP response).
//
// Bugboard #348 root-cause guard: rejects empty visible-content
// payloads up-front (no title, no body, no badge, no sound, no
// content-available) — Apple silently 200s those AND drops them
// without displaying, which previously looked like a successful
// delivery to the WASM caller. We surface the failure here so it
// doesn't look like success.
func (p *Provider) Send(ctx context.Context, msg push.PushMessage) error {
if msg.DeviceToken == "" {
return push.ErrEmptyToken
}
payload, err := buildAPSPayload(msg)
// VoIP/PushKit pushes legally have no visible alert content — iOS
// renders the CallKit UI from the `data` dict alone. Skipping the
// hasVisibleContent guard ONLY on the VoIP kind keeps the bugboard
// #348 silent-drop protection in place for the alert path while
// unblocking incoming-call signals on the VoIP path (#408).
if p.kind != KindVoIP && !hasVisibleContent(msg) {
return push.ErrEmptyContent
}
payload, err := buildAPSPayload(msg, p.kind)
if err != nil {
return fmt.Errorf("apns: build payload: %w", err)
}
n := &apns2.Notification{
DeviceToken: msg.DeviceToken,
Topic: p.bundleID,
Topic: p.topicForKind(),
Payload: payload,
PushType: p.pushTypeForKind(),
}
// Priority mapping: APNs uses 10 (immediate) / 5 (power-saving).
if msg.Priority == push.PriorityHigh {
// VoIP MUST use immediate (10) — Apple rejects "5" for voip pushes
// with `BadPriority`. We honor msg.Priority for alert; force high
// for voip regardless of what the caller passed.
if p.kind == KindVoIP || msg.Priority == push.PriorityHigh {
n.Priority = apns2.PriorityHigh
} else {
n.Priority = apns2.PriorityLow
@ -122,30 +174,132 @@ func (p *Provider) Send(ctx context.Context, msg push.PushMessage) error {
// goroutine leak.
resp, sendErr := p.client.PushWithContext(ctx, n)
if sendErr != nil {
// Transport-level failure (network, ctx cancel, etc.) — no
// HTTP response to dissect. Plain wrap so callers can still
// errors.Is against the underlying.
return fmt.Errorf("apns: push: %w", sendErr)
}
if resp == nil {
return fmt.Errorf("apns: nil response")
}
// Always log the APNs HTTP response so we have visibility into
// silent-drop classes (Apple 200 + no delivery, throttling, etc.).
// Bugboard #348 diagnostic — see investigation comment.
p.logger.Info("apns send response",
zap.Int("http_status", resp.StatusCode),
zap.String("reason", resp.Reason),
zap.String("apns_id", resp.ApnsID),
zap.String("device_token_prefix", tokenPrefix(msg.DeviceToken)),
)
switch resp.StatusCode {
case http.StatusOK:
return nil
case http.StatusGone:
// 410 Unregistered — surfaced as a sentinel so the dispatcher
// (or caller) can remove the device row.
return fmt.Errorf("%w: apns_id=%s reason=%s", ErrDeviceUnregistered, resp.ApnsID, resp.Reason)
// 410 Unregistered — both the sentinel sentinel wrap (for
// legacy errors.Is callers) AND a structured PushError (for
// the new SendToUserDetailed dispatcher path).
return &push.PushError{
HTTPStatus: http.StatusGone,
Reason: resp.Reason,
Message: fmt.Sprintf("apns: device token unregistered (410): apns_id=%s reason=%s", resp.ApnsID, resp.Reason),
Unregistered: true,
Wrapped: ErrDeviceUnregistered,
}
default:
return fmt.Errorf("apns: http %d: reason=%s apns_id=%s",
resp.StatusCode, resp.Reason, resp.ApnsID)
return &push.PushError{
HTTPStatus: resp.StatusCode,
Reason: resp.Reason,
Message: fmt.Sprintf("apns: http %d: reason=%s apns_id=%s", resp.StatusCode, resp.Reason, resp.ApnsID),
}
}
}
// buildAPSPayload assembles the APNs JSON payload from a generic
// PushMessage. The `aps` dictionary is the Apple-required wrapper;
// custom fields (`data`) go alongside at the top level.
// topicForKind returns the APNs `apns-topic` header value for this
// Provider's kind. PushKit / VoIP pushes MUST target the bundle ID
// suffixed with `.voip` — Apple routes those to the PushKit delivery
// path that wakes the app via CallKit. Alert pushes use the bare bundle.
func (p *Provider) topicForKind() string {
if p.kind == KindVoIP {
return p.bundleID + ".voip"
}
return p.bundleID
}
// pushTypeForKind returns the APNs `apns-push-type` header value.
// Required since iOS 13 — Apple rejects pushes lacking this header at
// the edge with `MissingTopic`/`InvalidPushType` errors.
func (p *Provider) pushTypeForKind() apns2.EPushType {
if p.kind == KindVoIP {
return apns2.PushTypeVOIP
}
return apns2.PushTypeAlert
}
// hasVisibleContent reports whether the message has any payload field
// that Apple will display or process. An APNs push with none of these
// is silently 200'd by Apple AND dropped — that's the bugboard #348
// root cause we want to surface as a structured error.
//
// `content_available: true` in Data signals a background-only push
// (legal even with empty alert) — we accept that as valid content.
func hasVisibleContent(msg push.PushMessage) bool {
if msg.Title != "" || msg.Body != "" {
return true
}
if msg.Badge > 0 {
return true
}
if msg.Sound != "" {
return true
}
if ca, ok := msg.Data["content_available"]; ok {
// Accept truthy variants: bool true, int/float != 0, "1"/"true".
switch v := ca.(type) {
case bool:
return v
case int:
return v != 0
case int64:
return v != 0
case float64:
return v != 0
case string:
return v == "1" || v == "true"
}
}
return false
}
// tokenPrefix returns the first 8 chars of a device token, safe for
// logging. The full token is sensitive — never log it whole.
func tokenPrefix(token string) string {
if len(token) <= 8 {
return token
}
return token[:8] + "..."
}
// buildAPSPayload assembles the APNs JSON payload from a generic PushMessage.
// The `aps` dictionary is the Apple-required wrapper; custom `Data` placement
// depends on the kind:
//
// - KindAlert: custom data is nested under a top-level "body" object.
// expo-notifications' iOS serializer sets content.data ONLY from
// userInfo["body"] for remote notifications (NotificationRecords.swift:
// `if isRemote { return userInfo["body"] }`) — top-level sibling keys of
// `aps` are IGNORED, so spreading them there yields content.data=null on
// iOS. This was bugboard #38 (Data never reached the JS client despite
// correct wire serialization). Note: "body" here is the data envelope
// expo expects; it is distinct from the human-readable alert body, which
// lives at aps.alert.body.
// - KindVoIP: custom data stays at the top level. PushKit/CallKit pushes are
// handled by the app's native pushRegistry (not expo-notifications), which
// reads payload.dictionaryPayload directly.
//
// Reference: https://developer.apple.com/documentation/usernotifications/setting_up_a_remote_notification_server/generating_a_remote_notification
func buildAPSPayload(msg push.PushMessage) ([]byte, error) {
func buildAPSPayload(msg push.PushMessage, kind Kind) ([]byte, error) {
alert := map[string]string{}
if msg.Title != "" {
alert["title"] = msg.Title
@ -168,13 +322,57 @@ func buildAPSPayload(msg push.PushMessage) ([]byte, error) {
// the lock-screen view. Channel is the most natural mapping.
aps["thread-id"] = msg.Channel
}
// content-available: 1 signals a background-only push to iOS. The
// caller opts in via Data["content_available"] (any truthy value).
// Mapped here at the aps boundary so the WASM Data shape stays
// snake_case while Apple's wire format uses the canonical key.
if ca, ok := msg.Data["content_available"]; ok {
switch v := ca.(type) {
case bool:
if v {
aps["content-available"] = 1
}
case int:
if v != 0 {
aps["content-available"] = 1
}
case int64:
if v != 0 {
aps["content-available"] = 1
}
case float64:
if v != 0 {
aps["content-available"] = 1
}
case string:
if v == "1" || v == "true" {
aps["content-available"] = 1
}
}
}
root := map[string]interface{}{"aps": aps}
// Collect tenant custom data, excluding reserved keys: `aps` (must not be
// clobbered) and `content_available` (already mapped into aps above).
data := map[string]interface{}{}
for k, v := range msg.Data {
// Don't allow tenant data to clobber `aps`.
if k == "aps" {
if k == "aps" || k == "content_available" {
continue
}
root[k] = v
data[k] = v
}
if len(data) > 0 {
if kind == KindVoIP {
// Native PushKit reads the dictionary payload directly — top-level.
for k, v := range data {
root[k] = v
}
} else {
// expo-notifications surfaces content.data from userInfo["body"]
// only (bugboard #38) — nest the data envelope there.
root["body"] = data
}
}
return json.Marshal(root)
}

View File

@ -11,6 +11,7 @@ import (
"github.com/DeBrosOfficial/network/pkg/push"
"github.com/sideshow/apns2"
"go.uber.org/zap"
)
// fakePushClient implements pushClient for unit tests so we don't have
@ -37,13 +38,23 @@ func (f *fakePushClient) PushWithContext(ctx apns2.Context, n *apns2.Notificatio
return f.resp, f.err
}
// newTestProvider constructs a Provider with a stub pushClient,
// bypassing real APNs.
// newTestProvider constructs an alert-kind Provider with a stub
// pushClient, bypassing real APNs. Existing call sites get the same
// behavior as pre-#408 — no need to thread a Kind through every test.
func newTestProvider(t *testing.T, bundle string, fake *fakePushClient) *Provider {
t.Helper()
return newTestProviderKind(t, bundle, KindAlert, fake)
}
// newTestProviderKind constructs a Provider of the given kind for
// VoIP-path coverage. Bugboard #408.
func newTestProviderKind(t *testing.T, bundle string, kind Kind, fake *fakePushClient) *Provider {
t.Helper()
return &Provider{
bundleID: bundle,
kind: kind,
client: fake,
logger: zap.NewNop(),
}
}
@ -144,7 +155,7 @@ func TestValidator_RedactNeverEchoesP8Key(t *testing.T) {
func TestBuildAPSPayload_basicAlert(t *testing.T) {
msg := push.PushMessage{Title: "hi", Body: "from orama"}
raw, err := buildAPSPayload(msg)
raw, err := buildAPSPayload(msg, KindAlert)
if err != nil {
t.Fatalf("build: %v", err)
}
@ -163,23 +174,58 @@ func TestBuildAPSPayload_basicAlert(t *testing.T) {
}
}
func TestBuildAPSPayload_dataAlongsideAPS(t *testing.T) {
// Bugboard #38: for an ALERT push, custom data must be nested under a
// top-level "body" object — expo-notifications' iOS serializer reads
// content.data from userInfo["body"] only, ignoring top-level sibling keys.
func TestBuildAPSPayload_alertNestsDataUnderBody(t *testing.T) {
msg := push.PushMessage{
Title: "x",
Body: "y",
Data: map[string]interface{}{"thread": "abc", "deeplink": "anchat://room/42"},
}
raw, _ := buildAPSPayload(msg)
raw, _ := buildAPSPayload(msg, KindAlert)
var out map[string]interface{}
_ = json.Unmarshal(raw, &out)
if err := json.Unmarshal(raw, &out); err != nil {
t.Fatalf("payload not valid JSON: %v", err)
}
if _, hasAPS := out["aps"]; !hasAPS {
t.Error("payload missing aps")
}
if out["thread"] != "abc" {
t.Errorf("data.thread missing; got %v", out)
// Must NOT be at the top level (expo would ignore it there).
if _, leaked := out["thread"]; leaked {
t.Errorf("data leaked to top level; expo-notifications would drop it: %v", out)
}
if out["deeplink"] != "anchat://room/42" {
t.Errorf("data.deeplink missing; got %v", out)
body, ok := out["body"].(map[string]interface{})
if !ok {
t.Fatalf("alert data not nested under top-level \"body\" object; got %v", out)
}
if body["thread"] != "abc" || body["deeplink"] != "anchat://room/42" {
t.Errorf("body envelope missing data; got %v", body)
}
// The human-readable alert body stays under aps.alert.body, distinct from
// the data envelope key.
aps := out["aps"].(map[string]interface{})
if alert, ok := aps["alert"].(map[string]interface{}); !ok || alert["body"] != "y" {
t.Errorf("aps.alert.body should be the human-readable body; got %v", aps["alert"])
}
}
// VoIP pushes are handled by native PushKit (not expo-notifications), so
// custom data stays at the top level of the dictionary payload.
func TestBuildAPSPayload_voipKeepsDataTopLevel(t *testing.T) {
msg := push.PushMessage{
Data: map[string]interface{}{"callId": "c-1", "callerName": "Alice"},
}
raw, _ := buildAPSPayload(msg, KindVoIP)
var out map[string]interface{}
if err := json.Unmarshal(raw, &out); err != nil {
t.Fatalf("payload not valid JSON: %v", err)
}
if out["callId"] != "c-1" || out["callerName"] != "Alice" {
t.Errorf("voip data must stay top-level for PushKit; got %v", out)
}
if _, nested := out["body"]; nested {
t.Errorf("voip data must NOT be nested under body; got %v", out)
}
}
@ -188,7 +234,7 @@ func TestBuildAPSPayload_dataCannotClobberAPS(t *testing.T) {
Title: "x",
Data: map[string]interface{}{"aps": "evil"},
}
raw, _ := buildAPSPayload(msg)
raw, _ := buildAPSPayload(msg, KindAlert)
var out map[string]interface{}
_ = json.Unmarshal(raw, &out)
apsField, ok := out["aps"]
@ -204,7 +250,7 @@ func TestBuildAPSPayload_badgeAndSound(t *testing.T) {
msg := push.PushMessage{
Title: "x", Badge: 3, Sound: "ding.caf",
}
raw, _ := buildAPSPayload(msg)
raw, _ := buildAPSPayload(msg, KindAlert)
if !strings.Contains(string(raw), `"badge":3`) {
t.Errorf("badge not in payload: %s", raw)
}
@ -215,7 +261,7 @@ func TestBuildAPSPayload_badgeAndSound(t *testing.T) {
func TestBuildAPSPayload_channelMapsToThreadID(t *testing.T) {
msg := push.PushMessage{Title: "x", Channel: "messages"}
raw, _ := buildAPSPayload(msg)
raw, _ := buildAPSPayload(msg, KindAlert)
if !strings.Contains(string(raw), `"thread-id":"messages"`) {
t.Errorf("channel not mapped to thread-id: %s", raw)
}
@ -370,3 +416,143 @@ func TestParseCredentials_RejectsBadConfig(t *testing.T) {
t.Error("expected error on bad config")
}
}
// ---- Bugboard #348 hardening: empty-content + structured PushError -------
// TestSend_EmptyContentRejected verifies the bugboard #348 root-cause
// guard: a message with no title, body, badge, sound, or
// content_available marker MUST fail upfront — not silently 200 from
// Apple and look like delivery success.
func TestSend_EmptyContentRejected(t *testing.T) {
p := newTestProvider(t, "com.example.app", &fakePushClient{})
err := p.Send(context.Background(), push.PushMessage{
DeviceToken: "ABCDEF1234",
// No Title, Body, Badge, Sound, or content_available in Data.
})
if !errors.Is(err, push.ErrEmptyContent) {
t.Errorf("expected push.ErrEmptyContent for empty payload; got %v", err)
}
}
// TestSend_ContentAvailableAccepted ensures background-only pushes
// (content_available without alert) ARE allowed — iOS uses this for
// silent data pushes that wake the app without UI. Bugboard #348:
// don't over-reject; only reject pushes that have NOTHING.
func TestSend_ContentAvailableAccepted(t *testing.T) {
fake := &fakePushClient{
resp: &apns2.Response{StatusCode: http.StatusOK, ApnsID: "ok-1"},
}
p := newTestProvider(t, "com.example.app", fake)
err := p.Send(context.Background(), push.PushMessage{
DeviceToken: "ABCDEF1234",
Data: map[string]interface{}{"content_available": true},
})
if err != nil {
t.Fatalf("content-available push should be allowed: %v", err)
}
if fake.lastSent == nil {
t.Fatal("Send didn't dispatch to client")
}
// Verify content-available landed in the aps dict.
var payload map[string]interface{}
if err := json.Unmarshal(fake.lastSent.Payload.([]byte), &payload); err != nil {
t.Fatalf("decode payload: %v", err)
}
aps, _ := payload["aps"].(map[string]interface{})
if aps["content-available"] != float64(1) {
t.Errorf("aps.content-available = %v; want 1", aps["content-available"])
}
}
// TestSend_Non200ReturnsPushError verifies non-200 responses return a
// structured *push.PushError with the HTTP status, reason, and (for
// 410) the Unregistered flag — so SendToUserDetailed can extract them
// for the WASM caller. Bugboard #348.
func TestSend_Non200ReturnsPushError(t *testing.T) {
cases := []struct {
name string
status int
reason string
wantUnregistered bool
}{
{"410_unregistered", http.StatusGone, "Unregistered", true},
{"400_bad_device_token", http.StatusBadRequest, "BadDeviceToken", false},
{"403_invalid_provider_token", http.StatusForbidden, "InvalidProviderToken", false},
{"500_internal_apple_error", http.StatusInternalServerError, "InternalServerError", false},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
fake := &fakePushClient{
resp: &apns2.Response{StatusCode: tc.status, Reason: tc.reason, ApnsID: "x"},
}
p := newTestProvider(t, "com.example.app", fake)
err := p.Send(context.Background(), push.PushMessage{
DeviceToken: "tok",
Title: "x",
})
if err == nil {
t.Fatal("expected error for non-200 response")
}
var perr *push.PushError
if !errors.As(err, &perr) {
t.Fatalf("expected *push.PushError; got %T: %v", err, err)
}
if perr.HTTPStatus != tc.status {
t.Errorf("HTTPStatus = %d; want %d", perr.HTTPStatus, tc.status)
}
if perr.Reason != tc.reason {
t.Errorf("Reason = %q; want %q", perr.Reason, tc.reason)
}
if perr.Unregistered != tc.wantUnregistered {
t.Errorf("Unregistered = %v; want %v", perr.Unregistered, tc.wantUnregistered)
}
})
}
}
// TestSend_410StillCompatibleWithLegacySentinel ensures the structured
// PushError for 410 ALSO satisfies errors.Is(ErrDeviceUnregistered) so
// existing callers using the sentinel keep working.
func TestSend_410StillCompatibleWithLegacySentinel(t *testing.T) {
fake := &fakePushClient{
resp: &apns2.Response{StatusCode: http.StatusGone, Reason: "Unregistered", ApnsID: "x"},
}
p := newTestProvider(t, "com.example.app", fake)
err := p.Send(context.Background(), push.PushMessage{
DeviceToken: "tok",
Title: "x",
})
if !errors.Is(err, ErrDeviceUnregistered) {
t.Errorf("expected errors.Is(err, ErrDeviceUnregistered) to be true; got %v", err)
}
}
// TestHasVisibleContent exercises every accepted shape so the guard
// matches the WASM caller's mental model.
func TestHasVisibleContent(t *testing.T) {
cases := []struct {
name string
msg push.PushMessage
want bool
}{
{"empty", push.PushMessage{}, false},
{"title only", push.PushMessage{Title: "hi"}, true},
{"body only", push.PushMessage{Body: "hi"}, true},
{"badge only", push.PushMessage{Badge: 1}, true},
{"sound only", push.PushMessage{Sound: "ping.aiff"}, true},
{"content_available bool true", push.PushMessage{Data: map[string]interface{}{"content_available": true}}, true},
{"content_available bool false", push.PushMessage{Data: map[string]interface{}{"content_available": false}}, false},
{"content_available int 1", push.PushMessage{Data: map[string]interface{}{"content_available": 1}}, true},
{"content_available string 1", push.PushMessage{Data: map[string]interface{}{"content_available": "1"}}, true},
{"content_available string true", push.PushMessage{Data: map[string]interface{}{"content_available": "true"}}, true},
{"data without content_available", push.PushMessage{Data: map[string]interface{}{"other_key": "value"}}, false},
{"title and badge", push.PushMessage{Title: "x", Badge: 5}, true},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if got := hasVisibleContent(tc.msg); got != tc.want {
t.Errorf("hasVisibleContent(%+v) = %v; want %v", tc.msg, got, tc.want)
}
})
}
}

Some files were not shown because too many files have changed in this diff Show More