1196 Commits

Author SHA1 Message Date
anonpenguin23
e7ed718965 fix(namespace): don't silently disable TURN on unresolvable WebRTC secret (#130)
A freshly-joined namespace node could come up with TURN silently disabled
(turn.credentials -> namespace_not_configured) when GetWebRTCConfig errored:
the stored TURN shared secret was encrypted with a pre-rotation
cluster-secret-derived key and failed to decrypt, and the converge swallowed
that error into "WebRTC disabled", writing a TURN-disabled gateway config.

Distinguish "resolved but not enabled" (genuinely disabled, fine to write the
empty block) from "unresolved" (DB/decrypt error). chooseRestoreWebRTC's
dbFetch callback now returns a `resolved` bool; an unresolved lookup forces
enabled=false AND sets restoreWebRTC.unresolved. The converge then:
  - logs the decrypt/read error loudly with the exact remediation
    (`orama namespace disable webrtc` then `enable webrtc`) instead of
    swallowing it;
  - on the warm path, SKIPS ReconcileGateway so it doesn't rewrite a running
    gateway's working WebRTC block to empty (preserves TURN);
  - on the cold path, still spawns the gateway (the namespace needs one) but
    warns loudly that TURN is degraded until the secret is regenerated.

Healthy nodes are unaffected: a node whose state file holds the secret
short-circuits before dbFetch, so a flaky/rotated DB cannot disable it.
Dual-reviewed (code-quality APPROVED, security SECURE — no secret material is
logged; operator disable still resolves to disabled, not unresolved).

Pure-function coverage in restore_webrtc_test.go: unresolved marking,
resolved-empty-is-disabled, and state-secret-wins-over-db-error.
2026-06-12 16:44:25 +03:00
anonpenguin23
021d362b2f release: 0.122.50 v0.122.50-nightly 2026-06-12 10:14:00 +03:00
anonpenguin23
4d700aed54 feat(gateway): enforce jwt expiry on persistent websockets
- implement `wsJWTExpired` to validate token lifetime with a grace period
- capture jwt expiry at connection upgrade and update via auth.refresh
- close connections with custom code 4401 when tokens expire to force re-auth
- add unit tests to verify expiry logic and state transitions
2026-06-12 10:12:21 +03:00
anonpenguin23
d113b75497 feat(auth): refresh-token custom claims hook (#548)
Custom JWT claims survive token refresh: migration 031 adds the
custom-claims column to refresh tokens, the new gateway ClaimsProvider
re-resolves claims on refresh, and the serverless invoke path carries
them through. Includes refresh-rotation, WS-JWT middleware, and
claims-provider test coverage.
v0.122.49-nightly
2026-06-12 08:05:27 +03:00
anonpenguin23
8472861ed3 merge main into nightly — keep nightly (staging) on all conflicts
Reconciles the divergence created by the May-13 nightly history rewrite
(main absorbed pre-rewrite SHAs via PRs #90-92). Content conflicts all
resolve in nightly's favor; nightly is the deployed, verified branch
(v0.122.47 live on devnet).
2026-06-11 17:32:37 +03:00
anonpenguin23
cd8c717363 chore(version): bump to 0.122.47
- refactor(turn): extract decodeTURNConfig for testability
- feat(turn): add stealth domain fields to config
- fix(apns): nest custom data under "body" for expo-notifications compatibility
v0.122.47-nightly
2026-06-11 11:45:12 +03:00
anonpenguin23
f4c58db710 release: 0.122.46 v0.122.46-nightly 2026-06-11 10:06:19 +03:00
anonpenguin23
8375d92109 feat(namespace): reuse caddy wildcard certificate for stealth turns
- Implement `resolveStealthCert` to use existing `*.<baseDomain>` wildcard certificates instead of dynamic Caddyfile provisioning.
- Avoids EROFS errors caused by `ProtectSystem=strict` on the orama-node service.
- Add strict validation to ensure stealth hosts are single-label subdomains covered by the wildcard.
2026-06-11 10:04:45 +03:00
anonpenguin23
37daf28b5a release: 0.122.45 v0.122.45-nightly 2026-06-11 08:00:31 +03:00
anonpenguin23
b425f80efb fix(config): add sni_router to root Config — prevents feat-124 boot crash
b9d5f54 (stealth TURN discovery) emits a top-level `sni_router:` block
into node.yaml unconditionally, but only added a lenient ad-hoc parse
in the carry-forward logic — not the field on config.Config that
orama-node strict-decodes (KnownFields(true)) at boot. Identical
failure mode to the v0.122.42 secrets_encryption_key incident: the
unknown key fails the whole node.yaml parse and orama-node crash-loops.

Caught pre-deploy this time by the strict-decode gate check; devnet
never saw it. Regression test added alongside the v0.122.42 one in
decode_test.go.
2026-06-11 08:00:31 +03:00
anonpenguin23
b9d5f542e1 feat(gateway): implement stealth TURN discovery and configuration
- Add `turn_stealth_domain` to gateway config for stealth TURN support
- Introduce `turn_discovery` in `sni-router` to auto-discover per-namespace routes
- Add database migration to enable stealth TURN per namespace
- Document ephemeral state API in `SERVERLESS.md`
2026-06-11 07:04:50 +03:00
anonpenguin23
f192cd0b84 release: 0.122.44 v0.122.44-nightly 2026-06-10 12:13:25 +03:00
anonpenguin23
ff3e273da8 feat(gateway): implement persistent secrets and webrtc configuration
- add `secrets_encryption_key` to gateway config for serverless secrets
- implement durable TURN secret persistence to prevent config regen outages
- add regression test for gateway config loading and field mapping
2026-06-10 12:10:52 +03:00
anonpenguin23
4c631243b3 release: 0.122.43 v0.122.43-nightly 2026-06-09 15:57:33 +03:00
anonpenguin23
e685c864fc fix(config): add secrets_encryption_key to HTTPGatewayConfig — fixes orama-node boot crash
v0.122.42 (f412425, secrets encryption) shipped the template emission,
the per-cluster secret generator, and the gateway.Config consumer — but
NOT the parse field on config.HTTPGatewayConfig. Phase 4 writes
`secrets_encryption_key` into node.yaml under the http_gateway section,
and pkg/config/yaml.go decodes with KnownFields(true) (strict). The
unknown field made every node.yaml parse fail, so orama-node exited 1
on every start and systemd crash-looped it (restart counter hit 380+ on
the first upgraded devnet node before the rolling controller halted).

Root cause: a generated-config field with no matching struct field under
strict unmarshal. Fix is the missing field. The runtime key itself is
still consumed from ~/.orama/secrets/secrets-encryption-key (pkg/node/
gateway.go), which already worked — so this one-field addition fully
restores boot AND the feature.

The standalone gateway (cmd/gateway/config.go) uses lenient parsing and
was unaffected.

Regression test in pkg/config/decode_test.go decodes a node.yaml
carrying secrets_encryption_key under strict mode.
2026-06-09 15:57:32 +03:00
anonpenguin23
b6b518e005 release: 0.122.42 v0.122.42-nightly 2026-06-09 13:01:38 +03:00
anonpenguin23
f41242538e feat(serverless): add raw http response mode and secrets encryption
- Add `raw_http_response` configuration to functions to allow verbatim HTTP responses
- Implement cluster-wide secrets encryption key generation and distribution for serverless functions
- Update documentation with UnifiedPush support for ntfy on Android/GrapheneOS
2026-06-09 13:01:02 +03:00
anonpenguin23
aa04ab5f50 release: 0.122.41 v0.122.41-nightly 2026-06-09 09:24:59 +03:00
anonpenguin23
f8de4af704 feat(sni-router): implement hot-reloading for route configuration
- Add `FileRouteReloader` to watch and atomically update routes from disk
- Refactor `main` to support seamless configuration updates without restarts
- Ensure existing routes are preserved if a reload encounters an error
2026-06-09 09:23:54 +03:00
anonpenguin23
32f7b3824e release: 0.122.40 v0.122.40-nightly 2026-06-04 10:08:59 +03:00
anonpenguin23
eade6e1742 feat(pubsub): remove mesh formation wait and add publish rate limiting
- Remove the 2-second polling wait for gossipsub mesh formation in `Publish`
  to eliminate unnecessary latency, relying on `FloodPublish` for delivery.
- Introduce a per-invocation publish budget (1000 messages) to prevent
  potential flooding of the shared gossipsub router by WASM functions.
- Add regression tests to ensure `Publish` remains non-blocking and that
  the publish budget is strictly enforced.
2026-06-04 10:08:10 +03:00
anonpenguin23
f3875d5157 release: 0.122.39 v0.122.39-nightly 2026-06-02 15:06:48 +03:00
anonpenguin23
9373c2ad92 feat(rqlite,serverless): add local read consistency and async invocation
- Introduce `BatchQueryConsistency` with `ReadConsistencyNone` to allow
  local SQLite reads, bypassing leader round-trips for performance.
- Add `function_invoke_async` host function to support non-blocking
  fire-and-forget function execution.
2026-06-01 19:59:30 +03:00
anonpenguin23
b2a3bff88c release: 0.122.38 v0.122.38-nightly 2026-06-01 10:13:15 +03:00
anonpenguin23
ca4ccbfcd4 feat(gateway): decouple turn credentials and sfu route registration
- split webrtc route gating into `webrtcServeTURNCredentials` and `webrtcServeSFURoutes` to allow non-SFU gateways to mint TURN credentials
- update `chooseRestoreWebRTC` to correctly resolve configurations for nodes without local SFU ports
- add unit tests to verify independent route registration logic (bugboard #25)
2026-06-01 10:12:07 +03:00
anonpenguin23
a3cf8384e9 release: 0.122.37 v0.122.37-nightly 2026-05-30 19:27:25 +03:00
anonpenguin23
bf0d5f9f9f feat(namespace): implement warm reconciliation for gateway webrtc config
- Add logic to reconcile gateway configuration drift for running instances
- Prevent unnecessary restart loops by verifying on-disk config state
- Add unit tests to validate synchronization logic and prevent regressions
2026-05-30 19:26:26 +03:00
anonpenguin23
3987ad0cf3 release: 0.122.36 v0.122.36-nightly 2026-05-30 14:41:51 +03:00
anonpenguin23
4fc975216f feat(gateway): fix WebRTC config persistence and endpoint access
- Add internal WebRTC management endpoints to public path exemption list
- Implement DB fallback for WebRTC configuration during cluster restore
- Add unit tests to verify WebRTC config precedence and state self-healing
2026-05-30 14:39:39 +03:00
anonpenguin23
9bace7bbf4 release: 0.122.35 v0.122.35-nightly 2026-05-29 12:09:12 +03:00
anonpenguin23
325a2471c7 Changes 2026-05-29 11:46:20 +03:00
anonpenguin23
0d352d0b42 release: 0.122.34 v0.122.34-nightly 2026-05-28 09:55:42 +03:00
anonpenguin23
cfff08d91e feat(serverless): add turn_credentials host function and slow invocation diagnostics
- Implement `turn_credentials` host function to provide TURN configuration to WASM modules.
- Add structured logging for slow serverless invocations exceeding 5s, providing per-phase timing (rate-limit, module-load, execution) to identify performance bottlenecks.
- Enhance WebSocket handler logging to capture request context when 30s timeouts occur.
2026-05-28 09:54:24 +03:00
anonpenguin23
8fbc4485c1 fix(serverless): enable system clocks for wasm modules
- opt into `WithSysWalltime` and `WithSysNanotime` to prevent wazero from using a frozen sentinel clock
- add regression tests to verify real-time clock behavior in wasm execution
- ensure serverless functions receive accurate timestamps for audit and cursor logic
2026-05-26 10:53:07 +03:00
anonpenguin23
1399b22676 release: 0.122.33 v0.122.33-nightly 2026-05-25 10:25:51 +03:00
anonpenguin23
1faf04e2a3 feat(cli): add function enable/disable and fix upgrade re-exec
- Add `enable` and `disable` commands to manage function status
- Implement process re-exec in the upgrade orchestrator to ensure
  Phase 4 config generation uses the newly-installed binary version
  (fixes bugboard #15)
2026-05-25 10:25:04 +03:00
anonpenguin23
bc2c25ff16 release: 0.122.32 v0.122.32-nightly 2026-05-25 09:35:16 +03:00
anonpenguin23
b2d35bbde1 feat(gateway): enable local wildcard triggers for pubsub
- wire PubSubDispatcher to host functions to support local wildcard
  triggers for WASM-published topics
- implement batch deduplication by topic to prevent redundant trigger
  invocations and bound fan-out
- propagate trigger depth through function invocations to maintain
  recursion limits during local dispatch
2026-05-25 09:34:01 +03:00
anonpenguin23
0463f37c0d release: 0.122.31 v0.122.31-nightly 2026-05-24 20:57:52 +03:00
anonpenguin23
98dad46a81 fix(gateway): decouple webrtc route registration from legacy flag
- Update route registration logic to rely solely on SFUPort > 0, resolving a silent 404 issue where gateways with valid SFU configurations were incorrectly disabled.
- Retain WebRTCEnabled in config for backward compatibility with existing operator YAML and request schemas.
- Add unit tests to pin registration behavior and prevent future regressions.
2026-05-24 20:56:08 +03:00
anonpenguin23
877563b86f release: 0.122.30 v0.122.30-nightly 2026-05-24 19:39:39 +03:00
anonpenguin23
62e4d1963b feat(gateway): add apns_voip provider support
- register "apns_voip" provider to handle PushKit/CallKit signals
- implement target provider filtering in dispatcher to prevent cross-talk
  between alert and VoIP push paths
- add comprehensive tests to ensure backward compatibility for fan-out
  and correct filtering behavior
2026-05-24 19:38:38 +03:00
anonpenguin23
57eb9f4f66 release: 0.122.29 v0.122.29-nightly 2026-05-23 12:49:53 +03:00
anonpenguin23
ccbcea0f3f fix(serverless): prevent invocation context race condition
- Attach InvocationContext to the execution context in Engine.Execute to
  ensure host functions resolve identity from the request context.
- Fixes a race condition where concurrent stateless invocations would
  overwrite the global singleton, causing cross-tenant leaks or nil
  namespace errors.
- Added a regression test to verify per-invocation isolation under load.
2026-05-23 12:48:45 +03:00
anonpenguin23
370c48d575 release: 0.122.28 v0.122.28-nightly 2026-05-21 15:53:58 +03:00
anonpenguin23
e2bc9577ff feat(serverless): isolate invocation logs and enforce cron poll interval
- Fix log cross-contamination by introducing per-invocation LogBuffers
  (bugboard #108)
- Enforce a 100ms minimum for CronPollInterval to prevent scheduler
  starvation (bugboard #109)
- Add comprehensive validation tests for cron interval constraints
2026-05-21 15:52:46 +03:00
anonpenguin23
3b8139802c feat: APNs silent-drop guard + persistent-WS mid-session JWT refresh
#348 - APNs silent-drop guard
Apple's APNs silently returns HTTP 200 for pushes with no visible
content (no title, no body, no badge, no sound, no
content-available=1) and then drops them — which looked to the WASM
caller like a successful delivery. Now rejected up-front with the new
push.ErrEmptyContent sentinel, and the APNs provider returns the
structured push.PushError shape (HTTPStatus, Reason, Unregistered,
Wrapped) so the dispatcher can branch on Unregistered to remove dead
tokens automatically. Legacy ErrDeviceUnregistered sentinel is
preserved for errors.Is compatibility (wrapped inside PushError).

Always logs APNs HTTP response (status, reason, apns_id, token prefix)
so future silent-drop classes show up in operator logs.

content-available is also now correctly mapped from snake_case
Data["content_available"] (any truthy variant) into Apple's
canonical "content-available": 1 inside the aps dictionary.

#321 - mid-session JWT refresh on persistent WS
Long-lived persistent WS connections used to have to close+reconnect
when the JWT rolled — losing per-instance state, message queues, and
subscriptions. The handler now accepts an "auth.refresh" control
frame: client sends the new token, the gateway re-verifies it via
the new JWTVerifier interface, updates the per-instance invCtx
in-place (persistent.Instance.UpdateInvCtx), and acks. No close, no
state loss.

JWTVerifier is optional — handlers set it via SetJWTVerifier at
gateway init. When unwired the handler nack's with a "not supported
on this gateway" response and clients fall back to the old
close+reconnect path, so older deploys don't break.

Other:
- push/dispatcher.go: SendToUserDetailed returns per-device PushError
  shape so callers can act on Unregistered / HTTPStatus / Reason.
- serverless/hostfunctions/push.go: WASM host functions for the new
  detailed-error shape.
- serverless/persistent/instance.go: UpdateInvCtx mid-session.

Tests:
- ws_persistent_control_test.go: auth.refresh ack/nack paths.
- apns_test.go: empty-content rejection, PushError shape on 410 +
  generic non-200, content-available mapping.
- dispatcher_detailed_test.go: SendToUserDetailed result shape.
- instance_update_invctx_test.go: invCtx update is per-instance, not
  cross-tenant.

VERSION bumped to 0.122.27.
v0.122.27-nightly
2026-05-19 18:19:21 +03:00
anonpenguin23
94f10c66c5 release: 0.122.26 v0.122.26-nightly 2026-05-18 10:42:50 +03:00
anonpenguin23
ebc9d51167 feat(gateway): implement pubsub dispatcher and batch query support
- Integrate PubSubDispatcher to enable libp2p subscription for trigger patterns
- Add BatchQuery to rqlite client to reduce round-trips for multi-query operations
- Implement lifecycle management for dispatcher and add safety limits for batch queries
2026-05-17 16:27:05 +03:00
anonpenguin23
17b06d38e4 fix(gateway,serverless): libp2p mesh peer-port + system-trigger auth bypass
Two serious bugs found via cross-node behavior observation:

1. libp2p peer-discovery published wrong port
   PeerDiscovery's multiaddr was using the gateway's HTTP API port (e.g.
   10004), not the actual libp2p TCP port. Remote gateways dialed that
   port, hit the HTTP server, received 400, and failed the libp2p
   multistream handshake ("message did not have trailing newline").
   Result: cluster-wide cross-node libp2p mesh had 0 connected peers
   and cross-node pubsub silently dropped 100% of messages.

   The libp2p port is OS-assigned at startup (client.go uses
   /ip4/0.0.0.0/tcp/0). It's not anywhere in cfg — it's only on
   host.Addrs(). Fix: drop the listenPort field from PeerDiscovery
   entirely and derive the port live from host.Addrs() via
   extractLibp2pTCPPort. WG IP still comes from getWireGuardIP
   (libp2p filters its own enumeration so WG IPs don't appear in
   host.Addrs(), but the listener is bound 0.0.0.0 so the port is
   reachable on the WG interface).

2. System triggers silently blocked by CanInvoke (#264)
   Cron, pubsub, database, timer, and job triggers all fire from
   gateway-internal state with no caller identity. Invoke() ran every
   request through CanInvoke(callerWallet) which returned false for
   the empty wallet — every fire returned ErrUnauthorized. Reported as
   a cron firing every minute with "unauthorized" for 19+ hours.

   Auth boundary for system triggers belongs at REGISTRATION time
   (POST /v1/functions/{name}/triggers, deploy-time auto-register
   from function.yaml). Skip the per-invocation check for system
   trigger types; user-driven triggers (HTTP, WebSocket) still gate
   on caller identity as before.

Tests:
- gateway/peer_discovery_test.go covers extractLibp2pTCPPort.
- serverless/invoke_system_trigger_test.go covers the bypass and the
  user-trigger gate.

VERSION bumped to 0.122.25.
v0.122.25-nightly
2026-05-16 15:43:18 +03:00