The HTTP client re-threw the raw platform error on a transport failure, so
callers (e.g. AnChat's JwtSession driving client.auth.refresh/challenge) could
only tell "couldn't reach the gateway" from a real HTTP error by string-
matching `TypeError: Network request failed`. The typed SDKError was built
only for the onNetworkError callback, never for the thrown error, and native
errors weren't retryable so they bubbled raw.
normalizeError() now maps a fetch failure -> SDKError{code:"NETWORK_ERROR",
httpStatus:0} and a timeout AbortError -> {code:"TIMEOUT",httpStatus:0}, and
that typed error is thrown to the caller. Genuine HTTP errors pass through
unchanged. httpStatus:0 is the stable "no HTTP response received" signal to
branch on; the original platform message is preserved for diagnostics.
Deliberately NOT auto-retrying network failures: a blind retry on a non-
idempotent POST like /v1/auth/refresh could burn the rotated refresh token on
a lost response. The SDK now just gives the app a typed error to drive its own
retry/failover.
Tests: tests/unit/http/network-errors-bug-129.test.ts (TypeError->NETWORK_ERROR,
AbortError->TIMEOUT, real 401 passes through, callback gets the typed error).
Full unit suite green, typecheck clean.
A freshly-joined namespace node could come up with TURN silently disabled
(turn.credentials -> namespace_not_configured) when GetWebRTCConfig errored:
the stored TURN shared secret was encrypted with a pre-rotation
cluster-secret-derived key and failed to decrypt, and the converge swallowed
that error into "WebRTC disabled", writing a TURN-disabled gateway config.
Distinguish "resolved but not enabled" (genuinely disabled, fine to write the
empty block) from "unresolved" (DB/decrypt error). chooseRestoreWebRTC's
dbFetch callback now returns a `resolved` bool; an unresolved lookup forces
enabled=false AND sets restoreWebRTC.unresolved. The converge then:
- logs the decrypt/read error loudly with the exact remediation
(`orama namespace disable webrtc` then `enable webrtc`) instead of
swallowing it;
- on the warm path, SKIPS ReconcileGateway so it doesn't rewrite a running
gateway's working WebRTC block to empty (preserves TURN);
- on the cold path, still spawns the gateway (the namespace needs one) but
warns loudly that TURN is degraded until the secret is regenerated.
Healthy nodes are unaffected: a node whose state file holds the secret
short-circuits before dbFetch, so a flaky/rotated DB cannot disable it.
Dual-reviewed (code-quality APPROVED, security SECURE — no secret material is
logged; operator disable still resolves to disabled, not unresolved).
Pure-function coverage in restore_webrtc_test.go: unresolved marking,
resolved-empty-is-disabled, and state-secret-wins-over-db-error.
- implement `wsJWTExpired` to validate token lifetime with a grace period
- capture jwt expiry at connection upgrade and update via auth.refresh
- close connections with custom code 4401 when tokens expire to force re-auth
- add unit tests to verify expiry logic and state transitions
Custom JWT claims survive token refresh: migration 031 adds the
custom-claims column to refresh tokens, the new gateway ClaimsProvider
re-resolves claims on refresh, and the serverless invoke path carries
them through. Includes refresh-rotation, WS-JWT middleware, and
claims-provider test coverage.
Reconciles the divergence created by the May-13 nightly history rewrite
(main absorbed pre-rewrite SHAs via PRs #90-92). Content conflicts all
resolve in nightly's favor; nightly is the deployed, verified branch
(v0.122.47 live on devnet).
- refactor(turn): extract decodeTURNConfig for testability
- feat(turn): add stealth domain fields to config
- fix(apns): nest custom data under "body" for expo-notifications compatibility
- Implement `resolveStealthCert` to use existing `*.<baseDomain>` wildcard certificates instead of dynamic Caddyfile provisioning.
- Avoids EROFS errors caused by `ProtectSystem=strict` on the orama-node service.
- Add strict validation to ensure stealth hosts are single-label subdomains covered by the wildcard.
b9d5f54 (stealth TURN discovery) emits a top-level `sni_router:` block
into node.yaml unconditionally, but only added a lenient ad-hoc parse
in the carry-forward logic — not the field on config.Config that
orama-node strict-decodes (KnownFields(true)) at boot. Identical
failure mode to the v0.122.42 secrets_encryption_key incident: the
unknown key fails the whole node.yaml parse and orama-node crash-loops.
Caught pre-deploy this time by the strict-decode gate check; devnet
never saw it. Regression test added alongside the v0.122.42 one in
decode_test.go.
- Add `turn_stealth_domain` to gateway config for stealth TURN support
- Introduce `turn_discovery` in `sni-router` to auto-discover per-namespace routes
- Add database migration to enable stealth TURN per namespace
- Document ephemeral state API in `SERVERLESS.md`
- add `secrets_encryption_key` to gateway config for serverless secrets
- implement durable TURN secret persistence to prevent config regen outages
- add regression test for gateway config loading and field mapping
v0.122.42 (f412425, secrets encryption) shipped the template emission,
the per-cluster secret generator, and the gateway.Config consumer — but
NOT the parse field on config.HTTPGatewayConfig. Phase 4 writes
`secrets_encryption_key` into node.yaml under the http_gateway section,
and pkg/config/yaml.go decodes with KnownFields(true) (strict). The
unknown field made every node.yaml parse fail, so orama-node exited 1
on every start and systemd crash-looped it (restart counter hit 380+ on
the first upgraded devnet node before the rolling controller halted).
Root cause: a generated-config field with no matching struct field under
strict unmarshal. Fix is the missing field. The runtime key itself is
still consumed from ~/.orama/secrets/secrets-encryption-key (pkg/node/
gateway.go), which already worked — so this one-field addition fully
restores boot AND the feature.
The standalone gateway (cmd/gateway/config.go) uses lenient parsing and
was unaffected.
Regression test in pkg/config/decode_test.go decodes a node.yaml
carrying secrets_encryption_key under strict mode.
- Add `raw_http_response` configuration to functions to allow verbatim HTTP responses
- Implement cluster-wide secrets encryption key generation and distribution for serverless functions
- Update documentation with UnifiedPush support for ntfy on Android/GrapheneOS
- Add `FileRouteReloader` to watch and atomically update routes from disk
- Refactor `main` to support seamless configuration updates without restarts
- Ensure existing routes are preserved if a reload encounters an error
- Remove the 2-second polling wait for gossipsub mesh formation in `Publish`
to eliminate unnecessary latency, relying on `FloodPublish` for delivery.
- Introduce a per-invocation publish budget (1000 messages) to prevent
potential flooding of the shared gossipsub router by WASM functions.
- Add regression tests to ensure `Publish` remains non-blocking and that
the publish budget is strictly enforced.
- Introduce `BatchQueryConsistency` with `ReadConsistencyNone` to allow
local SQLite reads, bypassing leader round-trips for performance.
- Add `function_invoke_async` host function to support non-blocking
fire-and-forget function execution.
- split webrtc route gating into `webrtcServeTURNCredentials` and `webrtcServeSFURoutes` to allow non-SFU gateways to mint TURN credentials
- update `chooseRestoreWebRTC` to correctly resolve configurations for nodes without local SFU ports
- add unit tests to verify independent route registration logic (bugboard #25)
- Add logic to reconcile gateway configuration drift for running instances
- Prevent unnecessary restart loops by verifying on-disk config state
- Add unit tests to validate synchronization logic and prevent regressions
- Add internal WebRTC management endpoints to public path exemption list
- Implement DB fallback for WebRTC configuration during cluster restore
- Add unit tests to verify WebRTC config precedence and state self-healing
- opt into `WithSysWalltime` and `WithSysNanotime` to prevent wazero from using a frozen sentinel clock
- add regression tests to verify real-time clock behavior in wasm execution
- ensure serverless functions receive accurate timestamps for audit and cursor logic
- Add `enable` and `disable` commands to manage function status
- Implement process re-exec in the upgrade orchestrator to ensure
Phase 4 config generation uses the newly-installed binary version
(fixes bugboard #15)
- wire PubSubDispatcher to host functions to support local wildcard
triggers for WASM-published topics
- implement batch deduplication by topic to prevent redundant trigger
invocations and bound fan-out
- propagate trigger depth through function invocations to maintain
recursion limits during local dispatch
- Update route registration logic to rely solely on SFUPort > 0, resolving a silent 404 issue where gateways with valid SFU configurations were incorrectly disabled.
- Retain WebRTCEnabled in config for backward compatibility with existing operator YAML and request schemas.
- Add unit tests to pin registration behavior and prevent future regressions.
- register "apns_voip" provider to handle PushKit/CallKit signals
- implement target provider filtering in dispatcher to prevent cross-talk
between alert and VoIP push paths
- add comprehensive tests to ensure backward compatibility for fan-out
and correct filtering behavior
- Attach InvocationContext to the execution context in Engine.Execute to
ensure host functions resolve identity from the request context.
- Fixes a race condition where concurrent stateless invocations would
overwrite the global singleton, causing cross-tenant leaks or nil
namespace errors.
- Added a regression test to verify per-invocation isolation under load.
#348 - APNs silent-drop guard
Apple's APNs silently returns HTTP 200 for pushes with no visible
content (no title, no body, no badge, no sound, no
content-available=1) and then drops them — which looked to the WASM
caller like a successful delivery. Now rejected up-front with the new
push.ErrEmptyContent sentinel, and the APNs provider returns the
structured push.PushError shape (HTTPStatus, Reason, Unregistered,
Wrapped) so the dispatcher can branch on Unregistered to remove dead
tokens automatically. Legacy ErrDeviceUnregistered sentinel is
preserved for errors.Is compatibility (wrapped inside PushError).
Always logs APNs HTTP response (status, reason, apns_id, token prefix)
so future silent-drop classes show up in operator logs.
content-available is also now correctly mapped from snake_case
Data["content_available"] (any truthy variant) into Apple's
canonical "content-available": 1 inside the aps dictionary.
#321 - mid-session JWT refresh on persistent WS
Long-lived persistent WS connections used to have to close+reconnect
when the JWT rolled — losing per-instance state, message queues, and
subscriptions. The handler now accepts an "auth.refresh" control
frame: client sends the new token, the gateway re-verifies it via
the new JWTVerifier interface, updates the per-instance invCtx
in-place (persistent.Instance.UpdateInvCtx), and acks. No close, no
state loss.
JWTVerifier is optional — handlers set it via SetJWTVerifier at
gateway init. When unwired the handler nack's with a "not supported
on this gateway" response and clients fall back to the old
close+reconnect path, so older deploys don't break.
Other:
- push/dispatcher.go: SendToUserDetailed returns per-device PushError
shape so callers can act on Unregistered / HTTPStatus / Reason.
- serverless/hostfunctions/push.go: WASM host functions for the new
detailed-error shape.
- serverless/persistent/instance.go: UpdateInvCtx mid-session.
Tests:
- ws_persistent_control_test.go: auth.refresh ack/nack paths.
- apns_test.go: empty-content rejection, PushError shape on 410 +
generic non-200, content-available mapping.
- dispatcher_detailed_test.go: SendToUserDetailed result shape.
- instance_update_invctx_test.go: invCtx update is per-instance, not
cross-tenant.
VERSION bumped to 0.122.27.
- Integrate PubSubDispatcher to enable libp2p subscription for trigger patterns
- Add BatchQuery to rqlite client to reduce round-trips for multi-query operations
- Implement lifecycle management for dispatcher and add safety limits for batch queries