Reconciles the divergence created by the May-13 nightly history rewrite
(main absorbed pre-rewrite SHAs via PRs #90-92). Content conflicts all
resolve in nightly's favor; nightly is the deployed, verified branch
(v0.122.47 live on devnet).
- refactor(turn): extract decodeTURNConfig for testability
- feat(turn): add stealth domain fields to config
- fix(apns): nest custom data under "body" for expo-notifications compatibility
- Implement `resolveStealthCert` to use existing `*.<baseDomain>` wildcard certificates instead of dynamic Caddyfile provisioning.
- Avoids EROFS errors caused by `ProtectSystem=strict` on the orama-node service.
- Add strict validation to ensure stealth hosts are single-label subdomains covered by the wildcard.
b9d5f54 (stealth TURN discovery) emits a top-level `sni_router:` block
into node.yaml unconditionally, but only added a lenient ad-hoc parse
in the carry-forward logic — not the field on config.Config that
orama-node strict-decodes (KnownFields(true)) at boot. Identical
failure mode to the v0.122.42 secrets_encryption_key incident: the
unknown key fails the whole node.yaml parse and orama-node crash-loops.
Caught pre-deploy this time by the strict-decode gate check; devnet
never saw it. Regression test added alongside the v0.122.42 one in
decode_test.go.
- Add `turn_stealth_domain` to gateway config for stealth TURN support
- Introduce `turn_discovery` in `sni-router` to auto-discover per-namespace routes
- Add database migration to enable stealth TURN per namespace
- Document ephemeral state API in `SERVERLESS.md`
- add `secrets_encryption_key` to gateway config for serverless secrets
- implement durable TURN secret persistence to prevent config regen outages
- add regression test for gateway config loading and field mapping
v0.122.42 (f412425, secrets encryption) shipped the template emission,
the per-cluster secret generator, and the gateway.Config consumer — but
NOT the parse field on config.HTTPGatewayConfig. Phase 4 writes
`secrets_encryption_key` into node.yaml under the http_gateway section,
and pkg/config/yaml.go decodes with KnownFields(true) (strict). The
unknown field made every node.yaml parse fail, so orama-node exited 1
on every start and systemd crash-looped it (restart counter hit 380+ on
the first upgraded devnet node before the rolling controller halted).
Root cause: a generated-config field with no matching struct field under
strict unmarshal. Fix is the missing field. The runtime key itself is
still consumed from ~/.orama/secrets/secrets-encryption-key (pkg/node/
gateway.go), which already worked — so this one-field addition fully
restores boot AND the feature.
The standalone gateway (cmd/gateway/config.go) uses lenient parsing and
was unaffected.
Regression test in pkg/config/decode_test.go decodes a node.yaml
carrying secrets_encryption_key under strict mode.
- Add `raw_http_response` configuration to functions to allow verbatim HTTP responses
- Implement cluster-wide secrets encryption key generation and distribution for serverless functions
- Update documentation with UnifiedPush support for ntfy on Android/GrapheneOS
- Add `FileRouteReloader` to watch and atomically update routes from disk
- Refactor `main` to support seamless configuration updates without restarts
- Ensure existing routes are preserved if a reload encounters an error
- Remove the 2-second polling wait for gossipsub mesh formation in `Publish`
to eliminate unnecessary latency, relying on `FloodPublish` for delivery.
- Introduce a per-invocation publish budget (1000 messages) to prevent
potential flooding of the shared gossipsub router by WASM functions.
- Add regression tests to ensure `Publish` remains non-blocking and that
the publish budget is strictly enforced.
- Introduce `BatchQueryConsistency` with `ReadConsistencyNone` to allow
local SQLite reads, bypassing leader round-trips for performance.
- Add `function_invoke_async` host function to support non-blocking
fire-and-forget function execution.
- split webrtc route gating into `webrtcServeTURNCredentials` and `webrtcServeSFURoutes` to allow non-SFU gateways to mint TURN credentials
- update `chooseRestoreWebRTC` to correctly resolve configurations for nodes without local SFU ports
- add unit tests to verify independent route registration logic (bugboard #25)
- Add logic to reconcile gateway configuration drift for running instances
- Prevent unnecessary restart loops by verifying on-disk config state
- Add unit tests to validate synchronization logic and prevent regressions
- Add internal WebRTC management endpoints to public path exemption list
- Implement DB fallback for WebRTC configuration during cluster restore
- Add unit tests to verify WebRTC config precedence and state self-healing
- opt into `WithSysWalltime` and `WithSysNanotime` to prevent wazero from using a frozen sentinel clock
- add regression tests to verify real-time clock behavior in wasm execution
- ensure serverless functions receive accurate timestamps for audit and cursor logic
- Add `enable` and `disable` commands to manage function status
- Implement process re-exec in the upgrade orchestrator to ensure
Phase 4 config generation uses the newly-installed binary version
(fixes bugboard #15)
- wire PubSubDispatcher to host functions to support local wildcard
triggers for WASM-published topics
- implement batch deduplication by topic to prevent redundant trigger
invocations and bound fan-out
- propagate trigger depth through function invocations to maintain
recursion limits during local dispatch
- Update route registration logic to rely solely on SFUPort > 0, resolving a silent 404 issue where gateways with valid SFU configurations were incorrectly disabled.
- Retain WebRTCEnabled in config for backward compatibility with existing operator YAML and request schemas.
- Add unit tests to pin registration behavior and prevent future regressions.
- register "apns_voip" provider to handle PushKit/CallKit signals
- implement target provider filtering in dispatcher to prevent cross-talk
between alert and VoIP push paths
- add comprehensive tests to ensure backward compatibility for fan-out
and correct filtering behavior
- Attach InvocationContext to the execution context in Engine.Execute to
ensure host functions resolve identity from the request context.
- Fixes a race condition where concurrent stateless invocations would
overwrite the global singleton, causing cross-tenant leaks or nil
namespace errors.
- Added a regression test to verify per-invocation isolation under load.
#348 - APNs silent-drop guard
Apple's APNs silently returns HTTP 200 for pushes with no visible
content (no title, no body, no badge, no sound, no
content-available=1) and then drops them — which looked to the WASM
caller like a successful delivery. Now rejected up-front with the new
push.ErrEmptyContent sentinel, and the APNs provider returns the
structured push.PushError shape (HTTPStatus, Reason, Unregistered,
Wrapped) so the dispatcher can branch on Unregistered to remove dead
tokens automatically. Legacy ErrDeviceUnregistered sentinel is
preserved for errors.Is compatibility (wrapped inside PushError).
Always logs APNs HTTP response (status, reason, apns_id, token prefix)
so future silent-drop classes show up in operator logs.
content-available is also now correctly mapped from snake_case
Data["content_available"] (any truthy variant) into Apple's
canonical "content-available": 1 inside the aps dictionary.
#321 - mid-session JWT refresh on persistent WS
Long-lived persistent WS connections used to have to close+reconnect
when the JWT rolled — losing per-instance state, message queues, and
subscriptions. The handler now accepts an "auth.refresh" control
frame: client sends the new token, the gateway re-verifies it via
the new JWTVerifier interface, updates the per-instance invCtx
in-place (persistent.Instance.UpdateInvCtx), and acks. No close, no
state loss.
JWTVerifier is optional — handlers set it via SetJWTVerifier at
gateway init. When unwired the handler nack's with a "not supported
on this gateway" response and clients fall back to the old
close+reconnect path, so older deploys don't break.
Other:
- push/dispatcher.go: SendToUserDetailed returns per-device PushError
shape so callers can act on Unregistered / HTTPStatus / Reason.
- serverless/hostfunctions/push.go: WASM host functions for the new
detailed-error shape.
- serverless/persistent/instance.go: UpdateInvCtx mid-session.
Tests:
- ws_persistent_control_test.go: auth.refresh ack/nack paths.
- apns_test.go: empty-content rejection, PushError shape on 410 +
generic non-200, content-available mapping.
- dispatcher_detailed_test.go: SendToUserDetailed result shape.
- instance_update_invctx_test.go: invCtx update is per-instance, not
cross-tenant.
VERSION bumped to 0.122.27.
- Integrate PubSubDispatcher to enable libp2p subscription for trigger patterns
- Add BatchQuery to rqlite client to reduce round-trips for multi-query operations
- Implement lifecycle management for dispatcher and add safety limits for batch queries
Two serious bugs found via cross-node behavior observation:
1. libp2p peer-discovery published wrong port
PeerDiscovery's multiaddr was using the gateway's HTTP API port (e.g.
10004), not the actual libp2p TCP port. Remote gateways dialed that
port, hit the HTTP server, received 400, and failed the libp2p
multistream handshake ("message did not have trailing newline").
Result: cluster-wide cross-node libp2p mesh had 0 connected peers
and cross-node pubsub silently dropped 100% of messages.
The libp2p port is OS-assigned at startup (client.go uses
/ip4/0.0.0.0/tcp/0). It's not anywhere in cfg — it's only on
host.Addrs(). Fix: drop the listenPort field from PeerDiscovery
entirely and derive the port live from host.Addrs() via
extractLibp2pTCPPort. WG IP still comes from getWireGuardIP
(libp2p filters its own enumeration so WG IPs don't appear in
host.Addrs(), but the listener is bound 0.0.0.0 so the port is
reachable on the WG interface).
2. System triggers silently blocked by CanInvoke (#264)
Cron, pubsub, database, timer, and job triggers all fire from
gateway-internal state with no caller identity. Invoke() ran every
request through CanInvoke(callerWallet) which returned false for
the empty wallet — every fire returned ErrUnauthorized. Reported as
a cron firing every minute with "unauthorized" for 19+ hours.
Auth boundary for system triggers belongs at REGISTRATION time
(POST /v1/functions/{name}/triggers, deploy-time auto-register
from function.yaml). Skip the per-invocation check for system
trigger types; user-driven triggers (HTTP, WebSocket) still gate
on caller identity as before.
Tests:
- gateway/peer_discovery_test.go covers extractLibp2pTCPPort.
- serverless/invoke_system_trigger_test.go covers the bypass and the
user-trigger gate.
VERSION bumped to 0.122.25.
HostFunctions is a process-wide singleton (one per gateway engine).
Its `invCtx` field is shared across all WASM instances. For STATELESS
execution the executor sets/clears it per-call but the lock is
released before WASM runs — two concurrent invocations can race on
the field and one's host call can read the other's identity. Window
is microseconds.
For PERSISTENT WS the bug was much worse: invCtx used to be bound
ONCE at instantiation and reused for the connection's lifetime. Two
simultaneous persistent WS connections from different namespaces /
wallets overwrote each other's invCtx, and EVERY subsequent
function_invoke / GetCallerJWTSubject / GetCallerWallet / GetSecret
call from inside the WASM read whatever was bound LAST. Result:
silent identity leak across tenants for as long as the connections
overlapped.
Fix: per-call invCtx propagation through Go's context.Context.
wazero passes the ctx given to api.Function.Call through to host
function callbacks, so every WASM-host hop carries its own invCtx.
- pkg/serverless/invocation_context.go (new): WithInvocationContext +
InvocationContextFromCtx helpers using an unexported invCtxKey.
- pkg/serverless/hostfunctions/invocation_context.go (new):
currentInvocationContext(ctx) — ctx-attached invCtx wins over the
singleton field.
- All host accessors (FunctionInvoke, GetEnv, GetSecret, GetRequestID,
GetCallerWallet, GetWSClientID, GetCallerClaim, GetCallerJWTSubject)
now route through currentInvocationContext(ctx).
- pkg/serverless/persistent/instance.go: every export call's ctx is
wrapped with the per-instance invCtx before being passed to wazero.
- pkg/gateway/handlers/serverless/ws_persistent_handler.go: invCtx is
built per-frame and attached to ctx, not stored on a shared field.
- pkg/serverless/engine.go: removed the SetInvocationContext call at
InstantiatePersistent (no longer needed; ctx carries it).
Stateless still uses the singleton field — its race is latent since
the host-functions split and migrating it is a separate scoped
change.
Tests:
- hostfunctions/invocation_context_test.go covers ctx-wins-over-singleton.
- gateway/handlers/serverless/ws_persistent_handler_test.go covers the
per-frame ctx wiring.
- cli/functions/build_test.go is new coverage for the build path
touched in this change.
VERSION bumped to 0.122.24.
The previous fix (v0.122.22) made `InstantiatePersistent` call `_start`
to bootstrap TinyGo's runtime, then catch the resulting ExitError(0).
That got past init, but the module STILL died — wazero's stock
`proc_exit` implementation calls `mod.CloseWithExitCode(exitCode)`
before panicking, which invalidates the module regardless of what
the caller does with the panic. Every subsequent call to ws_open /
ws_frame / ws_close / orama_alloc returned ExitError(0) ("module
already closed").
Wazero exposes no flag for this — the close is hard-coded. The only
intercept point is to override `proc_exit` at the WASI host-module
boundary. Documented pattern at imports/wasi_snapshot_preview1/wasi.go
lines 111-127.
Fix: build the WASI host module manually so we can override
`proc_exit`:
- exit code 0 → panic ExitError(0) BUT do NOT close the module.
This is TinyGo's "_start completed cleanly" signal; the module's
other exports must stay callable for the persistent lifecycle.
- exit code != 0 → preserve standard WASI behavior (close + panic).
A non-zero exit is a genuine app-signaled failure; we want
`proc_exit(N != 0)` to behave exactly as upstream does.
The InstantiatePersistent caller already distinguishes the two cases
via errors.As + ExitCode() check — added in v0.122.22, no change here.
Safe for stateless functions on the same runtime: the stateless
execution path closes its own module after each invocation, so the
"module stays alive on exit 0" override has no effect on that path.
VERSION bumped to 0.122.23.
InstantiatePersistent passed WithStartFunctions() with no args,
explicitly disabling both wasi entry points. The intent was to skip
main(); the side effect was leaving the TinyGo runtime
uninitialized. The first call to any export traps via
wasmExportCheckRun and managed-memory ops panic. Every persistent WS
function was effectively dead since plan #06 landed.
Earlier patch in this thread restored the call but only handled
wasi-reactor builds (_initialize). AnChat's rpc-router is a wasi
command build (`_start` export only, no `_initialize`) — wasm-objdump
confirms — so the reactor-only fix still left it broken.
This fix tries `_initialize` first, falls back to `_start`, and
bounds whichever runs with a 5s timeout so a buggy main() can't hang
instantiation forever. Logs the chosen hook at Debug, warns when
neither is exported.
Still pass WithStartFunctions() (no args) so wazero doesn't
auto-call `_start` during InstantiateModule — we want full control
over which hook runs and the timeout that bounds it.
VERSION bumped to 0.122.22.