1205 Commits

Author SHA1 Message Date
anonpenguin23
61635c4ce7 release: 0.122.54 v0.122.54-nightly 2026-06-13 09:25:16 +03:00
anonpenguin23
34f9da6f8d feat(gateway): implement ntfy cluster fan-out and improve secrets encryption
- Add `ntfyFanoutResolver` to distribute push notifications across all active cluster nodes, ensuring delivery when nodes lack shared state.
- Refactor secrets encryption key derivation to use cluster-wide secrets via HKDF, replacing ephemeral per-node keys to fix cross-node decryption issues.
- Add unit tests for fan-out resolution logic and caching behavior.
2026-06-13 09:23:14 +03:00
anonpenguin23
4ae8fa941d release: 0.122.53 2026-06-13 08:13:10 +03:00
anonpenguin23
2b184f0398 fix(namespace): make WebRTC config survive slow/cold node restarts (#130)
Root cause of the recurring "turn.credentials → namespace_not_configured" on a
distant node: at converge the gateway resolves its TURN secret from the
namespace rqlite, and on a slow/just-restarted node that read fails ONCE, so
the gateway is written with TURN disabled. Removing the node is not a fix — the
software must tolerate a slow read.

Two-part fix (complements e7ed718's "don't blank a warm config"):
  - RETRY the secret read (5×2s) at converge so a node whose rqlite is still
    syncing waits for it to land instead of writing an empty block once. A
    genuine decrypt failure still exhausts the retries → unresolved → the
    running config is preserved.
  - CACHE the resolved secret into the node's own cluster-state.json
    (applyResolvedWebRTCToState), so the NEXT cold start reads it from disk —
    chooseRestoreWebRTC is state-first and short-circuits before the DB. The
    state struct already had TURNSharedSecret "for cold start" but nothing
    populated it; now it's filled on every successful resolve (only rewritten
    on change). Each node self-heals its own cache; nothing new is sent
    cross-node.

cluster-state.json now carries the TURN secret, so both writers (local
saveLocalState and the remote SaveClusterState) are tightened to 0600 + chmod.
Stale-secret self-heals: disable/enable webrtc re-pushes every node's config
and the next converge re-caches the new value.

Dual-reviewed: code-quality APPROVED; security SECURE after the remote-write
0600 fix. Tests: cache populate + short-circuit, no-change, turn-only node.
2026-06-13 08:12:48 +03:00
anonpenguin23
66db54c094 release: 0.122.52 2026-06-13 07:53:33 +03:00
anonpenguin23
cf21668782 fix(push): cap VoIP apns-expiration to the ring window; record success status (#132)
VoIP call-invite pushes set no apns-expiration, so apns2 omits the header and
APNs store-and-forwards the push — delivering it minutes late and firing a
phantom "missed call" ring long after the call ended (and burning PushKit
goodwill, inviting throttling). Cap the VoIP apns-expiration to the ring window
(30s) so APNs delivers promptly or DISCARDS, never a stale invite. Alert pushes
keep the default store-and-forward so a message notification still lands after
the device reconnects.

Also surface HTTP 200 on a successful dispatch instead of leaving HTTPStatus at
0 — a successful push was logging "http=0", which reads like an opaque failure
and masked real false-success classes.

Tests: VoIP push carries an expiration within the ring-window cap; alert push
carries none. push package green.
2026-06-12 17:49:44 +03:00
anonpenguin23
33600092a8 fix(auth): bounded single-use refresh-token reuse grace (#125)
A lost rotation response strands the client on a just-revoked token: the retry
hits res.Count==0 → genuine 401 → SIWE, which is impossible on a VoIP-woken
locked screen, so the call dies. This recurred under the reconnect storms from
today's gateway rolls.

Add an RFC 9700 §4.13.2 reuse grace: a refresh token revoked within 60s whose
grace_used_at is still NULL is accepted ONCE more and mints a fresh session.
The grace path skips the revoke CAS (the token is already revoked — the CAS
would 0-match and mis-fire the replay tripwire) and is locked instead by a
single-use CAS on grace_used_at, so a stolen token can't be replayed at
leisure. The window predicate is repeated on the CAS to close the
SELECT→UPDATE TOCTOU, and the grace SELECT excludes expired tokens.

Security (found + fixed in review): explicit revocation (RevokeToken /
/v1/auth/logout) now also stamps grace_used_at, so a deliberately-logged-out
token can never be grace-recovered — closes a logout-bypass where a just-
revoked token would otherwise be resurrectable for 60s. Transient rqlite
errors on the grace lookup/CAS surface as 503 (retryable), not 401, preserving
the #125 transient-vs-genuine distinction.

Migration 032 adds grace_used_at (additive ALTER, rolling-safe; NULL = grace
available, the window predicate keeps historically-revoked tokens ineligible).

Dual-reviewed: code-quality APPROVED; security SECURE after the logout-bypass
fix. Tests: lost-response recovery, single-use second-attempt 401, genuine bad
token 401, and the logout-bypass regression.
2026-06-12 17:42:36 +03:00
anonpenguin23
f3145f1b90 release: 0.122.51 v0.122.51-nightly 2026-06-12 16:48:21 +03:00
anonpenguin23
6f5b2db95e fix(sdk): surface typed SDKError on network/timeout failures (#129)
The HTTP client re-threw the raw platform error on a transport failure, so
callers (e.g. AnChat's JwtSession driving client.auth.refresh/challenge) could
only tell "couldn't reach the gateway" from a real HTTP error by string-
matching `TypeError: Network request failed`. The typed SDKError was built
only for the onNetworkError callback, never for the thrown error, and native
errors weren't retryable so they bubbled raw.

normalizeError() now maps a fetch failure -> SDKError{code:"NETWORK_ERROR",
httpStatus:0} and a timeout AbortError -> {code:"TIMEOUT",httpStatus:0}, and
that typed error is thrown to the caller. Genuine HTTP errors pass through
unchanged. httpStatus:0 is the stable "no HTTP response received" signal to
branch on; the original platform message is preserved for diagnostics.

Deliberately NOT auto-retrying network failures: a blind retry on a non-
idempotent POST like /v1/auth/refresh could burn the rotated refresh token on
a lost response. The SDK now just gives the app a typed error to drive its own
retry/failover.

Tests: tests/unit/http/network-errors-bug-129.test.ts (TypeError->NETWORK_ERROR,
AbortError->TIMEOUT, real 401 passes through, callback gets the typed error).
Full unit suite green, typecheck clean.
2026-06-12 16:44:37 +03:00
anonpenguin23
e7ed718965 fix(namespace): don't silently disable TURN on unresolvable WebRTC secret (#130)
A freshly-joined namespace node could come up with TURN silently disabled
(turn.credentials -> namespace_not_configured) when GetWebRTCConfig errored:
the stored TURN shared secret was encrypted with a pre-rotation
cluster-secret-derived key and failed to decrypt, and the converge swallowed
that error into "WebRTC disabled", writing a TURN-disabled gateway config.

Distinguish "resolved but not enabled" (genuinely disabled, fine to write the
empty block) from "unresolved" (DB/decrypt error). chooseRestoreWebRTC's
dbFetch callback now returns a `resolved` bool; an unresolved lookup forces
enabled=false AND sets restoreWebRTC.unresolved. The converge then:
  - logs the decrypt/read error loudly with the exact remediation
    (`orama namespace disable webrtc` then `enable webrtc`) instead of
    swallowing it;
  - on the warm path, SKIPS ReconcileGateway so it doesn't rewrite a running
    gateway's working WebRTC block to empty (preserves TURN);
  - on the cold path, still spawns the gateway (the namespace needs one) but
    warns loudly that TURN is degraded until the secret is regenerated.

Healthy nodes are unaffected: a node whose state file holds the secret
short-circuits before dbFetch, so a flaky/rotated DB cannot disable it.
Dual-reviewed (code-quality APPROVED, security SECURE — no secret material is
logged; operator disable still resolves to disabled, not unresolved).

Pure-function coverage in restore_webrtc_test.go: unresolved marking,
resolved-empty-is-disabled, and state-secret-wins-over-db-error.
2026-06-12 16:44:25 +03:00
anonpenguin23
021d362b2f release: 0.122.50 v0.122.50-nightly 2026-06-12 10:14:00 +03:00
anonpenguin23
4d700aed54 feat(gateway): enforce jwt expiry on persistent websockets
- implement `wsJWTExpired` to validate token lifetime with a grace period
- capture jwt expiry at connection upgrade and update via auth.refresh
- close connections with custom code 4401 when tokens expire to force re-auth
- add unit tests to verify expiry logic and state transitions
2026-06-12 10:12:21 +03:00
anonpenguin23
d113b75497 feat(auth): refresh-token custom claims hook (#548)
Custom JWT claims survive token refresh: migration 031 adds the
custom-claims column to refresh tokens, the new gateway ClaimsProvider
re-resolves claims on refresh, and the serverless invoke path carries
them through. Includes refresh-rotation, WS-JWT middleware, and
claims-provider test coverage.
v0.122.49-nightly
2026-06-12 08:05:27 +03:00
anonpenguin23
8472861ed3 merge main into nightly — keep nightly (staging) on all conflicts
Reconciles the divergence created by the May-13 nightly history rewrite
(main absorbed pre-rewrite SHAs via PRs #90-92). Content conflicts all
resolve in nightly's favor; nightly is the deployed, verified branch
(v0.122.47 live on devnet).
2026-06-11 17:32:37 +03:00
anonpenguin23
cd8c717363 chore(version): bump to 0.122.47
- refactor(turn): extract decodeTURNConfig for testability
- feat(turn): add stealth domain fields to config
- fix(apns): nest custom data under "body" for expo-notifications compatibility
v0.122.47-nightly
2026-06-11 11:45:12 +03:00
anonpenguin23
f4c58db710 release: 0.122.46 v0.122.46-nightly 2026-06-11 10:06:19 +03:00
anonpenguin23
8375d92109 feat(namespace): reuse caddy wildcard certificate for stealth turns
- Implement `resolveStealthCert` to use existing `*.<baseDomain>` wildcard certificates instead of dynamic Caddyfile provisioning.
- Avoids EROFS errors caused by `ProtectSystem=strict` on the orama-node service.
- Add strict validation to ensure stealth hosts are single-label subdomains covered by the wildcard.
2026-06-11 10:04:45 +03:00
anonpenguin23
37daf28b5a release: 0.122.45 v0.122.45-nightly 2026-06-11 08:00:31 +03:00
anonpenguin23
b425f80efb fix(config): add sni_router to root Config — prevents feat-124 boot crash
b9d5f54 (stealth TURN discovery) emits a top-level `sni_router:` block
into node.yaml unconditionally, but only added a lenient ad-hoc parse
in the carry-forward logic — not the field on config.Config that
orama-node strict-decodes (KnownFields(true)) at boot. Identical
failure mode to the v0.122.42 secrets_encryption_key incident: the
unknown key fails the whole node.yaml parse and orama-node crash-loops.

Caught pre-deploy this time by the strict-decode gate check; devnet
never saw it. Regression test added alongside the v0.122.42 one in
decode_test.go.
2026-06-11 08:00:31 +03:00
anonpenguin23
b9d5f542e1 feat(gateway): implement stealth TURN discovery and configuration
- Add `turn_stealth_domain` to gateway config for stealth TURN support
- Introduce `turn_discovery` in `sni-router` to auto-discover per-namespace routes
- Add database migration to enable stealth TURN per namespace
- Document ephemeral state API in `SERVERLESS.md`
2026-06-11 07:04:50 +03:00
anonpenguin23
f192cd0b84 release: 0.122.44 v0.122.44-nightly 2026-06-10 12:13:25 +03:00
anonpenguin23
ff3e273da8 feat(gateway): implement persistent secrets and webrtc configuration
- add `secrets_encryption_key` to gateway config for serverless secrets
- implement durable TURN secret persistence to prevent config regen outages
- add regression test for gateway config loading and field mapping
2026-06-10 12:10:52 +03:00
anonpenguin23
4c631243b3 release: 0.122.43 v0.122.43-nightly 2026-06-09 15:57:33 +03:00
anonpenguin23
e685c864fc fix(config): add secrets_encryption_key to HTTPGatewayConfig — fixes orama-node boot crash
v0.122.42 (f412425, secrets encryption) shipped the template emission,
the per-cluster secret generator, and the gateway.Config consumer — but
NOT the parse field on config.HTTPGatewayConfig. Phase 4 writes
`secrets_encryption_key` into node.yaml under the http_gateway section,
and pkg/config/yaml.go decodes with KnownFields(true) (strict). The
unknown field made every node.yaml parse fail, so orama-node exited 1
on every start and systemd crash-looped it (restart counter hit 380+ on
the first upgraded devnet node before the rolling controller halted).

Root cause: a generated-config field with no matching struct field under
strict unmarshal. Fix is the missing field. The runtime key itself is
still consumed from ~/.orama/secrets/secrets-encryption-key (pkg/node/
gateway.go), which already worked — so this one-field addition fully
restores boot AND the feature.

The standalone gateway (cmd/gateway/config.go) uses lenient parsing and
was unaffected.

Regression test in pkg/config/decode_test.go decodes a node.yaml
carrying secrets_encryption_key under strict mode.
2026-06-09 15:57:32 +03:00
anonpenguin23
b6b518e005 release: 0.122.42 v0.122.42-nightly 2026-06-09 13:01:38 +03:00
anonpenguin23
f41242538e feat(serverless): add raw http response mode and secrets encryption
- Add `raw_http_response` configuration to functions to allow verbatim HTTP responses
- Implement cluster-wide secrets encryption key generation and distribution for serverless functions
- Update documentation with UnifiedPush support for ntfy on Android/GrapheneOS
2026-06-09 13:01:02 +03:00
anonpenguin23
aa04ab5f50 release: 0.122.41 v0.122.41-nightly 2026-06-09 09:24:59 +03:00
anonpenguin23
f8de4af704 feat(sni-router): implement hot-reloading for route configuration
- Add `FileRouteReloader` to watch and atomically update routes from disk
- Refactor `main` to support seamless configuration updates without restarts
- Ensure existing routes are preserved if a reload encounters an error
2026-06-09 09:23:54 +03:00
anonpenguin23
32f7b3824e release: 0.122.40 v0.122.40-nightly 2026-06-04 10:08:59 +03:00
anonpenguin23
eade6e1742 feat(pubsub): remove mesh formation wait and add publish rate limiting
- Remove the 2-second polling wait for gossipsub mesh formation in `Publish`
  to eliminate unnecessary latency, relying on `FloodPublish` for delivery.
- Introduce a per-invocation publish budget (1000 messages) to prevent
  potential flooding of the shared gossipsub router by WASM functions.
- Add regression tests to ensure `Publish` remains non-blocking and that
  the publish budget is strictly enforced.
2026-06-04 10:08:10 +03:00
anonpenguin23
f3875d5157 release: 0.122.39 v0.122.39-nightly 2026-06-02 15:06:48 +03:00
anonpenguin23
9373c2ad92 feat(rqlite,serverless): add local read consistency and async invocation
- Introduce `BatchQueryConsistency` with `ReadConsistencyNone` to allow
  local SQLite reads, bypassing leader round-trips for performance.
- Add `function_invoke_async` host function to support non-blocking
  fire-and-forget function execution.
2026-06-01 19:59:30 +03:00
anonpenguin23
b2a3bff88c release: 0.122.38 v0.122.38-nightly 2026-06-01 10:13:15 +03:00
anonpenguin23
ca4ccbfcd4 feat(gateway): decouple turn credentials and sfu route registration
- split webrtc route gating into `webrtcServeTURNCredentials` and `webrtcServeSFURoutes` to allow non-SFU gateways to mint TURN credentials
- update `chooseRestoreWebRTC` to correctly resolve configurations for nodes without local SFU ports
- add unit tests to verify independent route registration logic (bugboard #25)
2026-06-01 10:12:07 +03:00
anonpenguin23
a3cf8384e9 release: 0.122.37 v0.122.37-nightly 2026-05-30 19:27:25 +03:00
anonpenguin23
bf0d5f9f9f feat(namespace): implement warm reconciliation for gateway webrtc config
- Add logic to reconcile gateway configuration drift for running instances
- Prevent unnecessary restart loops by verifying on-disk config state
- Add unit tests to validate synchronization logic and prevent regressions
2026-05-30 19:26:26 +03:00
anonpenguin23
3987ad0cf3 release: 0.122.36 v0.122.36-nightly 2026-05-30 14:41:51 +03:00
anonpenguin23
4fc975216f feat(gateway): fix WebRTC config persistence and endpoint access
- Add internal WebRTC management endpoints to public path exemption list
- Implement DB fallback for WebRTC configuration during cluster restore
- Add unit tests to verify WebRTC config precedence and state self-healing
2026-05-30 14:39:39 +03:00
anonpenguin23
9bace7bbf4 release: 0.122.35 v0.122.35-nightly 2026-05-29 12:09:12 +03:00
anonpenguin23
325a2471c7 Changes 2026-05-29 11:46:20 +03:00
anonpenguin23
0d352d0b42 release: 0.122.34 v0.122.34-nightly 2026-05-28 09:55:42 +03:00
anonpenguin23
cfff08d91e feat(serverless): add turn_credentials host function and slow invocation diagnostics
- Implement `turn_credentials` host function to provide TURN configuration to WASM modules.
- Add structured logging for slow serverless invocations exceeding 5s, providing per-phase timing (rate-limit, module-load, execution) to identify performance bottlenecks.
- Enhance WebSocket handler logging to capture request context when 30s timeouts occur.
2026-05-28 09:54:24 +03:00
anonpenguin23
8fbc4485c1 fix(serverless): enable system clocks for wasm modules
- opt into `WithSysWalltime` and `WithSysNanotime` to prevent wazero from using a frozen sentinel clock
- add regression tests to verify real-time clock behavior in wasm execution
- ensure serverless functions receive accurate timestamps for audit and cursor logic
2026-05-26 10:53:07 +03:00
anonpenguin23
1399b22676 release: 0.122.33 v0.122.33-nightly 2026-05-25 10:25:51 +03:00
anonpenguin23
1faf04e2a3 feat(cli): add function enable/disable and fix upgrade re-exec
- Add `enable` and `disable` commands to manage function status
- Implement process re-exec in the upgrade orchestrator to ensure
  Phase 4 config generation uses the newly-installed binary version
  (fixes bugboard #15)
2026-05-25 10:25:04 +03:00
anonpenguin23
bc2c25ff16 release: 0.122.32 v0.122.32-nightly 2026-05-25 09:35:16 +03:00
anonpenguin23
b2d35bbde1 feat(gateway): enable local wildcard triggers for pubsub
- wire PubSubDispatcher to host functions to support local wildcard
  triggers for WASM-published topics
- implement batch deduplication by topic to prevent redundant trigger
  invocations and bound fan-out
- propagate trigger depth through function invocations to maintain
  recursion limits during local dispatch
2026-05-25 09:34:01 +03:00
anonpenguin23
0463f37c0d release: 0.122.31 v0.122.31-nightly 2026-05-24 20:57:52 +03:00
anonpenguin23
98dad46a81 fix(gateway): decouple webrtc route registration from legacy flag
- Update route registration logic to rely solely on SFUPort > 0, resolving a silent 404 issue where gateways with valid SFU configurations were incorrectly disabled.
- Retain WebRTCEnabled in config for backward compatibility with existing operator YAML and request schemas.
- Add unit tests to pin registration behavior and prevent future regressions.
2026-05-24 20:56:08 +03:00
anonpenguin23
877563b86f release: 0.122.30 v0.122.30-nightly 2026-05-24 19:39:39 +03:00