1217 Commits

Author SHA1 Message Date
anonpenguin23
3779ba9502 release: 0.122.59 v0.122.59-nightly 2026-06-15 21:57:39 +03:00
anonpenguin23
92428df7e9 fix(cli): node restart restores the caddy/coredns frontend
caddy/coredns declare Requires=orama-node.service, so stopping orama-node
cascade-stops them; Requires propagates STOP but not START, and
StartServicesOrdered only starts the orama services — so a bare
'orama node restart' left caddy dead and the node's :443 HTTPS frontend
offline until the next reboot. Capture the active frontend units before the
stop and restart exactly those after the gateway is healthy. Adds
ServiceUnitExists helper + selectFrontendToRestore policy test.
2026-06-15 21:57:29 +03:00
anonpenguin23
d4bf187e94 fix(gateway): /v1/auth/token resolves HMAC-hashed api keys
APIKeyToJWTHandler looked up the namespace by the raw api key, but keys are
stored HMAC-hashed (Service.HashAPIKey), so it always returned 'invalid API
key' for real keys — no api-key holder could exchange for a JWT. Resolve the
hashed key first with a raw-key fallback for legacy rows, mirroring the
gateway middleware's lookupAPIKeyNamespace. Adds args-aware mock + tests for
the hashed-key, raw-fallback, and unknown-key paths.
2026-06-15 21:57:29 +03:00
anonpenguin23
cd869a588f release: 0.122.58 v0.122.58-nightly 2026-06-15 14:46:49 +03:00
anonpenguin23
123ca90b65 fix(serverless): get_secret round-trip via secrets.Encrypt + string scan (#837)
The base64 wrapper wasn't enough: DBSecretsManager scanned encrypted_value
into []byte, so the rqlite client applied base64 binary semantics on read and
the ciphertext never round-tripped — get_secret stayed empty. Mirror the
proven push-credentials store exactly: encrypt to a 'enc:'-prefixed base64
string via pkg/secrets and scan the column into a STRING for Decrypt. Text
round-trips cleanly through rqlite regardless of the BLOB column.
2026-06-15 14:46:36 +03:00
anonpenguin23
a59017350b release: 0.122.57 v0.122.57-nightly 2026-06-15 13:55:03 +03:00
anonpenguin23
b77c3dae97 chore(devnet): remove 3 permanently-decommissioned nodes from nodes.conf
51.83.128.181, 144.217.160.15, 144.217.163.114 were removed from the cluster
permanently and must never be targeted by rollouts/monitoring again.
2026-06-15 13:54:51 +03:00
anonpenguin23
7165992b12 fix(serverless): get_secret returns empty — base64 round-trip for stored secrets (#837)
Second #837 root cause (the key-derivation fix was necessary but not
sufficient): DBSecretsManager stored the AES-GCM ciphertext as a raw []byte
parameter, but the rqlite client serializes []byte as base64 and reads it
back as that base64 TEXT — never the original bytes. So decrypt() always
received base64 ASCII instead of ciphertext and failed, making get_secret
return empty for every stored secret (exactly the reported symptom).

Encode the ciphertext to an explicit base64 string in Set and decode it in
Get (with a raw fallback), making the round-trip symmetric and
driver-independent. The test mock now emulates rqlite's blob->base64-text
behavior so it's a real regression guard.
2026-06-15 13:54:35 +03:00
anonpenguin23
dcc17f1e90 release: 0.122.56 v0.122.56-nightly 2026-06-15 11:20:32 +03:00
anonpenguin23
7b5587094d fix(gateway): api_key owners no longer 403 on namespaces they own
The namespace-ownership middleware compared an api_key caller's RAW key
against namespace_ownership.owner_id, but api_keys are stored HMAC-hashed
(HashAPIKey). So every api_key-authenticated owner got a 403 on a namespace
they actually own — blocking function deploy and PUT /v1/push/config.

Hash the presented api_key before the ownership comparison (hashed first,
raw second as a rolling-upgrade legacy fallback), mirroring the existing
lookupAPIKeyNamespace pattern. The wallet path is unchanged (wallets stored
raw). Security-reviewed: grants only to the correct key holder, no
escalation.
2026-06-15 11:20:23 +03:00
anonpenguin23
b1b8ac57d5 release: 0.122.55 v0.122.55-nightly 2026-06-15 08:05:47 +03:00
anonpenguin23
9c213a166c feat(serverless,namespace): cut namespace gateway RPC latency (#708)
The 5-10s RPCs that broke calling were not cold-start — they were
per-RPC sequential rqlite reads, each forwarded to a raft leader that
geography-blind election had placed on a 256ms-distant node.

Lever A (serverless): cache function metadata + env vars in-process
(5s TTL, invalidated on deploy/enable/disable/delete) and stop the hot
invoke path re-fetching the function for the authorization check —
removes ~820ms of leader-routed pre-flight reads from every op.

Lever B (namespace): a locality-aware leadership reconciler hands raft
leadership off a geographically-isolated namespace leader to the nearest
co-located voter, via rqlite's transfer-leadership API. All nodes stay
voters — membership, quorum and fault tolerance are unchanged. Cuts the
per-hop cost from ~274ms to ~20ms when a distant node had become leader.
2026-06-15 08:05:38 +03:00
anonpenguin23
61635c4ce7 release: 0.122.54 v0.122.54-nightly 2026-06-13 09:25:16 +03:00
anonpenguin23
34f9da6f8d feat(gateway): implement ntfy cluster fan-out and improve secrets encryption
- Add `ntfyFanoutResolver` to distribute push notifications across all active cluster nodes, ensuring delivery when nodes lack shared state.
- Refactor secrets encryption key derivation to use cluster-wide secrets via HKDF, replacing ephemeral per-node keys to fix cross-node decryption issues.
- Add unit tests for fan-out resolution logic and caching behavior.
2026-06-13 09:23:14 +03:00
anonpenguin23
4ae8fa941d release: 0.122.53 2026-06-13 08:13:10 +03:00
anonpenguin23
2b184f0398 fix(namespace): make WebRTC config survive slow/cold node restarts (#130)
Root cause of the recurring "turn.credentials → namespace_not_configured" on a
distant node: at converge the gateway resolves its TURN secret from the
namespace rqlite, and on a slow/just-restarted node that read fails ONCE, so
the gateway is written with TURN disabled. Removing the node is not a fix — the
software must tolerate a slow read.

Two-part fix (complements e7ed718's "don't blank a warm config"):
  - RETRY the secret read (5×2s) at converge so a node whose rqlite is still
    syncing waits for it to land instead of writing an empty block once. A
    genuine decrypt failure still exhausts the retries → unresolved → the
    running config is preserved.
  - CACHE the resolved secret into the node's own cluster-state.json
    (applyResolvedWebRTCToState), so the NEXT cold start reads it from disk —
    chooseRestoreWebRTC is state-first and short-circuits before the DB. The
    state struct already had TURNSharedSecret "for cold start" but nothing
    populated it; now it's filled on every successful resolve (only rewritten
    on change). Each node self-heals its own cache; nothing new is sent
    cross-node.

cluster-state.json now carries the TURN secret, so both writers (local
saveLocalState and the remote SaveClusterState) are tightened to 0600 + chmod.
Stale-secret self-heals: disable/enable webrtc re-pushes every node's config
and the next converge re-caches the new value.

Dual-reviewed: code-quality APPROVED; security SECURE after the remote-write
0600 fix. Tests: cache populate + short-circuit, no-change, turn-only node.
2026-06-13 08:12:48 +03:00
anonpenguin23
66db54c094 release: 0.122.52 2026-06-13 07:53:33 +03:00
anonpenguin23
cf21668782 fix(push): cap VoIP apns-expiration to the ring window; record success status (#132)
VoIP call-invite pushes set no apns-expiration, so apns2 omits the header and
APNs store-and-forwards the push — delivering it minutes late and firing a
phantom "missed call" ring long after the call ended (and burning PushKit
goodwill, inviting throttling). Cap the VoIP apns-expiration to the ring window
(30s) so APNs delivers promptly or DISCARDS, never a stale invite. Alert pushes
keep the default store-and-forward so a message notification still lands after
the device reconnects.

Also surface HTTP 200 on a successful dispatch instead of leaving HTTPStatus at
0 — a successful push was logging "http=0", which reads like an opaque failure
and masked real false-success classes.

Tests: VoIP push carries an expiration within the ring-window cap; alert push
carries none. push package green.
2026-06-12 17:49:44 +03:00
anonpenguin23
33600092a8 fix(auth): bounded single-use refresh-token reuse grace (#125)
A lost rotation response strands the client on a just-revoked token: the retry
hits res.Count==0 → genuine 401 → SIWE, which is impossible on a VoIP-woken
locked screen, so the call dies. This recurred under the reconnect storms from
today's gateway rolls.

Add an RFC 9700 §4.13.2 reuse grace: a refresh token revoked within 60s whose
grace_used_at is still NULL is accepted ONCE more and mints a fresh session.
The grace path skips the revoke CAS (the token is already revoked — the CAS
would 0-match and mis-fire the replay tripwire) and is locked instead by a
single-use CAS on grace_used_at, so a stolen token can't be replayed at
leisure. The window predicate is repeated on the CAS to close the
SELECT→UPDATE TOCTOU, and the grace SELECT excludes expired tokens.

Security (found + fixed in review): explicit revocation (RevokeToken /
/v1/auth/logout) now also stamps grace_used_at, so a deliberately-logged-out
token can never be grace-recovered — closes a logout-bypass where a just-
revoked token would otherwise be resurrectable for 60s. Transient rqlite
errors on the grace lookup/CAS surface as 503 (retryable), not 401, preserving
the #125 transient-vs-genuine distinction.

Migration 032 adds grace_used_at (additive ALTER, rolling-safe; NULL = grace
available, the window predicate keeps historically-revoked tokens ineligible).

Dual-reviewed: code-quality APPROVED; security SECURE after the logout-bypass
fix. Tests: lost-response recovery, single-use second-attempt 401, genuine bad
token 401, and the logout-bypass regression.
2026-06-12 17:42:36 +03:00
anonpenguin23
f3145f1b90 release: 0.122.51 v0.122.51-nightly 2026-06-12 16:48:21 +03:00
anonpenguin23
6f5b2db95e fix(sdk): surface typed SDKError on network/timeout failures (#129)
The HTTP client re-threw the raw platform error on a transport failure, so
callers (e.g. AnChat's JwtSession driving client.auth.refresh/challenge) could
only tell "couldn't reach the gateway" from a real HTTP error by string-
matching `TypeError: Network request failed`. The typed SDKError was built
only for the onNetworkError callback, never for the thrown error, and native
errors weren't retryable so they bubbled raw.

normalizeError() now maps a fetch failure -> SDKError{code:"NETWORK_ERROR",
httpStatus:0} and a timeout AbortError -> {code:"TIMEOUT",httpStatus:0}, and
that typed error is thrown to the caller. Genuine HTTP errors pass through
unchanged. httpStatus:0 is the stable "no HTTP response received" signal to
branch on; the original platform message is preserved for diagnostics.

Deliberately NOT auto-retrying network failures: a blind retry on a non-
idempotent POST like /v1/auth/refresh could burn the rotated refresh token on
a lost response. The SDK now just gives the app a typed error to drive its own
retry/failover.

Tests: tests/unit/http/network-errors-bug-129.test.ts (TypeError->NETWORK_ERROR,
AbortError->TIMEOUT, real 401 passes through, callback gets the typed error).
Full unit suite green, typecheck clean.
2026-06-12 16:44:37 +03:00
anonpenguin23
e7ed718965 fix(namespace): don't silently disable TURN on unresolvable WebRTC secret (#130)
A freshly-joined namespace node could come up with TURN silently disabled
(turn.credentials -> namespace_not_configured) when GetWebRTCConfig errored:
the stored TURN shared secret was encrypted with a pre-rotation
cluster-secret-derived key and failed to decrypt, and the converge swallowed
that error into "WebRTC disabled", writing a TURN-disabled gateway config.

Distinguish "resolved but not enabled" (genuinely disabled, fine to write the
empty block) from "unresolved" (DB/decrypt error). chooseRestoreWebRTC's
dbFetch callback now returns a `resolved` bool; an unresolved lookup forces
enabled=false AND sets restoreWebRTC.unresolved. The converge then:
  - logs the decrypt/read error loudly with the exact remediation
    (`orama namespace disable webrtc` then `enable webrtc`) instead of
    swallowing it;
  - on the warm path, SKIPS ReconcileGateway so it doesn't rewrite a running
    gateway's working WebRTC block to empty (preserves TURN);
  - on the cold path, still spawns the gateway (the namespace needs one) but
    warns loudly that TURN is degraded until the secret is regenerated.

Healthy nodes are unaffected: a node whose state file holds the secret
short-circuits before dbFetch, so a flaky/rotated DB cannot disable it.
Dual-reviewed (code-quality APPROVED, security SECURE — no secret material is
logged; operator disable still resolves to disabled, not unresolved).

Pure-function coverage in restore_webrtc_test.go: unresolved marking,
resolved-empty-is-disabled, and state-secret-wins-over-db-error.
2026-06-12 16:44:25 +03:00
anonpenguin23
021d362b2f release: 0.122.50 v0.122.50-nightly 2026-06-12 10:14:00 +03:00
anonpenguin23
4d700aed54 feat(gateway): enforce jwt expiry on persistent websockets
- implement `wsJWTExpired` to validate token lifetime with a grace period
- capture jwt expiry at connection upgrade and update via auth.refresh
- close connections with custom code 4401 when tokens expire to force re-auth
- add unit tests to verify expiry logic and state transitions
2026-06-12 10:12:21 +03:00
anonpenguin23
d113b75497 feat(auth): refresh-token custom claims hook (#548)
Custom JWT claims survive token refresh: migration 031 adds the
custom-claims column to refresh tokens, the new gateway ClaimsProvider
re-resolves claims on refresh, and the serverless invoke path carries
them through. Includes refresh-rotation, WS-JWT middleware, and
claims-provider test coverage.
v0.122.49-nightly
2026-06-12 08:05:27 +03:00
anonpenguin23
8472861ed3 merge main into nightly — keep nightly (staging) on all conflicts
Reconciles the divergence created by the May-13 nightly history rewrite
(main absorbed pre-rewrite SHAs via PRs #90-92). Content conflicts all
resolve in nightly's favor; nightly is the deployed, verified branch
(v0.122.47 live on devnet).
2026-06-11 17:32:37 +03:00
anonpenguin23
cd8c717363 chore(version): bump to 0.122.47
- refactor(turn): extract decodeTURNConfig for testability
- feat(turn): add stealth domain fields to config
- fix(apns): nest custom data under "body" for expo-notifications compatibility
v0.122.47-nightly
2026-06-11 11:45:12 +03:00
anonpenguin23
f4c58db710 release: 0.122.46 v0.122.46-nightly 2026-06-11 10:06:19 +03:00
anonpenguin23
8375d92109 feat(namespace): reuse caddy wildcard certificate for stealth turns
- Implement `resolveStealthCert` to use existing `*.<baseDomain>` wildcard certificates instead of dynamic Caddyfile provisioning.
- Avoids EROFS errors caused by `ProtectSystem=strict` on the orama-node service.
- Add strict validation to ensure stealth hosts are single-label subdomains covered by the wildcard.
2026-06-11 10:04:45 +03:00
anonpenguin23
37daf28b5a release: 0.122.45 v0.122.45-nightly 2026-06-11 08:00:31 +03:00
anonpenguin23
b425f80efb fix(config): add sni_router to root Config — prevents feat-124 boot crash
b9d5f54 (stealth TURN discovery) emits a top-level `sni_router:` block
into node.yaml unconditionally, but only added a lenient ad-hoc parse
in the carry-forward logic — not the field on config.Config that
orama-node strict-decodes (KnownFields(true)) at boot. Identical
failure mode to the v0.122.42 secrets_encryption_key incident: the
unknown key fails the whole node.yaml parse and orama-node crash-loops.

Caught pre-deploy this time by the strict-decode gate check; devnet
never saw it. Regression test added alongside the v0.122.42 one in
decode_test.go.
2026-06-11 08:00:31 +03:00
anonpenguin23
b9d5f542e1 feat(gateway): implement stealth TURN discovery and configuration
- Add `turn_stealth_domain` to gateway config for stealth TURN support
- Introduce `turn_discovery` in `sni-router` to auto-discover per-namespace routes
- Add database migration to enable stealth TURN per namespace
- Document ephemeral state API in `SERVERLESS.md`
2026-06-11 07:04:50 +03:00
anonpenguin23
f192cd0b84 release: 0.122.44 v0.122.44-nightly 2026-06-10 12:13:25 +03:00
anonpenguin23
ff3e273da8 feat(gateway): implement persistent secrets and webrtc configuration
- add `secrets_encryption_key` to gateway config for serverless secrets
- implement durable TURN secret persistence to prevent config regen outages
- add regression test for gateway config loading and field mapping
2026-06-10 12:10:52 +03:00
anonpenguin23
4c631243b3 release: 0.122.43 v0.122.43-nightly 2026-06-09 15:57:33 +03:00
anonpenguin23
e685c864fc fix(config): add secrets_encryption_key to HTTPGatewayConfig — fixes orama-node boot crash
v0.122.42 (f412425, secrets encryption) shipped the template emission,
the per-cluster secret generator, and the gateway.Config consumer — but
NOT the parse field on config.HTTPGatewayConfig. Phase 4 writes
`secrets_encryption_key` into node.yaml under the http_gateway section,
and pkg/config/yaml.go decodes with KnownFields(true) (strict). The
unknown field made every node.yaml parse fail, so orama-node exited 1
on every start and systemd crash-looped it (restart counter hit 380+ on
the first upgraded devnet node before the rolling controller halted).

Root cause: a generated-config field with no matching struct field under
strict unmarshal. Fix is the missing field. The runtime key itself is
still consumed from ~/.orama/secrets/secrets-encryption-key (pkg/node/
gateway.go), which already worked — so this one-field addition fully
restores boot AND the feature.

The standalone gateway (cmd/gateway/config.go) uses lenient parsing and
was unaffected.

Regression test in pkg/config/decode_test.go decodes a node.yaml
carrying secrets_encryption_key under strict mode.
2026-06-09 15:57:32 +03:00
anonpenguin23
b6b518e005 release: 0.122.42 v0.122.42-nightly 2026-06-09 13:01:38 +03:00
anonpenguin23
f41242538e feat(serverless): add raw http response mode and secrets encryption
- Add `raw_http_response` configuration to functions to allow verbatim HTTP responses
- Implement cluster-wide secrets encryption key generation and distribution for serverless functions
- Update documentation with UnifiedPush support for ntfy on Android/GrapheneOS
2026-06-09 13:01:02 +03:00
anonpenguin23
aa04ab5f50 release: 0.122.41 v0.122.41-nightly 2026-06-09 09:24:59 +03:00
anonpenguin23
f8de4af704 feat(sni-router): implement hot-reloading for route configuration
- Add `FileRouteReloader` to watch and atomically update routes from disk
- Refactor `main` to support seamless configuration updates without restarts
- Ensure existing routes are preserved if a reload encounters an error
2026-06-09 09:23:54 +03:00
anonpenguin23
32f7b3824e release: 0.122.40 v0.122.40-nightly 2026-06-04 10:08:59 +03:00
anonpenguin23
eade6e1742 feat(pubsub): remove mesh formation wait and add publish rate limiting
- Remove the 2-second polling wait for gossipsub mesh formation in `Publish`
  to eliminate unnecessary latency, relying on `FloodPublish` for delivery.
- Introduce a per-invocation publish budget (1000 messages) to prevent
  potential flooding of the shared gossipsub router by WASM functions.
- Add regression tests to ensure `Publish` remains non-blocking and that
  the publish budget is strictly enforced.
2026-06-04 10:08:10 +03:00
anonpenguin23
f3875d5157 release: 0.122.39 v0.122.39-nightly 2026-06-02 15:06:48 +03:00
anonpenguin23
9373c2ad92 feat(rqlite,serverless): add local read consistency and async invocation
- Introduce `BatchQueryConsistency` with `ReadConsistencyNone` to allow
  local SQLite reads, bypassing leader round-trips for performance.
- Add `function_invoke_async` host function to support non-blocking
  fire-and-forget function execution.
2026-06-01 19:59:30 +03:00
anonpenguin23
b2a3bff88c release: 0.122.38 v0.122.38-nightly 2026-06-01 10:13:15 +03:00
anonpenguin23
ca4ccbfcd4 feat(gateway): decouple turn credentials and sfu route registration
- split webrtc route gating into `webrtcServeTURNCredentials` and `webrtcServeSFURoutes` to allow non-SFU gateways to mint TURN credentials
- update `chooseRestoreWebRTC` to correctly resolve configurations for nodes without local SFU ports
- add unit tests to verify independent route registration logic (bugboard #25)
2026-06-01 10:12:07 +03:00
anonpenguin23
a3cf8384e9 release: 0.122.37 v0.122.37-nightly 2026-05-30 19:27:25 +03:00
anonpenguin23
bf0d5f9f9f feat(namespace): implement warm reconciliation for gateway webrtc config
- Add logic to reconcile gateway configuration drift for running instances
- Prevent unnecessary restart loops by verifying on-disk config state
- Add unit tests to validate synchronization logic and prevent regressions
2026-05-30 19:26:26 +03:00
anonpenguin23
3987ad0cf3 release: 0.122.36 v0.122.36-nightly 2026-05-30 14:41:51 +03:00
anonpenguin23
4fc975216f feat(gateway): fix WebRTC config persistence and endpoint access
- Add internal WebRTC management endpoints to public path exemption list
- Implement DB fallback for WebRTC configuration during cluster restore
- Add unit tests to verify WebRTC config precedence and state self-healing
2026-05-30 14:39:39 +03:00