32 Commits

Author SHA1 Message Date
anonpenguin23
251630a5c7 fix(serverless): per-call invCtx propagation prevents cross-tenant identity leak in persistent WS
HostFunctions is a process-wide singleton (one per gateway engine).
Its `invCtx` field is shared across all WASM instances. For STATELESS
execution the executor sets/clears it per-call but the lock is
released before WASM runs — two concurrent invocations can race on
the field and one's host call can read the other's identity. Window
is microseconds.

For PERSISTENT WS the bug was much worse: invCtx used to be bound
ONCE at instantiation and reused for the connection's lifetime. Two
simultaneous persistent WS connections from different namespaces /
wallets overwrote each other's invCtx, and EVERY subsequent
function_invoke / GetCallerJWTSubject / GetCallerWallet / GetSecret
call from inside the WASM read whatever was bound LAST. Result:
silent identity leak across tenants for as long as the connections
overlapped.

Fix: per-call invCtx propagation through Go's context.Context.
wazero passes the ctx given to api.Function.Call through to host
function callbacks, so every WASM-host hop carries its own invCtx.

- pkg/serverless/invocation_context.go (new): WithInvocationContext +
  InvocationContextFromCtx helpers using an unexported invCtxKey.
- pkg/serverless/hostfunctions/invocation_context.go (new):
  currentInvocationContext(ctx) — ctx-attached invCtx wins over the
  singleton field.
- All host accessors (FunctionInvoke, GetEnv, GetSecret, GetRequestID,
  GetCallerWallet, GetWSClientID, GetCallerClaim, GetCallerJWTSubject)
  now route through currentInvocationContext(ctx).
- pkg/serverless/persistent/instance.go: every export call's ctx is
  wrapped with the per-instance invCtx before being passed to wazero.
- pkg/gateway/handlers/serverless/ws_persistent_handler.go: invCtx is
  built per-frame and attached to ctx, not stored on a shared field.
- pkg/serverless/engine.go: removed the SetInvocationContext call at
  InstantiatePersistent (no longer needed; ctx carries it).

Stateless still uses the singleton field — its race is latent since
the host-functions split and migrating it is a separate scoped
change.

Tests:
- hostfunctions/invocation_context_test.go covers ctx-wins-over-singleton.
- gateway/handlers/serverless/ws_persistent_handler_test.go covers the
  per-frame ctx wiring.
- cli/functions/build_test.go is new coverage for the build path
  touched in this change.

VERSION bumped to 0.122.24.
2026-05-15 13:36:35 +03:00
anonpenguin23
80b466af68 fix(serverless): override WASI proc_exit so command-mode persistent WS stays alive
The previous fix (v0.122.22) made `InstantiatePersistent` call `_start`
to bootstrap TinyGo's runtime, then catch the resulting ExitError(0).
That got past init, but the module STILL died — wazero's stock
`proc_exit` implementation calls `mod.CloseWithExitCode(exitCode)`
before panicking, which invalidates the module regardless of what
the caller does with the panic. Every subsequent call to ws_open /
ws_frame / ws_close / orama_alloc returned ExitError(0) ("module
already closed").

Wazero exposes no flag for this — the close is hard-coded. The only
intercept point is to override `proc_exit` at the WASI host-module
boundary. Documented pattern at imports/wasi_snapshot_preview1/wasi.go
lines 111-127.

Fix: build the WASI host module manually so we can override
`proc_exit`:

  - exit code 0 → panic ExitError(0) BUT do NOT close the module.
    This is TinyGo's "_start completed cleanly" signal; the module's
    other exports must stay callable for the persistent lifecycle.
  - exit code != 0 → preserve standard WASI behavior (close + panic).
    A non-zero exit is a genuine app-signaled failure; we want
    `proc_exit(N != 0)` to behave exactly as upstream does.

The InstantiatePersistent caller already distinguishes the two cases
via errors.As + ExitCode() check — added in v0.122.22, no change here.

Safe for stateless functions on the same runtime: the stateless
execution path closes its own module after each invocation, so the
"module stays alive on exit 0" override has no effect on that path.

VERSION bumped to 0.122.23.
2026-05-15 11:56:29 +03:00
anonpenguin23
6a0043a244 fix(serverless): bootstrap TinyGo runtime in persistent WS instances (#240/#249)
InstantiatePersistent passed WithStartFunctions() with no args,
explicitly disabling both wasi entry points. The intent was to skip
main(); the side effect was leaving the TinyGo runtime
uninitialized. The first call to any export traps via
wasmExportCheckRun and managed-memory ops panic. Every persistent WS
function was effectively dead since plan #06 landed.

Earlier patch in this thread restored the call but only handled
wasi-reactor builds (_initialize). AnChat's rpc-router is a wasi
command build (`_start` export only, no `_initialize`) — wasm-objdump
confirms — so the reactor-only fix still left it broken.

This fix tries `_initialize` first, falls back to `_start`, and
bounds whichever runs with a 5s timeout so a buggy main() can't hang
instantiation forever. Logs the chosen hook at Debug, warns when
neither is exported.

Still pass WithStartFunctions() (no args) so wazero doesn't
auto-call `_start` during InstantiateModule — we want full control
over which hook runs and the timeout that bounds it.

VERSION bumped to 0.122.22.
2026-05-15 10:40:27 +03:00
anonpenguin23
62a8fbf2df fix(serverless): registry read paths now load WS persistent metadata (#240/#249)
Register() writes the four ws_* columns (ws_persistent,
ws_idle_timeout_sec, ws_max_frame_bytes, ws_max_inflight_per_conn) to
the functions table, but every read path — Get, List, GetByID,
GetByNameInternal — silently dropped them from the SELECT. functionRow
had no fields for them either. Result: fn.WSPersistent was always the
zero value (false) at runtime, no matter what the DB row said. Every
WS function ran in per-frame stateless mode regardless of its
`ws_persistent: true` config.

AnChat's rpc-router was the canary: it relies on per-connection
instance state (request_id ↔ reply correlation, subscription
bookkeeping) that the stateless model destroys every frame. The
gateway telemetry envelope still reached the client
({request_id, status, duration_ms}) so the failure looked like
"function works, frames don't" — every RPC timed out at 15 s.

Fix: include the four columns in every SELECT, add the matching
functionRow fields, and copy them into Function in rowToFunction.
No schema change (columns have been in migration 011 from the start).

Regression tests in registry_ws_columns_test.go cover the Get / List
paths against an in-memory SQLite that mirrors the production DDL.

VERSION bumped to 0.122.21.
2026-05-15 09:01:42 +03:00
anonpenguin23
a0a1decd06 fix(ws): prefer X-Forwarded-Host in Origin check — root cause #240/#249
handleNamespaceGatewayRequest rewrites r.Host to the backend target
IP:port (e.g. "10.0.0.6:10004") before forwarding. The original
public host (e.g. "ns-anchat-test.orama-devnet.network") is preserved
in X-Forwarded-Host. checkWSOrigin in both pubsub/ws_client.go and
serverless/ws_handler.go was comparing the client's Origin against
the proxied r.Host only — so every browser / RN-iOS WS upgrade was
rejected 403 because their Origin's public hostname can never match
10.0.0.6.

curl probes don't send Origin, so curl returned true unconditionally
and the bug was invisible to operator smoke tests. AnChat's iPhone
WS clients hit `code=1006 reason="Received bad response code from
server: 403"` for ~24h.

Fix: prefer X-Forwarded-Host (the original public host) when present,
fall back to r.Host for direct (non-proxied) connections. Applied
identically to both WS handlers. Regression test in
serverless/ws_origin_test.go covers the proxy-hop case, no-Origin
case, and direct-connection case.

This is the real fix; v0.122.19 only closed a separate silent-forward
auth hole that produced opaque 401s on a different code path.

VERSION bumped to 0.122.20.
2026-05-15 07:03:28 +03:00
anonpenguin23
872c553d1c fix(gateway): namespace-proxy rejects unauthed requests at main, logs WS audit
Root-cause hardening for bug #240 and #249's "intermittent 401 over WS"
reports. handleNamespaceGatewayRequest previously had a third code
path beyond "auth ok" and "auth error": when validateAuthForNamespaceProxy
returned empty namespace AND empty error (i.e. "no credentials found"),
the request fell through to a silent forward to the namespace gateway
WITHOUT internal-auth headers. The namespace gateway then rejected
with 401 "missing API key" in ~60µs.

From the client's perspective: opaque 401.
From our side: only the namespace gateway logged it, and that tier
can't validate API keys (they live in the main cluster RQLite), so
the operator had no signal that the main gateway had even seen the
request. AnChat's intermittent 401-on-WS reports went unsolved for
this exact reason.

Fix:
- Explicit reject at main when no credentials extracted AND path
  isn't public. Returns 401 with WWW-Authenticate: Bearer realm and a
  clear message naming the three accepted credential sources.
- Rich structured logging on every WS upgrade auth outcome: presence
  of api_key/token/jwt query params, Authorization + X-API-Key
  headers, Connection/Upgrade headers, Origin, User-Agent, client IP,
  raw query length. Steady-state stays low-noise: success path logs at
  debug, reject paths log at warn.
- Namespace-mismatch reject (existing branch) now also logs.

VERSION bumped to 0.122.19.
2026-05-14 17:53:38 +03:00
anonpenguin23
5c1404849b fix(#72): correct ntfy upstream checksum URL
Upstream publishes the checksums asset as a plain "checksums.txt" at
the release root, not "ntfy_<VER>_checksums.txt". The version-prefixed
URL we were constructing 404'd, so InstallNtfy bailed in the
download-binary step and ntfy never landed even after we wired
InstallNtfy into the pre-built install path.

Verified against the v2.11.0 release assets list. If a future version
changes the naming convention, the install will 404 loud and this URL
gets bumped in the same PR as ntfyVersion.

VERSION bumped to 0.122.18.
2026-05-14 14:29:24 +03:00
anonpenguin23
7e47f42f91 fix(#72): install ntfy in pre-built path too — devnet path was missing it
Phase 2b auto-detects pre-built archive mode and routes to
installFromPreBuilt(). That path copies bundled binaries (caddy, orama,
gateway, …) into place but never called InstallNtfy() — because ntfy
is downloaded from upstream github, not bundled. Result: on devnet
(which always uses pre-built mode), ntfy never installed even though
the always-on code path in installFromSource() was correctly wired up.

Fix: add InstallNtfy() call to installFromPreBuilt right after the
binary deploy + setCapabilities steps, before disableResolvedStub
runs. Ordering matters because Phase 4's ConfigureNtfy chowns
/etc/ntfy/server.yml to the ntfy user, which needs to exist.

VERSION bumped to 0.122.17.
2026-05-14 12:16:28 +03:00
anonpenguin23
8b4abb7eef feat(#72): install ntfy on every node, drop --with-ntfy gating
ntfy is now part of the standard node install, just like Caddy. The
binary, /etc/ntfy/server.yml, and the Caddy push.<dnsZone> reverse-
proxy block are written unconditionally on every node, and the
ntfy.service starts as part of the standard service order.

Why uniform: ntfy listens on 127.0.0.1:NtfyListenPort only, reachable
exclusively via the local Caddy reverse-proxy block. Nodes that don't
serve a public push.* DNS entry just have an idle ntfy with no
inbound traffic — zero operational cost, zero attack surface change.
Removing the flag means no per-node toggling, no preference drift
between nodes, no "did we remember to set --with-ntfy" mistakes when
DNS topology changes (e.g. promoting a node to nameserver later).

Removed:
- NodePreferences.NtfyHost (yaml: ntfy_host)
- ProductionSetup.isNtfyHost field, SetNtfyHost, IsNtfyHost
- install/flags.go --with-ntfy + NtfyHost field
- upgrade/flags.go --with-ntfy + NtfyHost field + isFlagPassed helper
  (was only used for --with-ntfy tri-state semantics)
- upgrade/orchestrator.go preference-load and persist for ntfy
- upgrade/remote.go --with-ntfy forwarding

Phase 2 always calls InstallNtfy.
Phase 4 always calls EnableCaddyNtfyProxy + ConfigureNtfy.
Phase 5 always enables ntfy.service.
Phase 5b always starts ntfy.service.

VERSION bumped to 0.122.16.
2026-05-14 11:51:08 +03:00
anonpenguin23
8c37ef547e fix(upgrade): forward per-node flags to remote so --with-ntfy actually lands
`orama node upgrade --node <ip> --with-ntfy --restart` parsed the flag
locally but `upgradeNode()` ran a hardcoded
`orama node upgrade --restart` on the remote — dropping --with-ntfy,
--nameserver, --force, and --skip-checks on the floor. The remote
orchestrator then read the SAVED preference (or default false for
nameserver/ntfy), so operator overrides like enabling ntfy on a
nameserver were silently ignored. Bug surfaced in devnet today:
running --with-ntfy reported success but ntfy was never installed.

Fix forwards the four passthrough flags to the remote command,
preserving the tri-state semantics for the pointer flags (nil = honor
saved preference; non-nil = explicit override).

VERSION bumped to 0.122.15.
2026-05-14 11:44:47 +03:00
anonpenguin23
07638354d2 feat(#72): full-privacy push — self-hosted ntfy + APNs-direct provider
Migration 028: namespace_push_credentials
- Per-(namespace, provider) AES-256-GCM encrypted credential blob.
- Generic schema — apns/ntfy/expo/future plug in with zero migration.
- Separated from migration 026's namespace_push_config (preferences vs
  credentials, different access patterns).

pkg/push/credentials
- Manager + Registry + RQLite store; HKDF purpose "namespace-push-credentials"
  via pkg/secrets. Provider Validator interface for per-provider schema.

pkg/push/providers/apns
- Apple Push Notification service direct provider (no Expo proxy).
- Validator + dispatcher; credentials are p8 signing key + key_id + team_id.

pkg/push/providers/ntfy/credentials.go
- ntfy credential schema (auth_token + default topic). Used both with
  the public ntfy.sh and our self-hosted instance.

pkg/environments/production/installers/ntfy.go
- Self-hosted ntfy server installer. Binary, system user, hardened
  /etc/ntfy/server.yml, systemd unit. Listens on 127.0.0.1:NtfyListenPort
  only — Caddy is the only public path.

pkg/environments/production/installers/caddy.go
- Emit reverse_proxy block for push.<dnsZone> -> 127.0.0.1:NtfyListenPort
  when operator enables ntfy on a node.

CLI: install/upgrade orchestrators learn a new "ntfy" install/preserve
phase; flag gating in install/flags.go + upgrade/flags.go.

Gateway handlers/push/credentials_handler.go
- GET/PUT/DELETE /v1/namespace/push-credentials/{provider}.
- PUT validates against provider Validator before encrypting and storing.
- GET returns a redacted view (booleans + non-secret fields only).

Push manager: provider resolution now also consults
namespace_push_credentials before falling back to YAML defaults.

Docs: core/docs/PUSH_NOTIFICATIONS.md walks through end-to-end setup.

VERSION bumped to 0.122.14.
2026-05-14 10:48:00 +03:00
anonpenguin23
32a2a62e0d fix(caddy): disable HTTP/2 to keep WebSocket upgrade auth working (#249)
HTTP/2 forbids the `Connection: Upgrade` and `Upgrade: websocket`
headers per RFC 7540 §8.1.2.2. With h2 advertised at the listener,
ALPN negotiates h2 for TLS-capable clients, the WS-upgrade request
arrives at Caddy with those headers stripped, and Caddy forwards a
plain HTTP/1.1 GET to the gateway. The gateway's `isWebSocketUpgrade(r)`
then returns false, the `?api_key=` / `?jwt=` query-string WS-auth
fallback never runs, and clients see 401.

RFC 8441 ("Bootstrapping WebSockets with HTTP/2") fixes this, but iOS
RN and most other mobile WS libraries don't implement it. Until they
do, h1 is the only protocol that keeps WS auth working.

Trade-off: lose h2 multiplexing on plain HTTP traffic. Acceptable for
an API gateway whose dominant workload is REST + WebSocket — neither
benefits much from h2 streams.

caddy_test.go adds a regression guard so anyone re-enabling h2 in the
listener protocols fails CI loud.

Also (separate, was uncommitted): pkg/cli/build/builder.go now reads
VERSION from the repo-root /VERSION file first, falling back to
parsing the Makefile only if absent. The previous Makefile-only path
broke after VERSION moved to /VERSION (Makefile got `$(shell cat ...)`
which the CLI builder pulled in literally).

VERSION bumped to 0.122.13.
2026-05-14 07:50:47 +03:00
anonpenguin23
fda47533c3 feat: per-namespace rate-limit self-service + WS JWT auth + release 0.122.12
Per-namespace rate-limit config (feature #69)
- Migration 027: new `namespace_rate_limit_config` table
  (namespace PK, requests_per_minute, burst, audit metadata).
- pkg/ratelimit: Manager + RQLite ConfigStore + types. Same pattern
  as the push config in bug #220's follow-up — LRU cache, invalidate
  on PUT/DELETE, falls back to YAML defaults when no row exists.
- pkg/gateway/handlers/ratelimit: GET/PUT/DELETE /v1/namespace/rate-limit.
  PUT requests are rejected if they exceed the operator's configured
  ceiling (MaxRequestsPerMinute / MaxBurst) — tenants self-serve but
  cannot raise their quota past the cap.
- pkg/gateway/rate_limiter.go: per-namespace lookup, default fallback.
- pkg/gateway/middleware.go: WS JWT middleware (middleware_ws_jwt_test.go).
- pkg/gateway/auth/service.go: refresh-token rotation hardening with
  regression test in refresh_rotation_test.go.

AI agent instructions
- Add AGENTS.md, CLAUDE.md, .github/copilot-instructions.md (DeBros v0.2.0
  baseline).

DeBros rules bumped to v0.2.0 (sha bb6e6ef).

VERSION bumped to 0.122.12.
2026-05-13 15:41:36 +03:00
anonpenguin23
91774de465 fix(gateway): update rqlite consistency level and improve column mapping
- Change RQLite consistency level from `none` to `weak` to ensure reads
  route to the leader and prevent stale data reads (fixes #235)
- Add `normalizeColumnKey` to allow snake_case SQL columns to map to
  CamelCase Go struct fields automatically (fixes #65)
- Add comprehensive unit tests for DSN generation and column mapping
2026-05-12 09:13:03 +03:00
anonpenguin23
f55c7269cd feat(gateway): implement self-service tenant push notifications
- Add `namespace_push_config` table for per-namespace provider settings
- Introduce `cluster_secret_path` to enable deterministic JWT signing and
  AES-256-GCM encryption for push credentials
- Update gateway config to support per-namespace overrides of push
  notification providers (ntfy/Expo)
- Bump version to 0.122.3
2026-05-08 11:23:53 +03:00
anonpenguin23
b5f6fb4497 docs: update deployment and serverless documentation
- bump version to 0.122.2
- document schema migration invariants and push notification configuration
- add serverless host function aliases and v2 database API documentation
- introduce schema roundtrip test to prevent migration drift
2026-05-07 07:33:52 +03:00
anonpenguin23
bd26af2cb1 feat(serverless): register host module under "orama" alias
- Add "orama" to the list of host module registration names to support
  common developer intuition and prevent instantiation errors.
- Add comprehensive regression tests to ensure all aliases ("env",
  "host", "orama") remain registered.
- Update SDK documentation to clarify import conventions and alias
  support.
2026-05-06 15:43:11 +03:00
anonpenguin23
4cce4bd97b feat(migrations): implement schema version contract enforcement
- Add `contract.go` to manage and validate embedded SQL migrations
- Introduce `AssertSchema` to verify database version at startup
- Include `SchemaMismatchError` with actionable recovery instructions
- Add comprehensive unit tests for version parsing and validation
2026-05-06 08:23:13 +03:00
0f42816a78 etc 2026-05-05 11:35:35 +03:00
ba68291566 Serverless Engine Patch 2026-05-05 07:40:16 +03:00
anonpenguin23
604ce221d5 feat(gateway): implement persistent webhooks and namespace sequencing
- Add migrations for per-namespace publish sequences and persistent WebSocket function settings
- Integrate PersistentWSManager and WSBridge into the gateway dependency graph
- Upgrade serverless engine to use a multi-tier rate limiter
- Update JWT claims to support custom application-defined fields
2026-05-04 11:38:19 +03:00
anonpenguin23
9225215ed3 feat(core): implement sni-router for stealth turn
- add `orama-sni-router` binary to build process
- introduce `cmd/sni-router` for TLS-level SNI routing
- add documentation for stealth turn deployment architecture
2026-05-03 18:20:21 +03:00
anonpenguin23
b71fc9bf06 feat(ssh): allow running remote commands via ssh
- update ssh command to accept optional remote command argument
- modify sshInto to execute commands non-interactively when provided
- comment out unreachable node in nodes.conf
2026-05-03 14:55:43 +03:00
anonpenguin23
f9947eadb5 Fix 2026-04-02 15:22:06 +03:00
anonpenguin23
93dface005 feat(cli): add fanout push strategy and improve website responsiveness
- Add --fanout flag to push command for server-to-server deployment
- Implement agent forwarding for efficient multi-node distribution
- Update landing page scene heights and section padding for mobile devices
2026-03-28 15:27:54 +02:00
anonpenguin23
9917abcd16 feat(cli): add push command and improve node setup
- Add `orama push` command to upload and extract binary archives to nodes
- Update `node setup` to pass operator metadata and auto-configure environments
- Improve SSH configuration and node registration logic
2026-03-28 14:30:55 +02:00
anonpenguin23
750e742c61 feat(cli): add node setup command
- implement automated VPS bootstrapping for Orama nodes
- add SSH key management via rootwallet
- support genesis node creation and cluster joining via invite tokens
2026-03-28 10:24:48 +02:00
anonpenguin23
8c4e18908b feat(auth): integrate rootwallet agent and update service hardening
- Replace CLI-based rootwallet calls with agent-based communication
- Update production provisioner to support sudo-based service management
- Add API key-to-wallet resolution for gateway operator handlers
2026-03-28 08:59:11 +02:00
anonpenguin23
fe4823dbba feat(cli): add node management and rollout commands
- implement `nodes`, `rollout`, `ssh`, and `status` commands
- add `migrate-conf` utility to register existing nodes with the gateway
- update database schema to support operator wallet tracking for nodes
2026-03-27 16:25:32 +02:00
anonpenguin23
89e6c428e8 feat(monitor): add vault health checks and reporting
- integrate vault into node alerts (service, responsive, status, restarts)
- add vault report collection (systemd, logs, HTTP status)
- update production CLI (clean, restart, stop, services)
- add comprehensive unit tests for vault alerts
2026-03-27 14:52:41 +02:00
anonpenguin23
218adcecf8 refactor(cli): extract AddEnvironment/RemoveEnvironment functions
- support upsert in AddEnvironment, no-op RemoveEnvironment if absent
- fallback active env to devnet on remove, add tests
- integrate with sandbox create/destroy, ignore core/plans/
2026-03-27 14:16:51 +02:00
anonpenguin23
86fe0588b9 refactor: move Go project into core/ for monorepo structure 2026-03-26 18:14:52 +02:00