14 Commits

Author SHA1 Message Date
anonpenguin23
62a8fbf2df fix(serverless): registry read paths now load WS persistent metadata (#240/#249)
Register() writes the four ws_* columns (ws_persistent,
ws_idle_timeout_sec, ws_max_frame_bytes, ws_max_inflight_per_conn) to
the functions table, but every read path — Get, List, GetByID,
GetByNameInternal — silently dropped them from the SELECT. functionRow
had no fields for them either. Result: fn.WSPersistent was always the
zero value (false) at runtime, no matter what the DB row said. Every
WS function ran in per-frame stateless mode regardless of its
`ws_persistent: true` config.

AnChat's rpc-router was the canary: it relies on per-connection
instance state (request_id ↔ reply correlation, subscription
bookkeeping) that the stateless model destroys every frame. The
gateway telemetry envelope still reached the client
({request_id, status, duration_ms}) so the failure looked like
"function works, frames don't" — every RPC timed out at 15 s.

Fix: include the four columns in every SELECT, add the matching
functionRow fields, and copy them into Function in rowToFunction.
No schema change (columns have been in migration 011 from the start).

Regression tests in registry_ws_columns_test.go cover the Get / List
paths against an in-memory SQLite that mirrors the production DDL.

VERSION bumped to 0.122.21.
2026-05-15 09:01:42 +03:00
anonpenguin23
a0a1decd06 fix(ws): prefer X-Forwarded-Host in Origin check — root cause #240/#249
handleNamespaceGatewayRequest rewrites r.Host to the backend target
IP:port (e.g. "10.0.0.6:10004") before forwarding. The original
public host (e.g. "ns-anchat-test.orama-devnet.network") is preserved
in X-Forwarded-Host. checkWSOrigin in both pubsub/ws_client.go and
serverless/ws_handler.go was comparing the client's Origin against
the proxied r.Host only — so every browser / RN-iOS WS upgrade was
rejected 403 because their Origin's public hostname can never match
10.0.0.6.

curl probes don't send Origin, so curl returned true unconditionally
and the bug was invisible to operator smoke tests. AnChat's iPhone
WS clients hit `code=1006 reason="Received bad response code from
server: 403"` for ~24h.

Fix: prefer X-Forwarded-Host (the original public host) when present,
fall back to r.Host for direct (non-proxied) connections. Applied
identically to both WS handlers. Regression test in
serverless/ws_origin_test.go covers the proxy-hop case, no-Origin
case, and direct-connection case.

This is the real fix; v0.122.19 only closed a separate silent-forward
auth hole that produced opaque 401s on a different code path.

VERSION bumped to 0.122.20.
2026-05-15 07:03:28 +03:00
anonpenguin23
872c553d1c fix(gateway): namespace-proxy rejects unauthed requests at main, logs WS audit
Root-cause hardening for bug #240 and #249's "intermittent 401 over WS"
reports. handleNamespaceGatewayRequest previously had a third code
path beyond "auth ok" and "auth error": when validateAuthForNamespaceProxy
returned empty namespace AND empty error (i.e. "no credentials found"),
the request fell through to a silent forward to the namespace gateway
WITHOUT internal-auth headers. The namespace gateway then rejected
with 401 "missing API key" in ~60µs.

From the client's perspective: opaque 401.
From our side: only the namespace gateway logged it, and that tier
can't validate API keys (they live in the main cluster RQLite), so
the operator had no signal that the main gateway had even seen the
request. AnChat's intermittent 401-on-WS reports went unsolved for
this exact reason.

Fix:
- Explicit reject at main when no credentials extracted AND path
  isn't public. Returns 401 with WWW-Authenticate: Bearer realm and a
  clear message naming the three accepted credential sources.
- Rich structured logging on every WS upgrade auth outcome: presence
  of api_key/token/jwt query params, Authorization + X-API-Key
  headers, Connection/Upgrade headers, Origin, User-Agent, client IP,
  raw query length. Steady-state stays low-noise: success path logs at
  debug, reject paths log at warn.
- Namespace-mismatch reject (existing branch) now also logs.

VERSION bumped to 0.122.19.
2026-05-14 17:53:38 +03:00
anonpenguin23
5c1404849b fix(#72): correct ntfy upstream checksum URL
Upstream publishes the checksums asset as a plain "checksums.txt" at
the release root, not "ntfy_<VER>_checksums.txt". The version-prefixed
URL we were constructing 404'd, so InstallNtfy bailed in the
download-binary step and ntfy never landed even after we wired
InstallNtfy into the pre-built install path.

Verified against the v2.11.0 release assets list. If a future version
changes the naming convention, the install will 404 loud and this URL
gets bumped in the same PR as ntfyVersion.

VERSION bumped to 0.122.18.
2026-05-14 14:29:24 +03:00
anonpenguin23
7e47f42f91 fix(#72): install ntfy in pre-built path too — devnet path was missing it
Phase 2b auto-detects pre-built archive mode and routes to
installFromPreBuilt(). That path copies bundled binaries (caddy, orama,
gateway, …) into place but never called InstallNtfy() — because ntfy
is downloaded from upstream github, not bundled. Result: on devnet
(which always uses pre-built mode), ntfy never installed even though
the always-on code path in installFromSource() was correctly wired up.

Fix: add InstallNtfy() call to installFromPreBuilt right after the
binary deploy + setCapabilities steps, before disableResolvedStub
runs. Ordering matters because Phase 4's ConfigureNtfy chowns
/etc/ntfy/server.yml to the ntfy user, which needs to exist.

VERSION bumped to 0.122.17.
2026-05-14 12:16:28 +03:00
anonpenguin23
8b4abb7eef feat(#72): install ntfy on every node, drop --with-ntfy gating
ntfy is now part of the standard node install, just like Caddy. The
binary, /etc/ntfy/server.yml, and the Caddy push.<dnsZone> reverse-
proxy block are written unconditionally on every node, and the
ntfy.service starts as part of the standard service order.

Why uniform: ntfy listens on 127.0.0.1:NtfyListenPort only, reachable
exclusively via the local Caddy reverse-proxy block. Nodes that don't
serve a public push.* DNS entry just have an idle ntfy with no
inbound traffic — zero operational cost, zero attack surface change.
Removing the flag means no per-node toggling, no preference drift
between nodes, no "did we remember to set --with-ntfy" mistakes when
DNS topology changes (e.g. promoting a node to nameserver later).

Removed:
- NodePreferences.NtfyHost (yaml: ntfy_host)
- ProductionSetup.isNtfyHost field, SetNtfyHost, IsNtfyHost
- install/flags.go --with-ntfy + NtfyHost field
- upgrade/flags.go --with-ntfy + NtfyHost field + isFlagPassed helper
  (was only used for --with-ntfy tri-state semantics)
- upgrade/orchestrator.go preference-load and persist for ntfy
- upgrade/remote.go --with-ntfy forwarding

Phase 2 always calls InstallNtfy.
Phase 4 always calls EnableCaddyNtfyProxy + ConfigureNtfy.
Phase 5 always enables ntfy.service.
Phase 5b always starts ntfy.service.

VERSION bumped to 0.122.16.
2026-05-14 11:51:08 +03:00
anonpenguin23
8c37ef547e fix(upgrade): forward per-node flags to remote so --with-ntfy actually lands
`orama node upgrade --node <ip> --with-ntfy --restart` parsed the flag
locally but `upgradeNode()` ran a hardcoded
`orama node upgrade --restart` on the remote — dropping --with-ntfy,
--nameserver, --force, and --skip-checks on the floor. The remote
orchestrator then read the SAVED preference (or default false for
nameserver/ntfy), so operator overrides like enabling ntfy on a
nameserver were silently ignored. Bug surfaced in devnet today:
running --with-ntfy reported success but ntfy was never installed.

Fix forwards the four passthrough flags to the remote command,
preserving the tri-state semantics for the pointer flags (nil = honor
saved preference; non-nil = explicit override).

VERSION bumped to 0.122.15.
2026-05-14 11:44:47 +03:00
anonpenguin23
07638354d2 feat(#72): full-privacy push — self-hosted ntfy + APNs-direct provider
Migration 028: namespace_push_credentials
- Per-(namespace, provider) AES-256-GCM encrypted credential blob.
- Generic schema — apns/ntfy/expo/future plug in with zero migration.
- Separated from migration 026's namespace_push_config (preferences vs
  credentials, different access patterns).

pkg/push/credentials
- Manager + Registry + RQLite store; HKDF purpose "namespace-push-credentials"
  via pkg/secrets. Provider Validator interface for per-provider schema.

pkg/push/providers/apns
- Apple Push Notification service direct provider (no Expo proxy).
- Validator + dispatcher; credentials are p8 signing key + key_id + team_id.

pkg/push/providers/ntfy/credentials.go
- ntfy credential schema (auth_token + default topic). Used both with
  the public ntfy.sh and our self-hosted instance.

pkg/environments/production/installers/ntfy.go
- Self-hosted ntfy server installer. Binary, system user, hardened
  /etc/ntfy/server.yml, systemd unit. Listens on 127.0.0.1:NtfyListenPort
  only — Caddy is the only public path.

pkg/environments/production/installers/caddy.go
- Emit reverse_proxy block for push.<dnsZone> -> 127.0.0.1:NtfyListenPort
  when operator enables ntfy on a node.

CLI: install/upgrade orchestrators learn a new "ntfy" install/preserve
phase; flag gating in install/flags.go + upgrade/flags.go.

Gateway handlers/push/credentials_handler.go
- GET/PUT/DELETE /v1/namespace/push-credentials/{provider}.
- PUT validates against provider Validator before encrypting and storing.
- GET returns a redacted view (booleans + non-secret fields only).

Push manager: provider resolution now also consults
namespace_push_credentials before falling back to YAML defaults.

Docs: core/docs/PUSH_NOTIFICATIONS.md walks through end-to-end setup.

VERSION bumped to 0.122.14.
2026-05-14 10:48:00 +03:00
anonpenguin23
32a2a62e0d fix(caddy): disable HTTP/2 to keep WebSocket upgrade auth working (#249)
HTTP/2 forbids the `Connection: Upgrade` and `Upgrade: websocket`
headers per RFC 7540 §8.1.2.2. With h2 advertised at the listener,
ALPN negotiates h2 for TLS-capable clients, the WS-upgrade request
arrives at Caddy with those headers stripped, and Caddy forwards a
plain HTTP/1.1 GET to the gateway. The gateway's `isWebSocketUpgrade(r)`
then returns false, the `?api_key=` / `?jwt=` query-string WS-auth
fallback never runs, and clients see 401.

RFC 8441 ("Bootstrapping WebSockets with HTTP/2") fixes this, but iOS
RN and most other mobile WS libraries don't implement it. Until they
do, h1 is the only protocol that keeps WS auth working.

Trade-off: lose h2 multiplexing on plain HTTP traffic. Acceptable for
an API gateway whose dominant workload is REST + WebSocket — neither
benefits much from h2 streams.

caddy_test.go adds a regression guard so anyone re-enabling h2 in the
listener protocols fails CI loud.

Also (separate, was uncommitted): pkg/cli/build/builder.go now reads
VERSION from the repo-root /VERSION file first, falling back to
parsing the Makefile only if absent. The previous Makefile-only path
broke after VERSION moved to /VERSION (Makefile got `$(shell cat ...)`
which the CLI builder pulled in literally).

VERSION bumped to 0.122.13.
2026-05-14 07:50:47 +03:00
anonpenguin23
fda47533c3 feat: per-namespace rate-limit self-service + WS JWT auth + release 0.122.12
Per-namespace rate-limit config (feature #69)
- Migration 027: new `namespace_rate_limit_config` table
  (namespace PK, requests_per_minute, burst, audit metadata).
- pkg/ratelimit: Manager + RQLite ConfigStore + types. Same pattern
  as the push config in bug #220's follow-up — LRU cache, invalidate
  on PUT/DELETE, falls back to YAML defaults when no row exists.
- pkg/gateway/handlers/ratelimit: GET/PUT/DELETE /v1/namespace/rate-limit.
  PUT requests are rejected if they exceed the operator's configured
  ceiling (MaxRequestsPerMinute / MaxBurst) — tenants self-serve but
  cannot raise their quota past the cap.
- pkg/gateway/rate_limiter.go: per-namespace lookup, default fallback.
- pkg/gateway/middleware.go: WS JWT middleware (middleware_ws_jwt_test.go).
- pkg/gateway/auth/service.go: refresh-token rotation hardening with
  regression test in refresh_rotation_test.go.

AI agent instructions
- Add AGENTS.md, CLAUDE.md, .github/copilot-instructions.md (DeBros v0.2.0
  baseline).

DeBros rules bumped to v0.2.0 (sha bb6e6ef).

VERSION bumped to 0.122.12.
2026-05-13 15:41:36 +03:00
anonpenguin23
3676b000a6 chore: adopt DeBros DAO baseline rules + release 0.122.11
Standardization batch — no application code changes. Pulls in the
DeBros DAO baseline rules (v0.1.0, sha 51ce3f8) for supply-chain
defense and toolchain pinning.

Files added:
- DEBROS.md + debros.json — adopted-rules manifest
- .debros/compliance/{go,javascript-typescript,zig}.md — per-language
  compliance docs
- .github/workflows/security.yml — auto-detecting security CI
  (npm audit + go vulncheck), runs on main + weekly cron
- renovate.json — 30-day dependency cooldown, no auto-merge,
  vulnerability alerts bypass cooldown
- .nvmrc — pin Node 20.18.0
- vault/.zigversion — pin Zig 0.14.0
- sdk/.npmrc, website/.npmrc — supply-chain hardening
  (ignore-scripts, strict-peer-dependencies, save-exact, etc.)

Files modified:
- core/go.mod, os/agent/go.mod, website/invest-api/go.mod —
  add `toolchain go1.24.6` directive for reproducible builds
- VERSION + sdk/package.json — bump to 0.122.11
2026-05-12 11:10:10 +03:00
anonpenguin23
d990d0d6b3 release: 0.122.10 2026-05-12 10:14:53 +03:00
anonpenguin23
58d541d9ee ci: goreleaser v2 hooks need string form, bump to 0.122.9
GoReleaser v2.15.4 rejects the {cmd: ..., dir: ...} map syntax for
before.hooks even though v2 docs show it. Reverting to the simple
string form `go -C core mod tidy` that worked in v1.
2026-05-12 09:54:58 +03:00
anonpenguin23
8e4d11a6ce ci: single VERSION file, version guards, goreleaser v2, CI on push
Workflow hardening based on the four-cycle release-debugging session:

Centralized versioning
- Add /VERSION at repo root as single source of truth.
- core/Makefile reads VERSION via `$(shell cat ../VERSION)`.
- Add `make bump VER=X.Y.Z` target that updates /VERSION and syncs
  sdk/package.json in one shot.

Version mismatch guards
- All three release workflows (release.yaml, release-apt.yml,
  publish-sdk.yml) now verify the release tag matches /VERSION at the
  very first step. Stale-VERSION releases fail fast with a clear hint
  to run `make bump`.

GoReleaser v2 migration
- Upgrade goreleaser-action v5 -> v6 (pinned `~> v2`).
- Add `version: 2` to .goreleaser.yaml.
- Migrate to v2 syntax: `archives.format` -> `formats: [...]`,
  `brews.folder` -> `directory`, `snapshot.name_template` ->
  `version_template`, `builds`-style references replaced with `ids:`.
- `before.hooks` can use map syntax again (v2 supports it).

Homebrew tap on stable only
- `brews.skip_upload` is now `'{{ if .Prerelease }}true{{ else }}false{{ end }}'`.
- Stops nightly releases from polluting the tap and from hitting 401
  on stale HOMEBREW_TAP_TOKEN. Stable main releases still publish.

CI on every push
- New ci.yml runs `go vet` + `go test -race` on the core module and
  typecheck/build/unit-tests on the SDK for every push to main/nightly
  and every PR. version-sanity job warns when /VERSION and
  sdk/package.json drift.

Version bump for next pipeline test
- /VERSION: 0.122.8
- sdk/package.json: 0.122.8
2026-05-12 09:49:33 +03:00