mirror of
https://github.com/DeBrosOfficial/network.git
synced 2025-12-15 04:28:49 +00:00
8.2 KiB
8.2 KiB
Dynamic Database Clustering — Implementation Plan
Scope
Implement the feature described in DYNAMIC_DATABASE_CLUSTERING.md: decentralized metadata via libp2p pubsub, dynamic per-database rqlite clusters (3-node default), idle hibernation/wake-up, node failure replacement, and client UX that exposes cli.Database(name) with app namespacing.
Guiding Principles
- Reuse existing
pkg/pubsubandpkg/rqlitewhere practical; avoid singletons. - Backward-compatible config migration with deprecations, feature-flag controlled rollout.
- Strong eventual consistency (vector clocks + periodic gossip) over centralized control planes.
- Tests and observability at each phase.
Phase 0: Prep & Scaffolding
- Add feature flag
dynamic_db_clustering(env/config) → default off. - Introduce config shape for new
databasefields while supporting legacy fields (soft deprecated). - Create empty packages and interfaces to enable incremental compilation:
pkg/metadata/{types.go,manager.go,pubsub.go,consensus.go,vector_clock.go}pkg/dbcluster/{manager.go,lifecycle.go,subprocess.go,ports.go,health.go,metrics.go}
- Ensure rqlite subprocess availability (binary path detection,
scripts/install-debros-network.shupdate if needed). - Establish CI jobs for new unit/integration suites and longer-running e2e.
Phase 1: Metadata Layer (No hibernation yet)
- Implement metadata types and store (RW locks, versioning) inside
pkg/rqlite/metadata.go:DatabaseMetadata,NodeCapacity,PortRange,MetadataStore.
- Pubsub schema and handlers inside
pkg/rqlite/pubsub.gousing existingpkg/pubsubbridge:- Topic
/debros/metadata/v1; messages for create request/response/confirm, status, node capacity, health.
- Topic
- Consensus helpers inside
pkg/rqlite/consensus.goandpkg/rqlite/vector_clock.go:- Deterministic coordinator (lowest peer ID), vector clocks, merge rules, periodic full-state gossip (checksums + fetch diffs).
- Reuse existing node connectivity/backoff; no new ping service required.
- Skip unit tests for now; validate by wiring e2e flows later.
Phase 2: Database Creation & Client API
- Port management:
PortManagerwith bind-probing, random allocation within configured ranges; local bookkeeping.
- Subprocess control:
RQLiteInstancelifecycle (start, wait ready via /status and simple query, stop, status).
- Cluster manager:
ClusterManagerkeepsactiveClusters, listens to metadata events, executes creation protocol, readiness fan-in, failure surfaces.
- Client API:
- Update
pkg/client/interface.goto includeDatabase(name string). - Implement app namespacing in
pkg/client/client.go(sanitize app name + db name). - Backoff polling for readiness during creation.
- Update
- Data isolation:
- Data dir per db:
./data/<app>_<db>/rqlite(respect nodedata_dirbase).
- Data dir per db:
- Integration tests: create single db across 3 nodes; multiple databases coexisting; cross-node read/write.
Phase 3: Hibernation & Wake-Up
- Idle detection and coordination:
- Track
LastQueryper instance; periodic scan; all-nodes-idle quorum → coordinated shutdown schedule.
- Track
- Hibernation protocol:
- Broadcast idle notices, coordinator schedules
DATABASE_SHUTDOWN_COORDINATED, graceful SIGTERM, ports freed, status →hibernating.
- Broadcast idle notices, coordinator schedules
- Wake-up protocol:
- Client detects
hibernating, performs CAS towaking, triggers wake request; port reuse if available else re-negotiate; start instances; status →active.
- Client detects
- Client retry UX:
- Transparent retries with exponential backoff; treat
wakingas wait-only state.
- Transparent retries with exponential backoff; treat
- Tests: hibernation under load; thundering herd; resource verification and persistence across cycles.
Phase 4: Resilience (Failure & Replacement)
- Continuous health checks with timeouts → mark node unhealthy.
- Replacement orchestration:
- Coordinator initiates
NODE_REPLACEMENT_NEEDED, eligible nodes respond, confirm selection, new node joins raft via-jointhen syncs.
- Coordinator initiates
- Startup reconciliation:
- Detect and cleanup orphaned or non-member local data directories.
- Rate limiting replacements to prevent cascades; prioritize by usage metrics.
- Tests: forced crashes, partitions, replacement within target SLO; reconciliation sanity.
Phase 5: Production Hardening & Optimization
- Metrics/logging:
- Structured logs with trace IDs; counters for queries/min, hibernations, wake-ups, replacements; health and capacity gauges.
- Config validation, replication factor settings (1,3,5), and debugging APIs (read-only metadata dump, node status).
- Client metadata caching and query routing improvements (simple round-robin → latency-aware later).
- Performance benchmarks and operator-facing docs.
File Changes (Essentials)
pkg/config/config.go- Remove (deprecate, then delete):
Database.DataDir,RQLitePort,RQLiteRaftPort,RQLiteJoinAddress. - Add:
ReplicationFactor int,HibernationTimeout time.Duration,MaxDatabases int,PortRange {HTTPStart, HTTPEnd, RaftStart, RaftEnd int},Discovery.HealthCheckInterval.
- Remove (deprecate, then delete):
pkg/client/interface.go/pkg/client/client.go- Add
Database(name string)and app namespace requirement (DefaultClientConfig(appName)); backoff polling.
- Add
pkg/node/node.go- Wire
metadata.Manageranddbcluster.ClusterManager; remove direct rqlite singleton usage.
- Wire
pkg/rqlite/*- Refactor to instance-oriented helpers from singleton.
- New packages under
pkg/metadataandpkg/dbclusteras listed above. configs/node.yamland validation paths to reflect newdatabaseblock.
Config Example (target end-state)
node:
data_dir: "./data"
database:
replication_factor: 3
hibernation_timeout: 60
max_databases: 100
port_range:
http_start: 5001
http_end: 5999
raft_start: 7001
raft_end: 7999
discovery:
health_check_interval: 10s
Rollout Strategy
- Keep feature flag off by default; support legacy single-cluster path.
- Ship Phase 1 behind flag; enable in dev/e2e only.
- Incrementally enable creation (Phase 2), then hibernation (Phase 3) per environment.
- Remove legacy config after deprecation window.
Testing & Quality Gates
- Unit tests: metadata ops, consensus, ports, subprocess, manager state machine.
- Integration tests under
e2e/for creation, isolation, hibernation, failure handling, partitions. - Benchmarks for creation (<10s), wake-up (<8s), metadata sync (<5s), query overhead (<10ms).
- Chaos suite for randomized failures and partitions.
Risks & Mitigations (operationalized)
- Metadata divergence → vector clocks + periodic checksums + majority read checks in client.
- Raft churn → adaptive timeouts; allow
always_onflag per-db (future). - Cascading replacements → global rate limiter and prioritization.
- Debuggability → verbose structured logging and metadata dump endpoints.
Timeline (indicative)
- Weeks 1-2: Phases 0-1
- Weeks 3-4: Phase 2
- Weeks 5-6: Phase 3
- Weeks 7-8: Phase 4
- Weeks 9-10+: Phase 5
To-dos
- Add feature flag, scaffold packages, CI jobs, rqlite binary checks
- Extend
pkg/config/config.goand YAML schemas; deprecate legacy fields - Implement metadata types and thread-safe store with versioning
- Implement pubsub messages and handlers using existing pubsub manager
- Implement coordinator election, vector clocks, gossip reconciliation
- Implement
PortManagerwith bind-probing and allocation - Implement rqlite subprocess control and readiness checks
- Implement
ClusterManagerand creation lifecycle orchestration - Add
Database(name)and app namespacing to client; backoff polling - Adopt per-database data dirs under node
data_dir - Integration tests for creation and isolation across nodes
- Idle detection, coordinated shutdown, status updates
- Wake-up CAS to
waking, port reuse/renegotiation, restart - Client transparent retry/backoff for hibernation and waking
- Health checks, replacement orchestration, rate limiting
- Implement orphaned data reconciliation on startup
- Add metrics and structured logging across managers
- Benchmarks for creation, wake-up, sync, query overhead
- Operator and developer docs; config and migration guides