mirror of
https://github.com/DeBrosOfficial/network.git
synced 2025-12-15 03:38:49 +00:00
165 lines
8.2 KiB
Markdown
165 lines
8.2 KiB
Markdown
<!-- ec358e91-8e19-4fc8-a81e-cb388a4b2fc9 4c357d4a-bae7-4fe2-943d-84e5d3d3714c -->
|
|
# Dynamic Database Clustering — Implementation Plan
|
|
|
|
### Scope
|
|
|
|
Implement the feature described in `DYNAMIC_DATABASE_CLUSTERING.md`: decentralized metadata via libp2p pubsub, dynamic per-database rqlite clusters (3-node default), idle hibernation/wake-up, node failure replacement, and client UX that exposes `cli.Database(name)` with app namespacing.
|
|
|
|
### Guiding Principles
|
|
|
|
- Reuse existing `pkg/pubsub` and `pkg/rqlite` where practical; avoid singletons.
|
|
- Backward-compatible config migration with deprecations, feature-flag controlled rollout.
|
|
- Strong eventual consistency (vector clocks + periodic gossip) over centralized control planes.
|
|
- Tests and observability at each phase.
|
|
|
|
### Phase 0: Prep & Scaffolding
|
|
|
|
- Add feature flag `dynamic_db_clustering` (env/config) → default off.
|
|
- Introduce config shape for new `database` fields while supporting legacy fields (soft deprecated).
|
|
- Create empty packages and interfaces to enable incremental compilation:
|
|
- `pkg/metadata/{types.go,manager.go,pubsub.go,consensus.go,vector_clock.go}`
|
|
- `pkg/dbcluster/{manager.go,lifecycle.go,subprocess.go,ports.go,health.go,metrics.go}`
|
|
- Ensure rqlite subprocess availability (binary path detection, `scripts/install-debros-network.sh` update if needed).
|
|
- Establish CI jobs for new unit/integration suites and longer-running e2e.
|
|
|
|
### Phase 1: Metadata Layer (No hibernation yet)
|
|
|
|
- Implement metadata types and store (RW locks, versioning) inside `pkg/rqlite/metadata.go`:
|
|
- `DatabaseMetadata`, `NodeCapacity`, `PortRange`, `MetadataStore`.
|
|
- Pubsub schema and handlers inside `pkg/rqlite/pubsub.go` using existing `pkg/pubsub` bridge:
|
|
- Topic `/debros/metadata/v1`; messages for create request/response/confirm, status, node capacity, health.
|
|
- Consensus helpers inside `pkg/rqlite/consensus.go` and `pkg/rqlite/vector_clock.go`:
|
|
- Deterministic coordinator (lowest peer ID), vector clocks, merge rules, periodic full-state gossip (checksums + fetch diffs).
|
|
- Reuse existing node connectivity/backoff; no new ping service required.
|
|
- Skip unit tests for now; validate by wiring e2e flows later.
|
|
|
|
### Phase 2: Database Creation & Client API
|
|
|
|
- Port management:
|
|
- `PortManager` with bind-probing, random allocation within configured ranges; local bookkeeping.
|
|
- Subprocess control:
|
|
- `RQLiteInstance` lifecycle (start, wait ready via /status and simple query, stop, status).
|
|
- Cluster manager:
|
|
- `ClusterManager` keeps `activeClusters`, listens to metadata events, executes creation protocol, readiness fan-in, failure surfaces.
|
|
- Client API:
|
|
- Update `pkg/client/interface.go` to include `Database(name string)`.
|
|
- Implement app namespacing in `pkg/client/client.go` (sanitize app name + db name).
|
|
- Backoff polling for readiness during creation.
|
|
- Data isolation:
|
|
- Data dir per db: `./data/<app>_<db>/rqlite` (respect node `data_dir` base).
|
|
- Integration tests: create single db across 3 nodes; multiple databases coexisting; cross-node read/write.
|
|
|
|
### Phase 3: Hibernation & Wake-Up
|
|
|
|
- Idle detection and coordination:
|
|
- Track `LastQuery` per instance; periodic scan; all-nodes-idle quorum → coordinated shutdown schedule.
|
|
- Hibernation protocol:
|
|
- Broadcast idle notices, coordinator schedules `DATABASE_SHUTDOWN_COORDINATED`, graceful SIGTERM, ports freed, status → `hibernating`.
|
|
- Wake-up protocol:
|
|
- Client detects `hibernating`, performs CAS to `waking`, triggers wake request; port reuse if available else re-negotiate; start instances; status → `active`.
|
|
- Client retry UX:
|
|
- Transparent retries with exponential backoff; treat `waking` as wait-only state.
|
|
- Tests: hibernation under load; thundering herd; resource verification and persistence across cycles.
|
|
|
|
### Phase 4: Resilience (Failure & Replacement)
|
|
|
|
- Continuous health checks with timeouts → mark node unhealthy.
|
|
- Replacement orchestration:
|
|
- Coordinator initiates `NODE_REPLACEMENT_NEEDED`, eligible nodes respond, confirm selection, new node joins raft via `-join` then syncs.
|
|
- Startup reconciliation:
|
|
- Detect and cleanup orphaned or non-member local data directories.
|
|
- Rate limiting replacements to prevent cascades; prioritize by usage metrics.
|
|
- Tests: forced crashes, partitions, replacement within target SLO; reconciliation sanity.
|
|
|
|
### Phase 5: Production Hardening & Optimization
|
|
|
|
- Metrics/logging:
|
|
- Structured logs with trace IDs; counters for queries/min, hibernations, wake-ups, replacements; health and capacity gauges.
|
|
- Config validation, replication factor settings (1,3,5), and debugging APIs (read-only metadata dump, node status).
|
|
- Client metadata caching and query routing improvements (simple round-robin → latency-aware later).
|
|
- Performance benchmarks and operator-facing docs.
|
|
|
|
### File Changes (Essentials)
|
|
|
|
- `pkg/config/config.go`
|
|
- Remove (deprecate, then delete): `Database.DataDir`, `RQLitePort`, `RQLiteRaftPort`, `RQLiteJoinAddress`.
|
|
- Add: `ReplicationFactor int`, `HibernationTimeout time.Duration`, `MaxDatabases int`, `PortRange {HTTPStart, HTTPEnd, RaftStart, RaftEnd int}`, `Discovery.HealthCheckInterval`.
|
|
- `pkg/client/interface.go`/`pkg/client/client.go`
|
|
- Add `Database(name string)` and app namespace requirement (`DefaultClientConfig(appName)`); backoff polling.
|
|
- `pkg/node/node.go`
|
|
- Wire `metadata.Manager` and `dbcluster.ClusterManager`; remove direct rqlite singleton usage.
|
|
- `pkg/rqlite/*`
|
|
- Refactor to instance-oriented helpers from singleton.
|
|
- New packages under `pkg/metadata` and `pkg/dbcluster` as listed above.
|
|
- `configs/node.yaml` and validation paths to reflect new `database` block.
|
|
|
|
### Config Example (target end-state)
|
|
|
|
```yaml
|
|
node:
|
|
data_dir: "./data"
|
|
|
|
database:
|
|
replication_factor: 3
|
|
hibernation_timeout: 60
|
|
max_databases: 100
|
|
port_range:
|
|
http_start: 5001
|
|
http_end: 5999
|
|
raft_start: 7001
|
|
raft_end: 7999
|
|
|
|
discovery:
|
|
health_check_interval: 10s
|
|
```
|
|
|
|
### Rollout Strategy
|
|
|
|
- Keep feature flag off by default; support legacy single-cluster path.
|
|
- Ship Phase 1 behind flag; enable in dev/e2e only.
|
|
- Incrementally enable creation (Phase 2), then hibernation (Phase 3) per environment.
|
|
- Remove legacy config after deprecation window.
|
|
|
|
### Testing & Quality Gates
|
|
|
|
- Unit tests: metadata ops, consensus, ports, subprocess, manager state machine.
|
|
- Integration tests under `e2e/` for creation, isolation, hibernation, failure handling, partitions.
|
|
- Benchmarks for creation (<10s), wake-up (<8s), metadata sync (<5s), query overhead (<10ms).
|
|
- Chaos suite for randomized failures and partitions.
|
|
|
|
### Risks & Mitigations (operationalized)
|
|
|
|
- Metadata divergence → vector clocks + periodic checksums + majority read checks in client.
|
|
- Raft churn → adaptive timeouts; allow `always_on` flag per-db (future).
|
|
- Cascading replacements → global rate limiter and prioritization.
|
|
- Debuggability → verbose structured logging and metadata dump endpoints.
|
|
|
|
### Timeline (indicative)
|
|
|
|
- Weeks 1-2: Phases 0-1
|
|
- Weeks 3-4: Phase 2
|
|
- Weeks 5-6: Phase 3
|
|
- Weeks 7-8: Phase 4
|
|
- Weeks 9-10+: Phase 5
|
|
|
|
### To-dos
|
|
|
|
- [ ] Add feature flag, scaffold packages, CI jobs, rqlite binary checks
|
|
- [ ] Extend `pkg/config/config.go` and YAML schemas; deprecate legacy fields
|
|
- [ ] Implement metadata types and thread-safe store with versioning
|
|
- [ ] Implement pubsub messages and handlers using existing pubsub manager
|
|
- [ ] Implement coordinator election, vector clocks, gossip reconciliation
|
|
- [ ] Implement `PortManager` with bind-probing and allocation
|
|
- [ ] Implement rqlite subprocess control and readiness checks
|
|
- [ ] Implement `ClusterManager` and creation lifecycle orchestration
|
|
- [ ] Add `Database(name)` and app namespacing to client; backoff polling
|
|
- [ ] Adopt per-database data dirs under node `data_dir`
|
|
- [ ] Integration tests for creation and isolation across nodes
|
|
- [ ] Idle detection, coordinated shutdown, status updates
|
|
- [ ] Wake-up CAS to `waking`, port reuse/renegotiation, restart
|
|
- [ ] Client transparent retry/backoff for hibernation and waking
|
|
- [ ] Health checks, replacement orchestration, rate limiting
|
|
- [ ] Implement orphaned data reconciliation on startup
|
|
- [ ] Add metrics and structured logging across managers
|
|
- [ ] Benchmarks for creation, wake-up, sync, query overhead
|
|
- [ ] Operator and developer docs; config and migration guides |