DeBros/network

Fork 0

mirror of https://github.com/DeBrosOfficial/network.git synced 2025-12-13 03:08:49 +00:00

anonpenguin23 f2d5a0790e

started working on clustering

2025-10-13 07:41:46 +03:00

13 KiB

Raw Blame History

Dynamic Database Clustering - User Guide

Overview

Dynamic Database Clustering enables on-demand creation of isolated, replicated rqlite database clusters with automatic resource management through hibernation. Each database runs as a separate 3-node cluster with its own data directory and port allocation.

Key Features

✅ Multi-Database Support - Create unlimited isolated databases on-demand
✅ 3-Node Replication - Fault-tolerant by default (configurable)
✅ Auto Hibernation - Idle databases hibernate to save resources
✅ Transparent Wake-Up - Automatic restart on access
✅ App Namespacing - Databases are scoped by application name
✅ Decentralized Metadata - LibP2P pubsub-based coordination
✅ Failure Recovery - Automatic node replacement on failures
✅ Resource Optimization - Dynamic port allocation and data isolation

Configuration

Node Configuration (`configs/node.yaml`)

node:
  data_dir: "./data"
  listen_addresses:
    - "/ip4/0.0.0.0/tcp/4001"
  max_connections: 50

database:
  replication_factor: 3           # Number of replicas per database
  hibernation_timeout: 60s        # Idle time before hibernation
  max_databases: 100              # Max databases per node
  port_range_http_start: 5001     # HTTP port range start
  port_range_http_end: 5999       # HTTP port range end
  port_range_raft_start: 7001     # Raft port range start
  port_range_raft_end: 7999       # Raft port range end

discovery:
  bootstrap_peers:
    - "/ip4/127.0.0.1/tcp/4001/p2p/..."
  discovery_interval: 30s
  health_check_interval: 10s

Key Configuration Options

`database.replication_factor` (default: 3)

Number of nodes that will host each database cluster. Minimum 1, recommended 3 for fault tolerance.

`database.hibernation_timeout` (default: 60s)

Time of inactivity before a database is hibernated. Set to 0 to disable hibernation.

`database.max_databases` (default: 100)

Maximum number of databases this node can host simultaneously.

`database.port_range_*`

Port ranges for dynamic allocation. Ensure ranges are large enough for max_databases * 2 ports (HTTP + Raft per database).

Client Usage

Creating/Accessing Databases

package main

import (
    "context"
    "github.com/DeBrosOfficial/network/pkg/client"
)

func main() {
    // Create client with app name for namespacing
    cfg := client.DefaultClientConfig("myapp")
    cfg.BootstrapPeers = []string{
        "/ip4/127.0.0.1/tcp/4001/p2p/...",
    }
    
    c, err := client.NewClient(cfg)
    if err != nil {
        panic(err)
    }
    
    // Connect to network
    if err := c.Connect(); err != nil {
        panic(err)
    }
    defer c.Disconnect()
    
    // Get database client (creates database if it doesn't exist)
    db, err := c.Database().Database("users")
    if err != nil {
        panic(err)
    }
    
    // Use the database
    ctx := context.Background()
    err = db.CreateTable(ctx, `
        CREATE TABLE users (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            email TEXT UNIQUE
        )
    `)
    
    // Query data
    result, err := db.Query(ctx, "SELECT * FROM users")
    // ...
}

Database Naming

Databases are automatically namespaced by your application name:

client.Database("users") → creates myapp_users internally
This prevents name collisions between different applications

Gateway API Usage

If you prefer HTTP/REST API access instead of the Go client, you can use the gateway endpoints:

Base URL

http://gateway-host:8080/v1/database/

Execute SQL (INSERT, UPDATE, DELETE, DDL)

POST /v1/database/exec
Content-Type: application/json

{
  "database": "users",
  "sql": "INSERT INTO users (name, email) VALUES (?, ?)",
  "args": ["Alice", "alice@example.com"]
}

Response:
{
  "rows_affected": 1,
  "last_insert_id": 1
}

Query Data (SELECT)

POST /v1/database/query
Content-Type: application/json

{
  "database": "users",
  "sql": "SELECT * FROM users WHERE name LIKE ?",
  "args": ["A%"]
}

Response:
{
  "items": [
    {"id": 1, "name": "Alice", "email": "alice@example.com"}
  ],
  "count": 1
}

Execute Transaction

POST /v1/database/transaction
Content-Type: application/json

{
  "database": "users",
  "queries": [
    "INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com')",
    "UPDATE users SET email = 'alice.new@example.com' WHERE name = 'Alice'"
  ]
}

Response:
{
  "success": true
}

Get Schema

GET /v1/database/schema?database=users

# OR

POST /v1/database/schema
Content-Type: application/json

{
  "database": "users"
}

Response:
{
  "tables": [
    {
      "name": "users",
      "columns": ["id", "name", "email"]
    }
  ]
}

Create Table

POST /v1/database/create-table
Content-Type: application/json

{
  "database": "users",
  "schema": "CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, email TEXT)"
}

Response:
{
  "rows_affected": 0
}

Drop Table

POST /v1/database/drop-table
Content-Type: application/json

{
  "database": "users",
  "table_name": "old_table"
}

Response:
{
  "rows_affected": 0
}

List Databases

GET /v1/database/list

Response:
{
  "databases": ["users", "products", "orders"]
}

Important Notes

Authentication Required: All endpoints require authentication (JWT or API key)
Database Creation: Databases are created automatically on first access
Hibernation: The gateway handles hibernation/wake-up transparently - you may experience a delay (< 8s) on first query to a hibernating database
Timeouts: Query timeout is 30s, transaction timeout is 60s
Namespacing: Database names are automatically prefixed with your app name
Concurrent Access: All endpoints are safe for concurrent use

Database Lifecycle

1. Creation

When you first access a database:

Request Broadcast - Node broadcasts DATABASE_CREATE_REQUEST
Node Selection - Eligible nodes respond with available ports
Coordinator Selection - Deterministic coordinator (lowest peer ID) chosen
Confirmation - Coordinator selects nodes and broadcasts DATABASE_CREATE_CONFIRM
Instance Startup - Selected nodes start rqlite subprocesses
Readiness - Nodes report active status when ready

Typical creation time: < 10 seconds

2. Active State

Database instances run as rqlite subprocesses
Each instance tracks LastQuery timestamp
Queries update the activity timestamp
Metadata synced across all network nodes

3. Hibernation

After hibernation_timeout of inactivity:

Idle Detection - Nodes detect idle databases
Idle Notification - Nodes broadcast idle status
Coordinated Shutdown - When all nodes report idle, coordinator schedules shutdown
Graceful Stop - SIGTERM sent to rqlite processes
Port Release - Ports freed for reuse
Status Update - Metadata updated to hibernating

Data persists on disk during hibernation

4. Wake-Up

On first query to hibernating database:

Detection - Client/node detects hibernating status
Wake Request - Broadcast DATABASE_WAKEUP_REQUEST
Port Allocation - Reuse original ports or allocate new ones
Instance Restart - Restart rqlite with existing data
Status Update - Update to active when ready

Typical wake-up time: < 8 seconds

5. Failure Recovery

When a node fails:

Health Detection - Missed health checks trigger failure detection
Replacement Request - Surviving nodes broadcast NODE_REPLACEMENT_NEEDED
Offers - Healthy nodes with capacity offer to replace
Selection - First offer accepted (simple approach)
Join Cluster - New node joins existing Raft cluster
Sync - Data synced from existing members

Data Management

Data Directories

Each database gets its own data directory:

./data/
  ├── myapp_users/        # Database: users
  │   └── rqlite/
  │       ├── db.sqlite
  │       └── raft/
  ├── myapp_products/     # Database: products
  │   └── rqlite/
  └── myapp_orders/       # Database: orders
      └── rqlite/

Orphaned Data Cleanup

On node startup, the system automatically:

Scans data directories
Checks against metadata
Removes directories for:
- Non-existent databases
- Databases where this node is not a member

Monitoring & Debugging

Structured Logging

All operations are logged with structured fields:

INFO  Starting cluster manager node_id=12D3... max_databases=100
INFO  Received database create request database=myapp_users requester=12D3...
INFO  Database instance started database=myapp_users http_port=5001 raft_port=7001
INFO  Database is idle database=myapp_users idle_time=62s
INFO  Database hibernated successfully database=myapp_users
INFO  Received wakeup request database=myapp_users
INFO  Database woke up successfully database=myapp_users

Health Checks

Nodes perform periodic health checks:

Every health_check_interval (default: 10s)
Tracks last-seen time for each peer
3 missed checks → node marked unhealthy
Triggers replacement protocol for affected databases

Best Practices

1. Capacity Planning

# For 100 databases with 3-node replication:
database:
  max_databases: 100
  port_range_http_start: 5001
  port_range_http_end: 5200    # 200 ports (100 databases * 2)
  port_range_raft_start: 7001
  port_range_raft_end: 7200

2. Hibernation Tuning

High Traffic: Set hibernation_timeout: 300s or higher
Development: Set hibernation_timeout: 30s for faster cycles
Always-On DBs: Set hibernation_timeout: 0 to disable

3. Replication Factor

Development: replication_factor: 1 (single node, no replication)
Production: replication_factor: 3 (fault tolerant)
High Availability: replication_factor: 5 (survives 2 failures)

4. Network Topology

Use at least 3 nodes for replication_factor: 3
Ensure max_databases * replication_factor <= total_cluster_capacity
Example: 3 nodes × 100 max_databases = 300 database instances total

Troubleshooting

Database Creation Fails

Problem: insufficient nodes responded: got 1, need 3

Solution:

Ensure you have at least replication_factor nodes online
Check max_databases limit on nodes
Verify port ranges aren't exhausted

Database Not Waking Up

Problem: Database stays in waking status

Solution:

Check node logs for rqlite startup errors
Verify rqlite binary is installed
Check port conflicts (use different port ranges)
Ensure data directory is accessible

Orphaned Data

Problem: Disk space consumed by old databases

Solution:

Orphaned data is automatically cleaned on node restart
Manual cleanup: Delete directories from ./data/ that don't match metadata
Check logs for reconciliation results

Node Replacement Not Working

Problem: Failed node not replaced

Solution:

Ensure remaining nodes have capacity (CurrentDatabases < MaxDatabases)
Check network connectivity between nodes
Verify health check interval is reasonable (not too aggressive)

Advanced Topics

Metadata Consistency

Vector Clocks: Each metadata update includes vector clock for conflict resolution
Gossip Protocol: Periodic metadata sync via checksums
Eventual Consistency: All nodes eventually agree on database state

Port Management

Ports allocated randomly within configured ranges
Bind-probing ensures ports are actually available
Ports reused during wake-up when possible
Failed allocations fall back to new random ports

Coordinator Election

Deterministic selection based on lexicographical peer ID ordering
Lowest peer ID becomes coordinator
No persistent coordinator state
Re-election occurs for each database operation

Migration from Legacy Mode

If upgrading from single-cluster rqlite:

Backup Data: Backup your existing ./data/rqlite directory
Update Config: Remove deprecated fields:
- database.data_dir
- database.rqlite_port
- database.rqlite_raft_port
- database.rqlite_join_address
Add New Fields: Configure dynamic clustering (see Configuration section)
Restart Nodes: Restart all nodes with new configuration
Migrate Data: Create new database and import data from backup

Future Enhancements

The following features are planned for future releases:

Advanced Metrics (Future)

Prometheus-style metrics export
Per-database query counters
Hibernation/wake-up latency histograms
Resource utilization gauges

Performance Benchmarks (Future)

Automated benchmark suite
Creation time SLOs
Wake-up latency targets
Query overhead measurements

Enhanced Monitoring (Future)

Dashboard for cluster visualization
Database status API endpoint
Capacity planning tools
Alerting integration

Support

For issues, questions, or contributions:

GitHub Issues: https://github.com/DeBrosOfficial/network/issues
Documentation: https://github.com/DeBrosOfficial/network/blob/main/DYNAMIC_DATABASE_CLUSTERING.md

License

See LICENSE file for details.

13 KiB Raw Blame History Unescape Escape

Dynamic Database Clustering - User Guide

Overview

Key Features

Configuration

Node Configuration (configs/node.yaml)

Key Configuration Options

database.replication_factor (default: 3)

database.hibernation_timeout (default: 60s)

database.max_databases (default: 100)

database.port_range_*

Client Usage

Creating/Accessing Databases

Database Naming

Gateway API Usage

Base URL

Execute SQL (INSERT, UPDATE, DELETE, DDL)

Query Data (SELECT)

Execute Transaction

Get Schema

Create Table

Drop Table

List Databases

Important Notes

Database Lifecycle

1. Creation

2. Active State

3. Hibernation

4. Wake-Up

5. Failure Recovery

Data Management

Data Directories

Orphaned Data Cleanup

Monitoring & Debugging

Structured Logging

Health Checks

Best Practices

1. Capacity Planning

2. Hibernation Tuning

3. Replication Factor

4. Network Topology

Troubleshooting

Database Creation Fails

Database Not Waking Up

Orphaned Data

Node Replacement Not Working

Advanced Topics

Metadata Consistency

Port Management

Coordinator Election

Migration from Legacy Mode

Future Enhancements

Advanced Metrics (Future)

Performance Benchmarks (Future)

Enhanced Monitoring (Future)

Support

License

13 KiB

Raw Blame History

Node Configuration (`configs/node.yaml`)

`database.replication_factor` (default: 3)

`database.hibernation_timeout` (default: 60s)

`database.max_databases` (default: 100)

`database.port_range_*`