GPU on Demand POC: When Architecture Meets Reality

December 21, 2025

The Gap Between Vision and Version 1

In my last post, I wrote about architecting an AI orchestrator with grand visions of auto-scaling controllers, queue-based request routing, and multi-provider GPU orchestration. The architecture looked beautiful in ASCII diagrams.

Then I sat down to actually build it.

This is the story of what happened when theory met practice - the decisions we made, the complexity we cut, and why our actual POC looks nothing like the original architecture (and that's totally fine).

TL;DR - What Changed?

Original Vision:

K8s Cluster → API Gateway → Request Queue → Auto-Scaler Controller
→ SSH Tunnel Pod → vast.ai GPU → vLLM

Actual POC:

Client → Trigger API (K8s) → vast.ai API → GPU Instance
Client → Direct HTTPS → GPU Instance vLLM ✅

Key Simplifications:

  • ❌ No request queue (Redis/RabbitMQ)
  • ❌ No auto-scaler controller monitoring queue depth
  • ❌ No SSH tunnel complexity
  • ✅ Simple trigger API with API key auth
  • ✅ Direct client access to GPU instances
  • ✅ Hetzner S3 for model storage (specific, not generic)
  • ✅ Prometheus remote write (detailed implementation)

Let's dive into why we made these decisions.


Decision 1: Direct Access vs. Proxy Pattern

Original Plan: SSH Tunnel from K8s

The initial architecture had this whole SSH tunnel setup:

# What I thought I needed
apiVersion: v1
kind: Pod
metadata:
  name: vastai-tunnel
spec:
  containers:
  - name: ssh-tunnel
    command: ["/bin/sh", "-c"]
    args:
      - ssh -L 0.0.0.0:8000:localhost:8000 root@vastai-host -N

The reasoning was solid:

  • All traffic routes through K8s (easier monitoring)
  • Centralized access control
  • Clients never see vast.ai IPs directly

What We Chose: Direct Client → GPU Access

# Actual POC flow
curl https://123.45.67.89:8000/v1/completions \
  -H "Authorization: Bearer ${VASTAI_TOKEN}" \
  -d '{"model": "llama-3.1-8b", "prompt": "..."}'

Why we simplified:

  1. SSH tunnels add latency - Every request goes K8s → SSH → GPU instead of Client → GPU
  2. Operational complexity - Managing SSH keys, tunnel health checks, reconnection logic
  3. POC goal clarity - We're validating GPU provisioning, not building a production proxy
  4. vast.ai already has HTTPS - The instances support direct HTTPS access with bearer tokens

The hard truth: We were solving problems we don't have yet. For a POC validating "$0.35/hr GPU instances can run vLLM inference," the SSH tunnel is architectural gold-plating.

When we'll add it back: Production migration to RunPod, when we need:

  • Centralized rate limiting
  • Request logging/audit trails
  • Client IP hiding
  • Load balancing across multiple GPUs

Decision 2: Trigger API vs. Queue-Based Auto-Scaling

Original Plan: Queue Depth Monitoring

The vision was beautiful:

# Imagined auto-scaler logic
while True:
    queue_depth = redis.llen('inference_queue')

    if queue_depth > 10 and active_gpus == 0:
        provision_gpu_instance()
    elif queue_depth == 0 and gpu_idle_time > 900:  # 15min
        terminate_gpu_instance()

    time.sleep(30)

What this requires:

  • Redis/RabbitMQ for request queue
  • Background worker monitoring queue depth
  • State management for active instances
  • Request routing logic (which GPU gets which request?)
  • Graceful shutdown handling (in-flight requests)

What We Built: On-Demand Trigger API

# FastAPI endpoint - actual POC
@app.post("/provision")
async def provision_gpu(request: ProvisionRequest):
    # Search vast.ai for available instance
    offers = vastai.search_offers(
        gpu_name="RTX 4090",
        max_price=0.50
    )

    # Create instance
    instance = vastai.create_instance(
        offer_id=offers[0].id,
        image="vllm/vllm-openai:latest"
    )

    # Wait for SSH, bootstrap vLLM
    await bootstrap_instance(instance.id)

    return {
        "endpoint": f"https://{instance.ip}:8000",
        "id": instance.id
    }

Why we simplified:

  1. POC usage pattern - We're manually testing, not handling production traffic
  2. Complexity explosion - Queue-based scaling is 5x the code for 0x the POC value
  3. Vast.ai reality - Instance startup takes 60-120 seconds anyway, queue won't help cold starts
  4. Iteration speed - We can test provisioning in 2 minutes vs. 2 weeks building queue infrastructure

The key insight: For a POC, "on-demand" can literally mean "I call an API when I want a GPU." We don't need to auto-scale from zero usage yet.

When we'll add queues: When we have:

  • Multiple concurrent users
  • Unpredictable request patterns
  • Need to batch requests for cost efficiency
  • SLA requirements for response time

Right now? We have none of that.


Decision 3: Hetzner S3 for Model Storage

Original Plan: "Model Registry & Cache"

The original architecture had a vague box labeled "Model Registry & Cache" with bullet points:

  • Model weights storage ✓
  • Version management ✓

Cool. How do we actually build that?

What We Specified: Hetzner Object Storage (S3)

# Actual implementation
# 1. Upload model to Hetzner S3 (one-time setup)
aws s3 cp models/llama-3.1-8b-instruct/ \
  s3://gpu-inference-models/llama-3.1-8b-instruct/ \
  --endpoint-url=https://fsn1.your-objectstorage.com \
  --recursive

# 2. Bootstrap script on GPU instance downloads model
aws s3 cp s3://gpu-inference-models/llama-3.1-8b-instruct \
  /models/llama-3.1-8b-instruct \
  --endpoint-url=$HETZNER_S3_ENDPOINT \
  --recursive

# 3. Start vLLM with downloaded model
python -m vllm.entrypoints.openai.api_server \
  --model /models/llama-3.1-8b-instruct \
  --port 8000

Why Hetzner S3 specifically:

  1. Regional proximity - Hetzner's Falkenstein datacenter → vast.ai Europe instances = faster downloads
  2. Cost - €0.0049/GB storage + €0.01/GB egress (first 1TB free)
  3. S3-compatible API - Works with standard AWS CLI/boto3
  4. Already using Hetzner - Same provider, same authentication model
  5. Terraform integration - We can provision buckets alongside K8s resources

Cost comparison for 8B model (~16GB):

Provider Storage Egress (10x downloads/mo) Total
Hetzner S3 €0.08/mo €1.60 €1.68/mo
AWS S3 eu-central-1 $0.18/mo $9.00 $9.18/mo
Cloudflare R2 $0.15/mo $0.00 $0.15/mo

Why not Cloudflare R2 (cheapest)? For a POC, staying in the Hetzner ecosystem is worth €1.53/mo to avoid managing another provider's credentials and networking.

Tradeoff we accepted: Vendor concentration. We're all-in on Hetzner for now (K8s + S3). That's fine for POC, we can migrate later if needed.


Decision 4: Full Automation from Day 1

Original Plan: Manual POC → Automation Later

The original roadmap was:

Phase 1: Manually provision vast.ai (web UI), SSH in, run commands Phase 2: Build automation

What We're Actually Doing: Automate Everything Immediately

# Instance manager - from POC day 1
async def provision_and_bootstrap(model_name: str):
    """Fully automated GPU provisioning"""

    # 1. Find best instance
    instance = await find_best_vastai_instance(
        gpu_type="rtx4090",
        max_price=0.50
    )

    # 2. Create instance
    created = await vastai_api.create_instance(instance.id)

    # 3. Wait for SSH (with timeout)
    await wait_for_ssh(created.ip, timeout=300)

    # 4. Run bootstrap script via SSH
    await ssh_execute(created.ip, f"""
        # Download model from Hetzner S3
        aws s3 cp s3://models/{model_name} /models/{model_name} \
          --endpoint-url={S3_ENDPOINT} --recursive

        # Start vLLM
        docker run -d --gpus all -p 8000:8000 \
          vllm/vllm-openai:latest \
          --model /models/{model_name}

        # Start monitoring agents
        prometheus-node-exporter --web.listen-address=:9100 &
        nvidia-gpu-exporter --web.listen-address=:9101 &
    """)

    # 5. Health check vLLM endpoint
    await wait_for_healthy(f"https://{created.ip}:8000/health")

    # 6. Save to state store
    await state_store.save({
        "id": created.id,
        "endpoint": f"https://{created.ip}:8000",
        "cost_per_hour": instance.price,
        "created_at": datetime.utcnow()
    })

    return created

Why we changed our minds:

  1. Manual testing is slow - Each test requires 10+ manual steps × 10-20 tests = hours wasted
  2. Reproducibility - Automated bootstrapping ensures every instance is configured identically
  3. Metrics from day 1 - If we're automating anyway, we get monitoring for free
  4. POC ≠ Throwaway code - This orchestrator is the production foundation, just simplified

The key realization: Building the automation takes 1 day upfront but saves 30 minutes per test. After 3 tests, we've broken even.

What we're NOT automating (yet):

  • Auto-scaling based on metrics (requires queue)
  • Multi-region failover (single region for POC)
  • Cost optimization algorithms (use first available instance)
  • A/B testing across GPU types

Decision 5: Prometheus Remote Write (Not Just "Monitoring")

Original Plan: "Add Monitoring Later"

The first architecture doc had:

  • ✅ Deploy vLLM
  • ✅ Test inference
  • 🔜 Add monitoring

What We Designed: Monitoring as a First-Class Component

# Prometheus config on GPU instance
remote_write:
  - url: "http://prometheus.gpu-system.svc.cluster.local:9090/api/v1/write"
    basic_auth:
      username: "remote-write"
      password_file: /etc/prometheus/password

scrape_configs:
  - job_name: 'vllm'
    static_configs:
      - targets: ['localhost:8000']  # vLLM metrics endpoint
    scrape_interval: 15s

  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']  # node_exporter

  - job_name: 'gpu'
    static_configs:
      - targets: ['localhost:9101']  # nvidia_gpu_exporter

Grafana dashboard queries:

# GPU utilization by instance
avg by (instance_id) (nvidia_gpu_utilization_percent)

# Cost per hour (custom metric)
sum(vast_instance_cost_per_hour) by (instance_id)

# Inference latency p95
histogram_quantile(0.95,
  rate(vllm_request_duration_seconds_bucket[5m])
)

# Idle time (for auto-teardown logic)
time() - vast_instance_last_request_timestamp_seconds

Why we prioritized this:

  1. Cost tracking - Without metrics, we're flying blind on actual costs
  2. Performance validation - POC means measuring if vLLM is actually fast enough
  3. Idle detection - Auto-shutdown requires knowing when instance is idle
  4. Debugging - When things break (they will), we need visibility

The insight: Monitoring isn't a "later" feature, it's how we know if the POC succeeded.

Our POC success criteria are literally metric-based:

  • ✅ Provisioning time <3 minutes (measured by vast_instance_provisioning_duration_seconds)
  • ✅ Inference p95 latency <2s (measured by vllm_request_duration_seconds)
  • ✅ Cost <$0.50/hour (measured by vast_instance_cost_per_hour)

Without metrics, we can't validate the POC. So monitoring isn't optional.


What We Learned: The POC Philosophy Shift

Before Planning: "Let's Validate the Idea Quickly"

Thinking:

  • Manually provision vast.ai instance
  • SSH in, run vLLM manually
  • Test a few requests
  • Check if it works
  • Estimated time: 2-3 hours

Problem: This validates "can vLLM run on vast.ai?" but not "can we orchestrate vast.ai instances?"

After Planning: "POC = Production Foundation, Simplified"

Thinking:

  • Build the orchestrator (trigger API, instance manager)
  • Automate provisioning and bootstrapping
  • Integrate monitoring from day 1
  • Test the system, not just the GPU
  • Estimated time: 1-2 weeks

Value: This validates the architecture, not just the technology.

What we're building:

┌─────────────────────────────────────────────┐
│         Production Architecture             │
│  ┌─────────────────────────────────────┐   │
│  │  Queue-Based Auto-Scaler            │   │ ← Add later
│  │  Multi-Provider Failover            │   │ ← Add later
│  │  Advanced Cost Optimization         │   │ ← Add later
│  └─────────────────────────────────────┘   │
│                                              │
│  ┌─────────────────────────────────────┐   │
│  │  POC Foundation (Week 1-5)          │   │
│  │  ✓ Trigger API                      │   │ ← Building now
│  │  ✓ Instance Manager                 │   │ ← Building now
│  │  ✓ Automated Bootstrapping          │   │ ← Building now
│  │  ✓ Prometheus Integration           │   │ ← Building now
│  │  ✓ Hetzner S3 Model Storage         │   │ ← Building now
│  └─────────────────────────────────────┘   │
└─────────────────────────────────────────────┘

The shift: POC isn't a throwaway prototype. It's the foundation we'll iteratively add complexity to.


Hard Truths We Accepted

1. Direct Client Access is "Insecure"

Reality: Clients hit vast.ai IPs directly, no centralized gateway.

Why it's fine for POC:

  • We're the only users (no multi-tenancy)
  • vast.ai provides HTTPS + bearer token auth
  • We can add API gateway later when we have >1 user

When it becomes a problem: Production with external users needing rate limiting, audit logs, or IP whitelisting.

2. Vast.ai Instance Failures Will Happen

Reality: Marketplace provider, instance reliability varies.

Why it's fine for POC:

  • Manual retry is acceptable (we're testing, not serving production traffic)
  • Failures teach us what error handling to build
  • Instance selection algorithm will evolve based on real failure data

What we're tracking:

# Metrics to inform production failover logic
vast_instance_provision_failures_total
vast_instance_ssh_timeout_seconds
vast_instance_bootstrap_failure_reason

3. We're Ignoring Multi-Region/Multi-Provider

Reality: Single provider (vast.ai), single region (wherever we get an instance).

Why it's fine for POC:

  • Latency doesn't matter for testing
  • Multi-provider adds 3x complexity
  • We'll migrate to RunPod for production anyway

When we'll address it: RunPod migration (Phase 2), when we need:

  • Sub-3 second cold starts (RunPod serverless)
  • SLA guarantees (vast.ai has none)
  • SOC 2 compliance (production requirement)

4. No Request Batching/Optimization

Reality: One request = one API call to vLLM. No batching, no connection pooling, no smart queuing.

Why it's fine for POC:

  • vLLM handles batching internally (continuous batching)
  • Low request volume (manual testing)
  • Premature optimization is the root of all evil

When we'll optimize: When metrics show it's needed (e.g., request rate >10/sec, queue depth regularly >5).


The 5-Week Implementation Plan (Reality Check)

Week 1: Manual Validation

# Goal: Prove vast.ai + vLLM + S3 works at all

# 1. Create S3 bucket, upload test model
terraform apply -target=hetzner_s3_bucket.models

# 2. Manually provision vast.ai instance (web UI)
# 3. SSH in, download model from S3
# 4. Run vLLM manually
# 5. Test inference with curl

# Success metric: One successful inference request

Expected failures:

  • S3 authentication errors (wrong credentials format)
  • Model download timeout (large file, slow network)
  • vLLM OOM (picked wrong GPU size)

Week 2: Orchestrator Core

# Goal: API-triggered provisioning (no manual clicks)

# Build FastAPI trigger API
@app.post("/provision")
async def provision_gpu(request: ProvisionRequest):
    instance = await create_vastai_instance(request.gpu_type)
    return {"id": instance.id, "status": "provisioning"}

# Deploy to K8s
kubectl apply -f k8s/gpu-orchestrator.yaml

# Test
curl -X POST http://gpu-orchestrator/provision \
  -H "X-API-Key: ${API_KEY}" \
  -d '{"model": "llama-3.1-8b", "gpu_type": "rtx4090"}'

Expected failures:

  • vast.ai API rate limiting
  • Instance not found (all RTX 4090s taken)
  • Timeout waiting for SSH (instance provisioning slow)

Week 3: Bootstrap Automation

# Goal: Instance auto-configures itself (no manual SSH)

# Bootstrap script (runs on instance startup)
#!/bin/bash
set -euo pipefail

# Download model
aws s3 cp s3://models/llama-3.1-8b /models/llama-3.1-8b \
  --endpoint-url=$S3_ENDPOINT --recursive

# Start vLLM
docker run -d --gpus all -p 8000:8000 \
  vllm/vllm-openai:latest \
  --model /models/llama-3.1-8b

# Health check
until curl -f http://localhost:8000/health; do sleep 5; done

Expected failures:

  • S3 credentials not passed correctly to instance
  • Docker not installed on base image
  • vLLM fails to load model (wrong format, corrupted download)

Week 4: Monitoring Integration

# Goal: Metrics flow to Hetzner Prometheus

# Deploy Prometheus with remote write enabled
# Configure GPU instances to scrape and push
# Build Grafana dashboard

# Queries we'll actually run
- GPU utilization over time
- Cost per inference request
- Instance idle time (for auto-shutdown logic)

Expected failures:

  • Remote write authentication issues
  • Firewall blocking Prometheus traffic
  • Missing GPU metrics (nvidia_gpu_exporter not installed)

Week 5: Idle Timeout & Cost Tracking

# Goal: Instance auto-terminates when idle

async def monitor_idle_instances():
    while True:
        instances = await state_store.get_all()

        for instance in instances:
            idle_time = time.time() - instance.last_request

            if idle_time > IDLE_TIMEOUT_SECONDS:
                await vastai_api.destroy_instance(instance.id)
                await state_store.delete(instance.id)

        await asyncio.sleep(60)

Expected failures:

  • Race condition (terminate instance during active request)
  • Cost calculation errors (timezone issues)
  • Metrics show idle but instance still processing

Comparing Costs: Theory vs Reality

Original Cost Estimate (from first blog post)

Scenario Monthly Cost (RTX 4090 @ $0.34/hr)
24/7 $248.20
On-demand (20h/mo) $6.80

Assumptions:

  • Perfect utilization (no idle time)
  • Instant provisioning/teardown (no waste)
  • No failed instances (no retry costs)

POC Reality Budget

# What we'll actually measure

# 1. Development/testing costs
provision_testing = 20  # 20 test runs
avg_instance_lifetime = 0.5  # 30min per test
cost_per_test = 0.35 * 0.5  # $0.175
total_dev_cost = 20 * cost_per_test  # $3.50

# 2. Idle waste (instance ready but not processing)
model_load_time = 60  # seconds
avg_requests_per_session = 5
request_duration = 10  # seconds
session_duration = model_load_time + (5 * 10)  # 110 seconds
idle_percentage = model_load_time / session_duration  # 54% idle!

# 3. Failed provisioning attempts
failure_rate = 0.1  # 10% of provisions fail (SSH timeout, etc.)
retry_cost = failure_rate * cost_per_test  # $0.0175 per attempt

# Actual expected cost
total_poc_cost = (
    total_dev_cost +  # $3.50
    (idle_percentage * total_dev_cost) +  # $1.89 (idle waste)
    (20 * retry_cost)  # $0.35 (retries)
)  # ~$5.74

print(f"POC budget: ${total_poc_cost:.2f}")
# vs original estimate: $0.50 (manual POC)

The insight: Our automated POC will cost 10x more than the original "manual validation" plan ($5.74 vs $0.50), but it gives us 100x more value (reusable orchestration system vs one-off test).


What Didn't Change (The Core Principles)

Despite all the simplifications, these stayed constant:

1. vLLM for Inference ✅

Why we stuck with it:

  • Best throughput for LLM serving (continuous batching, PagedAttention)
  • OpenAI-compatible API (easy migration from OpenAI to self-hosted)
  • Active development, supports latest models

Alternatives we considered:

  • Ollama (easier setup, lower throughput)
  • TGI/Text Generation Inference (HuggingFace, good but less performant)
  • Custom FastAPI + transformers (too much work)

Decision: vLLM is the right choice for production, so POC should use it too.

2. Cost Optimization as Primary Goal ✅

Original thesis: On-demand GPU saves 90%+ vs always-on.

Still true: Our POC validates this by measuring:

  • Actual cost per inference request
  • Idle time percentage
  • Provisioning overhead cost

Even with POC inefficiencies (manual testing, frequent teardown/setup), we expect to validate the cost model.

3. Security Consciousness ✅

Original: Start with SOC 2/GDPR-compliant providers (RunPod, DataCrunch).

POC compromise: Use vast.ai (marketplace) but:

  • No sensitive data in POC (only test prompts)
  • Document security limitations
  • Plan migration to RunPod for production

We didn't compromise: Using unencrypted connections or storing credentials insecurely. API keys in K8s Secrets, S3 credentials never logged.

4. Observability from Day 1 ✅

Original: Prometheus + Grafana integration.

POC: Same, but with specific implementation (remote write, custom metrics).

Why this matters: You can't optimize what you don't measure. Cost tracking, performance metrics, idle detection—all require metrics.


The Meta-Lesson: POC Design is About Tradeoffs

What Makes a Good POC?

Bad POC:

  • "Quick and dirty" manual tests
  • No reusable artifacts
  • Validates technology, not architecture
  • Throwaway code

Good POC:

  • Automated foundation for production
  • Validates architectural decisions
  • Includes monitoring/observability
  • Iteratively adds complexity

Our POC:

  • ✅ Reusable orchestrator (FastAPI + instance manager)
  • ✅ Automated bootstrapping (SSH scripts, S3 integration)
  • ✅ Monitoring integration (Prometheus remote write)
  • ✅ Cost tracking (custom metrics)
  • ❌ No queue-based auto-scaling (add in Week 6+)
  • ❌ No multi-provider failover (add in production migration)
  • ❌ No advanced cost optimization (add after measuring real usage)

The Tradeoff Matrix

Feature POC Value Implementation Cost Decision
Trigger API High (core orchestration) Low (1 day) ✅ Include
Instance manager High (automation) Medium (2-3 days) ✅ Include
S3 model storage High (reproducibility) Low (1 day) ✅ Include
Prometheus metrics High (validation criteria) Medium (2 days) ✅ Include
Queue-based scaling Low (manual testing) High (5+ days) ❌ Defer
SSH tunnel proxy Low (direct access works) Medium (3 days) ❌ Defer
Multi-provider Low (vast.ai sufficient) Very high (10+ days) ❌ Defer

The principle: Include features that are either:

  1. Required for validation (metrics, automation)
  2. Cheap to build now, expensive later (S3 integration, bootstrapping)

Defer features that are:

  1. Not needed for POC (multi-tenancy, SLA guarantees)
  2. Can easily add later (queue, advanced scaling)

What We'll Write About Next

Upcoming Blog Posts

Week 2-3: "First GPU Provisioning"

  • What worked immediately
  • What failed spectacularly
  • Actual vast.ai API gotchas
  • Bootstrap script debugging war stories

Week 4-5: "POC Results: Metrics Don't Lie"

  • Real cost per inference request
  • Actual provisioning time distribution (p50, p95, p99)
  • GPU utilization patterns
  • Idle time waste analysis

Week 6+: "From POC to Production: RunPod Migration"

  • Why we're switching providers
  • Architecture changes for serverless endpoints
  • Cost comparison (vast.ai vs RunPod actual)
  • What we kept from the POC

The Code (What We're Actually Building)

All architecture docs and implementation code are open source:

Repository structure:

hetzner-cloud-minimal-kubernetes-cluster/
├── gpu-poc-draft.md              # Full architecture spec
├── terraform/
│   └── kubernetes/
│       └── gpu-orchestrator/
│           ├── main.tf           # K8s deployment
│           ├── orchestrator/     # FastAPI app
│           │   ├── api.py        # Trigger endpoints
│           │   ├── manager.py    # Instance lifecycle
│           │   └── vastai.py     # API client wrapper
│           └── monitoring/
│               └── dashboards/   # Grafana JSON
└── newsletter-blog/
    └── content/blog/
        ├── ai-orchestrator-architecture-gpu-poc/  # Original vision
        └── gpu-poc-reality-check/                 # This post

Follow along:


Conclusion: Architecture is a Journey

The difference between the original architecture diagram and our actual POC isn't a failure—it's learning.

What we learned:

  1. Start simple, add complexity later - Queue-based scaling can wait
  2. Automate what matters - Manual tests waste time, automation is reusable
  3. Measure everything - Metrics tell us what to optimize
  4. POC ≠ Throwaway - Build the foundation, not a prototype
  5. Tradeoffs are explicit - Every deferred feature is documented with "when we'll add it"

The original vision is still valid - we just found a better path to get there.

Instead of:

Plan everything → Build everything → Test everything

We're doing:

Plan foundation → Build foundation → Test →
Measure → Learn → Add complexity → Repeat

Week 1 starts Monday. Time to turn architecture diagrams into running code.

Wish us luck. We'll document every success and failure along the way.


Resources:

Next post: "Week 1 Results: First GPU Provisioning" - What happens when theory meets vast.ai's reality.

The build begins.


Fractiunate AI

Building a cutting-edge AI framework on Hetzner Cloud Kubernetes infrastructure, with AI.

Follow the journey: Twitter · GitHub · LinkedIn · Website