Project Status Update - From Infrastructure to Microservices

December 20, 2025

Where We Are Today

What started as a minimal single-node Kubernetes cluster has evolved into a production-ready platform with microservices, comprehensive observability, and automated workflows. This update captures the current state of the project and the journey from bare infrastructure to a working application platform.

Infrastructure Foundation

The core infrastructure remains focused on simplicity and cost-effectiveness:

Terraform-Based Infrastructure as Code

Hetzner Cloud Server: CPX21 (2 vCPU, 4GB RAM) running Ubuntu 22.04
Kubernetes 1.30: Single-node cluster with kubeadm + containerd
Networking: Flannel CNI for pod networking
Automation: Cloud-init templates for reproducible deployments

The infrastructure layer is fully defined in Terraform modules:

terraform/
├── hetzner/minimal/           # Server provisioning
│   ├── main.tf               # Compute, network, firewall
│   └── templates/
│       └── kube-init.yaml    # Kubernetes bootstrap
└── kubernetes/
    ├── monitoring/           # Observability stack
    └── todo-backend/         # Microservice deployment

Recent Achievement: Memory Optimization

A critical milestone was reached in optimizing the monitoring stack's memory consumption - essential for running everything on a 4GB single-node cluster.

31% Memory Reduction

Through careful tuning, we reduced the monitoring stack's memory footprint from ~1.3GB to ~0.9GB:

Component Optimizations:

Prometheus: 256Mi/512Mi limits, 7-day retention, 20Gi storage
Grafana: 128Mi/256Mi limits, 5Gi storage
Loki: 128Mi/256Mi limits, 10Gi storage, 7-day retention
AlertManager: 64Mi/128Mi limits, 2Gi storage
Node Exporter: 32Mi/64Mi limits
Kube State Metrics: 64Mi/128Mi limits

Key Changes:

Reduced data retention (15d → 7d for Prometheus)
Increased scrape intervals to 60s
Added rate limits for Loki log ingestion
Set explicit resource limits for all components

This optimization leaves sufficient headroom for application workloads while maintaining comprehensive observability.

First Microservice: Todo Backend

The platform now hosts its first production-ready microservice - a RESTful API built in Rust.

Architecture Highlights

Technology Stack:

Language: Rust 1.75+ for performance and safety
Framework: Axum for async HTTP handling
Storage: In-memory with PostgreSQL-ready repository pattern
Container: Alpine-based image (~15-20MB)

Clean Architecture Pattern:

HTTP Request
    ↓
Handlers (API layer)
    ↓
Services (Business logic)
    ↓
Repository Trait (Abstraction)
    ↓
In-Memory Implementation (Arc<RwLock<HashMap>>)

API Capabilities

The service exposes a RESTful API with full CRUD operations:

GET /health - Kubernetes readiness/liveness probes
GET /metrics - Prometheus metrics endpoint
POST /api/v1/todos - Create todo items
GET /api/v1/todos - List todos with filtering
GET /api/v1/todos/{id} - Retrieve specific todo
PUT /api/v1/todos/{id} - Update todo
DELETE /api/v1/todos/{id} - Delete todo

Data Model:

{
  "id": "uuid",
  "user_id": "string",
  "title": "string",
  "description": "string | null",
  "due_date": "ISO 8601 datetime | null",
  "severity": "low | medium | high | critical",
  "created_at": "ISO 8601 datetime",
  "updated_at": "ISO 8601 datetime"
}

Production-Ready Features

Observability:

Prometheus metrics (todo_operations_total, active_todos_total)
Structured JSON logging with correlation IDs
Health check endpoints for Kubernetes probes

Security Hardening:

Non-root container (UID 1000)
Read-only root filesystem
All Linux capabilities dropped
No privilege escalation allowed
ClusterIP service (internal only)

Performance:

Thread-safe in-memory storage with RwLock
Optimized release build with LTO
Minimal image size (15-20MB)
Memory limits: 64Mi request, 128Mi max

Deployment Automation

Helm Chart

A production-grade Helm chart packages the todo backend with:

Configurable resource limits
Horizontal Pod Autoscaler (HPA) support
Ingress controller integration
ServiceMonitor for Prometheus
ConfigMap-based configuration

Located at helm/charts/todo-backend/ with comprehensive documentation.

Terraform Module

Infrastructure-as-code deployment via terraform/kubernetes/todo-backend/:

Automated namespace creation
ConfigMap management
Helm release orchestration
Output values for service access

CI/CD Pipeline

GitHub Actions workflow (.github/workflows/todo-backend-ci.yml) automates:

Testing: Cargo fmt, clippy, unit tests
Build: Multi-stage Docker build with layer caching
Publish: Push to GitHub Container Registry
Validation: Semantic versioning checks

Monitoring Stack

The platform includes a comprehensive observability layer:

Components

Prometheus: Time-series metrics database with 7-day retention
Grafana: Visualization dashboards with pre-configured views
Loki: Log aggregation and querying
Promtail: Log shipping from Kubernetes pods
AlertManager: Alert routing and notification
Node Exporter: Host metrics collection
Kube State Metrics: Kubernetes object state metrics

Custom Dashboards

Pre-configured Grafana dashboards for:

Kubernetes cluster overview
Node resource monitoring
Pod metrics and logs
Custom application metrics

All deployed via Terraform with optimized resource allocations.

Development Workflow

The project implements best practices for collaborative development:

Code Quality

Pre-commit hooks: Automated validation before commits
Conventional Commits: Semantic versioning via commit messages
Claude Code Integration: AI-assisted code review via GitHub Actions

Testing Strategy

Unit tests: Comprehensive test coverage for services
Integration tests: API endpoint validation
Format checks: Automated code formatting (cargo fmt)
Linting: Clippy with warnings-as-errors

Git Workflow

feature/branch → PR → Claude review → Tests → Merge → Deploy

Current Project Structure

hetzner-cloud-minimal-kubernetes-cluster/
├── .github/workflows/        # CI/CD automation
│   ├── claude-code-review.yml
│   ├── claude.yml
│   └── todo-backend-ci.yml
├── .scripts/                 # Utility scripts
├── helm/charts/
│   └── todo-backend/         # Helm chart for microservice
├── newsletter-blog/          # This blog (Gatsby)
│   └── content/blog/         # Article content
├── services/backend/
│   └── todo/                 # Rust todo service
│       ├── src/              # Application code
│       ├── tests/            # Integration tests
│       └── Dockerfile        # Container definition
└── terraform/
    ├── hetzner/minimal/      # Infrastructure provisioning
    └── kubernetes/
        ├── monitoring/       # Observability stack
        └── todo-backend/     # Service deployment

Lessons Learned

1. Memory Constraints Drive Design

Running a full stack on 4GB forced aggressive optimization:

Reduced retention periods
Tuned scrape intervals
Set strict resource limits
Monitored actual usage vs. limits

2. Repository Pattern Enables Flexibility

Starting with in-memory storage allowed rapid iteration while maintaining a clean migration path to PostgreSQL later.

3. Infrastructure as Code is Non-Negotiable

Every component - from servers to Helm releases - is version-controlled and reproducible.

4. Observability from Day One

Building metrics and logging into the first service prevented technical debt.

5. Security is Incremental

Each layer adds security (non-root containers, dropped capabilities, read-only filesystems, network policies).

What's Next

Short Term

Frontend Development: React/Next.js UI for the todo application
API Gateway: Kong or Traefik for routing and authentication
PostgreSQL Integration: Migrate from in-memory to persistent storage
Ingress Setup: Expose services with TLS termination

Medium Term

Authentication: JWT-based auth with refresh tokens
Service Mesh: Istio or Linkerd for advanced traffic management
Backup Strategy: Automated PostgreSQL backups to Hetzner Object Storage
Monitoring Alerts: PagerDuty integration for critical alerts

Long Term

Multi-Service Architecture: Add more microservices
AI/ML Integration: Model serving infrastructure
GitOps: ArgoCD for continuous deployment
Multi-Region: Expand beyond single-node cluster

Cost Analysis

Current monthly operating costs (EUR):

Hetzner CPX21 Server: ~€5.83
Traffic: Included (1TB free tier)
Container Registry: Free (GitHub Container Registry)
CI/CD: Free (GitHub Actions free tier)

Total: ~€6/month for a complete production-ready platform.

Key Metrics

As of December 20, 2025:

Lines of Code: ~4,900 lines added in last 3 commits
Terraform Modules: 3 (infrastructure, monitoring, todo-backend)
Docker Images: 1 microservice (~15-20MB)
API Endpoints: 6 (health, metrics, CRUD operations)
Memory Footprint: ~900MB for monitoring stack
Uptime: Configurable with HPA and health checks

Conclusion

What began as a simple Kubernetes learning project has matured into a production-capable platform. The combination of cost-effective infrastructure, modern development practices, and comprehensive observability creates a solid foundation for building distributed applications.

The project demonstrates that you don't need expensive cloud providers or complex setups to run production workloads - a €6/month Hetzner server with thoughtful architecture delivers a capable platform.

Every line of code is open source and available on GitHub.

Get Involved

Interested in following along or contributing?

Explore the code: Check out the repository structure
Try it yourself: Deploy your own cluster using the Terraform modules
Share feedback: Open issues or PRs with improvements
Follow updates: Watch the repo for new features

Next article: Deep dive into the Rust microservice architecture - clean code patterns, testing strategies, and performance optimization techniques.