Architecture

Last updated: 15. März 2026

Architecture & Tech Stack

A complete overview of the LVG Platform architecture, technology choices, and how the system operates for each customer.

Platform Overview

Hub-and-spoke architecture with a central management cluster orchestrating isolated customer environments

MANAGEMENT CLUSTER Rancher Cluster Management ArgoCD GitOps Deployment Monitoring Prometheus + Grafana Keycloak Identity & SSO Velero Backup & DR cert-manager TLS Certificates Terraform Infrastructure as Code Hetzner Cloud GDPR Infrastructure GitHub Source of Truth PagerDuty Incident Alerts MANAGED CONNECTION Customer A Production Cluster RKE2 Cilium Kyverno Falco Workloads Customer B Dev + Staging Cluster RKE2 Cilium Kyverno Falco Workloads Customer N Enterprise HA Cluster RKE2 Cilium Kyverno Falco Workloads
Cluster Management
GitOps
Monitoring
Identity
Infrastructure

Technology Stack

Every component is open-source. No vendor lock-in. Full control.

Container Orchestration

K8

RKE2

FIPS-compliant Kubernetes distribution by Rancher. Powers every cluster with hardened defaults and CIS benchmarks.

Kubernetes
R

Rancher

Multi-cluster management UI. Provides a single pane of glass for provisioning, monitoring, and managing all customer clusters.

Management
A

ArgoCD

Declarative GitOps continuous delivery. All deployments are driven by Git commits, ensuring auditability and rollback capability.

GitOps

Networking & Security

C

Cilium + Hubble

eBPF-powered CNI for high-performance networking with deep traffic visibility. Enforces network policies for tenant isolation.

Networking
K

Kyverno

Kubernetes-native policy engine. Validates and enforces security policies, pod standards, and organizational rules at admission time.

Policy
F

Falco

Runtime security monitoring. Detects anomalous activity in containers and Kubernetes clusters using system call analysis.

Security
ID

Keycloak

Enterprise identity and access management. Provides SSO, OIDC/SAML integration with customer identity providers.

Identity

Observability

P

Prometheus

Industry-standard metrics collection and alerting. Scrapes node, pod, and application metrics across all clusters.

Metrics
G

Grafana

Visualization platform with pre-built dashboards for cluster health, resource utilization, and application performance.

Dashboards
L

Loki

Lightweight log aggregation system. Efficiently indexes and queries logs from all pods and system components.

Logging
A

Alertmanager

Alert routing, deduplication, and notification. Routes critical alerts to PagerDuty, Slack, and on-call engineers.

Alerting

Infrastructure & Operations

T

Terraform

Infrastructure as Code for reproducible, version-controlled provisioning of servers, networks, and storage on Hetzner Cloud.

IaC
V

Velero

Kubernetes-native backup and disaster recovery. Daily cluster snapshots with 30-day retention to Hetzner StorageBox.

Backup
CM

cert-manager

Automated TLS certificate management. Issues and renews certificates via Let's Encrypt for all customer domains.

Certificates
H

Hetzner Cloud

German cloud provider with excellent price-performance. GDPR-compliant data centers ensure data sovereignty for EU customers.

Infrastructure

Single Customer Architecture

Each customer receives a fully isolated Kubernetes environment with dedicated infrastructure (hard multi-tenancy)

Customer Environment — Fully Isolated Dedicated compute, dedicated networking, dedicated storage CONTROL PLANE (3x HA) etcd Distributed state store API Server Kubernetes API Scheduler Pod placement Controller Mgr Reconciliation RKE2 Runtime PLATFORM SERVICES Cilium CNI + Policies Kyverno Admission Falco Runtime Sec Prometheus Metrics cert-mgr TLS Velero Backup WORKER NODES (auto-scalable) Worker Node 1 Pod: App A Pod: App B Pod: DB Pod: Cache Worker Node 2 Pod: API Pod: Worker Pod: Queue Processor Worker Node 3 Pod: Frontend Pod: Auth Monitoring Agent PERSISTENT STORAGE Block Storage NVMe SSD PV Claims NETWORKING Ingress Load Bal. BACKUP StorageBox (30d)

Isolation Guarantees

1

Compute Isolation

Dedicated servers per customer. No shared worker nodes, no noisy-neighbor effects. Full CPU and memory allocation.

2

Network Isolation

Cilium enforces strict network policies. Each cluster has its own network space. No cross-customer traffic is possible.

3

Data Isolation

Separate persistent volumes and backup storage. Customer data never co-mingles. Encrypted at rest and in transit.

4

Access Isolation

Per-customer RBAC and SSO. Customers authenticate through their own identity provider via Keycloak OIDC/SAML.

Customer Lifecycle

From contract signing to production workloads in 3-5 business days

1

Infrastructure Provisioning

Terraform provisions servers, networking, load balancers, and storage on Hetzner Cloud. Fully automated, 30-60 minutes.

2

Kubernetes Bootstrap

RKE2 initializes the cluster with an HA control plane (3 etcd members). Rancher registers the cluster for centralized management. 15-30 min.

3

Platform Stack Deployment

ArgoCD deploys the full platform stack: Cilium, Kyverno, Falco, Prometheus, Grafana, Loki, cert-manager, and Velero. 15-30 min.

4

Security & Identity Configuration

RBAC roles, SSO integration with customer IdP, network policies, and pod security standards are configured and validated.

5

Monitoring & Backup Setup

Grafana dashboards, alert rules, and Velero backup schedules (daily, 30-day retention) are configured. Customer gets read-only dashboard access.

6

Validation & Handover

Full connectivity check, backup restore test, monitoring verification. Customer receives access credentials and training session.

Ongoing Operations

Continuous monitoring, automated updates, and proactive incident management

24

24/7 Monitoring

Prometheus, Loki, and Falco continuously monitor infrastructure, applications, and security. Alerts route to PagerDuty for immediate response.

0

Zero-Downtime Updates

Rolling Kubernetes upgrades, automated OS patches, and GitOps-driven component updates. No maintenance windows required.

DR

Disaster Recovery

Daily Velero backups, 6-hourly etcd snapshots. Full cluster restore in under 4 hours. Quarterly DR testing for validation.

SLA

SLA Guarantees

Up to 99.95% uptime for Enterprise tier. 15-minute P1 response time. Financial credits for SLA breaches.