Skip to main content
Newsroom

KCD Rio 2025, How Syself Manages Thousands of Kubernetes Clusters — Without a Traditional Monitoring Stack

March 28, 2025

KCD Rio 2025, How Syself Manages Thousands of Kubernetes Clusters — Without a Traditional Monitoring Stack

Managing Clusters Without the Stack

Rio de Janeiro, Brazil – March 2025 — At Kubernetes Community Days (KCD) Rio 2025, Lucas Rattz, DevOps Engineer at Syself, revealed how Syself operates thousands of Kubernetes clusters — without relying on Prometheus, Grafana, or Alertmanager.

In his talk “Managing Thousands of Kubernetes Clusters with Cluster API and No Monitoring Stack,” Lucas explained Syself’s novel approach: combining Kubernetes-native primitives, node-level diagnostics, and external verifications to enable automated recovery and eliminate manual alerting.

While most teams still depend on alerts and dashboards to monitor issues, Syself built a system that enables Kubernetes to detect and resolve failures — autonomously.

Rethinking “Monitoring” for Kubernetes

Kubernetes doesn’t inherently monitor infrastructure — it responds to signals like node heartbeats. But at scale, that model is too slow.

We realized that if we want clusters to heal themselves, Kubernetes needs better signals. Not more dashboards. Just better inputs.

DevOps Engineer at Syself

Lucas Rattz

DevOps Engineer at Syself

Syself's solution delivers two layers of insight:

  • On-node daemons run health checks and update the Kubernetes API directly — even if the container runtime is unresponsive.
  • External controllers verify the responsiveness of nodes and key control plane components like etcd and the API server — even when internal processes can’t.

Together, these mechanisms ensure rapid detection and response — without a traditional monitoring stack.

No Alerts. No Dashboards. Just Resilience.

When a failure occurs, Syself’s system acts — not alerts:

  • Traffic is rerouted from failed nodes at the network layer.
  • Recoverable nodes reboot automatically.
  • Unrecoverable nodes are reprovisioned, optionally with hardware diagnostics for bare metal environments.
  • Pods in failure states are deleted and restarted — not just individual containers.

All of this happens without human intervention.

A Platform Built to Operate — Not Just Orchestrate

Kubernetes doesn’t manage infrastructure — it responds to signals. So we focused on giving it the right signals, and clear logic on how to act.

CEO at Syself

Sven Batista Steinbach

CEO at Syself

Instead of adding complexity, Syself removes it. No pagers. No Slack alerts. Just self-healing clusters that solve problems before operators are even aware.

Built From Real-World Experience

For Syself, KCD Rio wasn’t just a stage — it was a chance to reflect on years of engineering challenges at scale.

We built this system to solve very real problems our team encountered over the years. Sharing it at KCD Rio wasn’t just about presenting a solution — it was about showing there’s another way to run Kubernetes: simpler, autonomous, and alert-free.

DevOps Engineer at Syself

Lucas Rattz

DevOps Engineer at Syself

About Syself

Syself is a Kubernetes Management Platform powered by Cluster API. Designed to simplify infrastructure lifecycle operations across cloud and bare metal, Syself helps teams build clusters that are resilient, secure, and self-operating.

Rather than relying on external tools, Syself extends Kubernetes with practical, built-in automation — so your workloads keep running, even when the infrastructure underneath doesn’t.

Learn more at syself.com.

About Syself

Syself automates of the entire lifecycle of clusters, freeing up your teams to work on what really matters.