The Cluster Autoscaler is a tool designed to automatically adjust the size of a cluster, scaling nodes in or out depending on the demand. When pending pods without an available node are present, the autoscaler adds new nodes. Conversely, when nodes are underutilized and workloads can be efficiently run on fewer nodes, it will remove nodes.
To maximize the resource optimization and cost efficiency in an automated way, you can create Horizontal Pod Autoscalers (allowing you to set resource requests), which results in pending pods when there are no nodes with available resources.
Syself Autopilot provides seamless integration with Cluster Autoscaler on Cluster API.
Syself Autopilot supports the Cluster Autoscaler deployment within the workload cluster using service account credentials. This is achieved with an independent management cluster configuration, as illustrated below:
This mode of operation is referred to as "incluster-kubeconfig". For additional details, refer to the Cluster Autoscaler Helm Chart documentation.
For the autoscaler in the workload-cluster to function correctly, it requires authentication with the management cluster. This can be achieved by retrieving the secret from the autopilot/management cluster and then templating a new secret for the workload cluster.
Command:
One-Step Command with Namespace and Workload Cluster Context:
We recommend utilizing the helm chart of the cluster-autoscaler for deployment to the workload cluster.
Please update the following configurations:
autoDiscovery.labels[0].namespace
to match the namespace of your cluster-object. Note that it should start with the prefix org-
followed by your organization's name.autoDiscovery.clusterName
to reflect the name of your cluster. This name can be sourced from the cluster object.To optimize autoscaling behaviors, specific flags can be adjusted. When deploying the Cluster Autoscaler with Helm, these flags can be passed using the extraArgs parameter.
enforce-node-group-min-size | Should CA scale up the node group to the configured min size if needed | false |
---|---|---|
scale-down-delay-after-add | How long after scale up that scale down evaluation resumes | 10 minutes |
scale-down-delay-after-delete | How long after node deletion that scale down evaluation resumes, defaults to scan-interval | scan-interval |
scale-down-delay-after-failure | How long after scale down failure that scale down evaluation resumes | 3 minutes |
scale-down-unneeded-time | How long a node should be unneeded before it is eligible for scale down | 10 minutes |
scale-down-unready-time | How long an unready node should be unneeded before it is eligible for scale down | 20 minutes |
scale-down-utilization-threshold | Node utilization level, defined as sum of requested resources divided by capacity, below which a node can be considered for scale down | 0.5 |
| Maximum number of non empty nodes considered in one iteration as candidates for scale down with drain Lower value means better CA responsiveness but possible slower scale down latency Higher value can affect CA performance with big clusters (hundreds of nodes) Set to non positive value to turn this heuristic off - CA will not limit the number of nodes it considers." | 30 |
| A ratio of nodes that are considered as additional non empty candidates for scale down when some candidates from previous iteration are no longer valid Lower value means better CA responsiveness but possible slower scale down latency Higher value can affect CA performance with big clusters (hundreds of nodes) Set to 1.0 to turn this heuristics off - CA will take all nodes as additional candidates. | 0.1 |
| Minimum number of nodes that are considered as additional non empty candidates for scale down when some candidates from previous iteration are no longer valid. When calculating the pool size for additional candidates we take
| 50 |
scan-interval | How often cluster is reevaluated for scale up or down | 10 seconds |
max-nodes-total | Maximum number of nodes in all node groups. Cluster autoscaler will not grow the cluster beyond this number. | 0 |
cores-total | Minimum and maximum number of cores in cluster, in the format <min>:<max> . Cluster autoscaler will not scale the cluster beyond these numbers. | 320000 |
memory-total | Minimum and maximum number of gigabytes of memory in cluster, in the format <min>:<max> . Cluster autoscaler will not scale the cluster beyond these numbers. | 6400000 |
max-node-provision-time | Maximum time CA waits for node to be provisioned | 15 minutes |
emit-per-nodegroup-metrics | If true, emit per node group metrics. | false |
estimator | Type of resource estimator to be used in scale up | binpacking |
expander | Type of node group expander to be used in scale up. | random |
ignore-daemonsets-utilization | Whether DaemonSet pods will be ignored when calculating resource utilization for scaling down | false |
ignore-mirror-pods-utilization | Whether Mirror pods will be ignored when calculating resource utilization for scaling down | false |
write-status-configmap | Should CA write status information to a configmap | true |
status-config-map-name | The name of the status ConfigMap that CA writes | cluster-autoscaler-status |
max-inactivity | Maximum time from last recorded autoscaler activity before automatic restart | 10 minutes |
max-failing-time | Maximum time from last recorded successful autoscaler run before automatic restart | 15 minutes |
balance-similar-node-groups | Detect similar node groups and balance the number of nodes between them | false |
skip-nodes-with-system-pods | If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods) | true |
skip-nodes-with-local-storage | If true cluster autoscaler will never delete nodes with pods with local storage, e.g. EmptyDir or HostPath | true |
skip-nodes-with-custom-controller-pods | If true cluster autoscaler will never delete nodes with pods owned by custom controllers | true |
daemonset-eviction-for-empty-nodes | Whether DaemonSet pods will be gracefully terminated from empty nodes | false |
daemonset-eviction-for-occupied-nodes | Whether DaemonSet pods will be gracefully terminated from non-empty nodes | true |
record-duplicated-events | Enable the autoscaler to print duplicated events within a 5 minute window. | false |
The above flags focus on scaling behaviors. Adjust these based on the specific demands and characteristics of your workloads to achieve optimal scaling performance.
To ensure that the deployment was successful, check if the pod is running using the following command:
Syself Autopilot, in conjunction with Cluster Autoscaler, offers advanced autoscaling configurations. By adding specific annotations to your machine deployments, you can tailor autoscaling behavior to fit your needs.
For those using hcloud machines, a unique feature available is the ability to scale node groups down to zero. This feature is invaluable when particular node groups aren't actively running any pods, allowing them to scale down fully and save on infrastructure costs.
To activate this zero-scale feature on hcloud machines, annotate your machineDeployments with cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "0"
.