Rebooting bare metal nodes

Introduction

If you want to reboot a bare metal server that is registered as a node in your cluster, the easiest way is to reprovision it.

This can be done just by deleting the Machine object associated with the bare metal node.

However, in some cases it might be useful to reboot the machine in-place, such as when the local disk contains data which would require too much time to re-sync.

This guide will explain how you can do that.

Step 1: Set the paused annotation

In the Autopilot cluster, set the paused annotation for the machine:

shell

kubectl annotate machine your-cluster-md-abc-xyz-ijk cluster.x-k8s.io/paused=true

Double-check that you don't have a typo in the annotation. Otherwise, the node will get flagged as not functional and the machine will get reprovisioned.

Step 2: Drain the node

In your workload cluster, drain the node:

shell

kubectl drain bm-your-cluster-12345678 --ignore-daemonsets

And wait until all the pods are terminated. You can check with this command:

shell

kubectl get pods --all-namespaces --field-selector spec.nodeName=bm-your-cluster-12345678

Step 3: Reboot the server

Now, you can SSH into the server, and perform any needed maintenance tasks.

If you are unsure how to do that, refer to the How to SSH into nodes guide.

For rebooting it, type this in the server shell:

shell

reboot

Step 4: Add the node back

First, uncordon the node in your workload cluster:

shell

kubectl uncordon bm-your-cluster-12345678

And, in the Autopilot cluster, remove the pause annotation:

shell

kubectl annotate machine your-cluster-md-abc-xyz-ijk cluster.x-k8s.io/paused-

Now the node is a functional member of the cluster again.