installing talos on hetzner dedicated server

When you get a bare-metal instance, like EX44, from Hetzner (not Hetzner Cloud service), it gives you access to a real machine with all the quirks.

As I was trying to install Talos OS v1.7.0 to get an immutable OS for my small bare-metal cluster, it just didn’t work. I even had a technician write ISO to a USB and plug it in and enable a KVM switch for me so that I can see what’s happening because it simply wouldn’t boot. The only message on the screen was the following:

EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path

The setup worked in the vKVM rescue system (it loads your whole machine as a VM and gives you screen access) so there wasn’t anything wrong with the image.

After a few days of digging possible causes of Linux boot problems, I’ve found that in some cases kernel needs more information on where to print its logs. Hence, some setups required addition of kernel cmdline parameters to configure kernel console, like console=ttyS0,9600 as netboot.xyz instructs here for Oracle Cloud.

Talos has an Image Factory where you can generate Talos images with extra options, like additional kernel parameters. Using console=ttyS0,9600 didn’t work.

What I ended up doing was as following:

  1. Log into rescue system (Linux option so that you get SSH access).

  2. Query ttys:

    ls -al /dev | grep tty
    

    For me, this didn’t return any ttyS*. There were a lot of just ttys, including tty0. It’s likely because the machine doesn’t have hardware serial port; all tty’s are virtual. See more details here about the difference.

    If you do see ttyS0, you likely want to use that one.

  3. Go to Image Factory and add the following as additional kernel parameter:

    console=tty0
    
  4. Download the raw disk image (metal-amd64.raw.xz) in rescue system.

    cd /tmp
    wget <disk url>
    
  5. List all the disks and choose the appropriate one.

    lsblk -f
    
  6. You will write to the whole disk as the disk image contains all partition information as well. In my case, I had two disks /dev/nvme0n1 and /dev/nvme1n1. I chose /dev/nvme0n1.

    xz -d -c /tmp/metal-amd64.raw.xz | dd of=/dev/<your disk> status=progress && sync
    
  7. Now go back to UI and initiate a hardware reset.

  8. After a while, confirm it booted up with the following command from your local machine:

    talosctl -n <Machine IP Address> disks --insecure
    

    You should see output like the following:

    DEV            MODEL                        SERIAL           TYPE   UUID   WWID                   MODALIAS   NAME   SIZE     BUS_PATH                                                   SUBSYSTEM          READ_ONLY   SYSTEM_DISK
    /dev/nvme0n1   SAMSUNG MZVL2512HCJQ-00B00   S675NL0W675607   NVME   -      eui.002538b631a62b48   -          -      512 GB   /pci0000:00/0000:00:01.0/0000:01:00.0/nvme/nvme0/nvme0n1   /sys/class/block               *
    /dev/nvme1n1   SAMSUNG MZVL2512HCJQ-00B00   S675NL0W675614   NVME   -      eui.002538b631a62b4f   -          -      512 GB   /pci0000:00/0000:00:06.0/0000:02:00.0/nvme/nvme1/nvme1n1   /sys/class/block
    

    Note that if you see QEMU HARDDISK, then Talos OS is booted in the vKVM system. You need to restart the machine so that it goes out of the vKVM rescue system.

  9. Make sure to have the same additional kernel cmdline parameter in your Talos machine config:

    machine:
      install:
        extraKernelArgs:
          - console=tty0
    

It turns out, some single-board computers need this setting as well, like Raspberry Pi, and Talos handles them by having an overlay with this argument passed in.

Follow @muvaffakonus on Twitter.