Friday, 2 November 2018

Solution - Rancher 2 (k8s), private registry, self-signed certificates

Since Rancher switched to Kubernetes in version 2.x, I'm exposed to a lot of stupidity and limitations k8s introduced, but I can live with that, at least for a moment... What I couldn't accept was that I could no longer use my private registry (with self-signed certificate) that works perfectly fine with older Rancher (1.6 - before move to k8s).

That is now resolved!

My cluster setup

  • Rancher 2 cluster (based on Kubernetes), all running on latest RancherOS
  • Private registry available only within the LAB network - hence self-signed certificate
  • Registry has an internal host name, resolvable via internal DNS server
  • Registry does not require user accounts, so no need for credentials, but self-signed certificate prevents it from working, resulting with following error when image is pulled
x509: certificate signed by unknown authority

Dead ends

First of all, please ignore RancherOS documentation - last one I found was for version 1.2, current RancherOS is 1.4.2... anyway, it no longer works (it did for older RancherOS and Rancher 1.6 though, but new Rancher is more Kubernetes than anything else). In my research I also read a bunch of bug reports, feature requests, stack exchange articles, etc... mostly waste of time, but they gave me a good idea on rabbit holes to avoid. Some of the more useful reads are here and here, I also have a feeling this will be useful for me quite soon.
Another trick I noticed was that if I followed RancherOS docs above, the registry CA key was overwritten with something else on node reboot.

Solution (a.k.a "works for me")

Go old school Linux admin style:

  1. SSH to the RancherOS node (user is rancher@<node>), having your private CA certificate at hand
  2. As user rancheros try docker pull <registry:port>/<my image> - you should get a CA error
  3. Check your /etc/resolv.conf - mine was regularly overwritten by dhcp but it was not writing name servers correctly - this should be easily fixed by writing what you want to /etc/resolv.conf.tail (in hopes dhcp will append it when it regenerates resolv.conf).
  4. Now the key element - edit the OS wide trusted CA list (hint hint - may disappear after sudo ros os upgrade, but this can be fixed with sudo chattr +i /etc/resolv.conf)  and add your CA certificate there. Running vi /etc/ssl/certs/ca-certificates.crt and copy'n'paste does the trick!
  5. Try docker pull again, now it worked for me.