Upgrade Cert-Manager for Your Production Deployment Without Downtime

Jacky Jiang
ITNEXT
Published in
7 min readJun 26, 2022

--

Photo by Brian Scott on Unsplash

What’s Cert-Manager?

Cert-Manager is one of the most widely adopted cloud native certificate management solutions for Kubernetes and OpenShift. It simplifies the process of obtaining, renewing and using certificates from various sources, including the famous nonprofit certificate authority “Let’s Encrypt”. To facilitate the certificate management process, Cert-Manager adds many related objects, such as certificates, certificate requests and issuers etc., as custom resource types and defined by CRDs (Custom Resource Definitions) in Kubernetes clusters.

When upgrading Cert-Manager to a newer version, it’s often required to update the definition of those CRDs. Unfortunately, it’s not a straightforward process. When problems arise, you usually have to uninstall the Cert-Manager completely to fix the issue, which leads to downtime incidents in your production deployment.

In this article, we will talk about a few options that ensure a smooth Cert-Manager upgrade to avoid downtime of production deployment.

Recommended Installation / Setup Option

Many options are available to deploy a fresh Cert-Manager installation on your cluster. However, not every option would help much when it comes to upgrading or when things go wrong. If you are about to deploy a new installation, you might want to consider my recommended installation/setup option for ease of future upgrading & maintenance. You will also learn more about the reason behind it from the following sections.

  • Install required CRDs Manually using kubectl and installing Cert-Manager with Helm
    - Make sure the --enable-certificate-owner-ref flag is NOT set to true. The default value of this flag is false. However, if you set it to true in your helm config, the Secret object containing the X.509 certificates will be auto-removed when the associated Cert-Manager custom Certificate resource is removed. This might cause unnecessary downtime during reinstallation.
  • Manually create Cert-Manager custom Certificate resource rather than secure your Ingress resources by adding annotations that ingress-shim watches.
    - Certificate resources created for an Ingress via ingress-shim will have an owner reference pointing to the Ingress resource. This owner reference will be incorrect when you have to recreate your Ingress resource, e.g. for migrating to a new cluster. Creating/managing the Certificate resources without ingress-shim will save the trouble of fixing this potential issue.

Backup Cert-Manager Resources

To avoid downtime, ideally, we should try to upgrade the existing Cert-Manager installation. However, there is always a chance when you have to remove the current installation before you can get a newer version installation to work correctly. When it happens, a backup of existing Cert-Manager resources will be very helpful to restore the existing setup and avoid downtime.

To back up all necessary Cert-Manager configuration resources, you can run:

kubectl get --all-namespaces -oyaml issuer,clusterissuer,cert > backup.yaml

This command will backup Issuer, ClusterIssuer and Certificate resources you might create in your cluster.

Please note, if you use ingress-shim to auto-create the Certificate resources, you should exclude Certificate resources from the backup. Otherwise, the incorrect owner reference of the Certificate resources will lead to a config sync issue. i.e. Any updates to the Ingress will not be applied to the Certificate .

To backup without Certificate resources, you can run the following command instead:

kubectl get --all-namespaces -oyaml issuer,clusterissuer > backup.yaml

Besides the Cert-Manager configuration resources, you will also want to back up the Secret resource containing X.509 certificates:

kubectl -n [namespace] get secret [secret name] -oyaml > secret-bak.yaml 

Here, [namespace] is the namespace containing the Secret and [secret name] is the name of the Secret resource.

Besides the Secret resources containing issued X.509 certificates, if you are transferring data to a new cluster, you may also need to copy across additional Secret resources that your configured Issuers reference. More information can be found here.

How to Use Your Backup

When things go wrong, you should always restore the Secret resource first if the Secret resource is lost for any reason. This should happen before you attempt to restore any Issuer, ClusterIssuer, Certificate or Ingress resources to avoid triggering unnecessary certificate reissuance that might cause downtime.

To restore the Secret resource, you can run:

kubectl apply -f secret-bak.yaml

To restore Cert-Manager configuration resources, you can run:

kubectl apply -f <(awk '!/^ *(resourceVersion|uid): [^ ]+$/' backup.yaml)

The awk command will remove the uid and resourceVersion fields that do not need to be restored.

Verify `--enable-certificate-owner-ref` Flag

Before upgrading your existing Cert-Manager installation, you should ensure the --enable-certificate-owner-ref flag is NOT set to true on your current deployment. Otherwise, when the Certificate resource is removed for any reason, the Secret resource containing issued X.509 certificates will also be removed, which will cause downtime in most cases.

The default value of this flag is false. However, If the flag is ever set to true on your current deployment, you should change your deployment config to set the flag back to false before upgrading Cert-Manager to a newer version.

Upgrade Cert-Manager Resources API Version

If you need to upgrade Cert-Manager to version 1.6 or later, you must first upgrade the API version of the existing Cert-Manager resource. The easiest option is to use the cmctl CLI tool. You can download the binary for your system from the release page.

Once cmctl CLI tool is installed, you can run the following to upgrade the existing Cert-Manager resources API version in the cluster.

cmctl upgrade migrate-api-version --qps 5 --burst 10

cmctl also comes with a conversion tool that allows you to convert saved Cert-Manager resource manifest files (e.g. the previously produced backup file) to newer versions.

cmctl convert -f cert.yaml

Upgrade CRD Definition

The first step of upgrading the existing Cert-Manager installation is upgrading installed CRD definitions. You can:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/<version>/cert-manager.yaml

Here, <version> is the version you plan to upgrade. e.g. v1.8.2

Upgrade Cert-Manager

After the CRD definitions are upgraded, you can upgrade the Cert-Manager helm chart to the version you plan to upgrade.

helm upgrade --namespace <namespace> --version <version> <release_name> jetstack/cert-manager

here, <namespace> is the cert-manager deploy namespace. <version> is the version that you plan to upgrade to. e.g. 1.8.2 (there is no starting letter v). <release_name> is your helm deployment release name.

You can use cmctl CLI to verify the upgraded installation:

$ cmctl check apiThe cert-manager API is ready

Force Renew Certificate

You can check the status of the existing Certificate with cmctl CLI tool:

cmctl status certificate [cert name]

If you have stuck Certificate renewal order after the upgrade, you can force renew the certificate with cmctl CLI tool:

cmctl renew [cert name]

When Things Go Wrong

Firstly, you should check the log from the Cert-Manager controller pod to eliminate any obvious config mistakes. e.g. incorrect cluster issuer ACME credentials.

kubectl -n [namespace] logs cert-manager-xxxxxxxxxx-xxxxx

here, [namespace] is the cert-manager deployment namespace and `cert-manager-xxxxxxxxxx-xxxxx` is the name of one of cert-manager controller pods.

Nevertheless, there is always a chance that you could not pin down the root cause of the current installation. When it happens, you might have to completely remove the Cert-Manger installation and reinstall it in your cluster.

Remove Cert-Manager Installation

To reinstall Cert-Manager, you need to remove the current installation completely first. This will not necessarily cause downtime to your application as long as theSecret resource containing issued X.509 certificates is not removed. That’s why we need to make sure --enable-certificate-owner-ref flag of your installation was NOT set to true to ensure the removal of Certificate resources will not remove the Secret resource.

To remove the Cert-Manager installation, we must first ensure that all cert-manager resources have been deleted.

kubectl get Issuers,ClusterIssuers,Certificates,CertificateRequests,Orders,Challenges --all-namespaces

Once all these resources have been deleted you can uninstall Cert-Manager by:

helm --namespace <namespace> delete <release_name>

Here, <namespace> is the cert-manager deployment namespace. <release_name> is the cert-manager helm chart deployment release name.

If your current installation was not installed using Helm, you can also delete the installation using kubectl CLI tool.

kubectl delete -f https://github.com/cert-manager/cert-manager/releases/download/vX.Y.Z/cert-manager.yaml

Here, vX.Y.Z is the your current installation version. e.g. v1.8.2

Finally, we need to delete all CRD definitions installed:

kubectl delete -f https://github.com/cert-manager/cert-manager/releases/download/vX.Y.Z/cert-manager.crds.yaml

here, `vX.Y.Z` is the version of the installed version. e.g. v1.8.2

Reinstall Cert-Manager

Once the existing Cert-Manager installation is completely removed, we can proceed to install the newer version Cert-Manager by the recommended installation option. The previously produced Cert-Manager config resources backup can be used to restore the previous setup.

After all resources are restored, we can use cmctl CLI tool was introduced earlier to verify the installation and force renew any existing certificates if necessary.

Conclusion

Cert-Manager is an excellent cloud native certificate management solution for Kubernetes and OpenShift. However, upgrading the existing Cert-Manager installation is not straightforward. This article introduces a general process that helps smooth upgrades. In case any unexpected problems lead to a dysfunctional installation, the article also talks about how to avoid downtime during the reinstallation process.

--

--