gMSA on AKS and Non-public Endpoints


Just a few weeks in the past, I spent a while with our assist and engineering groups serving to a buyer remedy an issue that occurred after they enabled Group Managed Service Accounts (gMSA) on Azure Kubernetes Service (AKS).

I made a decision to write down this weblog so different clients with the identical subject can keep away from going by it altogether. I’m writing the weblog within the sequence as I skilled it, however should you’re simply on the lookout for the answer, be at liberty to skip to the tip.

When that buyer enabled gMSA on their cluster, just a few issues began to occur:

  • Any gMSA enabled deployment/container/pod entered a failed state. The occasions from the deployments would present the pods with the next error: Occasion Element: Didn’t setup the exterior credentials for Container ‘‘: The RPC server is unavailable.
  • Any non-gMSA deployment/container/pod utilizing the shopper’s non-public pictures and working on Home windows nodes additionally entered a failed state. The deployments have been displaying an occasion of ErrImagePull.
  • All different deployments/containers/pods each on Home windows and Linux nodes that weren’t utilizing non-public pictures stored their wholesome state.

Eradicating the gMSA configuration from the cluster would routinely revert to a wholesome state for the complete cluster.

The error with the gMSA pods took me instantly to different instances through which I’ve seen clients having related points due to community connectivity. The most typical gMSA points I’ve seen to this point are:

  • Blocked ports: Having a firewall between your AKS cluster and the Energetic Listing (AD) Area Controllers (DCs). AD makes use of a number of protocols for communication between shoppers and DCs. I even created a easy script that validates the ports.
  • Incorrect DNS configuration: AD makes use of DNS for service discovery. Area Controllers have a “SRV” entry within the DNS that shoppers question to allow them to discover not solely all DCs, however the closest one. If both the nodes or pods can’t resolve the area fqdn to a DC, gMSA received’t work.
  • Incorrect secret on Azure Key Vault (AKV): A person account is utilized by the Window nodes, quite than a pc account because the nodes should not domain-joined. The format of the key ought to be :.

There are different minor points that I’ve seen, however these are the principle ones. Within the case of this clients, we reviewed the above and every little thing appeared to be configured correctly.

At that time, I introduced other people and so they caught on one thing that I knew existed, however had not seen utilizing gMSA but: AKS non-public clusters.

This buyer has a safety coverage in-place that mandates Azure assets ought to be utilizing non-public endpoints at any time when potential. That was true for the AKS cluster and subsequently, it launched a habits that broke the cluster.

I discussed above that gMSA makes use of DNS for DC discovering. Let me clarify what the default config is and what occurred after enabling gMSA:

By default, Linux and Home windows nodes on AKS will use the Azure vNet DNS server for DNS queries. Home windows and Linux pods will use CoreDNS for DNS queries. Azure DNS can’t resolve AD area FQDNs since these are usually non-public to on-premises or personal cloud networks.

For that purpose, while you allow gMSA and move the parameter of DNS server for use, two issues are modified within the AKS cluster. First, the Home windows nodes will begin utilizing the DNS server supplied. Second, the CoreDNS setting is modified so as to add a forwarder. This forwards something associated to the area FQDN to the required DNS server. With these two configs, Home windows nodes and Home windows pods can now “discover” the DCs.

Azure Portal displaying the CoreDNS configuration with a DNS forwarder after gMSA has been configured.

Nonetheless, this introduces one other subject when mixed with a personal AKS cluster. Non-public endpoints are behind a personal DNS zone. Azure DNS servers can resolve for these zones, however non-Azure DNS servers can’t. Since now the Home windows nodes and Home windows pods are utilizing a DNS server exterior of Azure, the non-public zone of the AKS cluster can’t be resolved so the DCs can’t entry the Home windows nodes and Home windows pods.

Not solely that, however this buyer additionally had their Azure Container Registry (ACR) behind a personal endpoint. The second symptom above was additionally attributable to this configuration, as now the Home windows nodes can’t resolve for the non-public zone of the ACR registry and consequently can’t pull their non-public pictures.

For reference, these are the container associated providers and their non-public zones:

Non-public hyperlink useful resource sort

Subresource

Non-public DNS zone title

Public DNS zone forwarders

Azure Kubernetes Service – Kubernetes API (Microsoft.ContainerService/managedClusters)

administration

privatelink.{regionName}.azmk8s.io
{subzone}.privatelink.{regionName}.azmk8s.io

{regionName}.azmk8s.io

Azure Container Apps (Microsoft.App/ManagedEnvironments)

managedEnvironments

privatelink.{regionName}.azurecontainerapps.io

azurecontainerapps.io

Azure Container Registry (Microsoft.ContainerRegistry/registries)

registry

privatelink.azurecr.io
{regionName}.knowledge.privatelink.azurecr.io

azurecr.io
{regionName}.knowledge.azurecr.io

 

For a full listing of zones, take a look at the Azure documentation.

The answer right here is easy. For the non-Azure DNS servers to resolve Non-public Endpoint zones, a DNS forwarder could be created.

This buyer had a really particular implementation, however normally what you’ll want to configure is a DNS forwarder to the zones associated to the providers you might be utilizing. For instance:

–          AKS clusters: Create a forwarder of azmk8s.io to 168.63.129.16.

–          For ACR registries: Create a forwarder of azurecr.io to 168.63.129.16.

168.63.129.16. is the digital IP tackle of the Azure platform that serves because the communication channel to the platform assets. One among its providers is DNS. Actually, that is the unique service that the Home windows nodes and Home windows pods have been utilizing earlier than gMSA was enabled.

It’s all the time DNS!

If you’re utilizing gMSA on AKS, remember the fact that Home windows nodes and Home windows pods will begin utilizing a DNS server exterior of Azure (or that has no visibility into the Azure platform immediately, equivalent to Non-public Endpoint zones). You would possibly must configure DNS forwarders when you begin utilizing gMSA on AKS, though this can be true for any service.

I hope this weblog publish helps you keep away from this subject – or helps you troubleshoot it. Tell us within the feedback!

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles