Configure Cloud Provider

The configurations for Cloud Provider Azure.

This doc describes cloud provider config file, which is to be used via the --cloud-config flag of azure-cloud-controller-manager.

Here is a config file sample:

{
    "cloud":"AzurePublicCloud",
    "tenantId": "0000000-0000-0000-0000-000000000000",
    "aadClientId": "0000000-0000-0000-0000-000000000000",
    "aadClientSecret": "0000000-0000-0000-0000-000000000000",
    "subscriptionId": "0000000-0000-0000-0000-000000000000",
    "resourceGroup": "<name>",
    "location": "eastus",
    "subnetName": "<name>",
    "securityGroupName": "<name>",
    "securityGroupResourceGroup": "<name>",
    "vnetName": "<name>",
    "vnetResourceGroup": "<name>",
    "routeTableName": "<name>",
    "primaryAvailabilitySetName": "<name>",
    "routeTableResourceGroup": "<name>",
    "cloudProviderBackoff": false,
    "useManagedIdentityExtension": false,
    "useInstanceMetadata": true
}

Note: All values are of type string if not explicitly called out.

Auth configs

NameDescriptionRemark
cloudThe cloud environment identifierValid values could be found here. Default to AzurePublicCloud.
tenantIDThe AAD Tenant ID for the Subscription that the cluster is deployed inRequired.
aadClientIDThe ClientID for an AAD application with RBAC access to talk to Azure RM APIsUsed for service principal authn.
aadClientSecretThe ClientSecret for an AAD application with RBAC access to talk to Azure RM APIsUsed for service principal authn.
aadClientCertPathThe path of a client certificate for an AAD application with RBAC access to talk to Azure RM APIsUsed for client cert authn.
aadClientCertPasswordThe password of the client certificate for an AAD application with RBAC access to talk to Azure RM APIsUsed for client cert authn.
useManagedIdentityExtensionUse managed service identity for the virtual machine to access Azure ARM APIsBoolean type, default to false.
userAssignedIdentityIDThe Client ID of the user assigned MSI which is assigned to the underlying VMsRequired for user-assigned managed identity.
subscriptionIdThe ID of the Azure Subscription that the cluster is deployed inRequired.
identitySystemThe identity system for AzureStack. Supported values are: ADFSOnly used for AzureStack
networkResourceTenantIDThe AAD Tenant ID for the Subscription that the network resources are deployed inOptional. Supported since v1.18.0. Only used for hosting network resources in different AAD Tenant and Subscription than those for the cluster.
networkResourceSubscriptionIDThe ID of the Azure Subscription that the network resources are deployed inOptional. Supported since v1.18.0. Only used for hosting network resources in different AAD Tenant and Subscription than those for the cluster.

Note: Cloud provider currently supports three authentication methods, you can choose one combination of them:

  • Managed Identity:
    • For system-assigned managed identity: set useManagedIdentityExtension to true
    • For user-assigned managed identity: set useManagedIdentityExtension to true and also set userAssignedIdentityID
  • Service Principal: set aadClientID and aadClientSecret
  • Client Certificate: set aadClientCertPath and aadClientCertPassword

If more than one value is set, the order is Managed Identity > Service Principal > Client Certificate.

Cluster config

NameDescriptionRemark
resourceGroupThe name of the resource group that the cluster is deployed in
locationThe location of the resource group that the cluster is deployed in
vnetNameThe name of the VNet that the cluster is deployed in
vnetResourceGroupThe name of the resource group that the Vnet is deployed in
subnetNameThe name of the subnet that the cluster is deployed in
securityGroupNameThe name of the security group attached to the cluster’s subnet
securityGroupResourceGroupThe name of the resource group that the security group is deployed in
routeTableNameThe name of the route table attached to the subnet that the cluster is deployed inOptional in 1.6
primaryAvailabilitySetName*The name of the availability set that should be used as the load balancer backendOptional
vmTypeThe type of azure nodes. Candidate values are: vmss and standardOptional, default to standard
primaryScaleSetName*The name of the scale set that should be used as the load balancer backendOptional
cloudProviderBackoffEnable exponential backoff to manage resource request retriesBoolean value, default to false
cloudProviderBackoffRetriesBackoff retry limitInteger value, valid if cloudProviderBackoff is true
cloudProviderBackoffExponentBackoff exponentFloat value, valid if cloudProviderBackoff is true
cloudProviderBackoffDurationBackoff durationInteger value, valid if cloudProviderBackoff is true
cloudProviderBackoffJitterBackoff jitterFloat value, valid if cloudProviderBackoff is true
cloudProviderBackoffModeBackoff mode, supported values are “v2” and “default”. Note that “v2” has been deprecated since v1.18.0.Default to “default”
cloudProviderRateLimitEnable rate limitingBoolean value, default to false
cloudProviderRateLimitQPSRate limit QPS (Read)Float value, valid if cloudProviderRateLimit is true
cloudProviderRateLimitBucketRate limit Bucket SizeIntegar value, valid if cloudProviderRateLimit is true
cloudProviderRateLimitQPSWriteRate limit QPS (Write)Float value, valid if cloudProviderRateLimit is true
cloudProviderRateLimitBucketWriteRate limit Bucket SizeInteger value, valid if cloudProviderRateLimit is true
useInstanceMetadataUse instance metadata service where possibleBoolean value, default to false
loadBalancerSkuSku of Load Balancer and Public IP. Candidate values are: basic and standard.Default to basic.
excludeMasterFromStandardLBExcludeMasterFromStandardLB excludes master nodes from standard load balancer.Boolean value, default to true.
disableOutboundSNATDisable outbound SNAT for SLBDefault to false and available since v1.11.9, v1.12.7, v1.13.5 and v1.14.0
maximumLoadBalancerRuleCountMaximum allowed LoadBalancer Rule Count is the limit enforced by Azure Load balancerInteger value, default to 148
routeTableResourceGroupThe resource group name for routeTableDefault same as resourceGroup and available since v1.15.0
loadBalancerNameWorking together with loadBalancerResourceGroup to determine the LB name in a different resource groupSince v1.18.0, default is cluster name setting on kube-controller-manager
loadBalancerResourceGroupThe load balancer resource group name, which is different from node resource groupSince v1.18.0, default is same as resourceGroup
disableAvailabilitySetNodesDisable supporting for AvailabilitySet virtual machines in vmss cluster. It should be only used when vmType is “vmss” and all the nodes (including master) are VMSS virtual machinesSince v1.18.0, default is false
availabilitySetNodesCacheTTLInSecondsCache TTL in seconds for availabilitySet NodesSince v1.18.0, default is 900
vmssCacheTTLInSecondsCache TTL in seconds for VMSSSince v1.18.0, default is 600
vmssVirtualMachinesCacheTTLInSecondsCache TTL in seconds for VMSS virtual machinesSince v1.18.0, default is 600
vmCacheTTLInSecondsCache TTL in seconds for virtual machinesSince v1.18.0, default is 60
loadBalancerCacheTTLInSecondsCache TTL in seconds for load balancersSince v1.18.0, default is 120
nsgCacheTTLInSecondsCache TTL in seconds for network security groupSince v1.18.0, default is 120
routeTableCacheTTLInSecondsCache TTL in seconds for route tableSince v1.18.0, default is 120
disableAzureStackCloudDisableAzureStackCloud disables AzureStackCloud support. It should be used when setting Cloud with “AZURESTACKCLOUD” to customize ARM endpoints while the cluster is not running on AzureStack. Default is false.Optional. Supported since v1.20.0 in out-of-tree cloud provider Azure.
tagstags that would be tagged onto the cloud provider managed resources, including lb, public IP, network security group and route table.Optional. Supported since v1.20.0.
systemTagstag keys that should not be deleted when being updated.Optional. Supported since v1.21.0.

primaryAvailabilitySetName

If this is set, the Azure cloudprovider will only add nodes from that availability set to the load balancer backend pool. If this is not set, and multiple agent pools (availability sets) are used, then the cloudprovider will try to add all nodes to a single backend pool which is forbidden. In other words, if you use multiple agent pools (availability sets), you MUST set this field.

primaryScaleSetName

If this is set, the Azure cloudprovider will only add nodes from that scale set to the load balancer backend pool. If this is not set, and multiple agent pools (scale sets) are used, then the cloudprovider will try to add all nodes to a single backend pool which is forbidden when using Load Balancer Basic SKU. In other words, if you use multiple agent pools (scale sets), and loadBalancerSku is set to basic you MUST set this field.

excludeMasterFromStandardLB

Master nodes are not added to the backends of Azure Load Balancer (ALB) if excludeMasterFromStandardLB is set.

By default, if nodes are labeled with node-role.kubernetes.io/master, they would also be excluded from ALB. If you want to add the master nodes to ALB, excludeMasterFromStandardLB should be set to false and label node-role.kubernetes.io/master should be removed if it has already been applied.

Setting Azure cloud provider from Kubernetes secrets

Since v1.21.0, Azure cloud provider supports reading the cloud config from Kubernetes secrets. The secret is a serialized version of azure.json file. When the secret is changed, the cloud controller manager will re-constructing itself without restarting the pod.

To enable this feature, set --enable-dynamic-reloading=true and configure the secret name, namespace and data key by --cloud-config-secret-name, --cloud-config-secret-namespace and --cloud-config-key. When initializing from secret, the --cloud-config would be ignored.

Note that the --enable-dynamic-reloading cannot be false if --cloud-config is empty. To build the cloud provider from classic config file, please explicitly specify the --cloud-config and do not set --enable-dynamic-reloading=true. In this manner, the cloud controller manager will not be updated when the config file is changed. You need to restart the pod to manually trigger the re-initialization.

Since Azure cloud provider would read Kubernetes secrets, the following RBAC should also be configured:

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  labels:
    kubernetes.io/cluster-service: "true"
  name: system:azure-cloud-provider-secret-getter
rules:
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["azure-cloud-provider"]
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  labels:
    kubernetes.io/cluster-service: "true"
  name: system:azure-cloud-provider-secret-getter
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:azure-cloud-provider-secret-getter
subjects:
- kind: ServiceAccount
  name: azure-cloud-provider
  namespace: kube-system

per client rate limiting

Since v1.18.0, the original global rate limiting has been switched to per-client. A set of new rate limit configure options are introduced for each client, which includes:

  • routeRateLimit
  • SubnetsRateLimit
  • InterfaceRateLimit
  • RouteTableRateLimit
  • LoadBalancerRateLimit
  • PublicIPAddressRateLimit
  • SecurityGroupRateLimit
  • VirtualMachineRateLimit
  • StorageAccountRateLimit
  • DiskRateLimit
  • SnapshotRateLimit
  • VirtualMachineScaleSetRateLimit
  • VirtualMachineSizeRateLimit

The original rate limiting options (“cloudProviderRateLimitBucket”, “cloudProviderRateLimitBucketWrite”, “cloudProviderRateLimitQPS”, “cloudProviderRateLimitQPSWrite”) are still supported, and they would be the default values if per-client rate limiting is not configured.

Here is an example of per-client config:

{
  // default rate limit (enabled).
  "cloudProviderRatelimit": true,
  "cloudProviderRateLimitBucket": 1,
  "cloudProviderRateLimitBucketWrite": 1,
  "cloudProviderRateLimitQPS": 1,
  "cloudProviderRateLimitQPSWrite": 1,
  "virtualMachineScaleSetRateLimit": {  // VMSS specific (enabled).
    "cloudProviderRatelimit": true,
    "cloudProviderRateLimitBucket": 2,
    "CloudProviderRateLimitBucketWrite": 2,
    "cloudProviderRateLimitQPS": 0,
    "CloudProviderRateLimitQPSWrite": 0
  },
  "loadBalancerRateLimit": {  // LB specific (disabled)
    "cloudProviderRatelimit": false
  },
  ... // other cloud provider configs
}

Run Kubelet without Azure identity

When running Kubelet with kube-controller-manager, it also supports running without Azure identity since v1.15.0.

Both kube-controller-manager and kubelet should configure --cloud-provider=azure --cloud-config=/etc/kubernetes/azure.json, but the contents for azure.json are different:

(1) For kube-controller-manager, refer the above part for setting azure.json.

(2) For kubelet, useInstanceMetadata is required to be true and Azure identities are not required. A sample for Kubelet’s azure.json is

{
  "useInstanceMetadata": true,
  "vmType": "vmss"
}

Azure Stack Configuration

Azure Stack has different API endpoints, depending on the Azure Stack deployment. These need to be provided to the Azure SDK and currently this is done by adding an extra json file with the arguments, as well as an environment variable pointing to this file.

There are several available presets, namely:

  • AzureChinaCloud
  • AzureGermanCloud
  • AzurePublicCloud
  • AzureUSGovernmentCloud

These are determined using cloud: <PRESET> described above in the description of azure.json.

When cloud: AzureStackCloud, the extra environment variable used by the Azure SDK to find the Azure Stack configuration file is:

The configuration parameters of this file:

{
  "name": "AzureStackCloud",
  "managementPortalURL": "...",
  "publishSettingsURL": "...",
  "serviceManagementEndpoint": "...",
  "resourceManagerEndpoint": "...",
  "activeDirectoryEndpoint": "...",
  "galleryEndpoint": "...",
  "keyVaultEndpoint": "...",
  "graphEndpoint": "...",
  "serviceBusEndpoint": "...",
  "batchManagementEndpoint": "...",
  "storageEndpointSuffix": "...",
  "sqlDatabaseDNSSuffix": "...",
  "trafficManagerDNSSuffix": "...",
  "keyVaultDNSSuffix": "...",
  "serviceBusEndpointSuffix": "...",
  "serviceManagementVMDNSSuffix": "...",
  "resourceManagerVMDNSSuffix": "...",
  "containerRegistryDNSSuffix": "...",
  "cosmosDBDNSSuffix": "...",
  "tokenAudience": "...",
  "resourceIdentifiers": {
    "graph": "...",
    "keyVault": "...",
    "datalake": "...",
    "batch": "...",
    "operationalInsights": "..."
  }
}

The full list of existing settings for the AzureChinaCloud, AzureGermanCloud, AzurePublicCloud and AzureUSGovernmentCloud is available in the source code at https://github.com/Azure/go-autorest/blob/master/autorest/azure/environments.go#L51.

Host Network Resources in different AAD Tenant and Subscription

Since v1.18.0, Azure cloud provider supports hosting network resources (Virtual Network, Network Security Group, Route Table, Load Balancer and Public IP) in different AAD Tenant and Subscription than those for the cluster. To enable this feature, set networkResourceTenantID and networkResourceSubscriptionID in auth config. Note that the value of them need to be different than value of tenantID and subscriptionID.

With this feature enabled, network resources of the cluster will be created in networkResourceSubscriptionID in networkResourceTenantID, and rest resources of the cluster still remain in subscriptionID in tenantID. Properties which specify the resource groups of network resources are compatible with this feature. For example, Virtual Network will be created in vnetResourceGroup in networkResourceSubscriptionID in networkResourceTenantID.

For authentication methods, only Service Principal supports this feature, and aadClientID and aadClientSecret are used to authenticate with those two AAD Tenants and Subscriptions. Managed Identity and Client Certificate doesn’t support this feature. Azure Stack doesn’t support this feature.

Current default rate-limiting values

The following are the default rate limiting values configured in AKS and AKS-Engine clusters prior to Kubernetes version v1.18.0.

    "cloudProviderBackoff": true,
    "cloudProviderBackoffRetries": 6,
    "cloudProviderBackoffDuration": 5,
    "cloudProviderRatelimit": true,
    "cloudProviderRateLimitQPS": 10,
    "cloudProviderRateLimitBucket": 100,
    "cloudProviderRatelimitQPSWrite": 10,
    "cloudProviderRatelimitBucketWrite": 100,

For v1.18.0+ refer to per client rate limit config