Cluster AutoscalerをEKSに導入した

Posted on Dec 10, 2020

この記事はKubernetes Advent Calendar 2020の10日目の記事です。新規プロダクトをEKS上で稼働させる為に、Cluster AutoscalerをEKSに導入しました。今回の記事では、その為の事前調査の結果をまとめました。

前提知識

Cluster Autoscalerとは、Kubernetesクラスタ自体のオートスケーリングを実現するツール。需要に応じてKubernetesクラスタのNodeを自動的に追加・停止していくもの。リソース不足でPodが起動できない場合にNode数を増やす（スケールアウト）。リソース過多なNodeが存在していて、そのNodeで動いているPodが他のNodeで動かせる場合に、Node数を減らす（スケールイン）。つまりAutoscalerがトリガされるタイミングは２つ。リソース不足とリソース過多の瞬間。

リソース不足とは、Pending状態のPodがいるということ。リソース過多とは、Requestsの量が少なくて他のNodeに移せる状態のこと。（デフォルト閾値は50%）

クラスタ全体や各ノードのロードアベレージが高まった際にスケールするのではなく、Pending状態のPodができたタイミングで初めて発動する。つまり、RequestsとLimitsを適切に設定しないと、実施のロードアベレージは低いのにスケールアウトしたり、ロードアベレージは高いのにスケールアウトされない、みたいなことが起きる。基本的にリソースによるスケジューリングは、Requestsを基準に行われる。

RequestsとLimitsに顕著な差をつけないこと。Requestsを大きくしすぎないこと。（実際のロードアベレージは低いのにスケールアウトする、みたいなことが起きる）

Architecture

Core Logic

k8sのリソース情報の取得
Podの起動シミュレーション
リソース計算
スケールイン/スケールアウトのトリガー

Status ConfigMap

Node Poolの情報
Autoscaler管理対象のノード数
スケールイン候補情報
各種タイムスタンプ

Cloud Provider Logic

Node Poolの具体的表現
インフラリソースの管理

ハンズオン

EKSクラスタ構築

eksctlを使ってEKSクラスタをシュッと構築します。
- .kube/configも動的に書き換わるので、そのままEKSクラスタへのアクセスが可能です。

eksctl create cluster \
    --vpc-cidr 10.0.0.0/16 \
    --name gotoken-sandbox-cluster-for-auto-scale \
    --nodes=3 \
    --version 1.17 \
    --nodegroup-name gotoken-sandbox-ng-for-auto-scale \
    --node-type t3.medium

Cluster Auto Scaler

AutoScalingGroupの設定を確認

>  aws autoscaling \
    describe-auto-scaling-groups \
    --query "AutoScalingGroups[?Tags[?(Key=='alpha.eksctl.io/cluster-name') && Value=='gotoken-sandbox-cluster-for-auto-scale']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" \
    --output table
-------------------------------------------------------------------------------------------------------------------------------------
|                                                     DescribeAutoScalingGroups                                                     |
+--------------------------------------------------------------------------------------------------------------------+----+----+----+
|  eksctl-gotoken-sandbox-cluster-for-auto-scale-nodegroup-gotoken-sandbox-ng-for-auto-scale-NodeGroup-7X05BWEKJEYQ  |  3 |  3 |  3 |
+--------------------------------------------------------------------------------------------------------------------+----+----+----+

AutoScalingGroupの設定を変更

export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='alpha.eksctl.io/cluster-name') && Value=='gotoken-sandbox-cluster-for-auto-scale']].AutoScalingGroupName" --output text)


aws autoscaling \
    update-auto-scaling-group \
    --auto-scaling-group-name ${ASG_NAME} \
    --min-size 3 \
    --desired-capacity 3 \
    --max-size 5

 aws autoscaling \
    describe-auto-scaling-groups \
    --query "AutoScalingGroups[?Tags[?(Key=='alpha.eksctl.io/cluster-name') && Value=='gotoken-sandbox-cluster-for-auto-scale']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" \
    --output table

IRSA有効化

eksctl utils associate-iam-oidc-provider \
    --cluster gotoken-sandbox-cluster-for-auto-scale \
    --approve

IAM Policy作成

aws iam create-policy   \
  --policy-name k8s-asg-policy \
  --policy-document file://asg-policy.json

SeriviceAccountとIAM Roleを作成

eksctl create iamserviceaccount \
    --name cluster-autoscaler \
    --namespace kube-system \
    --cluster gotoken-sandbox-cluster-for-auto-scale \
    --attach-policy-arn "arn:aws:iam::XXXXXXXXXXXX:policy/k8s-asg-policy" \
    --approve \
    --override-existing-serviceaccounts

IAM RoleのARNがannotationsで付与されたService Accountの存在を確認する

> kubectl -n kube-system describe sa cluster-autoscaler
Name:                cluster-autoscaler
Namespace:           kube-system
Labels:              <none>
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXXX:role/eksctl-gotoken-sandbox-cluster-for-auto-scal-Role1-95ZRL9U4QN24
Image pull secrets:  <none>
Mountable secrets:   cluster-autoscaler-token-m8xnk
Tokens:              cluster-autoscaler-token-m8xnk
Events:              <none>

Auto Discovery設定用にマニフェスト書き換え

curl -O https://www.eksworkshop.com/beginner/080_scaling/deploy_ca.files/cluster-autoscaler-autodiscover.yaml

#            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eksworkshop-eksctl
            - --node-group-auto-discovery=asg:tag=kubernetes.io/cluster/gotoken-sandbox-cluster-for-auto-scale

Cluster AutoScalerのデプロイ

kubectl apply -f cluster-autoscaler-autodiscover.yaml

Nodeのオートスケールの検証

nginx deploymentの準備

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-to-scaleout
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        service: nginx
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx-to-scaleout
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi

kubectl apply -f nginx.yaml

nginxのPod数を増やしてみる

kubectl scale --replicas=10 deployment/nginx-to-scaleout

> k get po -w
NAME                                 READY   STATUS    RESTARTS   AGE
nginx-to-scaleout-77cc8cc66f-8qfx6   0/1     Pending   0          45s
nginx-to-scaleout-77cc8cc66f-f997x   1/1     Running   0          45s
nginx-to-scaleout-77cc8cc66f-h6vxf   1/1     Running   0          45s
nginx-to-scaleout-77cc8cc66f-jx4tx   1/1     Running   0          45s
nginx-to-scaleout-77cc8cc66f-kdzj6   1/1     Running   0          116s
nginx-to-scaleout-77cc8cc66f-lvbml   1/1     Running   0          45s
nginx-to-scaleout-77cc8cc66f-tzrff   1/1     Running   0          45s
nginx-to-scaleout-77cc8cc66f-v9r75   1/1     Running   0          45s
nginx-to-scaleout-77cc8cc66f-wllsz   1/1     Running   0          45s
nginx-to-scaleout-77cc8cc66f-xnfk7   1/1     Running   0          45s

nginx deploymentのPod数を10にしてPending状態のPodを作成したらNodeがスケールアウトした
nginx deploymentのPod数を1にしたらNodeがスケールインした

深堀り

Auto Discovery

タグを使ってNode Groupの対象となるASGを指定する
なおタグを使ったAuto Discoveryではなく、マニュアルで対象のASGを指定することも可能
- --nodes=<min>:<max>:<asg-name>
  - <min>と<max>はASGで定義されたminimum / maximum countに収まる値を指定する必要がある
  - 手動の場合は、全てのNodeはCPU/Memoryが同等である（≒同じインスタンスタイプである）必要がある
- 複数のASGをEKS Clusterで使っている場合は、こちらを指定する

scale-down-unneeded-time

Worker Nodeが過剰と判断されたあとに、スケールインが始まるまでにはタイムラグが存在する
デフォルトだ10分経過してからNodeが不必要と判断される
scale-down-unneeded-timeパラメータ（デフォルト10分）を指定することで、この時間を短くできる

Cluster Autoscalerのステータス管理

Cluster AutoscalerによるNode増減のステータスはConfigMapで管理されている

> k describe cm cluster-autoscaler-status -n kube-system

概略

大きく２つのセクションに分けられる
- Cluster-wide
- NodeGroups
NodeGrups
- NodePoolの内部表現（ID）
- NameはEKSではASGの名前
registeredとcloudProviderTarget（≒Desired）に差分がある状態が続くと調整される

Production Readyな運用にあたって

IRSAでIAM RoleをCluster AutoscalerのPodに付与する
- その際にはここらへんを参考にActionを定義
- 対象のResourceは明示的に指定する
ClusterRole/Roleをcluster-autoscalerのServiceAccountにBindする
- Secrestsへのread/delete権限などはないので、サンプルをそのまま適用できそう
マニフェスト及びCluster Autoscalerの起動コマンドの引数でnamespaceを指定することで、kube-system以外でも起動できる
- これを指定しないで単純にCluster AutoscalerのDeploymentを別のnamespaceで起動すると、Status管理用のConfigMapが作成されずにうまく動かない。
Cluster AutoscalerによるスケールアウトはASGのDesired CapacityをAWS APIを通じて書き換えるという方式
- Cluster Autoscalerによるスケールアウトが行われている時にはTerraformのコードと実態の設定が乖離する
- Cluster Autoscaler導入後にnullを設定する（aws_autoscaling_groupのdesired_capacityはOptionalなのでnullが許容される）
Tagを使ってNodeGroupをAutoDiscoveryで指定する。そのためには以下の２つの対応が必要。
- Cluster Autoscalerを起動する時の引数--node-group-auto-discoveryにタグを指定
- ASGにタグを指定
Kubernetes の EventsでScaleUpとScaleDownのEventsを取得できるので、DatadogからScaleUp/ScaleDownの通知ができそう
スケールインする時にevictする仕組みを導入しないといけない
- ASGのLifecycleEventでevictさせるワークロードを何らかの処理で動かす（Lambdaとか）