[Kubernetes] HPA(Horizontal Pod Autoscaling)란?

DevOps/Kubernetes

[Kubernetes] HPA(Horizontal Pod Autoscaling)란?

nayoungs 2022. 5. 26. 03:33

728x90

HPA(Horizontal Pod Autoscaling)는 워크로드를 자동으로 업데이트하며

Deployment, ReplicaSets, StatefulSet의 복제본 개수를 조정한다.

Metric이 Pod의 CPU 사용량을 측정해서 HPA에 전달하고(HPA는 Metric을 모니터링하고),

실제로 Pod의 수를 늘리거나 줄이는것은 ReplicaSet이나 Deployment에 의해서 컨트롤 된다.

CPU 등의 리소스 사용량이 증가 하면 Pod 개수를 늘리고, 리소스 사용량이 감소하면 다시 Pod 개수를 줄인다.

따라서 HPA를 사용하기 위해서는 사용량을 측정하기 위해 반드시 Metric 서버가 필요하다.

단, HPA는 크기를 조정할 수 없는 Object (ex : DaemonSet, Service 등)에는 적용되지 않는다.

AutoScaling 대상

Pod
- HPA : Horizontal Pod Autoscaling
- VPA : Vertical Pod Autoscaler --> Cloud에서만 가능
  - 수직적으로 파드의 크기를 조정(Request,Limit 실시간 조정)
Node
- ClusterAutoScaler --> Cloud에서만 가능

리소스 확인

$ kubectl api-resources| grep hpa
horizontalpodautoscalers          hpa          autoscaling/v1                         true         HorizontalPodAutoscaler

리소스 정의 방법 확인

$  kubectl explain hpa.spec

scaleTargetRef

hpa.spec.scaleTargetRef

오토스케일링할 리소스에 관한 내용을 작성하는 것으로, required 필드이다.

apiVersion
kind(required)
name(required)

targetCPUUtilizationPercentage

hpa.spec.targetCPUUtilizationPercentage

CPU사용률이 몇 퍼센트를 넘어갔을 때 스케일링 할 것인지 정의한다.

☁️ 참고

HPA 버전v2beta2의 metrics는 targetCPUUtilizationPercentage와 유사한 것으로,

metrics는 CPU 뿐만 아니라 다양한 다른 metric들을 확인할 수 있다.

$ kubectl explain hpa.spec.metrics --api-version autoscaling/v2beta2
...
FIELDS:
   containerResource    <Object>
     container resource refers to a resource metric (such as those specified in
     requests and limits) known to Kubernetes describing a single container in
     each pod of the current scale target (e.g. CPU or memory). Such metrics are
     built in to Kubernetes, and have special scaling options on top of those
     available to normal per-pod metrics using the "pods" source. This is an
     alpha feature and can be enabled by the HPAContainerMetrics feature flag.

   external     <Object>
     external refers to a global metric that is not associated with any
     Kubernetes object. It allows autoscaling based on information coming from
     components running outside of cluster (for example length of queue in cloud
     messaging service, or QPS from loadbalancer running outside of cluster).

   object       <Object>
     object refers to a metric describing a single kubernetes object (for
     example, hits-per-second on an Ingress object).

   pods <Object>
     pods refers to a metric describing each pod in the current scale target
     (for example, transactions-processed-per-second). The values will be
     averaged together before being compared to the target value.

   resource     <Object>
     resource refers to a resource metric (such as those specified in requests
     and limits) known to Kubernetes describing each pod in the current scale
     target (e.g. CPU or memory). Such metrics are built in to Kubernetes, and
     have special scaling options on top of those available to normal per-pod
     metrics using the "pods" source.

   type <string> -required-
     type is the type of metric source. It should be one of "ContainerResource",
     "External", "Object", "Pods" or "Resource", each mapping to a matching
     field in the object. Note: "ContainerResource" type is available on when
     the feature-gate HPAContainerMetrics is enabled

minReplicas

hpa.spec.minReplicas

파드의 최소 개수

maxReplicas

hpa.spec.maxReplicas

파드의 최대 개수로, required 필드이다.

HPA 알고리즘

원하는 레플리카 수 = ceil[현재 레플리카 수 * ( 현재 메트릭 값 / 원하는 메트릭 값 )]

워크로드 스케일링의 안정성

너무 자주 스케일링하게 되면, 리소스가 많이 소모되기 때문에 오히려 역효과가 날 수 있다.

이렇게 경계를 계속 넘나드는 형태를 thrashing 또는 flapping이라고 불린다.

안정한 윈도우

임계값을 넘어가는 상황이 일정 시간 이상 지속되었을 때 스케일링한다.

기본적으로 세팅되어있는 것이 300초이다.

스케일 아웃: 180초
스케일 인: 300초

예시

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300

💻 실습 : HPA를 이용한 AutoScaling

myweb-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myweb-deploy
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: myweb
          image: ghcr.io/c1t1d0s7/go-myweb
          ports:
            - containerPort: 8080

myweb-hpa.yaml

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: myweb-hpa
spec:
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50
  scaleTargetRef:
    kind: Deployment
    name: myweb-deploy

$ kubectl create -f myweb-deploy.yaml -f myweb-hpa.yaml

hpa를 확인해보면 TARGETS이 <unknown>인 것을 확인할 수 있는데,

원래라면 시간이 조금 지나면 값으로 측정되어 나타난다.

$ kubectl get deploy,rs,po,hpa   
NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/myweb-deploy             2/2     2            2           6s

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/myweb-deploy-657f957c85             2         2         2       6s

NAME                                          READY   STATUS    RESTARTS      AGE
pod/myweb-deploy-657f957c85-cskms             1/1     Running   0             6s
pod/myweb-deploy-657f957c85-n6dqh             1/1     Running   0             6s

NAME                                            REFERENCE                 TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/myweb-hpa   Deployment/myweb-deploy   <unknown>/50%   1         10        0          6s

그러나 이 상황에서는 아무리 기다려도 <unknown>이 사라지지 않는다.

왜일까❔❔

kubectl describe 명령어로 Events를 보면, missing request for cpu 를 확인할 수 있다.

Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Warning  FailedGetResourceMetric       2s (x8 over 107s)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu
  Warning  FailedComputeMetricsReplicas  2s (x8 over 107s)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: missing request for cpu

deployment에 request 설정이 안되어있기 때문이다.

관리되는 대상은 Request 설정이 되어있어야한다.⭐

다음과 같이 request를 포함하여 수정 후, apply하면 <unknown>이 값으로 변경된 것을 확인할 수 있다.

myweb-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myweb-deploy
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: myweb
          image: ghcr.io/c1t1d0s7/go-myweb:alpine
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 200m
            limits:
              cpu: 200m

$ kubectl apply -f myweb-deploy.yaml

$ kubectl get hpa
NAME        REFERENCE                 TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
myweb-hpa   Deployment/myweb-deploy   0%/50%    1         10        2          5m27s

이제 스케일링을 확인하기 위해 파드에 인위적으로 부하를 가해보자.

$ kubectl get po 
NAME                                      READY   STATUS    RESTARTS        AGE
myweb-deploy-6dcf6c95c6-dmw84             1/1     Running   0               3m49s

$ kubectl exec myweb-deploy-6dcf6c95c6-dmw84 -- sha256sum /dev/zero

잠시후 CPU 사용률이 점점 증가하고,

$ kubectl get hpa
NAME        REFERENCE                 TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
myweb-hpa   Deployment/myweb-deploy   13%/50%   1         10        1          4m33s

Replicas가 3으로 스케일링되어, 파드가 새로 생성된 것을 확인할 수 있다.

$ kubectl get hpa 
NAME        REFERENCE                 TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
myweb-hpa   Deployment/myweb-deploy   33%/50%   1         10        3          6m35s

$ kubectl top pods
NAME                                      CPU(cores)   MEMORY(bytes)   
myweb-deploy-6dcf6c95c6-dmw84             201m         1Mi             
myweb-deploy-6dcf6c95c6-kmm4n             0m           1Mi             
myweb-deploy-6dcf6c95c6-z68j8             0m           1Mi

☁️ 참고

myweb-hpa.yaml 를 버전v2beta2로 작성하면 다음과 같다.

myweb-hpa-v2beta2.yaml

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: myweb-hpa
spec:
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization #비율
          avarageUtilization: 50
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myweb-deploy

myweb-hpa.yaml과 myweb-hpa-v2beta2.yaml은 완벽하게 똑같이 작동하는 HPA이다.

참고

728x90

저작자표시 비영리