Downsizing the GKE setup #29

New Issue

greg · 2019-07-22T08:18:55Z

greg commented

2019-07-22 08:18:55 +00:00

Last week I have downsized the GKE setup from 4 to 2 nodes. I cannot find an option to change the machine type (both nodes are 1vCPU right now), so it looks like we would have to recreate a node pool. GKE offers the option of a half shared CPU, we need to check if the performance is acceptable for Gitea

greg self-assigned this 2019-07-22 08:18:55 +00:00

greg added the

ops

label 2019-07-22 08:18:55 +00:00

raucao commented

2019-07-22 08:44:29 +00:00

Yes, of course it has to be a new node pool. A cluster can have many pools.

And I think there's no question about any type of CPU being enough for Gitea. It uses hardly any resources at all, because it's all compiled machine code. No need for spending any time on testing that beforehand.

Yes, of course it has to be a new node pool. A cluster can have many pools. And I think there's no question about any type of CPU being enough for Gitea. It uses hardly any resources at all, because it's all compiled machine code. No need for spending any time on testing that beforehand.

greg commented

2019-07-24 15:13:30 +00:00

Here are the current pods running on our two nodes:

$ kubectl describe nodes
Name:               gke-sidamo-default-pool-289d39e8-31cj
Non-terminated Pods:         (10 in total)
  Namespace                  Name                                                         CPU Requests  CPU Limits  Memory Requests  Memory Limits   AGE
  ---------                  ----                                                         ------------  ----------  ---------------  -------------   ---
  kube-system                fluentd-gcp-scaler-86b957c9c8-cwtp9                          0 (0%)        0 (0%)      0 (0%)           0 (0%)          5h7m
  kube-system                fluentd-gcp-v3.1.1-c9kjr                                     100m (10%)    1 (106%)    200Mi (14%)      500Mi (36%)     5h7m
  kube-system                heapster-v1.6.1-67fb84b7b6-m2rwr                             63m (6%)      63m (6%)    216040Ki (15%)   216040Ki (15%)  5h16m
  kube-system                kube-dns-b46cc9485-bfbwc                                     260m (27%)    0 (0%)      110Mi (7%)       170Mi (12%)     5h16m
  kube-system                kube-dns-b46cc9485-jxf57                                     260m (27%)    0 (0%)      110Mi (7%)       170Mi (12%)     5h16m
  kube-system                kube-proxy-gke-sidamo-default-pool-289d39e8-31cj             100m (10%)    0 (0%)      0 (0%)           0 (0%)          5h7m
  kube-system                l7-default-backend-7ff48cffd7-6nkjp                          10m (1%)      10m (1%)    20Mi (1%)        20Mi (1%)       5h16m
  kube-system                metrics-server-v0.3.1-57c75779f-cw4t7                        48m (5%)      143m (15%)  105Mi (7%)       355Mi (25%)     5h16m
  kube-system                prometheus-to-sd-rncdt                                       1m (0%)       3m (0%)     20Mi (1%)        20Mi (1%)       5h7m
  kube-system                stackdriver-metadata-agent-cluster-level-6c7cc6b7bc-5vzgp    40m (4%)      0 (0%)      50Mi (3%)        0 (0%)          5h15m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests        Limits
  --------                   --------        ------
  cpu                        882m (93%)      1219m (129%)
  memory                     845800Ki (59%)  1480680Ki (104%)
  ephemeral-storage          0 (0%)          0 (0%)
  attachable-volumes-gce-pd  0               0


Name:               gke-sidamo-default-pool-289d39e8-mt4b
[snip]
Non-terminated Pods:         (10 in total)
  Namespace                  Name                                                CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                ------------  ----------  ---------------  -------------  ---
  default                    gitea-db-6b8869ff8-f4w2c                            250m (26%)    500m (53%)  150Mi (10%)      300Mi (21%)    5h7m
  default                    gitea-server-76c9945b4c-nwdv2                       250m (26%)    500m (53%)  256Mi (18%)      512Mi (36%)    5h16m
  kube-system                event-exporter-v0.2.5-7d99d74cf8-mvgt2              0 (0%)        0 (0%)      0 (0%)           0 (0%)         5h7m
  kube-system                fluentd-gcp-v3.1.1-z4lnv                            100m (10%)    1 (106%)    200Mi (14%)      500Mi (36%)    5h7m
  kube-system                kube-dns-autoscaler-bb58c6784-ts6r5                 20m (2%)      0 (0%)      10Mi (0%)        0 (0%)         5h7m
  kube-system                kube-proxy-gke-sidamo-default-pool-289d39e8-mt4b    100m (10%)    0 (0%)      0 (0%)           0 (0%)         5h7m
  kube-system                prometheus-to-sd-h5dg5                              1m (0%)       3m (0%)     20Mi (1%)        20Mi (1%)      5h7m
  kube-system                sealed-secrets-controller-6b9f699f5-nl85x           0 (0%)        0 (0%)      0 (0%)           0 (0%)         5h7m
  kube-system                tiller-deploy-6fd8d857bc-5mg6m                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         5h7m
  velero                     velero-db6459bb-swqxk                               0 (0%)        0 (0%)      0 (0%)           0 (0%)         5h7m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests     Limits
  --------                   --------     ------
  cpu                        721m (76%)   2003m (213%)
  memory                     636Mi (45%)  1332Mi (96%)
  ephemeral-storage          0 (0%)       0 (0%)
  attachable-volumes-gce-pd  0            0

The big system pods are fluentd-gcp (2 pods, requests 200MB RAM, 10% of a CPU), kube-dns (2 pods, requests 110MB RAM, 27% of a CPU), metrics-server (1 pod, requests 105MB RAM, 5% of a CPU)

If we switch to 3 small cluster nodes (g1-small), we'd have 3 x 1.7GB RAM (5.1GB) and 3 x shared vCPUs as opposed to 2 x 2 GB (4GB) and 2 x 1 vCPUs right now.

We should be able to switch to 3 small nodes without changing the resource requests and limits, and then probably adjust the CPU requests. These 3 small nodes should cost around USD 43.38 / month according to the price calculator (https://cloud.google.com/products/calculator/)

Here are the current pods running on our two nodes: ``` $ kubectl describe nodes Name: gke-sidamo-default-pool-289d39e8-31cj Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- kube-system fluentd-gcp-scaler-86b957c9c8-cwtp9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5h7m kube-system fluentd-gcp-v3.1.1-c9kjr 100m (10%) 1 (106%) 200Mi (14%) 500Mi (36%) 5h7m kube-system heapster-v1.6.1-67fb84b7b6-m2rwr 63m (6%) 63m (6%) 216040Ki (15%) 216040Ki (15%) 5h16m kube-system kube-dns-b46cc9485-bfbwc 260m (27%) 0 (0%) 110Mi (7%) 170Mi (12%) 5h16m kube-system kube-dns-b46cc9485-jxf57 260m (27%) 0 (0%) 110Mi (7%) 170Mi (12%) 5h16m kube-system kube-proxy-gke-sidamo-default-pool-289d39e8-31cj 100m (10%) 0 (0%) 0 (0%) 0 (0%) 5h7m kube-system l7-default-backend-7ff48cffd7-6nkjp 10m (1%) 10m (1%) 20Mi (1%) 20Mi (1%) 5h16m kube-system metrics-server-v0.3.1-57c75779f-cw4t7 48m (5%) 143m (15%) 105Mi (7%) 355Mi (25%) 5h16m kube-system prometheus-to-sd-rncdt 1m (0%) 3m (0%) 20Mi (1%) 20Mi (1%) 5h7m kube-system stackdriver-metadata-agent-cluster-level-6c7cc6b7bc-5vzgp 40m (4%) 0 (0%) 50Mi (3%) 0 (0%) 5h15m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 882m (93%) 1219m (129%) memory 845800Ki (59%) 1480680Ki (104%) ephemeral-storage 0 (0%) 0 (0%) attachable-volumes-gce-pd 0 0 Name: gke-sidamo-default-pool-289d39e8-mt4b [snip] Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- default gitea-db-6b8869ff8-f4w2c 250m (26%) 500m (53%) 150Mi (10%) 300Mi (21%) 5h7m default gitea-server-76c9945b4c-nwdv2 250m (26%) 500m (53%) 256Mi (18%) 512Mi (36%) 5h16m kube-system event-exporter-v0.2.5-7d99d74cf8-mvgt2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5h7m kube-system fluentd-gcp-v3.1.1-z4lnv 100m (10%) 1 (106%) 200Mi (14%) 500Mi (36%) 5h7m kube-system kube-dns-autoscaler-bb58c6784-ts6r5 20m (2%) 0 (0%) 10Mi (0%) 0 (0%) 5h7m kube-system kube-proxy-gke-sidamo-default-pool-289d39e8-mt4b 100m (10%) 0 (0%) 0 (0%) 0 (0%) 5h7m kube-system prometheus-to-sd-h5dg5 1m (0%) 3m (0%) 20Mi (1%) 20Mi (1%) 5h7m kube-system sealed-secrets-controller-6b9f699f5-nl85x 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5h7m kube-system tiller-deploy-6fd8d857bc-5mg6m 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5h7m velero velero-db6459bb-swqxk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5h7m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 721m (76%) 2003m (213%) memory 636Mi (45%) 1332Mi (96%) ephemeral-storage 0 (0%) 0 (0%) attachable-volumes-gce-pd 0 0 ``` The big system pods are fluentd-gcp (2 pods, requests 200MB RAM, 10% of a CPU), kube-dns (2 pods, requests 110MB RAM, 27% of a CPU), metrics-server (1 pod, requests 105MB RAM, 5% of a CPU) If we switch to 3 small cluster nodes ([g1-small](https://cloud.google.com/compute/vm-instance-pricing#sharedcore)), we'd have 3 x 1.7GB RAM (5.1GB) and 3 x shared vCPUs as opposed to 2 x 2 GB (4GB) and 2 x 1 vCPUs right now. We should be able to switch to 3 small nodes without changing the resource requests and limits, and then probably adjust the CPU requests. These 3 small nodes should cost around USD 43.38 / month according to the price calculator (https://cloud.google.com/products/calculator/)

raucao commented

2019-07-24 15:26:21 +00:00

So what you're saying is that you want to run 5.1GB of memory for Gitea only needing 400MB max?

I don't understand this reasoning. It sounds like you're saying the smallest node type doesn't work at all by default. So Google is offering a node type that cannot work with GKE. Which is both unreasonable and also doesn't map the blog posts around the net about using $5 nodes for GKE.

So what you're saying is that you want to run 5.1GB of memory for Gitea only needing 400MB max? I don't understand this reasoning. It sounds like you're saying the smallest node type doesn't work at all by default. So Google is offering a node type that cannot work with GKE. Which is both unreasonable and also doesn't map the blog posts around the net about using $5 nodes for GKE.

raucao commented

2019-07-24 15:27:34 +00:00

Btw, running two nodes, if that's what is happening right now, is actually the first thing that the official docs (as well as others) say you should never do, because the config orchestration needs one master and two slaves minimum. Hence, a minimum of 3 nodes is always recommended, even for the smallest of clusters.

greg commented

2019-07-24 16:28:33 +00:00

I found a resource that lists the actual allocatable memory for different node types: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#memory_cpu

The system pods do not count towards that limit. But they still use actual RAM, around 400MB per node according to the docs, ~500MB right now for us

3 f1-micro would be USD 12.88 / month. A micro has 0.6GB RAM, 240 of it allocatable (1.8GB RAM total, 720MB allocatable). So that should work for Gitea and its database (plus the system pods)

On Monday when I tried to migrate to 2 micros we ran out of CPU, but that should not happen with 3 micros.

We could do the switch to 3 micros tomorrow, what do you think?

I found a resource that lists the actual allocatable memory for different node types: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#memory_cpu The system pods do not count towards that limit. But they still use actual RAM, around 400MB per node according to the docs, ~500MB right now for us 3 f1-micro would be USD 12.88 / month. A micro has 0.6GB RAM, 240 of it allocatable (1.8GB RAM total, 720MB allocatable). So that should work for Gitea and its database (plus the system pods) On Monday when I tried to migrate to 2 micros we ran out of CPU, but that should not happen with 3 micros. We could do the switch to 3 micros tomorrow, what do you think?

raucao commented

2019-07-25 09:56:54 +00:00

We could do the switch to 3 micros tomorrow, what do you think?

I think my opinion doesn't matter anymore. You just explained why it makes sense to switch to micro instances.

> We could do the switch to 3 micros tomorrow, what do you think? I think my opinion doesn't matter anymore. You just explained why it makes sense to switch to micro instances.

raucao commented

2019-07-25 16:14:33 +00:00

Note from chat: the Gitea resource limits have to be adjusted before downsizing the cluster again.

greg commented

2019-08-06 15:25:32 +00:00

It turns out we can get rid of one of the 3 small nodes after all. The master is managed by GKE (https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#master). The minimum of 3 nodes in a cluster is when you run your own cluster, because etcd needs at least 3 nodes for a production cluster; this is what the Kubernetes docs and blog posts are talking about, not the case of a managed service like Amazon EKS or GKE. This is confusing because they are still "nodes", but they don't have a role and only run pods, they do not provide the actual Kubernetes API (the master managed by GKE does)

Running only one node would mean that Gitea would go down completely if the node goes down, but two would be just fine for our usage for now

Edit: The GKE docs call these nodes "worker machines" as opposed to the "cluster master"

It turns out we can get rid of one of the 3 small nodes after all. The master is managed by GKE (https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#master). The minimum of 3 nodes in a cluster is when you run your own cluster, because etcd needs at least 3 nodes for a production cluster; this is what the Kubernetes docs and blog posts are talking about, not the case of a managed service like Amazon EKS or GKE. This is confusing because they are still "nodes", but they don't have a role and only run pods, they do not provide the actual Kubernetes API (the master managed by GKE does) Running only one node would mean that Gitea would go down completely if the node goes down, but two would be just fine for our usage for now Edit: The GKE docs call these nodes "worker machines" as opposed to the "cluster master"

raucao added this to the Production readiness milestone 2020-02-15 16:07:52 +00:00

raucao closed this issue

2020-07-30 10:57:11 +00:00

This repo is archived. You cannot comment on issues.

2 Participants

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: kosmos/gitea.kosmos.org#29