Skip to content

Commit

Permalink
Merge pull request #4886 from sgibson91/docs/prom-resize
Browse files Browse the repository at this point in the history
Add documentation on resizing prometheus' disk
  • Loading branch information
sgibson91 authored Sep 25, 2024
2 parents 46d605b + f2a16d6 commit c86b695
Show file tree
Hide file tree
Showing 2 changed files with 104 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ sre-guide/support/index.md
sre-guide/manage-k8s/index.md
sre-guide/node-scale-up/index.md
sre-guide/common-problems-solutions.md
sre-guide/prometheus-disk-resize.md
```

(hub-deployment-guide)=
Expand Down
103 changes: 103 additions & 0 deletions docs/sre-guide/prometheus-disk-resize.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Resizing Prometheus' disk

We may need to resize Prometheus' disk that collects metrics data as we store more and more data.

On GCP clusters, the storage classes are set by default to permit auto-expansion.
Therefore, simply defining a new persistent volume size in the support chart values and redeploying it, should suffice.
However, this may not be the case on other cloud providers.
The below steps will walk you through resizing the disk.

## Resizing the disk

```bash
# Set the KUBE_EDITOR env var to point to a text editor you're comfortable with
export KUBE_EDITOR="/usr/bin/nano"

# Set the name of the cluster to work against
export CLUSTER_NAME=...

# Authenticate against the cluster
deployer use-cluster-credentials $CLUSTER_NAME
```

1. Set the desired size of the Prometheus server persistent volume in the relevant `support.values.yaml` file.

```yaml
prometheus:
server:
persistentVolume:
size: <desired-size>
```
1. Check the reclaim policy on the persistent volume.
```bash
# List all the PVs. They are not namespaced.
kubectl get pv
```

1. Edit persistent volume's reclaim policy to be `Retain` if it is not already.
This will prevent us from losing the data Prometheus has already collected.

```bash
kubectl edit pv <pv-name>
```

1. Check the value of `ALLOWVOLUMEEXPANSION` of the default storage class, identified by `(default)` next to it's name.

```bash
kubectl get storageclass
```

1. Set `ALLOWVOLUMEEXPANSION` to `true` if it is not.
This will allow the persistent volumes to be dynamically resized.

```bash
kubectl patch storageclass <storage-class-name> --patch '{\"allowVolumeExpansion\": true}'
```

```{note}
At the point, you could try to redeploy the support chart and see if it succeeds.
If it doesn't, continue with the steps.
```

1. Delete the persistent volume claim for the prometheus server.
Persistent volume claims cannot be patched so we will need to recreate it.

```bash
# List all PVCs in the support namespace
kubectl -n support get pvc
# Delete the prometheus server PVC
kubectl -n support delete pvc support-prometheus-server
```

1. In another terminal with the `CLUSTER_NAME` variable set, redeploy the support chart.
It should fail with the PVC in a `Pending` state.

```bash
deployer deploy-support $CLUSTER_NAME
```

1. Edit the persistent volume to have the same UID and resource version as the newly created PVC under`spec.claimRef`.

```bash
# Get the UID and resource version of the PVC
kubectl -n support get pvc support-prometheus-server -o yaml
# Edit the PV to reference these values under `spec.claimRef`
kubectl edit pv <pv-name>
```

1. Delete the prometheus server pod and check that it comes back up.

```bash
kubectl -n support delete pod support-prometheus-server-<hash>
kubectl -n support get pods --watch
```

1. Redeploy the support chart again and this time it should succeed.

```bash
deployer deploy-support $CLUSTER_NAME
```

0 comments on commit c86b695

Please sign in to comment.