[SURE-8366] Rancher keeps sending delete request for an already deleted EKS cluster #723

kkaempf · 2024-07-31T07:52:55Z

SURE-8366

Issue description:

Customer is reporting that the eks-operator is constantly sending DeleteCluster calls to the AWS API to delete clusters that have already been deleted (from Rancher). They would restart Rancher, but it continues every 150 seconds.

Business impact:

This isn't causing workload outages, but it's more of an annoyance for them.

Troubleshooting steps:

We had a few calls trying to find where the requests were coming from, and we found that in the ekscc object, some mentions of clusters were present, but in some of the newer logs, the original clusters they were concerned about were no longer present.

Actual behavior:

The cluster in question was removed from Rancher and EKS. However rancher continues to send requests to delete it.

Expected behavior:

When removing the cluster from Rancher, the cluster should be deleted, and cluster deleting messages should not be sent to AWS.

Files, logs, traces:

(See JIRA)

Additional notes:

It's important to note that they are doing some odd things with the permissions on the AWS side "For security reasons" that we couldn't get more explanation on. That's why we were seeing those errors in the AWS logs.

See SURE-8366 for the rest of the logs & the impacted cluster list

kkaempf · 2024-08-06T10:07:11Z

Waiting for customer feedback.

mjura · 2024-08-13T09:16:28Z

It is still waiting for customer feedback, solution was provided we could consider to close it

mjura · 2024-09-16T12:25:41Z

I have asked about providing output from following commands:

kubectl get clusters.management.cattle.io -A
kubectl get clusters.provisioning.cattle.io -A
kubectl get eksclusterconfigs.eks.cattle.io -A

kubectl get clusters.management.cattle.io -A -o yaml
kubectl get clusters.provisioning.cattle.io -A -o yaml
kubectl get eksclusterconfigs.eks.cattle.io -A -o yaml

kubectl logs -n cattle-system eks-operator-ID

kkaempf added kind/bug Something isn't working JIRA Must shout labels Jul 31, 2024

kkaempf added this to the v2.9-Next1 milestone Jul 31, 2024

mjura self-assigned this Aug 1, 2024

kkaempf modified the milestones: v2.9-Next1, v2.9-Next2 Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SURE-8366] Rancher keeps sending delete request for an already deleted EKS cluster #723

[SURE-8366] Rancher keeps sending delete request for an already deleted EKS cluster #723

kkaempf commented Jul 31, 2024 •

edited by gaktive

Loading

kkaempf commented Aug 6, 2024

mjura commented Aug 13, 2024

mjura commented Sep 16, 2024

[SURE-8366] Rancher keeps sending delete request for an already deleted EKS cluster #723

[SURE-8366] Rancher keeps sending delete request for an already deleted EKS cluster #723

Comments

kkaempf commented Jul 31, 2024 • edited by gaktive Loading

SURE-8366

Issue description:

Business impact:

Troubleshooting steps:

Actual behavior:

Expected behavior:

Files, logs, traces:

Additional notes:

kkaempf commented Aug 6, 2024

mjura commented Aug 13, 2024

mjura commented Sep 16, 2024

kkaempf commented Jul 31, 2024 •

edited by gaktive

Loading