Every time we restarted Atlantis, the tool we use to plan and apply Terraform changes, we’d be stuck for 30 minutes waiting for it to come back up. No plans, no applies, no infrastructure changes for any repository managed by Atlantis. With roughly 100 restarts a month for credential rotations and onboarding, that added up to over 50 hours of blocked engineering time every month, and paged the on-...
In this situation, Arc Codex encountered a common issue with Kubernetes: default settings can become bottlenecks as data grows on large persistent volumes. The recursive permission changes every time the volume was mounted were a manifestation of this issue. By changing the fsGroupChangePolicy setting to OnRootMismatch, Arc Codex was able to significantly reduce restart times for their infrastructure.
This incident serves as a reminder that understanding and auditing Kubernetes settings is cruci...
