We have spent quite a few days to troubleshoot an intermittent problem in our dev clusters.
I ended up deploying a k8s cluster by hand on Ubuntu and watching the output of each command to finally figure out the issue we were experiencing.
When I installed istio I got this output:
bin/istioctl install|\| \| \| \/|| \/ || \/ || \/ || \/ || \/ || \/______||__________\____________________\__ _____/\_____/❗ detected Calico CNI with 'bpfConnectTimeLoadBalancing=TCP'; this must be set to 'bpfConnectTimeLoadBalancing=Disabled' in the Calico configurationThis will install the Istio 1.28.0 profile "default" into the cluster. Proceed? (y/N) y✔ Istio core installed ⛵️✔ Istiod installed ðŸ§✔ Ingress gateways installed 🛬✔ Installation complete
I read a little bit more on Calico and figured out where that configuration was and modified it with:
kubectl edit -n kube-system felixconfigurations.crd.projectcalico.org
Normally this would be in the calico-system namespace but on the clusters deployed with Kubespray it is in the kube-system namespace.
After we restarted the cluster, everything was stable and much faster.
I like automated deployment but they hide a lot of the output so we had no idea what we were looking for.
It is also difficult to keep on top of everything that changes in every new version of every component of our stack and to be aware of the impact of these changes. This requires a lot of time.
Versions involved:
- Kubernetes 1.31, 1.32 and 1.34
- Calico 3.31
- Istio 1.28
We still have an issue with Calico Whisker because we disabled ipv6 on the nodes but it is a known bug with a workaround.