29 Apr 2024 |
| Pascal van Maanen changed their display name from _slack_taloscommunity_U070WKZ18DU to Pascal van Maanen. | 09:58:26 |
| Pascal van Maanen set a profile picture. | 09:58:28 |
Tim Jones | talosctl mounts will show you that finromation | 10:01:52 |
Andrew Rynhard | Cool! I’m trying to get Talos running with something like kata. I think Talos as a pod using a secure runtime could be super cool. | 11:28:19 |
Jukka Väisänen | yea I was interrupted by real work so will get back to it later.. 🙂 | 11:31:35 |
Andrew | I cannot wait to hear of kata progress! | 11:48:08 |
Jukka Väisänen | lesson learned, how to screw up your 3-pod-cp cluster.. restart all 3 pods at the same time --> all etcd members change their IP addresses --> no quorum --> no apiserver 😄 | 12:46:26 |
Andrew | that's a good thing tho, you re-discovered the opening quote of chapter 12 SRE Book on your own and you will never forget this lesson https://sre.google/sre-book/effective-troubleshooting/ | 12:48:41 |
Jukka Väisänen | been there done that sooo many times! | 12:50:48 |
Jukka Väisänen | worst case ever was Solaris tcp stack slowing down because it kept a route cache which kept growing and the hash algorithm sucked so when this box was acting as a web proxy for a million users the cache would fill up with millions of entries.. and there was only one route out of the box :facepalm: | 12:52:03 |
Jukka Väisänen | the only way to flush the cache was to artificially create a condition where the kernel though it's running out of memory | 12:52:35 |
Jukka Väisänen | so we made a balloon driver | 12:52:40 |
Andrew | I would have really liked working on Solaris systems back in the day | 12:52:54 |
Jukka Väisänen | not sure if that bug has been fixed even today | 12:52:59 |
Jukka Väisänen | many awesome things about solaris and we still have customers running SPARC boxes.. but I digress | 12:54:38 |
Jukka Väisänen | I'm just thinking how do I prevent this kind of etd mayhem in the future.. Andrew Rynhard any ideas? | 12:55:35 |
Jukka Väisänen | tl;dr - if all my virtual CP nodes (pods) restart, all of them get new IPs -> etcd is dead | 12:56:05 |
Andrew | a hacky solution I can think of is to use kyverno to implement a policy that at least 1 of these pods must be in a ready state at all times. | 13:05:15 |
Jukka Väisänen | I guess I could build a etcd discovery server.. or use DNS SRV discovery somehow | 13:07:54 |
Jukka Väisänen | 🐔 🥚 ♻️ | 13:24:58 |
Andrew | "so how do you ensure the etcd discovery server is always on? you use another kyverno policy to enforce that 😛 | 13:26:22 |
Jukka Väisänen | I'm starting to think this isn't really doable with the current Talos etcd member discovery, after reading through a ton of source code.. so I will stick to single-node CP for now. | 14:31:43 |
| Norman Nunley joined the room. | 15:07:50 |
| Norman Nunley changed their display name from _slack_taloscommunity_U07124Y24P7 to Norman Nunley. | 15:07:50 |
| Norman Nunley set a profile picture. | 15:07:51 |
| Norman Nunley changed their profile picture. | 15:07:52 |
| oleg joined the room. | 15:15:03 |
| oleg changed their display name from _slack_taloscommunity_U071TT2NLBS to oleg. | 15:15:04 |
| oleg set a profile picture. | 15:15:05 |
Endre Karlson | Reason earlier asking for disk usage command talos mounts is that I get a full disk message when trying to start a pod on Talos and then getting the below
Warning FailedMount 12s kubelet MountVolume.SetUp failed for volume "kube-api-access-h4884" : write /var/lib/kubelet/pods/81e228a8-3cc4-4d12-ab4d-0690ea4848de/volumes/kubernetes.io~projected/kube-api-access-h4884/..2024_04_29_17_49_24.825184148/ca.crt: no space left on device
Even though
talosctl -n 172.16.53.152 mounts
NODE FILESYSTEM SIZE(GB) USED(GB) AVAILABLE(GB) PERCENT USED MOUNTED ON
172.16.53.152 devtmpfs 8.34 0.00 8.34 0.00% /dev
172.16.53.152 tmpfs 8.38 0.00 8.37 0.04% /run
172.16.53.152 tmpfs 8.38 0.00 8.38 0.00% /system
172.16.53.152 tmpfs 0.07 0.00 0.07 0.00% /tmp
172.16.53.152 overlay 0.01 0.01 0.00 100.00% /
172.16.53.152 tmpfs 8.38 0.00 8.38 0.00% /dev/shm
172.16.53.152 tmpfs 8.38 0.00 8.38 0.00% /etc/cri/conf.d/hosts
172.16.53.152 overlay 8.38 0.00 8.38 0.00% /usr/etc/udev
172.16.53.152 overlay 8.38 0.00 8.38 0.00% /usr/local/lib/containers/talos-vmtoolsd
172.16.53.152 /dev/sda5 0.10 0.01 0.09 6.27% /system/state
172.16.53.152 /dev/sda6 84.57 19.80 64.77 23.42% /var
172.16.53.152 overlay 84.57 19.80 64.77 23.42% /etc/cni
172.16.53.152 overlay 84.57 19.80 64.77 23.42% /etc/kubernetes
172.16.53.152 overlay 84.57 19.80 64.77 23.42% /usr/libexec/kubernetes
172.16.53.152 overlay 84.57 19.80 64.77 23.42% /opt
How can it then get no space left? 🤔 | 18:05:52 |