5 Apr 2022 |
| Dave Scott set a profile picture. | 18:00:29 |
7 Apr 2022 |
| _slack_kubeflow_U03AZF4FF2M joined the room. | 19:06:54 |
12 Apr 2022 |
| _slack_kubeflow_U03BVHCHD4G joined the room. | 13:14:39 |
18 Apr 2022 |
| @wybpip:matrix.org joined the room. | 08:53:20 |
| @wybpip:matrix.org left the room. | 08:53:21 |
| Dave Scott changed their profile picture. | 14:49:33 |
Vlad | Hey guys, I have a question .. Have you ever have this issue with pytorch and Kubeflow ?
RuntimeError: unable to write to file /torch_253_338982586_0 : No space left on device (28) | 19:49:55 |
19 Apr 2022 |
Vlad | Hi, anyone have worked with PyTorch and Kale? | 19:02:59 |
20 Apr 2022 |
Benjamin Tan | Is this in a VM? | 01:09:39 |
Vlad | I think that this is from the pod | 01:11:38 |
Benjamin Tan | What about K8s, is this from vagrant or a full blown cloud K8s? | 01:38:29 |
| Joseph Olaide joined the room. | 08:25:34 |
21 Apr 2022 |
Vlad | is a full eks cluster | 23:35:47 |
22 Apr 2022 |
Benjamin Tan | that's interesting. Any idea how big the generated files are locally? | 01:57:30 |
Vlad | mm nop, but we create the shm volume manually (without kale) and now we have the next issue | 18:57:52 |
Vlad | RuntimeError: CUDA out of memory. Tried to allocate 162.00 MiB (GPU 0; 14.76 GiB total capacity; 13.53 GiB already allocated; 36.75 MiB free; 13.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
| 18:57:52 |
Vlad | I think that this is a limit of the gpu memory, maybe ... | 20:04:52 |
Vlad | Denise Mariel Cari Martinez | 20:06:51 |
26 Apr 2022 |
Vlad | Hey guys, how are you? I have a question: what size can get the pods for each step in the pipeline? And how I must define them? Thanks!! | 18:24:32 |
27 Apr 2022 |
Benjamin Tan | Size as in disk space ? | 01:25:48 |
Benjamin Tan | I think if u do a kubectl describe pod -n namespace and eyeball under the resources you should be able to tell | 01:26:54 |
Benjamin Tan | But thinking a bit, maybe you can store it in MinIO | 01:28:08 |
Benjamin Tan | i.e. use Output to do it | 01:28:25 |
28 Apr 2022 |
| Atra Akandeh joined the room. | 14:09:37 |
Vlad | cool!, Thanks BEnjamin, I'll try | 20:02:17 |
29 Apr 2022 |
| Bogdan Kowalczyk joined the room. | 15:14:00 |
| Bogdan Kowalczyk changed their display name from _slack_kubeflow_U03E7U44FC0 to Bogdan Kowalczyk. | 15:14:50 |
| Bogdan Kowalczyk set a profile picture. | 15:14:58 |
1 May 2022 |
| @wybpip:matrix.org joined the room. | 18:31:30 |
| @wybpip:matrix.org left the room. | 18:31:31 |