17 May 2022 |
Amit Jha | Do you have namespace selected on top of page drop down? | 18:53:14 |
Amit Jha | If it does not let you select one or does not stick, some of the pods may need to be restarted. | 18:54:06 |
Clark Updike | I think I know what happened. I was using the KFP Client in a Jupyter Notebook and it provided URL's directly to the Pipeline UI (outside of Kubeflow)... and those URL's don't apply any namespace. When I went back into experiments from the main Kubeflow UI (where the namespace is shown at the top), it started working... | 18:58:38 |
Clark Updike | Thanks for the pointer that made me realize what was happening! | 18:58:50 |
| leke onilude changed their display name from _slack_kubeflow_U038V0U5YFM to leke onilude. | 19:39:03 |
| leke onilude set a profile picture. | 19:39:16 |
| Jonny Browning changed their display name from _slack_kubeflow_U036ZCFAFLP to Jonny Browning. | 22:20:28 |
| Jonny Browning set a profile picture. | 22:20:31 |
| @californiatokens:matrix.org joined the room. | 23:09:17 |
18 May 2022 |
| Abhishek Sharma joined the room. | 08:05:47 |
Abhishek Sharma | Hi everyone,
I am facing an issue while trying to pass dict as a parameters in a kfp component Error: Structure "OrderedDict()" is incompatible with type "typing.Union[str, int, float, bool, NoneType]" - none of the types in Union are compatible.
According to kubeflow SDK v2 we can pass dict as pipeline parameter[1], but another KFP document[2] says that Parameters are passed into your component by value, and can be of any of the following types: int, double, float, or str.
I am getting some conflicting information on this, can you please help me with that is going on, is this a versioning issue or am I missing something?
I am loading the KFP component as follows component = kfp.components.load_component_from_text(manifest_string)
[1] https://www.kubeflow.org/docs/components/pipelines/sdk-v2/v2-component-io/
[2] https://www.kubeflow.org/docs/components/pipelines/sdk-v2/build-pipeline/ | 08:06:30 |
Cornelis Boon | Been using KF pipelines (the one deployed by GCP AI platform) for a while. Something I’ve not really looked into is automated testing of the pipelines. Has anyone set this up for themselves? Are these way to go? Or is someone using an alternative toolset for e2e testing of pipelines?
https://github.com/kubeflow/pipelines/wiki/Tests
https://github.com/kubeflow/testing | 08:06:42 |
| @californiatokens:matrix.org left the room. | 10:56:55 |
| @billykin:matrix.org joined the room. | 12:36:35 |
droctothorpe | Weird question for folks who use the KFP CLI: why? As in, why not just use the SDK? Does the CLI provide any particular features above and beyond what the SDK provides that motivate you to use it? Thanks! | 13:24:03 |
| _slack_kubeflow_ULV26QXM4 joined the room. | 14:07:12 |
| @billykin:matrix.org left the room. | 16:53:13 |
19 May 2022 |
Ian Miller | Hi all, question for the community. Are most organizations backing kfp with Argo Workflows or Tekton? Is it possible to back it with both? We currently back with Tekton, but have users wanting to leverage kfp v2 features which (to my knowledge) are not yet supported by kfp-tekton. This results in users frequently finding tutorials/examples which don't work on our clusters. | 01:45:41 |
| 레몬버터구이 joined the room. | 02:59:35 |
Ferdinand von den Eichen [Kineo.ai] | Who here is using spot instances for training as part of KF pipelines? I love the financial opportunities, but long running trainings seems to be uniquely poor for spot.
1. Has anyone figured out how to handle eviction properly during trainings? i.e. to pause and pick up on a new pod?
2. Failing that, would setting a .set_retry(X) on training steps be good enough? The intuition being that if we have to get evicted, we can just retry our training on the new node… | 14:55:09 |
| Chris Chase joined the room. | 16:20:47 |
| Amira Menfis joined the room. | 16:55:06 |
| Amira Menfis changed their display name from _slack_kubeflow_U01S6RV9U9M to Amira Menfis. | 16:55:23 |
| Amira Menfis set a profile picture. | 16:55:24 |
Amira Menfis | Hi, Could you let me know how to run kubeflow pipeline outside of kubeflow? | 16:55:24 |
Amit Jha | kfp.Client.run_pipeline - https://www.kubeflow.org/docs/components/pipelines/sdk/sdk-overview/ | 18:49:29 |
droctothorpe | Long running components are a bad use case for spot instances, IMO, but set_retry should theoretically help. | 20:14:59 |
Rahul Mehta | In the case of model training, most libraries support some notion of checkpointing. If you include the checkpoint dir (in s3/other cloud storage) as an argument to your component & appropriately set the retry policy, then you should be able to resume from the checkpoint when the pod is scheduled | 20:27:02 |
Rahul Mehta | Re (1), the kubernetes scheduler should handle that for you; when a node is removed from the cluster, k8s will taint that node with NoSchedule -- when the pod retries after being evicted, it will only be able to schedule on nodes without that taint | 20:27:49 |
David Aronchick | Hi, can you say what you're looking for? Not sure I understand | 21:52:04 |