19 May 2022 |
| _slack_kubeflow_U03FXUQEWFN joined the room. | 23:17:47 |
| jaklan joined the room. | 23:32:14 |
20 May 2022 |
| jaklan changed their display name from _slack_kubeflow_U01444C5K89 to jaklan. | 00:32:55 |
| jaklan set a profile picture. | 00:32:56 |
jaklan | Hi guys, we have a following issue:
• we would like to have a root pipeline (== global pipeline), which would, after running some common jobs, trigger child pipelines (== country-specific pipelines)
• from the root perspective - when one country pipeline fails, we should get failure status for it, but don't block other country pipelines (so just parallel tasks with sth like continueOn: failed: true in Argo Workflows)
• but now - there are a few business requirements:
◦ we should be able to open a detailed view of a selected child pipeline from the root pipeline (but it can be simply e.g. a link to another pipeline run, it doesn't have to be any dynamically expandable DAG of DAGs view)
◦ users have to be able to both re-run or retry the child pipeline, depending on which step failed (for now - just manually)
And now the question - how can we achieve sth like that with KFP? KFP doesn't have any concept of pipeline of pipelines, so we imagine we can create a root pipeline which would just trigger child pipelines via API calls, but it becomes very problematic to follow their status - we would need to e.g. send GET requests for the child pipelines' status each e.g. 30 seconds and if the child pipeline fails and someone decides to re-run or retry run it - then we won't be able to get the new status as it was already marked as failed before (at least without dirty workarounds like manually modifying the status in database etc.).
What is a good example of what we want achieve are... GitLab CI child pipelines. You can create a parent pipeline, which just triggers child pipelines and define if the root pipeline should wait for the status of child pipeline (and fail if they fail), or just always pass no matter what happens in child pipelines. Also in the first scenario, if you decide to retry a failed job and it succeeds, the child pipeline is then just continued, and if it's green - the rest of given root pipeline as well. And if you want to re-run the whole pipeline - it's also pretty easy. | 00:32:57 |
Jonny Browning | Hi Jakub - what's the timeline for your project? I think KFP V2 (later this year) will be adding support for graph-based components, so you should then be able to achieve pipelines-of-pipelines | 10:21:01 |
jaklan | Oh that's interesting, can you provide any more details about it? Any article, presentation, docs link? | 10:21:46 |
Jonny Browning | I will have a look! As I'm sure you know, a lot of the documentation isn't fully to up to date! | 10:22:30 |
jaklan | haha that's the never-ending problem with KFP 😄 | 10:22:52 |
Jonny Browning | some hints at ongoing work in the GH repo https://github.com/kubeflow/pipelines/pull/7551 | 10:24:39 |
jaklan | thanks! | 11:29:02 |
jaklan | and what would you recommend as a workaround for the nearest months not to create any dirty, unmaintainable workarounds? | 11:30:37 |
Jonny Browning | Tricky - KFP v2 will likely require a decent migration effort anyway as it's a major release | 11:31:45 |
Jonny Browning | some interesting discussion here also (slightly older but you might find it useful) - https://github.com/kubeflow/pipelines/issues/4555 | 11:33:23 |
jaklan | thanks again, will have a look | 11:34:33 |
Jonny Browning | You might be able to use this piece from the client library https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.client.html#kfp.Client.wait_for_run_completion | 11:34:52 |
Jonny Browning | (within a pipeline component for your "parent" pipeline) | 11:35:03 |
Jonny Browning | Don't know what platform you are running on, but you could also use something like Google Cloud Composer (/Airflow) for orchestrating the whole thing - but I appreciate that's quite a heavyweight solution! | 11:37:07 |
jaklan | definitely useful, just it's still challenging how to monitor retried pipelines, as it wouldn't affect parent pipeline, because it's just one-way communication | 11:37:22 |
jaklan | we have Kubeflow deployed on AWS and in general we try to leverage Argo for scheduling (but we rather just learn it) | 11:38:22 |
jaklan | but even with Argo I think the issue is exactly the same, so how to build that two-way communication | 11:39:01 |
| Krzysztof Romanowski joined the room. | 11:42:27 |
| Krzysztof Romanowski changed their display name from _slack_kubeflow_U03GAQYGV4K to Krzysztof Romanowski. | 11:45:14 |
| Krzysztof Romanowski set a profile picture. | 11:45:15 |
Krzysztof Romanowski | Hello, I work with jaklan and I'm the admin of our Kubeflow instance. This is very interesting Jonny Browning, will definitely take a look! 🙂 | 11:45:16 |
| fredkid joined the room. | 11:49:24 |
| willykin joined the room. | 18:24:04 |
| Vu Dat joined the room. | 21:45:41 |
Vu Dat | Do you have solution? | 21:52:49 |
Vu Dat | Vinay Anantharaman | 21:52:58 |