5 May 2022 |
| _slack_kubeflow_U03E8N9U56E joined the room. | 16:07:19 |
Sam Gaunt | I am trying to run through the setup process here and keep getting stuck waiting for cloud formation stacks to finish. The script times out waiting for the IAM service account stack to finish. I can see the stack in the AWS Console finishes within ~30 seconds, but still the script will just keep printing the same waiting message. | 23:33:56 |
Sam Gaunt | =================================================================
Cluster Secrets Setup
=================================================================
Creating secrets IAM service account...
2022-05-06 09:27:19 [ℹ] eksctl version 0.95.0
2022-05-06 09:27:19 [ℹ] using region ap-southeast-2
2022-05-06 09:27:21 [ℹ] 1 iamserviceaccount (kubeflow/kubeflow-secrets-manager-sa) was included (based on the include/exclude rules)
2022-05-06 09:27:21 [!] metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set
2022-05-06 09:27:21 [ℹ] 1 task: {
2 sequential sub-tasks: {
create IAM role for serviceaccount "kubeflow/kubeflow-secrets-manager-sa",
create serviceaccount "kubeflow/kubeflow-secrets-manager-sa",
} }2022-05-06 09:27:21 [ℹ] building iamserviceaccount stack "eksctl-kubeflow-spike-addon-iamserviceaccount-kubeflow-kubeflow-secrets-manager-sa"
2022-05-06 09:27:21 [ℹ] deploying stack "eksctl-kubeflow-spike-addon-iamserviceaccount-kubeflow-kubeflow-secrets-manager-sa"
2022-05-06 09:27:21 [ℹ] waiting for CloudFormation stack "eksctl-kubeflow-spike-addon-iamserviceaccount-kubeflow-kubeflow-secrets-manager-sa"
2022-05-06 09:27:51 [ℹ] waiting for CloudFormation stack "eksctl-kubeflow-spike-addon-iamserviceaccount-kubeflow-kubeflow-secrets-manager-sa"
2022-05-06 09:28:35 [ℹ] waiting for CloudFormation stack "eksctl-kubeflow-spike-addon-iamserviceaccount-kubeflow-kubeflow-secrets-manager-sa"
2022-05-06 09:30:12 [ℹ] waiting for CloudFormation stack "eksctl-kubeflow-spike-addon-iamserviceaccount-kubeflow-kubeflow-secrets-manager-sa"
2022-05-06 09:31:47 [ℹ] waiting for CloudFormation stack "eksctl-kubeflow-spike-addon-iamserviceaccount-kubeflow-kubeflow-secrets-manager-sa"
2022-05-06 09:33:36 [ℹ] waiting for CloudFormation stack "eksctl-kubeflow-spike-addon-iamserviceaccount-kubeflow-kubeflow-secrets-manager-sa"
2022-05-06 09:34:20 [ℹ] waiting for CloudFormation stack "eksctl-kubeflow-spike-addon-iamserviceaccount-kubeflow-kubeflow-secrets-manager-sa" | 23:34:29 |
6 May 2022 |
Ryan McCaffrey | There's another permission you're missing. I forget which one but I think you can find it by logging into cloud formation in the console and checking the failure log. I'm pretty sure I ran into the same issue. | 00:01:26 |
Sam Gaunt | There is no failure in cloud formation though. It shows all events and resources were complete with no errors. | 00:02:52 |
Ryan McCaffrey | For the cloud formation parts I had to add these policies:
"cloudformation:ListStacks",
"cloudformation:CreateStack",
"iam:CreateRole",
"cloudformation:DescribeStacks",
"cloudformation:DescribeStackEvents"
Maybe check that you have those. | 00:09:48 |
Kartik Kalamadi | Good point
The IAM User which you pass as parameter to the automated script only requires read and write access to objects in an S3 bucket
But to run the scripts itself you need a lot of different permissions
We tested all the automated scripts with Admin credentials so we never ran into any errors
We will fix the documentation
Thanks
-----------------------------
GITHUB ISSUES :
https://github.com/awslabs/kubeflow-manifests/issues/219
https://github.com/awslabs/kubeflow-manifests/issues/215 | 03:18:15 |
Gautam Kumar | Looking at this https://github.com/awslabs/kubeflow-manifests/blob/main/docs/deployment/cognito/README-automated.md
It seems without custom domain its not possible. | 03:26:40 |
Sam Gaunt | Thanks, thought so. Just going without cognito for now then. | 03:27:10 |
Sam Gaunt | Ok I have finally got it working. I think there was a conflict between the IAM account created in the automated setup option and my own user account.
I am using saml2aws to auth and I think that the export of the access key and secret access key in step 4 would override the saml2aws auth and cause issues.
I got it to work by passing the access key and secret access key directly to the script rather than with an env like so.
PYTHONPATH=.. python utils/rds-s3/auto-rds-s3-setup.py --region $CLUSTER_REGION --cluster $CLUSTER_NAME --bucket $S3_BUCKET --s3_aws_access_key_id AWS_ACCESS_KEY_ID_HERE --s3_aws_secret_access_key AWS_SECRET_ACCESS_KEY_HERE | 04:32:47 |
Gautam Kumar | That comments coming from ekstcl command | 05:09:10 |
| _slack_kubeflow_U03E5S89214 joined the room. | 09:41:05 |
| _slack_kubeflow_U03EETBT66P joined the room. | 18:05:07 |
7 May 2022 |
| _slack_kubeflow_U03EDC2FA3X joined the room. | 03:40:42 |
10 May 2022 |
| Yihong Wang joined the room. | 15:28:36 |
12 May 2022 |
| Jim Nolan joined the room. | 21:01:50 |
| Jim Nolan changed their display name from _slack_kubeflow_U03F94272TU to Jim Nolan. | 21:01:54 |
| Jim Nolan set a profile picture. | 21:01:55 |
Jim Nolan | Redacted or Malformed Event | 21:01:56 |
| _slack_kubeflow_U03EUHGBS07 joined the room. | 21:09:34 |
Jim Nolan | Redacted or Malformed Event | 21:11:39 |
16 May 2022 |
| _slack_kubeflow_U03FYE5JUKT joined the room. | 10:58:23 |
| idahotokens joined the room. | 11:22:51 |
Thomas Korrison | Redacted or Malformed Event | 14:19:49 |
Rustam Gimadiev | Redacted or Malformed Event | 14:52:40 |
| idahotokens left the room. | 16:41:03 |
Haris Farooqui | I am seeing issue after upgrading to EKS 1.22 (terraform deployment):
kubectl describe pod my_pipeline -n kubeflow
-----
...
Warning FailedMount 9m46s (x3 over 9m48s) kubelet MountVolume.SetUp failed for volume "kube-api-access-kz47c" : object "kubeflow"/"kube-root-ca.crt" not registered
Warning FailedMount 9m46s (x3 over 9m48s) kubelet MountVolume.SetUp failed for volume "mlpipeline-minio-artifact" : object "kubeflow"/"mlpipeline-minio-artifact" not registered
| 17:36:02 |
Haris Farooqui | I found some post suggesting explicitly setting automountServiceAccountToken: false to avoid rootCAConfigMap from publishing kube-root-ca.cert in kubeflow namespace. While this takes care of MountVolume.SetUp failed for volume "kube-api-access-kz47c" : object "kubeflow"/"kube-root-ca.crt" not registered Warning it creates other issues where pipeline Jobs start failing with following error:
This step is in Error state with this message: Error (exit code 2): invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
| 17:36:20 |
Haris Farooqui | https://stackoverflow.com/questions/69038012/mountvolume-setup-failed-for-volume-kube-api-access-fcz9j-object-default | 17:36:28 |
Alexandre Brown | Note that Kubeflow doesn't support k8s 1.22 yet | 17:48:37 |