!LuUSGaeArTeoOgUpwk:matrix.org

kubeflow-kfserving

433 Members
2 Servers

Load older messages


SenderMessageTime
4 Dec 2021
@_slack_kubeflow_UFVUV2UFP:matrix.orgDan Sun
In reply toundefined
(edited) ... gRPC with more ... => ... gRPC for more ...
00:04:10
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply to@_slack_kubeflow_U02NN0J9K5G:matrix.org
Thanks for the info Alexandre Brown. My question is when we use the GPU on Kubernetes cluster, it won't be serverless (always running) right? I want to trigger GPU-on only on when it's event handler or API endpoint is trigged.
Yes that is totally possible and what KServe is for. By setting the min replica to 0, KServe will automatically scale down to 0 the pods of your model server. So there's really not much configuration, it's a 1 liner in the inference service definition. See https://kserve.github.io/website/modelserving/autoscaling/autoscaling/#enable-scale-down-to-zero Now, this gives us a scale to and from 0 pod, but if you want your GPU node to scale to and from 0, that's outside the scope of KServe. You handle that at the cluster level. So you must use kubernetes auto scaler. Here is the setup for AWS. https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html#cluster-autoscaler In your cluster configuration you set the min to 0 for your gpu node and then setup autoscaler. Check your cloud provider doc about auto scaling for more details but fir aws it's not too complicated if you follow the doc.
00:38:30
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
(edited) ... the doc. => ... the doc. With both of these points covered, you'll have a gpu node with a min of 0 and not running initially. Then when a request comes in, kserve will try to schedule a pod for the model server. The pod will be in pending because the GPU nodes running is : 0. The autoscaler will react and start a GPU node. The GPU node running is now : 1. The autoscaler can now schedule the pending pod to the GPU node. Once the request is over, since we set a minReplicas of 0 in the inference service definition. KServe will automatically scale down the model server pod from 1 to 0. After X minutes (configurable), the autoscaler will realize that the GPU node has no running pod meaning it can scale it down from 1 to 0. And voilà, that's the jist of it
00:43:42
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
(edited) Yes that ... => Amit Singh Yes that ...
00:44:07
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
(edited) ... but fir aws ... => ... but for aws ...
00:44:53
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
(edited) ... pod to ... => ... pod (model server) to ...
00:45:44
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
(edited) ... is for. By ... => ... is for (being serverless). By ...
00:48:56
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
(edited) ... Yes that is totally possible and what KServe is for (being serverless). By ... => ... Yes Serverless that is totally possible and what KServe is for. By ...
00:49:11
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
(edited) ... And voilà, that's the jist of it => ... And voilà
15:00:36
5 Dec 2021
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
(edited) ... to 0. And voilà => ... to 0.
21:13:43
7 Dec 2021
@_slack_kubeflow_U02NWEG4PD1:matrix.org_slack_kubeflow_U02NWEG4PD1 joined the room.07:44:58
@_slack_kubeflow_U019640DQ06:matrix.orgDimitris Poulopoulos joined the room.10:07:09
@_slack_kubeflow_U019640DQ06:matrix.orgDimitris Poulopoulos changed their display name from _slack_kubeflow_U019640DQ06 to Dimitris Poulopoulos.10:24:41
@_slack_kubeflow_U019640DQ06:matrix.orgDimitris Poulopoulos set a profile picture.10:24:43
@_slack_kubeflow_U019640DQ06:matrix.orgDimitris Poulopoulos Hello to the community. We want to serve a TensorFlow Recommenders model, which contains a ScaNN layer (https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/factorized_top_k/ScaNN). Is there out-of-the-box support on TFServing, Triton, or Seldon MLServer backend for this? 10:24:43
@_slack_kubeflow_UM56LA7N3:matrix.orgBenjamin Tan
In reply to@_slack_kubeflow_U019640DQ06:matrix.org
Hello to the community. We want to serve a TensorFlow Recommenders model, which contains a ScaNN layer (https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/factorized_top_k/ScaNN). Is there out-of-the-box support on TFServing, Triton, or Seldon MLServer backend for this?
If you can convert it to a SavedModel, then TFServing can serve it
10:33:02
8 Dec 2021
@_slack_kubeflow_U027DFX2T46:matrix.orgSidhartha Panigrahi (edited) ... on Kubeflow, Su => ... on Kubeflow, 05:02:58
@_slack_kubeflow_U02B94TFPCJ:matrix.orgYoshihiro NISHIWAKI joined the room.07:25:13
@_slack_kubeflow_U02B94TFPCJ:matrix.orgYoshihiro NISHIWAKI changed their display name from _slack_kubeflow_U02B94TFPCJ to Yoshihiro NISHIWAKI.07:36:03
@_slack_kubeflow_U02B94TFPCJ:matrix.orgYoshihiro NISHIWAKI set a profile picture.07:36:05
@_slack_kubeflow_U02B94TFPCJ:matrix.orgYoshihiro NISHIWAKI Hi. I’m trying to build a custom model image with docker build and I’m getting the following error.
docker build -t username/custom -f python/custom_model.Dockerfile python
[+] Building 7.4s (8/10)                                                                                        
 => [internal] load build definition from custom_model.Dockerfile                                                                0.0s
 => => transferring dockerfile: 50B                                                                               0.0s
 => [internal] load .dockerignore                                                                                0.0s
 => => transferring context: 2B                                                                                 0.0s
 => [internal] load metadata for docker.io/library/python:3.7-slim                                                                0.8s
 => [internal] load build context                                                                                0.0s
 => => transferring context: 12.03kB                                                                               0.0s
 => [1/6] FROM docker.io/library/python:3.7-slim@sha256:9e51c1a3fea7e0a2b93df2538c02f1afe31d2c69b10d6dcbd372c10c72b325aa                                     0.0s
 => CACHED [2/6] COPY custom_model custom_model                                                                         0.0s
 => CACHED [3/6] COPY kserve kserve                                                                               0.0s
 => ERROR [4/6] RUN pip install --upgrade pip && pip install -e ./kserve                                                             6.5s
------                                                                                                 
 > [4/6] RUN pip install --upgrade pip && pip install -e ./kserve:                                                                   
#8 0.864 Requirement already satisfied: pip in /usr/local/lib/python3.7/site-packages (21.2.4)                                                     
#8 0.958 Collecting pip                                                                                        
#8 1.012  Downloading pip-21.3.1-py3-none-any.whl (1.7 MB)                                                                      
#8 1.104 Installing collected packages: pip                                                                              
#8 1.104  Attempting uninstall: pip
#8 1.104   Found existing installation: pip 21.2.4
#8 1.170   Uninstalling pip-21.2.4:
#8 1.269    Successfully uninstalled pip-21.2.4
#8 1.727 Successfully installed pip-21.3.1
#8 1.727 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#8 1.962 Obtaining file:///kserve
#8 1.962  Preparing metadata (setup.py): started
#8 2.146  Preparing metadata (setup.py): finished with status 'done'
#8 2.230 Collecting certifi>=14.05.14
#8 2.270  Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
#8 2.310 Collecting six>=1.15
#8 2.318  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
#8 2.341 Collecting python_dateutil>=2.5.3
#8 2.353  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
#8 2.364 Requirement already satisfied: setuptools>=21.0.0 in /usr/local/lib/python3.7/site-packages (from kserve==0.7.0) (57.5.0)
#8 2.396 Collecting urllib3>=1.15.1
#8 2.405  Downloading urllib3-1.26.7-py2.py3-none-any.whl (138 kB)
#8 2.439 Collecting kubernetes>=12.0.0
#8 2.450  Downloading kubernetes-20.13.0-py2.py3-none-any.whl (1.8 MB)
#8 2.567 Collecting tornado>=6.0.0
#8 2.578  Downloading tornado-6.1-cp37-cp37m-manylinux2014_aarch64.whl (428 kB)
#8 2.607 Collecting argparse>=1.4.0
#8 2.615  Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
#8 2.653 Collecting minio 7.0.0, =4.0.9
#8 2.663  Downloading minio-6.0.2-py2.py3-none-any.whl (73 kB)
#8 2.706 Collecting google-cloud-storage==1.41.1
#8 2.719  Downloading google_cloud_storage-1.41.1-py2.py3-none-any.whl (105 kB)
#8 2.738 Collecting adal>=1.2.2
#8 2.747  Downloading adal-1.2.7-py2.py3-none-any.whl (55 kB)
#8 2.763 Collecting table_logger>=0.3.5
#8 2.771  Downloading table_logger-0.3.6-py3-none-any.whl (14 kB)
#8 3.000 Collecting numpy>=1.17.3
#8 3.010  Downloading numpy-1.21.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (13.0 MB)
#8 3.475 Collecting azure-storage-blob==12.8.1
#8 3.485  Downloading azure_storage_blob-12.8.1-py2.py3-none-any.whl (345 kB)
#8 3.515 Collecting azure-identity>=1.6.0
#8 3.527  Downloading azure_identity-1.7.1-py2.py3-none-any.whl (129 kB)
#8 3.545 Collecting cloudevents>=1.2.0
#8 3.553  Downloading cloudevents-1.2.0-py3-none-any.whl (26 kB)
#8 3.571 Collecting avro>=1.10.1
#8 3.603  Downloading avro-1.11.0.tar.gz (83 kB)
#8 3.676  Installing build dependencies: started
#8 4.841  Installing build dependencies: finished with status 'done'
#8 4.846  Getting requirements to build wheel: started
#8 4.948  Getting requirements to build wheel: finished with status 'done'
#8 4.950  Preparing metadata (pyproject.toml): started
#8 5.053  Preparing metadata (pyproject.toml): finished with status 'done'
#8 5.303 Collecting boto3==1.18.18
#8 5.315  Downloading boto3-1.18.18-py3-none-any.whl (131 kB)
#8 5.650 Collecting botocore==1.21.18
#8 5.665  Downloading botocore-1.21.18-py3-none-any.whl (7.8 MB)
#8 6.061 Collecting psutil>=5.0
#8 6.071  Downloading psutil-5.8.0.tar.gz (470 kB)
#8 6.121  Preparing metadata (setup.py): started
#8 6.255  Preparing metadata (setup.py): finished with status 'done'
#8 6.317 ERROR: Could not find a version that satisfies the requirement ray[serve]==1.5.0 (from kserve) (from versions: none)
#8 6.317 ERROR: No matching distribution found for ray[serve]==1.5.0
------
executor failed running [/bin/sh -c pip install --upgrade pip && pip install -e ./kserve]: exit code: 1
07:36:05
@_slack_kubeflow_U02B94TFPCJ:matrix.orgYoshihiro NISHIWAKI
In reply to@_slack_kubeflow_U02B94TFPCJ:matrix.org
Hi. I’m trying to build a custom model image with docker build and I’m getting the following error.
docker build -t username/custom -f python/custom_model.Dockerfile python
[+] Building 7.4s (8/10)                                                                                        
 => [internal] load build definition from custom_model.Dockerfile                                                                0.0s
 => => transferring dockerfile: 50B                                                                               0.0s
 => [internal] load .dockerignore                                                                                0.0s
 => => transferring context: 2B                                                                                 0.0s
 => [internal] load metadata for docker.io/library/python:3.7-slim                                                                0.8s
 => [internal] load build context                                                                                0.0s
 => => transferring context: 12.03kB                                                                               0.0s
 => [1/6] FROM docker.io/library/python:3.7-slim@sha256:9e51c1a3fea7e0a2b93df2538c02f1afe31d2c69b10d6dcbd372c10c72b325aa                                     0.0s
 => CACHED [2/6] COPY custom_model custom_model                                                                         0.0s
 => CACHED [3/6] COPY kserve kserve                                                                               0.0s
 => ERROR [4/6] RUN pip install --upgrade pip && pip install -e ./kserve                                                             6.5s
------                                                                                                 
 > [4/6] RUN pip install --upgrade pip && pip install -e ./kserve:                                                                   
#8 0.864 Requirement already satisfied: pip in /usr/local/lib/python3.7/site-packages (21.2.4)                                                     
#8 0.958 Collecting pip                                                                                        
#8 1.012  Downloading pip-21.3.1-py3-none-any.whl (1.7 MB)                                                                      
#8 1.104 Installing collected packages: pip                                                                              
#8 1.104  Attempting uninstall: pip
#8 1.104   Found existing installation: pip 21.2.4
#8 1.170   Uninstalling pip-21.2.4:
#8 1.269    Successfully uninstalled pip-21.2.4
#8 1.727 Successfully installed pip-21.3.1
#8 1.727 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#8 1.962 Obtaining file:///kserve
#8 1.962  Preparing metadata (setup.py): started
#8 2.146  Preparing metadata (setup.py): finished with status 'done'
#8 2.230 Collecting certifi>=14.05.14
#8 2.270  Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
#8 2.310 Collecting six>=1.15
#8 2.318  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
#8 2.341 Collecting python_dateutil>=2.5.3
#8 2.353  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
#8 2.364 Requirement already satisfied: setuptools>=21.0.0 in /usr/local/lib/python3.7/site-packages (from kserve==0.7.0) (57.5.0)
#8 2.396 Collecting urllib3>=1.15.1
#8 2.405  Downloading urllib3-1.26.7-py2.py3-none-any.whl (138 kB)
#8 2.439 Collecting kubernetes>=12.0.0
#8 2.450  Downloading kubernetes-20.13.0-py2.py3-none-any.whl (1.8 MB)
#8 2.567 Collecting tornado>=6.0.0
#8 2.578  Downloading tornado-6.1-cp37-cp37m-manylinux2014_aarch64.whl (428 kB)
#8 2.607 Collecting argparse>=1.4.0
#8 2.615  Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
#8 2.653 Collecting minio 7.0.0, =4.0.9
#8 2.663  Downloading minio-6.0.2-py2.py3-none-any.whl (73 kB)
#8 2.706 Collecting google-cloud-storage==1.41.1
#8 2.719  Downloading google_cloud_storage-1.41.1-py2.py3-none-any.whl (105 kB)
#8 2.738 Collecting adal>=1.2.2
#8 2.747  Downloading adal-1.2.7-py2.py3-none-any.whl (55 kB)
#8 2.763 Collecting table_logger>=0.3.5
#8 2.771  Downloading table_logger-0.3.6-py3-none-any.whl (14 kB)
#8 3.000 Collecting numpy>=1.17.3
#8 3.010  Downloading numpy-1.21.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (13.0 MB)
#8 3.475 Collecting azure-storage-blob==12.8.1
#8 3.485  Downloading azure_storage_blob-12.8.1-py2.py3-none-any.whl (345 kB)
#8 3.515 Collecting azure-identity>=1.6.0
#8 3.527  Downloading azure_identity-1.7.1-py2.py3-none-any.whl (129 kB)
#8 3.545 Collecting cloudevents>=1.2.0
#8 3.553  Downloading cloudevents-1.2.0-py3-none-any.whl (26 kB)
#8 3.571 Collecting avro>=1.10.1
#8 3.603  Downloading avro-1.11.0.tar.gz (83 kB)
#8 3.676  Installing build dependencies: started
#8 4.841  Installing build dependencies: finished with status 'done'
#8 4.846  Getting requirements to build wheel: started
#8 4.948  Getting requirements to build wheel: finished with status 'done'
#8 4.950  Preparing metadata (pyproject.toml): started
#8 5.053  Preparing metadata (pyproject.toml): finished with status 'done'
#8 5.303 Collecting boto3==1.18.18
#8 5.315  Downloading boto3-1.18.18-py3-none-any.whl (131 kB)
#8 5.650 Collecting botocore==1.21.18
#8 5.665  Downloading botocore-1.21.18-py3-none-any.whl (7.8 MB)
#8 6.061 Collecting psutil>=5.0
#8 6.071  Downloading psutil-5.8.0.tar.gz (470 kB)
#8 6.121  Preparing metadata (setup.py): started
#8 6.255  Preparing metadata (setup.py): finished with status 'done'
#8 6.317 ERROR: Could not find a version that satisfies the requirement ray[serve]==1.5.0 (from kserve) (from versions: none)
#8 6.317 ERROR: No matching distribution found for ray[serve]==1.5.0
------
executor failed running [/bin/sh -c pip install --upgrade pip && pip install -e ./kserve]: exit code: 1
Buildpacks also doesnt work.
pack build --builder=heroku/buildpacks:20 username/custom-model:v1
20: Pulling from heroku/buildpacks
Digest: sha256:09935c3a5011d5c5720b7c6cb56ca2cd8d4a042808edfc15f4c3adc91469894b
Status: Image is up to date for heroku/buildpacks:20
20: Pulling from heroku/pack
Digest: sha256:b5a4da988ac2918ba50d9ab8e6ab685acf18a34bfef63714d52c6e6266237e66
Status: Image is up to date for heroku/pack:20
===> DETECTING
heroku/go    0.3.1
heroku/procfile 0.6.2
===> ANALYZING
Previous image with name "nishiwakidf/custom-model:v1" not found
===> RESTORING
===> BUILDING
-----> Fetching jq... done
-----> Fetching stdlib.sh.v8... done
-----> 
    Detected go modules via go.mod
-----> 
    Detected Module Name: github.com/kserve/kserve
-----> 
 !!  The go.mod file for this project does not specify a Go version
 !!   
 !!  Defaulting to go1.12.17
 !!   
 !!  For more details see: https://devcenter.heroku.com/articles/go-apps-with-modules#build-configuration
 !!   
-----> New Go Version, clearing old cache
-----> Installing go1.12.17
-----> Fetching go1.12.17.linux-amd64.tar.gz... done
-----> Determining packages to install
07:40:09
@_slack_kubeflow_U02B94TFPCJ:matrix.orgYoshihiro NISHIWAKI
In reply toundefined
(edited) ... heroku/go    0.3.1 heroku/procfile 0.6.2 ===> ANALYZING Previous image with name "nishiwakidf/custom-model:v1" not found ===> RESTORING ===> BUILDING -----> Fetching jq... done -----> Fetching stdlib.sh.v8... done ----->      Detected go modules via go.mod ----->      Detected Module Name: <http://github.com/kserve/kserve|github.com/kserve/kserve> ----->   !!  The go.mod file for this project does not specify a Go version  !!     !!  Defaulting to go1.12.17  !!     !!  For more details see: <https://devcenter.heroku.com/articles/go-apps-with-modules#build-configuration>  !!    -----> New Go Version, clearing old cache -----> Installing go1.12.17 -----> Fetching go1.12.17.linux-amd64.tar.gz... done -----> Determining packages to install``` => ... heroku/go 0.3.1 heroku/procfile 0.6.2 ===> ANALYZING Previous image with name "nishiwakidf/custom-model:v1" not found ===> RESTORING ===> BUILDING -----> Fetching jq... done -----> Fetching stdlib.sh.v8... done -----> Detected go modules via go.mod -----> Detected Module Name: <http://github.com/kserve/kserve|github.com/kserve/kserve> -----> !! The go.mod file for this project does not specify a Go version !! !! Defaulting to go1.12.17 !! !! For more details see: <https://devcenter.heroku.com/articles/go-apps-with-modules#build-configuration> !! -----> New Go Version, clearing old cache -----> Installing go1.12.17 -----> Fetching go1.12.17.linux-amd64.tar.gz... done -----> Determining packages to install ERROR: failed to build: exit status 1 ERROR: failed to build: executing lifecycle: failed with status code: 51```
07:55:12
@_slack_kubeflow_U026RKS3A87:matrix.orgBhagat Khemchandani
In reply to@_slack_kubeflow_U02B94TFPCJ:matrix.org
Buildpacks also doesnt work.
pack build --builder=heroku/buildpacks:20 username/custom-model:v1
20: Pulling from heroku/buildpacks
Digest: sha256:09935c3a5011d5c5720b7c6cb56ca2cd8d4a042808edfc15f4c3adc91469894b
Status: Image is up to date for heroku/buildpacks:20
20: Pulling from heroku/pack
Digest: sha256:b5a4da988ac2918ba50d9ab8e6ab685acf18a34bfef63714d52c6e6266237e66
Status: Image is up to date for heroku/pack:20
===> DETECTING
heroku/go       0.3.1
heroku/procfile 0.6.2
===> ANALYZING
Previous image with name "nishiwakidf/custom-model:v1" not found
===> RESTORING
===> BUILDING
-----> Fetching jq... done
-----> Fetching stdlib.sh.v8... done
-----> 
       Detected go modules via go.mod
-----> 
       Detected Module Name: github.com/kserve/kserve
-----> 
 !!    The go.mod file for this project does not specify a Go version
 !!    
 !!    Defaulting to go1.12.17
 !!    
 !!    For more details see: https://devcenter.heroku.com/articles/go-apps-with-modules#build-configuration
 !!    
-----> New Go Version, clearing old cache
-----> Installing go1.12.17
-----> Fetching go1.12.17.linux-amd64.tar.gz... done
-----> Determining packages to install
ERROR: failed to build: exit status 1
ERROR: failed to build: executing lifecycle: failed with status code: 51
mind sharing Dockerfile here , or on personal chat ?
08:35:03
@_slack_kubeflow_U026RKS3A87:matrix.orgBhagat Khemchandani
In reply to@_slack_kubeflow_U026RKS3A87:matrix.org
mind sharing Dockerfile here , or on personal chat ?
to understand this better, please send across the git repo/url of the sample being referred by you
09:01:27
@_slack_kubeflow_U02PXTTNGKC:matrix.org_slack_kubeflow_U02PXTTNGKC joined the room.09:01:39
@_slack_kubeflow_U02B94TFPCJ:matrix.orgYoshihiro NISHIWAKI
In reply to@_slack_kubeflow_U026RKS3A87:matrix.org
to understand this better, please send across the git repo/url of the sample being referred by you
here is the url. https://github.com/kserve/kserve
09:03:49
@_slack_kubeflow_U01T25HRREK:matrix.orgMark Winter
In reply to@_slack_kubeflow_U02B94TFPCJ:matrix.org
here is the url. https://github.com/kserve/kserve
ray==1.5.0 doesn't have a release for M1 unfortunately
12:24:47
@_slack_kubeflow_U01T25HRREK:matrix.orgMark Winter
In reply to@_slack_kubeflow_U01T25HRREK:matrix.org
ray==1.5.0 doesn't have a release for M1 unfortunately
They started doing arm64 builds from ray 1.8 or something like that
12:25:01
@_slack_kubeflow_U01T25HRREK:matrix.orgMark Winter
In reply to@_slack_kubeflow_U01T25HRREK:matrix.org
They started doing arm64 builds from ray 1.8 or something like that
Maybe we can get ray updated in kserve
12:25:27

Show newer messages


Back to Room ListRoom Version: 6