!LuUSGaeArTeoOgUpwk:matrix.org

kubeflow-kfserving

132 Members
1 Servers

Load older messages


SenderMessageTime
26 Nov 2021
@_slack_kubeflow_U02NN0J9K5G:matrix.orgAmit Singh
In reply toundefined
theofpa it's a Tensorflow model and I have it saved with a inference function. Is there any resource to checkout for deploying with serverless GPU inference? More like FaaS
08:22:19
@_slack_kubeflow_U02NN0J9K5G:matrix.orgAmit Singh
In reply toundefined
(edited) It's Tensorflow and I have it saved with a inference function. Is there any resource to checkout for deploying with serverless GPU inference? => theofpa it's a Tensorflow model and I have it saved with a inference function. Is there any resource to checkout for deploying with serverless GPU inference? More like FaaS
13:50:29
@_slack_kubeflow_U011C1WKND8:matrix.org_slack_kubeflow_U011C1WKND8 joined the room.18:49:39
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
Hello Amit Singh, in addition to what theofpa said. The flow usually goes as follow : 1. Pick a model server (eg: nvidia triton, TorchServe, Tensorflow etc) a. A model server is the part that creates an endpoint for your model, some model server work with multiple frameworks (see nvidia triton for instance), while some work only for tensorflow or pytorch etc. b. You can use a pre-built model server (available out of the box in kserve/kubeflow) or you can build a custom one, check this page for a complete list of the built-in model servers https://github.com/kserve/kserve/blob/master/docs/samples/README.md 2. On the storage solution, create a folder structure that respects what the chosen model server expects a. For instance, nvidia triton model server expects a root folder and sub folders with the model names etc. The expected structure is documented. 3. Upload model into storage solution of choice (eg: S3, Google cloud storage, etc) a. Some model server require config files to be uploaded as well, this is model-server dependant, everything is documented tho. 4. Create an inference service yaml definition a. In the yaml, you will specify which model server you use eg: triton for nvidia triton server, etc. b. You will specify the storage URI from step 3 (some model server expect the uri to be a root folder and not the path to the model file, this depends on the model server and everything is documented). c. For scale to zero you must specify minReplicas: 0 and that's it. (see https://kserve.github.io/website/modelserving/autoscaling/autoscaling/#autoscaling-on-gpu). This will scale to 0 pods when there are no requests. The scale to 0 of your GPU node is handled by your cluster autoscaler. d. Optionally you can also specify via node selector which node you want this inference service to run on. This is useful when you have dedicated inference gpu nodes for instance that you only want to use for inference. 5. Deploy the inference server a. kubectl apply -f my-inference-service.yaml or via Kubeflow Model UI. That's the gist of it, when you create a new model version you would upload it to your storage solution, update the uri in the inference service then re-deploy. You can also deploy it using kubeflow component instead of manually creating a yaml and manually applying it. Hopefully this helps a little
19:07:00
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre Brown
In reply toundefined
(edited) ... autoscaler. 5. ... => ... autoscaler. d. Optionally you can also specify via node selector which node you want this inference service to run on. This is useful when you have dedicated inference gpu nodes for instance that you only want to use for inference. 5. ...
19:09:28
29 Nov 2021
@_slack_kubeflow_U027DFX2T46:matrix.orgSidhartha Panigrahi joined the room.09:50:36
@_slack_kubeflow_U027DFX2T46:matrix.orgSidhartha Panigrahi Is there any way to find how much TPS can single inference service can serve? 09:50:52
@_slack_kubeflow_U015G7WJBUJ:matrix.orgFerdinand von den Eichen [Kineo.ai]
In reply to@_slack_kubeflow_U015G7WJBUJ:matrix.org
Amazing, thank you! 🙂
Do you by chance have a sample of where this is demonstrated?
10:36:33
@_slack_kubeflow_U02KT3C0T7W:matrix.org_slack_kubeflow_U02KT3C0T7W joined the room.13:10:22
@_slack_kubeflow_U015G7WJBUJ:matrix.orgFerdinand von den Eichen [Kineo.ai] How to deal with lack of packages in the prebuilt KFServing images? We wanted to try the sklearn one and got: ModuleNotFoundError: No module named 'pandas' Details:
[I 211129 13:24:35 storage:35] Copying contents of /mnt/models to local
/usr/local/lib/python3.7/site-packages/sklearn/base.py:253: UserWarning: Trying to unpickle estimator OneHotEncoder from version 0.23.2 when using version 0.20.3. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/sklearnserver/sklearnserver/__main__.py", line 33, in  module 
model.load()
File "/sklearnserver/sklearnserver/model.py", line 37, in load
self._model = joblib.load(model_file) #pylint:disable=attribute-defined-outside-init
File "/usr/local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/usr/local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
obj = unpickler.load()
File "/usr/local/lib/python3.7/pickle.py", line 1088, in load
dispatch[key[0]](self)
File "/usr/local/lib/python3.7/pickle.py", line 1376, in load_global
klass = self.find_class(module, name)
File "/usr/local/lib/python3.7/pickle.py", line 1426, in find_class
__import__(module, level=0)
ModuleNotFoundError: No module named 'pandas' 
13:29:33
@_slack_kubeflow_U02AYBVSLSK:matrix.orgAlexandre BrownRedacted or Malformed Event14:32:25
@_slack_kubeflow_U02LE3KB53M:matrix.orgVivian Pan changed their display name from _slack_kubeflow_U02LE3KB53M to Vivian Pan.21:20:44
@_slack_kubeflow_U02LE3KB53M:matrix.orgVivian Pan set a profile picture.21:20:48
@_slack_kubeflow_U02659LUHBM:matrix.orgVarun Sharma
In reply to@_slack_kubeflow_US7RRCDL2:matrix.org
WE use opentelemetry clients in our custom predictors to connect to jaeger and to add trace_ids to our logs.
Timothy Laurent could you elaborate on how you do this? I'm currently using the now-deprecating Jaeger client libraries with my predictor but as far as I know the opentelemetry clients work in a similar way. If I run it without KServe, then I can run my predictor container with all the exposed ports that Jaeger wants, run the jaeger-all-in-one container and then it can pick up the pushed data from the predictor. The same technique doesn't work with KServe because I can't expose more ports
23:11:19
30 Nov 2021
@_slack_kubeflow_UM56LA7N3:matrix.orgBenjamin Tan
In reply to@_slack_kubeflow_U02AYBVSLSK:matrix.org
You can always build your own images using the KFserving one as the base
01:13:15
@_slack_kubeflow_UM56LA7N3:matrix.orgBenjamin Tan
In reply to@_slack_kubeflow_UM56LA7N3:matrix.org
You can always build your own images using the KFserving one as the base
Then you'll have to change the configmap to point to the new image:
predictors: ,-
    {
        "tensorflow": {
            "image": "tensorflow/serving",
            "defaultImageVersion": "1.14.0",
            "defaultGpuImageVersion": "1.14.0-gpu",
            "defaultTimeout": "60",
            "supportedFrameworks": [
              "tensorflow"
            ],
            "multiModelServer": false
        },
        "onnx": {
            "image": "mcr.microsoft.com/onnxruntime/server",
            "defaultImageVersion": "v1.0.0",
            "supportedFrameworks": [
              "onnx"
            ],
            "multiModelServer": false
        },
        "sklearn": {
          "v1": {
            "image": "gcr.io/kfserving/sklearnserver",
            "defaultImageVersion": "v0.5.1",
            "supportedFrameworks": [
              "sklearn"
            ],
            "multiModelServer": false
          },
          "v2": {
01:14:58
@_slack_kubeflow_UM56LA7N3:matrix.orgBenjamin Tan
In reply to@_slack_kubeflow_UM56LA7N3:matrix.org
Then you'll have to change the configmap to point to the new image:
predictors: ,-
    {
        "tensorflow": {
            "image": "tensorflow/serving",
            "defaultImageVersion": "1.14.0",
            "defaultGpuImageVersion": "1.14.0-gpu",
            "defaultTimeout": "60",
            "supportedFrameworks": [
              "tensorflow"
            ],
            "multiModelServer": false
        },
        "onnx": {
            "image": "mcr.microsoft.com/onnxruntime/server",
            "defaultImageVersion": "v1.0.0",
            "supportedFrameworks": [
              "onnx"
            ],
            "multiModelServer": false
        },
        "sklearn": {
          "v1": {
            "image": "gcr.io/kfserving/sklearnserver",
            "defaultImageVersion": "v0.5.1",
            "supportedFrameworks": [
              "sklearn"
            ],
            "multiModelServer": false
          },
          "v2": {
kubectl edit cm -n kubeflow inferenceservice-config
01:15:16
@_slack_kubeflow_UM56LA7N3:matrix.orgBenjamin Tan
In reply toundefined
I assume you're using the sklearn server
01:17:37
@_slack_kubeflow_UM56LA7N3:matrix.orgBenjamin Tan
In reply toundefined
(edited) ```<http://gcr.io/kfserving/sklearnserver|gcr.io/kfserving/sklearnserver>``` => I assume you're using the sklearn server
01:17:56
@_slack_kubeflow_UM56LA7N3:matrix.orgBenjamin Tan
In reply to@_slack_kubeflow_UM56LA7N3:matrix.org
gcr.io/kfserving/sklearnserver
https://github.com/kserve/kserve/tree/master/python/sklearnserver#building-your-own-scikit-learn-server-docker-image
01:17:57
@_slack_kubeflow_UM56LA7N3:matrix.orgBenjamin Tan
In reply to@_slack_kubeflow_U027DFX2T46:matrix.org
Is there any way to find how much TPS can single inference service can serve?
Load test it?
01:18:38
@_slack_kubeflow_U026RKS3A87:matrix.orgBhagat Khemchandani joined the room.06:23:35
@_slack_kubeflow_U026RKS3A87:matrix.orgBhagat Khemchandani
In reply toundefined
Ferdinand von den Eichen [Kineo.ai] - as Benjamin said - bake your own images Then use custom predictor in inferenceservice to use your image. Couple of examples - https://github.com/kserve/kserve/tree/37af39054499caf9145664a48981740ca4ce14f5/docs/samples/v1alpha2/custom/kfserving-custom-model https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/custom/custom_model
06:23:36
@_slack_kubeflow_U026RKS3A87:matrix.orgBhagat Khemchandani
In reply toundefined
(edited) ... image. <https://github.com/kserve/kserve/tree/37af39054499caf9145664a48981740ca4ce14f5/docs/samples/v1alpha2/custom/kfserving-custom-model> ... => ... image. Couple of examples - <https://github.com/kserve/kserve/tree/37af39054499caf9145664a48981740ca4ce14f5/docs/samples/v1alpha2/custom/kfserving-custom-model> ...
06:23:52
@_slack_kubeflow_U02EYSQRNTF:matrix.orgAlexander Abramov joined the room.18:27:51
@_slack_kubeflow_U02EYSQRNTF:matrix.orgAlexander Abramov changed their display name from _slack_kubeflow_U02EYSQRNTF to Alexander Abramov.18:28:56
@_slack_kubeflow_U02EYSQRNTF:matrix.orgAlexander Abramov set a profile picture.18:28:59
@_slack_kubeflow_U02EYSQRNTF:matrix.orgAlexander Abramov Hi all, quick question regarding multiple versions of a framework. I see that for some frameworks, i.e. pytorch there are images for "v1" and "v2". Is there a way to deploy my model to a specific version? What fields in a model deployment yaml control which version is selected? 18:28:59
@_slack_kubeflow_U02EYSQRNTF:matrix.orgAlexander Abramov
In reply to@_slack_kubeflow_U02EYSQRNTF:matrix.org
Hi all, quick question regarding multiple versions of a framework. I see that for some frameworks, i.e. pytorch there are images for "v1" and "v2". Is there a way to deploy my model to a specific version? What fields in a model deployment yaml control which version is selected?
hmm I see references to protocolVersion in the documentation. Seems like this is it?
18:29:59
@_slack_kubeflow_U02EYSQRNTF:matrix.orgAlexander Abramov
In reply to@_slack_kubeflow_U02EYSQRNTF:matrix.org
hmm I see references to protocolVersion in the documentation. Seems like this is it?
Looks like it is. Seems like I just need to RTFM 😅
18:31:30

There are no newer messages yet.


Back to Room List