29 Apr 2022 |
| Bogdan Kowalczyk joined the room. | 15:13:45 |
| Bogdan Kowalczyk changed their display name from _slack_kubeflow_U03E7U44FC0 to Bogdan Kowalczyk. | 15:14:55 |
| Bogdan Kowalczyk set a profile picture. | 15:14:57 |
Shri Javadekar | Fantastic... let me try this out | 15:40:42 |
30 Apr 2022 |
Dan Sun | Jan Migoń this seems like the related issue
https://github.com/kserve/kserve/issues/1342 | 00:26:51 |
| _slack_kubeflow_UL3871NM6 joined the room. | 10:27:25 |
Dan Sun | Let me know if that works, would be great to see your monitoring doc contribution!! | 14:27:39 |
Shri Javadekar | Here's what I have come up so far:
• I could see a bunch of metrics at ports 9091 and 9090 that are exported by the queue-proxy
• Particularly the ones in 9090 were a little more interesting to me such as requests_per_second, etc. The ones on 9091 are about go
I made the changes to Prometheus as suggested here and also imported the grafana dashbhoards.
However, I do not see all the metrics being scraped by Prometheus. Particularly, the KNative Serving - Revision HTTP Requests dashboard shows up with No Data. I see that there are no activator_request_count metrics in Prometheus. I don't even know if these are exported by any component.
Are the activator metrics available to be scraped by Prometheus? | 20:59:52 |
Dan Sun | Yes you will need to add the Prometheus annotation on the activator pod | 21:40:01 |
Dan Sun | Also autoscaler pod | 21:40:02 |
Dan Sun | Did you see the revision http request metrics ? | 21:41:04 |
Shri Javadekar | Oh... I see. I just added the following annotations to the activator, autoscaler (and also the controller) pod. I see they have metrics being exported on port 9090.
prometheus.io/scrape: 'true'
prometheus.io/port: '9090'
Will know the results shortly.. | 23:16:59 |
Shri Javadekar | Hmm.. does Prometheus config need to be configured for explicitly including specific namespaces? | 23:32:50 |
Shri Javadekar | Ok.. I think I see some metrics in Prometheus. This article helped alot.
• Basically, the config serviceMonitorNamespaceSelector: {} in the Prometheus CRD means all namespaces will be watched for service monitor objects.
• The serviceMonitorSelector: field in the Prometheus CRD indicates the labels that should be put on ServiceMonitor objects. I had this set to release: kube-prometheus-stack-1651295153 because I used --generate-name when install the helm chart.
• The ServiceMonitors created in the Knative-serving namespace didn't have this label.
• I added this label to the service monitor objects.
• Now, the service monitor objects need to select which services it should select.
• I saw that all services had the label serving.knative.dev/release=v0.22.1 . I added this in the ServiceMonitor and I'm seeing this in Prometheus. | 23:56:05 |
Shri Javadekar | Let me look at the Grafana dashboards | 23:56:30 |
1 May 2022 |
Shri Javadekar | Seems to be working 😄 | 00:02:09 |
Shri Javadekar | I think I have everything I need at this point. I will these details to the https://github.com/knative/docs repo and send out a PR by Monday. | 00:04:00 |
Shri Javadekar | Thanks a lot Dan Sun! | 00:04:05 |
Shri Javadekar | I wanted to explore how I could get prediction metrics itself (e.g. confidence score of the predictions) into Prometheus. But, I will explore that later. | 00:05:29 |
| _slack_kubeflow_U03DQAW3Z36 joined the room. | 03:28:20 |
Benjamin Tan | In think those are custom so u would have to push those metrics yourself | 05:47:32 |
| @wybpip:matrix.org joined the room. | 16:03:35 |
| @wybpip:matrix.org left the room. | 16:03:36 |
2 May 2022 |
| Ajay kumar saini joined the room. | 12:02:32 |
Jan Migoń | Thanks it helped me. Problem solved. I deleted the mutating and validation webhooks for v1alpha2 as I dont need them and they had same names as the ones for v1beta1 which was causing the error. | 12:27:52 |
_slack_kubeflow_U9UFLSBM4 | I have some fairly vague questions. Hopefully, someone can help. Are there any docs around that deal with various aspects of monitoring models served with modelmesh-serving? Specifically, data drift, model drift, outlier detection.
Also, is there a way to get explanations for predictions or to figure out feature attribution? | 18:56:03 |
Rachit Chauhan | Is there a version compatibility matrix for what versions of kubeflow work with what versions of kserve ? | 19:20:09 |
| _slack_kubeflow_U03E5L0V0SV joined the room. | 19:29:19 |
Nick | Hi croberts we don't have this kind of thing in modelmesh yet, but have discussed it in the past. There's some in-progress work (mostly complete I think) to support kserve transformers with modelmesh predictors, I expect something similar could be done with explainers. | 20:54:45 |
_slack_kubeflow_U9UFLSBM4 | Thanks Nick. Is there support for batch inference in mm-serving? The other thing I couldn't find anything on was the possibility of canary rollout/a-b testing. Anything on those? | 20:56:15 |