Sender | Message | Time |
---|---|---|
30 Jan 2023 | ||
10:00:15 | ||
10:06:42 | ||
10:06:57 | ||
10:30:04 | ||
10:38:41 | ||
Hi, I'm doing some load tests using https://gitlab.futo.org/load-testing/matrix-locust on monolith and noticed some sttrange behaviour with 0.11.0 and postgres 15: The test logs in and creates a room with ~30 invitees, all of them on the same domain (no federation). I see around 10k(!) SQL queries and it takes around 6 seconds for a single room. I noticed a lot of transaction rollbacks (71). When looking into the queries, I see some duplicated queries (probably due to rollbacks), but e.g. the following query is done about 5.000 times: I think I made a setup mistake, but did anybody see this before? Dendrite's log does not contain anything (using level 'warn'). I'll check if older versions show the same behaviour, but any help would be appreciated. (Register and login on the same server is working flawlessly) | 14:47:17 | |
In reply to @matthias:asra.grSeems like every invitee is taking ~200ms, so 30x0.2=6 seconds | 14:55:07 | |
Is that for initial syncs or incremental ones? | 14:55:38 | |
You mean /sync? No /sync is happening, only /login and then a single /createRoom with 30 invitees | 14:58:12 | |
Hu, fun. | 14:59:36 | |
15:01:34 | ||
:-) I think I also saw relatively slow behavior on <0.11.0 during createRoom, but I had problems with locust, too. It was only now that I discovered the amount of queries. I'll check older versions if they show the same amount of queries | 15:02:37 | |
There are currently https://github.com/matrix-org/dendrite/issues/2911 and https://github.com/matrix-org/dendrite/issues/2777, #2777 mentions the same query | 15:04:46 | |
https://github.com/matrix-org/dendrite/issues/2777 : /sync performance slow since v0.10.0 | 15:04:47 | |
Thanks for the issues! So I'll check 0.9.9 first. | 15:14:55 | |
Interesting: with 0.9.9 it is slightly better 5sec vs 6sec and there is only a single ROLLBACK left, however the amount of queries is the same (~10k). | 15:53:56 | |
Yeah, I’ve been debugging those SQL related weirdnesses too, but haven’t had free time to continue recently. Another big problem with sync seems to be huge parameter sets given to some queries. Like multi megabyte list to ANY( … ) for example. | 15:59:24 | |
On 0.3.0 at least there is only 1/3 of the number of queries, but it still takes >5sec for a single room. So I assume it's really the number of invitees that cause the delay. I've gone back until 0.1.0, which is again slightly faster than 0.2.0 (~4.5sec). I again check for the number of queries and compare them to a room creation without any invitees/only one invitee | 16:39:00 | |
In reply to @jassu:kapsi.fiwe should be chunking them if we aren't already. In practice, having such large param sets isn't really an issue insofar that the alternative isn't really much better and is significantly slower | 18:18:22 | |
ideally we wouldn't be needing to query such vast amounts of data in the first place | 18:18:40 | |
Huh? How so slower? That kind of stuff should be picked in by the SQL query, not fetching it from the DB by another query, parse it in the Dendrite and then using it as a multi megabyte parameter… -_- | 18:31:32 | |
I don’t have full picture of what exactly would be needed to serve the full whatever the client expects, but I got it down to a minute on my raspberryPi, when I actually combined two of the heaviest queries into one and returned the whole set in one go from the DB in a single optimized query. Compared to not finishing the other one during the night before I started fiddling with that thing. | 18:35:07 | |
It’s always considerably faster to tell psql exactly what is wanted, and then building the query to get that and exactly that out of there. Add indexes as required, of course. | 18:37:04 | |
So, from literally not finishig the query in 7+ hours down to ~minute. When going from Dendrite’s exact query on my live db to the one that actually got considerably more of the required set covered. 🤷🏼♀️ | 18:39:10 | |
Relational databade is not a KV store. | 18:40:19 | |
The problem with the SQL backend in Dendrite is more the opinionated style, i.e. queries for a table are generally grouped together in the same file, generally don't reach into other tables, don't join unless absolutely essential | 19:24:52 | |
Those barriers could be broken down and you can push more work on the SQL engine | 19:25:04 | |
... but maybe you make more headaches for the devs in doing so :D | 19:25:16 | |
There are a lot of unnecessary database round trips though, and /sync is just a monstrosity that vomits out huge quantities of data anyway | 19:25:44 | |
21:37:55 |