Sender | Message | Time |
---|---|---|
14 Jan 2025 | ||
ngn | it accepts three params: void* pointer to memory, length, and itemsize so it's universal - works for 64bit, 32bit, 8bit | 22:22:26 |
ngn | it's from my private library for PE | 22:24:42 |
dzaima | ↰ ngn it's somewhere here with some levels of macro abstractions (incl. calling into Singeli for the cumulative sum steps), not gonna be able to easily extract it out (written by Marshall, not me) | 22:24:58 |
ngn | here is mine:
i'll add comment later | 22:27:44 |
ngn |
| 22:54:47 |
silas | why'd you not do G*g=r ? | 23:27:41 |
ngn | i started with separate code for srtU() and srtW() , some things were different, there are some traces left.. | 23:31:49 |
ngn | i'll change it, thanks | 23:33:04 |
ngn | ah, i remembered! :) | 23:34:27 |
ngn | b depends on the sizeof the type of G (because i wanted to experiment with a 16bit g too)c 's size depends on b and i wanted to initialize c with ={}; instead of a memset, so it had to move g=r after c (otherwise gcc refuses to compile the ={}; but it still didnt work and i left it like this.. | 23:39:37 |
ngn | * b depends on the sizeof the type of g (because i wanted to experiment with a 16bit g too)c 's size depends on b and i wanted to initialize c with ={}; instead of a memset, so it had to move g=r after c (otherwise gcc refuses to compile the ={}; but it still didnt work and i left it like this.. | 23:39:52 |
silas | cool, feel I'll need to read more to get the logic but looks interesting | 23:51:22 |
ngn | the logic of radix sort or the logic of my implementation of it? | 23:56:21 |
ngn | radix sort is simple: sort by least significant bytes, then by the second-least significant, etc | 23:57:20 |
ngn | each pass is a sort by counting | 23:57:50 |
silas | your implementation - don't get the need for multi-dimension counter array, although is that for the number of rounds? | 23:58:40 |
ngn | it's because i //go through g only once and do the counting for all of the w passes | 23:59:10 |
ngn | in other words, c[i] are the counters for the i th pass | 23:59:38 |
silas | right, thought so - thanks | 23:59:56 |
15 Jan 2025 | ||
ngn | it's a slightly better memory access pattern - access g[0], g[1], g[2].. instead of g[0], g[8], g[16].. and then g[1], g[9], g[17], and then.. | 00:01:00 |
silas | indeed | 00:02:06 |
ngn | new improved version:
| 00:05:41 |
ngn | explicit inline to increase chances of getting proper speed in other environmentssupport for odd w surrendered to the thought that 8bit g works best (not 16bit) and hardcoded 256 (and 257) | 00:09:00 |
ngn | * explicit inline to increase chances of getting proper speed in other environmentssupport for odd w surrendered to the thought that 8bit g works best (not 16bit) and hardcoded 256 (and 257)rounded sumsU 's args, it may simplify its vectorization "one day" | 00:10:04 |
ngn | it looks like Marshall doesnt want compilers to steal his job :) - he has separate copies for all possible types (and even those are macros), manual loop unrolling.. | 00:29:42 |
ngn | btw, what's the difference between "radix sort i8" and "counting sort i8"? | 00:31:05 |
dzaima | there is a good amount of variation between the copies - reusing buffers, handling the last byte separately (cause signed ints; includes having said buffer be offset). Perhaps you could move all that in loops but I'm not sure if the result would be much clearer | 00:38:48 |
dzaima | also there's a radix sort impl in Singeli here that's less duplicatey (along with a bunch of other sort algorithms in the same project) | 00:43:35 |
ngn | too many languages | 00:58:53 |
ngn | dzaima: do you have a fast plain c version of +\ by any chance? | 01:01:01 |