14 Jan 2025 |
dzaima | the sort-as-numbers version is still 140x faster than the hashmap thing | 21:28:44 |
dzaima | * the sort-as-numbers version is still ~100x faster than the hashmap thing for both length 1000 and 1e6 | 21:29:36 |
dzaima | (_time_ being from https://gist.github.com/dzaima/ec7bf874535a7186a9321e172536c384) | 21:31:22 |
panadestein | And why is that? I am surprised that sorting for a million elements is faster | 21:32:50 |
dzaima | integer sorting here goes to counting sort, so basically the /⁼ version with extra steps; character sorting is lame and falls back to the generic timsort | 21:32:53 |
dzaima | moderately surprised that for 1e6 the hashmap scalar thing beats timsort given how slow CBQNs scalar code execution is. I guess the logarithm does add up over time, and timsort is already non-trivial scalar code (albeit executed at C speeds, not BQN speeds) | 21:35:51 |
dzaima | of course, character sorting should be using the counting sort (/ whatever other fancy sort algorithms there are) but doesn't currently | 21:36:55 |
| twobular | 21:38:57 |
Kamila | both are linear time and they actually degenerate into the same code | 21:49:31 |
Kamila | first of all, in approach 1 you don't need a proper hashmap, you can just use an array that maps ASCII -> count | 21:49:49 |
rubenverg | sort is linear time? | 21:49:57 |
Kamila | second of all, sorting on a finite alphabet is linear time thanks to an algorithm called radix sort that you can find on wikipedia | 21:50:13 |
Kamila | but because the range of values is small, you use an algorithm called counting sort | 21:50:22 |
Kamila | and counting sort is precisely the same thing as the first approach | 21:50:33 |
Kamila | a sufficiently intelligent compiler would optimise both of these approaches into the same code, | 21:50:56 |
Kamila | * a sufficiently intelligent compiler would optimise both of these approaches into the same code | 21:50:58 |
Kamila | see e.g. https://palaiologos.rocks/posts/radix-float/ for an exposition of an O(n) sort for 32-bit float s | 21:51:18 |
Kamila | the linearithmic bound is asymptotically tight for comparison-based sorts and radix sorts are not comparison-based. | 21:51:37 |
Kamila | this is not just a theoretical improvement: Dyalog has linear-time key. | 21:52:38 |
.casenc | Why assume the string is ascii? | 21:52:45 |
Kamila | because it's the most logical thing to do; if it's not, then you use sparse arrays for counting sort and histogram, both of which are again hashmaps and degenerate to the same code. | 21:53:40 |
rubenverg | this is as much an exposition as my TinyAPL articles are — a bunch of code intercalated by a couple useless lines of text | 21:53:41 |
rubenverg | I guess now I know how people who don't know Haskell feel when they read my posts | 21:53:59 |
.casenc | How is it most logical? | 21:54:13 |
Kamila | it's perfectly readable, educational and easy to follow | 21:54:20 |
Kamila | i also think that my implementation of heap sort is stupid sexy. | 21:55:42 |
brian_e | what do you mean when you say "perfectly readable"? | 21:56:38 |
ngn | Kamila:
"Assuming non-degenerate data (i.e. no NaNs, infinities, etc.) we can sort IEEE754 floats via naive memory comparisons."
are negative floats degenerate?
| 21:58:20 |
dzaima | you xor with 0x80000000 and that problem goes away | 22:00:01 |
Olivia | got to sign extend | 22:00:20 |