15 Jan 2025 |
.casenc | Sure | 08:56:26 |
.casenc | This is a programming skill issue then 🤷♀️ | 09:02:13 |
.casenc | (I've never heard of local-dependence before, that is very interesting) | 09:04:28 |
.casenc | * (I've never heard of local-dependence like that before, that is very interesting) | 09:04:40 |
loke | In reply to @_discord_328851809357791232:t2bot.io (I've never heard of local-dependence like that before, that is very interesting) The Aalborg and Soeren examples are a bit bogus, the IJ letter in Dutch would be a bit more valid though. | 09:06:05 |
Kamila | unicode compliance was the bane of my existence a few weeks ago. | 09:06:15 |
Kamila | another awful part is that unicode normalisation algorithms keep getting extended for some reason. | 09:06:32 |
Kamila | so e.g. nowadays you're expected to properly lay out hangul joiners and stuff in a string. | 09:06:47 |
.casenc | "unicode too hard" does not sound like a good reason to assume ascii-ness to me | 09:07:20 |
.casenc | but i've never had to program myself mechanisms to deal with it at the byte-level | 09:07:42 |
.casenc | * then again, i've never had to program myself mechanisms to deal with it at the byte-level | 09:07:51 |
loke | The real problem (and I think what Kamila is getting at) of the palindrome check thing is that "reverse a string" is very hard. In fact, it's practically impossible. | 09:08:15 |
loke | Redacted or Malformed Event | 09:08:25 |
loke | Because the concept of reversing a string is not well defined. How do you do that in Arabic? | 09:08:49 |
loke | (Arabic has different letter forms depending on where the character is in a word) | 09:09:24 |
loke | So even though I agree with the most recent message from .casenc , I also argue that the palindrome check problem is such a specific issue that doesn't make sense outside some very well-defined restrictions, that one might just as well define the problem to only cover ASCII and be done with it. :-) | 09:11:06 |
.casenc | I agree | 09:11:40 |
loke | In reply to @_discord_356107472269869058:t2bot.io another awful part is that unicode normalisation algorithms keep getting extended for some reason. Did you implement the Unicode normalisation algorithms? Usually one just calls into ICU or something. | 09:12:40 |
Kamila | my algorithm easily extends to other fixed width-types | 09:13:43 |
Kamila | i saw the original remark as a nitpick and now for my own amusement i drowned us all in nitpicks and unicode nonsense. | 09:13:58 |
.casenc | It was not a nitpick it was a proper concern | 09:14:23 |
Kamila | gcc/clang do it by themselves for example | 09:14:24 |
Kamila | back when i didn't know how bad the situation with unicode normalisation is, i was very surprised to see mentions to korean alphabet in the preprocessor source code. | 09:14:58 |
loke | Annoying. The fewer implementations there are, the better. Broken Unicode implementations is very annoying. And these are very easy to get wrong. | 09:15:10 |
Kamila | for once you can just... not implement unicode too.. | 09:15:23 |
.casenc | My language(s) are European, use the latin alphabet, and I think most of the people in my class would give you an Index Error if you tried to sort their names | 09:15:34 |
Kamila | if you scan an utf-8 sequence of bytes then your palindrome check is correct in most cases | 09:15:38 |
Kamila | because then the multibyte sequences contribute individually to the histogram and you never have false positives | 09:15:51 |
Kamila | * because then the multibyte sequences contribute individually to the histogram and you never have false negatives | 09:16:00 |
Kamila | lativan or finnish | 09:16:13 |