Searching yarn

Twts matching #ascii
Sort by: Newest, Oldest, Most Relevant
In-reply-to » Hmmmm, I somehow run into an encoding problem where my inserted data end up mangled in the database. But, both SQLite and Go use UTF-8. What's happening here? :-?

@movq@www.uninformativ.de Non-ASCII characters were broken. Like U+2028, degrees (°), etc.

Turns out I used a silly library to detect the encoding and transform to UTF-8 if needed. When there is no Content-Type header, like for local files, it looks at the first 1024 bytes. Since it only saw ASCII in that region, the damn thing assumed the data to be in Windows-1252 (which for web pages kinda makes sense):

// TODO: change default depending on user's locale?
return charmap.Windows1252, "windows-1252", false

https://cs.opensource.google/go/x/net/+/master:html/charset/charset.go;l=102

This default is hardcoded and cannot be changed.

Trying to be smart and adding automatic support for other encodings turned out to be a bad move on my end. At least I can reduce my dependency list again. :-)

I now just reject everything that explicitly specifies something different than text/plain and an optional charset other than utf-8 (ignoring casing). Otherwise I assume it’s in UTF-8 (just like the twtxt file format specification mandates) and hope for the best.

⤋ Read More
In-reply-to » Check out the Nex Protocol. It's designed to be even simpler than Gemini and Gopher. What do you think? Could be great to host a twtxt feed on.

@shreyan@twtxt.net The only problem is that there is no such thing as “plain text”. Is it ASCII? UTF-8? DOS or UNIX line endings? Something else?

.txt or “plain text” are ambiguous terms, I’m afraid. 🫤

Other than that, it looks neat and interesting. 😅

⤋ Read More
In-reply-to » My kid just uncovered a bug in a program I wrote by grabbing my laptop and smacking the keyboard a bunch. Biological input fuzzing; a real-life chaos monkey.

“ç”, I think. Anything above 7-bit ASCII would’ve done it, though.

⤋ Read More