compressed_subject(msg_singlelined)
be configurable, so only a certain number of characters get displayed, ending on ellipses? Right now the entire twtxt is crammed into the Subject:
. This request aims to make twtxts display on mutt
/neomutt
, etc. more like emails do.
I mean, really, it couldnāt get any better. I love it!
@eldersnake@we.loveprivacy.club I wanted to ask you, are you running Headscale and WireGuard on the same VPS? I want to test Headscale, but currently run a small container with WireGuard, and I wonder if I need to stop (and eventually get rid of) the container to get Headscale going. Did you use the provided .deb
to install Headscale, or some other method?
Speaking of AI tech (sorry!); Just came across this really cool tool built by some engineers at Google⢠(currently completely free to use without any signup) called NotebookLM š Looks really good for summarizing and talking to document š
@eldersnake@we.loveprivacy.club there has to be less reliance on a single point of failure. It is not so much about creating jobs in the US (which come with it, anyway), but about the ability to produce whatās needed at home too. Whatās the trade off? Is it going to be a little bit more expensive to manufacture, perhaps?
@quark@ferengi.one It does not. That is why Iām advocating for not using hashes for treads, but a simpler link-back scheme.
the stem matching is the same as how GIT does its branch hashes. i think you can stem it down to 2 or 3 sha bytes.
if a client sees someone in a yarn using a byte longer hash it can lengthen to match since it can assume that maybe the other client has a collision that it doesnt know about.
@prologic@twtxt.net Wikipedia claims sha1 is vulnerable to a āchosen-prefix attackā, which I gather means I can write any two twts I like, and then cause them to have the exact same sha1 hash by appending something. I guess a twt ending in random junk might look suspcious, but perhaps the junk could be worked into an image URL like
. If thatās not possible now maybe it will be later.git only uses sha1 because theyāre stuck with it: migrating is very hard. There was an effort to move git to sha256 but I donāt know its status. I think there is progress being made with Game Of Trees, a git clone that uses the same on-disk format.
I canāt imagine any benefit to using sha1, except that maybe some very old software might support sha1 but not sha256.
@david@collantes.us āHello backā from the other corner of the world! š«”
Alright. My first mentionsāwhich were picked not so randomly, LOLāare @prologic@twtxt.net, @lyse@lyse.isobeef.org, and @movq@www.uninformativ.de. I am also posting my first image too, which you see below. Thatās my neighbourhood, in a āwinterā day. Hopefully @prologic@twtxt.net will add my domain to his allowed list, so that the image (and any other further) renders.
@movq@www.uninformativ.de Agreed that hashes have a benefit. I came up with a similar example where when I twted about an 11-character hash collision. Perhaps hashes could be made optional somehow. Like, you could use the āreplytoā idea and then additionally put a hash somewhere if you want to lock in which version of the twt you are replying to.
There is nothing wrong with how we currently run a diff to see what has been removed. if i build a merkle tree off all the twt hashes in a feed i can use that to verify a twt should be in a feed or not. and gossip that to my peers.
isnāt the benefit of blake2b that it is a more efficient algo than sha1 and has the same or similar entropy to sha3? i thought we had partially solved this with some type of expanding hash size? additionally we could increase bit density by using base36 or base64/url-safeā¦
Iām not advocating in either direction, btw. I havenāt made up my mind yet. š Just braindumping here.
The (replyto:ā¦)
proposal is definitely more in the spirit of twtxt, Iād say. Itās much simpler, anyone can use it even with the simplest tools, no need for any client code. That is certainly a great property, if you ask me, and itās things like that that brought me to twtxt in the first place.
Iād also say that in our tiny little community, message integrity simply doesnāt matter. Signed feeds donāt matter. I signed my feed for a while using GPG, someone else did the same, but in the end, nobody cares. The community is so tiny, thereās enough āimplicit trustā or whatever you want to call it.
If twtxt/Yarn was to grow bigger, then this would become a concern again. But even Mastodon allows editing, so how much of a problem can it really be? š
I do have to āadmitā, though, that hashes feel better. It feels good to know that we can clearly identify a certain twt. It feels more correct and stable.
Hm.
I suspect that the (replyto:ā¦)
proposal would work just as well in practice.
Hey, @movq@www.uninformativ.de, a tiny thing to add to jenny
, a -v
switch. That way when you twtxt āThatās an older format that was used before jenny version v23.04ā, I can go and run jenny -v
, and āduh!ā myself on the way to a git pull
. :-D
@movq@www.uninformativ.de to paraphrase US Presidents speech on each State of the Union, āthe State of the Jenny is strong!ā :-D As for the potential upcoming changes, there has to be a knowledgeable head honcho that will agglomerate and coalesce, and guide onto the direction that will be taken. All that with the strong input from the developers that will be implementing the changes, and a lesser (but not less valuable) input from users.
Thereās a simple reason all the current hashes end in a or q: the hash is 256 bits, the base32 encoding chops that into groups of 5 bits, and 256 isnāt divisible by 5. The last character of the base32 encoding just has that left-over single bit (256 mod 5 = 1).
So I agree with #3 below, but do you have a source for #1, #2 or #4? I would expect any lack of variability in any part of a hash functionās output would make it more vulnerable to attacks, so designers of hash functions would want to make the whole output vary as much as possible.
Other than the divisible-by-5 thing, my current intuition is it doesnāt matter what part you take.
Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.
@prologic@twtxt.net I just realised the jenny
also does what I want, as of latest commit. Simply use jenny --debug-feed <feed url>
, and it will do what I wanted too!
I came across this Gallery Theme for Hugo, and @lyse@lyse.isobeef.org immediately came to mind. I think it would be a very fitting theme to use for all your photos, Lyse!
Taking the last n characters of a base32 encoded hash instead of the first n can be problematic for several reasons:
Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.
In summary, using the first n characters generally preserves the intended randomness and collision resistance of the hash, making it a safer choice in most cases.
@quark@ferengi.one Do you mean something like this?
$ ./yarnc debug ~/Public/twtxt.txt | tail -n 1
kp4zitq 2024-09-08T02:08:45Z (#wsdbfna) @<aelaraji https://aelaraji.com/twtxt.txt> My work has this thing called "compressed work", where you can **buy** extra time off (_as much as 4 additional weeks_) per year. It comes out of your pay though, so it's not exactly a 4-day work week but it could be useful, just haven't tired it yet as I'm not entirely sure how it'll affect my net pay
@prologic@twtxt.net I saw those, yes. I tried using yarnc
, and it would work for a simple twtxt. Now, for a more convoluted one it truly becomes a nightmare using that tool for the job. I know there are talks about changing this hash, so this might be a moot point right now, but it would be nice to have a tool that:
- Would calculate the hash of a twtxt in a file.
- Would calculate all hashes on a
twtxt.txt
(local and remote).
Again, something lovely to have after any looming changes occur.
Could someone knowledgable reply with the steps a grandpa will take to calculate the hash of a twtxt from the CLI, using out-of-the-box tools? I swear I read about it somewhere, but canāt find it.
@aelaraji@aelaraji.com this is the little script I am using on my publish_command
:
#!/usr/bin/env bash
twtxt2html -t "Quark's twtxt feed" /var/www/sites/ferengi.one/twtxt.txt > /var/www/sites/ferengi.one/index.html
I named it twtxtit
. :-)
@sorenpeter@darch.dk I like this idea. Just for fun, Iām using a variant in this twt. (Also because Iām curious how it non-hash subjects appear in jenny and yarn.)
URLs can contain commas so I suggest a different character to separate the url from the date. Is this twt Iāve used space (also after āreplytoā, for symmetry).
I think this solves:
- Changing feed identities: although @mckinley@twtxt.net points out URLs can change, I think this syntax should be okay as long as the feed at that URL can be fetched, and as long as the current canonical URL for the feed lists this one as an alternate.
- editing, if you donāt care about message integrity
- finding the root of a thread, if youāre not following the author
An optional hash could be added if message integrity is desired. (E.g. if you donāt trust the feed author not to make a misleading edit.) Other recent suggestions about how to deal with edits and hashes might be applicable then.
People publishing multiple twts per second should include sub-second precision in their timestamps. As you suggested, the timestamp could just be copied verbatim.
Trying to figure out how to use the publish_command
to vomit the HTML into a file, using twtxt2html
.
@movq@www.uninformativ.de Non-ASCII characters were broken. Like U+2028, degrees (°), etc.
Turns out I used a silly library to detect the encoding and transform to UTF-8 if needed. When there is no Content-Type header, like for local files, it looks at the first 1024 bytes. Since it only saw ASCII in that region, the damn thing assumed the data to be in Windows-1252 (which for web pages kinda makes sense):
// TODO: change default depending on user's locale?
return charmap.Windows1252, "windows-1252", false
https://cs.opensource.google/go/x/net/+/master:html/charset/charset.go;l=102
This default is hardcoded and cannot be changed.
Trying to be smart and adding automatic support for other encodings turned out to be a bad move on my end. At least I can reduce my dependency list again. :-)
I now just reject everything that explicitly specifies something different than text/plain
and an optional charset other than utf-8
(ignoring casing). Otherwise I assume itās in UTF-8 (just like the twtxt file format specification mandates) and hope for the best.
-T/--template
in case you need a custom template š
@bender@twtxt.net I should put the template that is used by default as a file in the repo. Look at the source for now and youāll see š
(#hash;#originalHash)
would also work.
Maybe Iām being a bit too purist/minimalistic here. As I said before (in one of the 1372739 posts on this topic ā or maybe I didnāt even send that twt, I donāt remember š ), I never really liked hashes to begin with. They arenāt super hard to implement but they are kind of against the beauty of the original twtxt ā because you need special client support for them. Itās not something that you could write manually in your
twtxt.txt
file. With @sorenpeter@darch.dkās proposal, though, that would be possible.
Tangentially related, I was a bit disappointed to learn that the twt subject extension is now never used except with hashes. Manually-written subjects sounded so beautifully ad-hoc and organic as a way to disambiguate replies. Maybe Iāll try it some time just for fun.
Hmmmm, I somehow run into an encoding problem where my inserted data end up mangled in the database. But, both SQLite and Go use UTF-8. Whatās happening here? :-?
() @falsifian@www.falsifian.org You mean the idea of being able to inline
# url =
changes in your feed?
Yes, that one. But @lyse@lyse.isobeef.org pointed out suffers a compatibility issue, since currently the first listed url is used for hashing, not the last. Unless your feed is in reverse chronological order. Heh, I guess another metadata field could indicate which version to use.
Or maybe url changes could somehow be combined with the archive feeds extension? Could the url metadata field be local to each archive file, so that to switch to a new url all you need to do is archive everything youāve got and start a new file at the new url?
I donāt think itās that likely my feed url will change.
@movq@www.uninformativ.de I did started from scratch, today. I using am commit 6e8ce5afdabd5eac22eae4275407b3bd2a167daf (HEAD -> main, origin/main, origin/HEAD)
, I keep myself up-to-date, LOL. Still, that specific twtxt (o6dsrga
) is no longer.
Since
jenny
canāt fetch archived twtxts
I wiped my entire maildir and re-fetched everything. I did that recently because @aelaraji@aelaraji.com asked me to š , but I guess I also did this back in 2023.
What did you do to make yours work?
jenny does fetch archived feeds during the normal jenny -f
operation. Only when using the recently implemented --fetch-context
, archived feeds are not fetched (yet). That was an oversight and I intend to fix that.
@movq@www.uninformativ.de I figured it will be something like this, yet, you were able to reply just fine, and I wasnāt. Looking at your twtxt.txt
I see this line:
2024-09-16T17:37:14+00:00 (#o6dsrga) @<prologic https://twtxt.net/user/prologic/twtxt.txt>
@<quark https://ferengi.one/twtxt.txt> This is what I get. š¤
Which is using the right hash. Mine, on the other hand, when I replied to the original, old style message (Message-Id: <o6dsrga>
), looks like this:
2024-09-16T16:42:27+00:00 (#o) @<prologic https://twtxt.net/user/prologic/twtxt.txt> this was your first twtxt. Cool! :-P
What did you do to make yours work? I simply went to the oldest @prologic@twtxt.netās entry on my Maildir, and replied to it (jenny
set the reply-to
hash to #o
, even though the Message-Id
is o6dsrga
). Since jenny
canāt fetch archived twtxts, how could I go to re-fetch everything? And, most importantly, would re-fetching fix the Message-Id:
?
@mckinley@twtxt.net Yes, changing domains is be a problem if you tie your identity to an https url. But I also worry about being stuck with a key I canāt rotate. Whatever gets used, it would be nice to be able to rotate identities. I like @lyse@lyse.isobeef.orgās idea for that.
i have been using grapheneos for a long time (maybe 7 years now?) and i would fully reccomend it to anyone who is OK with buying a new pixel device to run it on. the install instructions are really easy to follow and can be executed on any device that supports WebUSB https://grapheneos.org/install/web
Can anyone recommend a decent Android ROM that strips out as much of the spyware as possible? Is GrapheneOS a good option? I need to get a new phone anyway so I donāt mind buying within a supported device list as long as I can get one on the used market for $300-$400 or less.
If anyone could recommend some learning resources for this stuff Iād really appreciate it.
the first version of the minibase flake cdn is online as of today. more details on how this is populated and funded will be coming along soon. currently minibase includes a very small selection of packages, but you can install them using the experimental flakes feature today! https://src.cyb.red/
Hmm⦠I replied to this message:
From: prologic <prologic>
Subject: Hello World! š
Date: Sat, 18 Jul 2020 08:39:52 -0400
Message-Id: <o6dsrga>
X-twtxt-feed-url: https://twtxt.net/user/prologic/twtxt.txt
Hello World! š
And see how the hash shows⦠Is it because that hash isnāt longer used?
(replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
I think I like this a lot. š¤
The problem with using hashes always was that theyāre āone-directionalā: You can construct a hash from URL + timestamp + twt, but you cannot do the inverse. When I see ā, I have no idea what that could possibly refer to.
But of course something like (replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
has all the information you need. This could simplify twt/feed discovery quite a bit, couldnāt it? š¤ That thing that I just implemented ā jenny asking some Yarn pod for some twt hash ā would not be necessary anymore. Clients could easily and automatically fetch complete threads instead of requiring the user to follow all relevant feeds.
Only using the timestamp to identify a twt also solves the edit problem.
It even is better for non-Yarn clients, because you now donāt have to read, understand, and implement a ātwt hash specificationā before you can reply to someone.
The only problem, really, is that (replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
is so long. Clients would have to try harder to hide this. š
More:
Subject: The [tag URI scheme](https://en.wikipedia.org/wiki/Tag_URI_scheme) looks interesting. I like that it human read- and writable. And since we already got the timestamp in the twtxt.txt it would be
somewhat trivial to parse. But there are still the issue with what the name/id should be... Maybe it doesn't have to bee that stick? Instead of using `tag:` as the prefix/protocol, it would more it clear
what we are talking about by using `in-reply-to:` (https://indieweb.org/in-reply-to) or `replyto:` similar to `mailto:` 1. `(reply:sorenpeter@darch.dk,2024-09-15T12:06:27Z)' 2.
`(in-reply-to:darch.dk/twtxt.txt,2024-09-15T12:06:27Z)' 2. `(replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)' I know it's longer that 7-11 characters, but it's self-explaining when looking at the
twtxt.txt in the raw, and the cases above can all be caught with this regex: `\([\w-]*reply[\w-]*\:` Is this something that would work?
Subject: The [tag URI scheme](https://en.wikipedia.org/wiki/Tag_URI_scheme) looks interesting. I like that it human read- and writable. And since we already got the timestamp in the twtxt.txt it would be
somewhat trivial to parse. But there are still the issue with what the name/id should be... Maybe it doesn't have to bee that stick? Instead of using `tag:` as the prefix/protocol, it would more it clear
what we are talking about by using `in-reply-to:` (https://indieweb.org/in-reply-to) or `replyto:` similar to `mailto:` 1. `(reply:sorenpeter@darch.dk,2024-09-15T12:06:27Z)` 2.
`(in-reply-to:darch.dk/twtxt.txt,2024-09-15T12:06:27Z)` 3. `(replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)` I know it's longer that 7-11 characters, but it's self-explaining when looking at the
twtxt.txt in the raw, and the cases above can all be caught with this regex: `\([\w-]*reply[\w-]*\:` Is this something that would work?
Notice the difference? Soren edited, and broke everything.
@prologic@twtxt.net Itās all Iām using ⦠I have barely touched any other social media since Iāve discovered Twtxt back in April š maybe a little bit of Mastodon and IRC, bluesky even less, but nothing else worth mentioning.
The tag URI scheme looks interesting. I like that it human read- and writable. And since we already got the timestamp in the twtxt.txt it would be somewhat trivial to parse. But there are still the issue with what the name/id should be⦠Maybe it doesnāt have to bee that stick?
Instead of using tag:
as the prefix/protocol, it would more it clear what we are talking about by using in-reply-to:
(https://indieweb.org/in-reply-to) or replyto:
similar to mailto:
(reply:sorenpeter@darch.dk,2024-09-15T12:06:27Z)
(in-reply-to:darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
(replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
I know itās longer that 7-11 characters, but itās self-explaining when looking at the twtxt.txt in the raw, and the cases above can all be caught with this regex: \([\w-]*reply[\w-]*\:
Is this something that would work?
Thank you @aelaraji@aelaraji.com, Iām glad you like it. I use PHP because itās everywhere on cheap hosting and no need for the user to log into a terminal to setup it up. Timeline is not mean to be use locally. For that I think something like twtxt2html is a better fit. (and happy to see you using simple.css on you new log page;)
Crap, I canāt find how and why rdomain is not set using /etc/hostname.if #openbsd
@falsifian@www.falsifian.org TLS wonāt help you if you change your domain name. How will people know if itās really you? Maybe thatās not the biggest problem for something with such low stakes as twtxt, but itās a reasonable concern that could be solved using signatures from an unchanging cryptographic key.
This idea is the basis of Nostr. Notes can be posted to many relays and every note is signed with your private key. It doesnāt matter where you get the note from, your client can verify its authenticity. That way, relays donāt need to be trusted.
@prologic@twtxt.net Brute force. I just hashed a bunch of versions of both tweets until I found a collision.
I mostly just wanted an excuse to write the program. I donāt know how I feel about actually using super-long hashes; could make the twts annoying to read if you prefer to view them untransformed.
@prologic@twtxt.net earlier you suggested extending hashes to 11 characters, but hereās an argument that they should be even longer than that.
Imagine I found this twt one day at https://example.com/twtxt.txt :
2024-09-14T22:00Z Useful backup command: rsync -a ā$HOMEā /mnt/backup
and I responded with ā(#5dgoirqemeq) Thanks for the tip!ā. Then Iāve endorsed the twt, but it could latter get changed to
2024-09-14T22:00Z Useful backup command: rm -rf /some_important_directory
which also has an 11-character base32 hash of 5dgoirqemeq. (Iām using the existing hashing method with https://example.com/twtxt.txt as the feed url, but Iām taking 11 characters instead of 7 from the end of the base32 encoding.)
Thatās what I meant by āspoofingā in an earlier twt.
I donāt know if preventing this sort of attack should be a goal, but if it is, the number of bits in the hash should be at least two times log2(number of attempts we want to defend against), where the ātwo timesā is because of the birthday paradox.
Side note: current hashes always end with āaā or āqā, which is a bit wasteful. Maybe we should take the first N characters of the base32 encoding instead of the last N.
Code I used for the above example: https://fossil.falsifian.org/misc/file?name=src/twt_collision/find_collision.c
I only needed to compute 43394987 hashes to find it.
Theyāre in Section 6:
Receiver should adopt UDP GRO. (Something about saving CPU processing UDP packets; Iām a but fuzzy about it.) And they have suggestions for making GRO more useful for QUIC.
Some other receiver-side suggestions: āsending delayed QUICK ACKsā; āusing recvmsg to read multiple UDF packets in a single system callā.
Use multiple threads when receiving large files.
HTTPS is supposed to do [verification] anyway.
TLS provides verification that nobody is tampering with or snooping on your connection to a server. It doesnāt, for example, verify that a file downloaded from server A is from the same entity as the one from server B.
I was confused by this response for a while, but now I think I understand what youāre getting at. You are pointing out that with signed feeds, I can verify the authenticity of a feed without accessing the original server, whereas with HTTPS I canāt verify a feed unless I download it myself from the origin server. Is that right?
I.e. if the HTTPS origin server is online and I donāt mind taking the time and bandwidth to contact it, then perhaps signed feeds offer no advantage, but if the origin server might not be online, or I want to download a big archive of lots of feeds at once without contacting each server individually, then I need signed feeds.
feed locations [being] URLs gives some flexibility
It does give flexibility, but perhaps we should have made them URIs instead for even more flexibility. Then, you could use a tag URI,
urn:uuid:*
, or a regular old URL if you wanted to. The spec seems to indicate that theurl
tag should be a working URL that clients can use to find a copy of the feed, optionally at multiple locations. Iām not very familiar with IP{F,N}S but if it ensures you own an identifier forever and that identifier points to a current copy of your feed, it could be a great way to fix it on an individual basis without breaking any specs :)
Iām also not very familiar with IPFS or IPNS.
I havenāt been following the other twts about signatures carefully. I just hope whatever you smart people come up with will be backwards-compatible so it still works if Iām too lazy to change how I publish my feed :-)
@sorenpeter@darch.dk !! I freaking love your Timeline ⦠I kind of have an justified PHP phobia š but, Iām definitely thinking about giving it a try!
/ME wondering if itās possible to use it locally just to read and manage my feed at first and then maybe make it publicly accessible later.