yarn

lyse.isobeef.org

↳ In-reply-to » @zvava The problem you now then is you lose integrity of the message content if you compute the hashes at runtime rather than on the way in. So if your message content or database becomes corrupt in any way, so do your hashes.

@prologic@twtxt.net In my opinion, the integrity isn’t lost. The same input data always result in the same output hash, no matter when you calculate the hashes. It’s true that a corrupt database contents yields to corrupt hashes, but then you have a whole bigger problem than just receiving different hashes. :-D

⤋ Read More

lyse

lyse.isobeef.org

Mon, Dec 29 3:45AM (8w ago)

↳ In-reply-to » @lyse while caching those is a good idea the problem is baking data that can be calculated into the database instead of some cache, because post hashes are not fixed and change for every post edit. you can always easily look up other twts by hash with a cached lookup table, but now you're not locked into them so supporting hashv2 or other hash variants or any other solution becomes far easier

@zvava@twtxt.net By hashing definition, if you edit your message, it simply becomes a new message. It’s just not the same message anymore. At least from a technical point of view. As a human, personally I disagree, but that’s what I’m stuck with. There’s no reliable way to detect and “correct” for that.

Storing the hash in your database doesn’t prevent you from switching to another hashing implementation later on. As of now, message creation timestamps earlier than some magical point in time use twt hash v1, messages on or after that magical timestamp use twt hash v2. So, a message either has a v1 or a v2 hash, but not both. At least one of them is never meaningful.

Once you “upgrade” your database schema, you can check for stored messages from the future which should have been hashed using v2, but were actually v1-hashed and simply fix them.

If there will ever be another addressing scheme, you could reuse the existing hash column if it supersedes the v1/v2 hashes. Otherwise, a new column might be useful, or perhaps no column at all (looking at location-based addressing or how it was called). The old v1/v2 hashes are still needed for all past conversation trees.

In my opinion, always recalculating the hashes is a big waste of time and energy. But if it serves you well, then go for it.

⤋ Read More

prologic

twtxt.net

Sun, Dec 28 9:24PM (8w ago)

↳ In-reply-to » very good blog post that reminded me why it's taking so long to ship bbycll — previously i had computed the hashes of every post before storing them in the database, after realizing it's a much better idea to compute the hashes during runtime and only store the post content & timestamp i'm now having to rewrite every function that reads & writes data. i hope the reason as to why i lost motivation is obvious — thankfully i caught it early enough so that once i'm done rewriting just those functions i should™ be able to finalize 1.0-rc with little hassle

@zvava@twtxt.net The problem you now then is you lose integrity of the message content if you compute the hashes at runtime rather than on the way in. So if your message content or database becomes corrupt in any way, so do your hashes.

⤋ Read More

zvava

twtxt.net

Sun, Dec 28 9:14PM (8w ago)

↳ In-reply-to » very good blog post that reminded me why it's taking so long to ship bbycll — previously i had computed the hashes of every post before storing them in the database, after realizing it's a much better idea to compute the hashes during runtime and only store the post content & timestamp i'm now having to rewrite every function that reads & writes data. i hope the reason as to why i lost motivation is obvious — thankfully i caught it early enough so that once i'm done rewriting just those functions i should™ be able to finalize 1.0-rc with little hassle

@lyse@lyse.isobeef.org while caching those is a good idea the problem is baking data that can be calculated into the database instead of some cache, because post hashes are not fixed and change for every post edit. you can always easily look up other twts by hash with a cached lookup table, but now you’re not locked into them so supporting hashv2 or other hash variants or any other solution becomes far easier

⤋ Read More

lyse

lyse.isobeef.org

Tue, Dec 23 3:00PM (9w ago)

↳ In-reply-to » very good blog post that reminded me why it's taking so long to ship bbycll — previously i had computed the hashes of every post before storing them in the database, after realizing it's a much better idea to compute the hashes during runtime and only store the post content & timestamp i'm now having to rewrite every function that reads & writes data. i hope the reason as to why i lost motivation is obvious — thankfully i caught it early enough so that once i'm done rewriting just those functions i should™ be able to finalize 1.0-rc with little hassle

@zvava@twtxt.net I might misunderstand what you wrote, but only hashing the message once and storing the hash together with the message in the database seems a way better approch to me. It’s fixed and doesn’t change, so there’s no need to recompute it during runtime over and over and over again. You just have it. And can easily look up other messages by hash.

⤋ Read More

zvava

twtxt.net

Tue, Dec 23 1:48PM (9w ago)

very good blog post that reminded me why it’s taking so long to ship bbycll — previously i had computed the hashes of every post before storing them in the database, after realizing it’s a much better idea to compute the hashes during runtime and only store the post content & timestamp i’m now having to rewrite every function that reads & writes data. i hope the reason as to why i lost motivation is obvious — thankfully i caught it early enough so that once i’m done rewriting just those functions i should™ be able to finalize 1.0-rc with little hassle

⇒ the cardinal sin of software architecture: the unnecessary distribution, replication, or restructuring of state, both in space and time.

⤋ Read More

movq

www.uninformativ.de

Sun, Nov 30 7:38AM (12w ago)

↳ In-reply-to » Which actively maintained Yarn/twtxt clients are there at the moment? Client authors raise your hands! 🙋

@lyse@lyse.isobeef.org Damn. That was stupid of me. I should have posted examples using 2026-03-01 as cutoff date. 😂

In my actual test suite, everything uses 2027-01-01 and then I have this, hoping that that’s good enough. 🥴

def test_rollover():
    d = jenny.HASHV2_CUTOFF_DATE
    assert len(jenny.make_twt_hash(URL, d - timedelta(days=7), TEXT)) == 7
    assert len(jenny.make_twt_hash(URL, d - timedelta(seconds=3), TEXT)) == 7
    assert len(jenny.make_twt_hash(URL, d - timedelta(seconds=2), TEXT)) == 7
    assert len(jenny.make_twt_hash(URL, d - timedelta(seconds=1), TEXT)) == 7
    assert len(jenny.make_twt_hash(URL, d, TEXT)) == 12
    assert len(jenny.make_twt_hash(URL, d + timedelta(seconds=1), TEXT)) == 12
    assert len(jenny.make_twt_hash(URL, d + timedelta(seconds=2), TEXT)) == 12
    assert len(jenny.make_twt_hash(URL, d + timedelta(seconds=3), TEXT)) == 12
    assert len(jenny.make_twt_hash(URL, d + timedelta(days=7), TEXT)) == 12

(In other words, I don’t care as long as it’s before 2027-01-01. 😏😅)

⤋ Read More

shinyoukai

neko.laidback.moe

Thu, Nov 27 8:41PM (13w ago)

The funny thing is, Yarn moving to Twt Hash v2 sounds a tad more optimistic than Git adopting SHA-256.

Git is several years too late, while Yarn is pretty much on time.

⤋ Read More

movq

www.uninformativ.de

Tue, Nov 25 6:28AM (13w ago)

↳ In-reply-to » Which actively maintained Yarn/twtxt clients are there at the moment? Client authors raise your hands! 🙋

Hm, so regarding the hash change:

https://git.mills.io/yarnsocial/twtxt.dev/pulls/28

How about 2026-03-01 00:00:00 UTC as the cut-off date? 🤔

⤋ Read More

lyse

lyse.isobeef.org

Sat, Nov 22 5:45AM (13w ago)

All my newly added test cases failed, that movq thankfully provided in https://git.mills.io/yarnsocial/twtxt.dev/pulls/28#issuecomment-20801 for the draft of the twt hash v2 extension. The first error was easy to see in the diff. The hashes were way too long. You’ve already guessed it, I had cut the hash from the twelfth character towards the end instead of taking the first twelve characters: hash[12:] instead of hash[:12].

After fixing this rookie mistake, the tests still all failed. Hmmm. Did I still cut the wrong twelve characters? :-? I even checked the Go reference implementation in the document itself. But it read basically the same as mine. Strange, what the heck is going on here?

Turns out that my vim replacements to transform the Python code into Go code butchered all the URLs. ;-) The order of operations matters. I first replaced the equals with colons for the subtest struct fields and then wanted to transform the RFC 3339 timestamp strings to time.Date(…) calls. So, I replaced the colons in the time with commas and spaces. Hence, my URLs then also all read https, //example.com/twtxt.txt.

But that was it. All test green. \o/

⤋ Read More

lyse

lyse.isobeef.org

Wed, Nov 12 2:45PM (15w ago)

↳ In-reply-to » Hmmm, looks like my twt hash algorithm implementation calculates incorrect values. Might be the tilde in the URL that throws something off. :-? At least yarnd and jenny agree on a different hash.

No, I was using an empty hash URL when the feed didn’t specify a url metadata. Now I’m correctly falling back to the feed URL.

⤋ Read More

lyse

lyse.isobeef.org

Wed, Nov 12 1:45PM (15w ago)

Hmmm, looks like my twt hash algorithm implementation calculates incorrect values. Might be the tilde in the URL that throws something off. :-? At least yarnd and jenny agree on a different hash.

⤋ Read More

aelaraji

aelaraji.com

Tue, Oct 28 8:11PM (17w ago)

↳ In-reply-to » @aelaraji tell us all about it, without omitting details!

Just typing twts directly into my twtxt file.

Details:

Opening my twtxt file remotely using vim scp://user@remote:port//path/to/twtxt.txt
Inserting the date, time and tab part of the twt with :.!echo "$(date -Is)\t"
In case I need to add a new line I just Ctrl+Shift+u, type in the 2028 and hit Enter
In order to replay, you just steal a twt hash from your favorite Yarn instance.

It looks tedious, but it’s fun to know I can twt no matter where I am, as long as can ssh in.

⤋ Read More

alexonit

twtxt.alessandrocutolo.it

Wed, Oct 1 3:07PM (21w ago)

↳ In-reply-to » @zvava Mixing both addressing schemes combines the worst of both worlds in my opinion. Please don't do that.

@lyse@lyse.isobeef.org I think will be bad if handled incorrectly.

The client must reference both properly or it would miss posts, including both this way is a bit pointless if you can’t use the hash or url separately.

Being a highly likely a breaking change anyway I think @zvava@twtxt.net proposal looks much better.

⤋ Read More

zvava

twtxt.net

Wed, Oct 1 1:53PM (21w ago)

↳ In-reply-to » @zvava Mixing both addressing schemes combines the worst of both worlds in my opinion. Please don't do that.

@lyse@lyse.isobeef.org i would like to ditch hash addressing but as was pointed out it would be a pain in the ass to get clients currently working off of hashv1 to suddenly switch to location-based addressing instead of just hashv2 with the option to eventually phase it out — unless we can rally together all active client developers to decide on a location-based addressing specification (i still think my original suggestion of #<https://example.com/tw.txt#yyyy-mm-ddThh:mm:ssZ> is foolproof)

⤋ Read More

movq

www.uninformativ.de

Wed, Oct 1 12:44PM (21w ago)

↳ In-reply-to » is the first url metadata field unequivocally treated as the canon feed url when calculating hashes, or are they ignored if they're not at least proper urls? do you just tolerate it if they're impersonating someone else's feed, or pointing to something that isn't even a feed at all?

@zvava@twtxt.net My clients trusts the first url field it finds. If there is none, it uses the URL that I’m using for fetching the feed.

No validation, no logging.

In practice, I’ve not seen issues with people messing with this field. (What I do see, of course, is broken threads when people do legitimate edits that change the hash.)

I don’t see a way how anyone can impersonate anybody else this way. 🤔 Sure, you could use my URL in your url field, but then what? You will still show up as zvava in my client or, if you also change your nick field, as movq (zvava).

⤋ Read More

lyse

lyse.isobeef.org

Wed, Oct 1 12:00PM (21w ago)

↳ In-reply-to » is the first url metadata field unequivocally treated as the canon feed url when calculating hashes, or are they ignored if they're not at least proper urls? do you just tolerate it if they're impersonating someone else's feed, or pointing to something that isn't even a feed at all?

@zvava@twtxt.net Yes, the specification defines the first url to be used for hashing. No matter if it points to a different feed or whatever. Just unsubscribe from malicious feeds and you’re done.

Since the first url is used for hashing, it must never change. Otherwise, it will break threading, as you already noticed. If your feed moves and you wanna keep the old messages in the same new feed, you still have to point to the old url location and keep that forever. But you can add more urls. As I said several times in the past, in hindsight, using the first url was a big mistake. It would have been much better, if the last encountered url were used for hashing onwards. This way, feed moves would be relatively straightforward. However, that ship has sailed. Luckily, feeds typically don’t relocate.

⤋ Read More

zvava

twtxt.net

Wed, Oct 1 5:39AM (21w ago)

↳ In-reply-to » is the first url metadata field unequivocally treated as the canon feed url when calculating hashes, or are they ignored if they're not at least proper urls? do you just tolerate it if they're impersonating someone else's feed, or pointing to something that isn't even a feed at all?

@alexonit@twtxt.alessandrocutolo.it prologic has me sold on the idea of hashv2 being served alongside a text fragment, eg. (#abcdefghijkl https://example.com/tw.txt#:~:text=2025-10-01T10:28:00Z), because it can be simply hacked in to clients currently on hashv1 and provides an off-ramp to location-based addressing (though i still think the format should be changed to smth like #<abc... http://example.com/...> so it’s cleaner once we finally drop hashes)

⤋ Read More

zvava

twtxt.net

Wed, Oct 1 4:41AM (21w ago)

is the first url metadata field unequivocally treated as the canon feed url when calculating hashes, or are they ignored if they’re not at least proper urls? do you just tolerate it if they’re impersonating someone else’s feed, or pointing to something that isn’t even a feed at all?

and if the first url metadata field changes, should it be logged with a time so we can still calculate hashes for old posts? or should it never be updated? (in the case of a pod, where the end user has no choice in how such events are treated) or do we redirect all the old hashes to the new ones (probably this, since it would be helpful for edits too)

⤋ Read More

prologic

twtxt.net

Sun, Sep 28 9:02AM (21w ago)

↳ In-reply-to » The twtiverse appears to have shrunk. Among the 61 feeds that I follow, I don’t see any hash collisions anymore. 🤔

@movq@www.uninformativ.de You were seeing that mayn hash collisions for you to notice this? 😱

⤋ Read More

movq

www.uninformativ.de

Sun, Sep 28 8:48AM (21w ago)

The twtiverse appears to have shrunk. Among the 61 feeds that I follow, I don’t see any hash collisions anymore. 🤔

⤋ Read More

alexonit

twtxt.alessandrocutolo.it

Sat, Sep 27 10:34AM (21w ago)

↳ In-reply-to » @prologic to clarify: i meant the ability to parse feeds using unix command line utilities, as a principal of twtxtv1's design. im not sure how feasible it is to build a simple feed reader out of common scripting utilities when hashing is in play, and;

@prologic@twtxt.net I think nobody will stop you if you replace the current hashing with SHA-256 if you call it improvement™ 😉

⤋ Read More

lyse

lyse.isobeef.org

Fri, Sep 26 9:15AM (22w ago)

↳ In-reply-to » @prologic the simplest thing to do is to completely forgo hashing anything because we are communicating using plain text files right now :3 while i agree hashes are incredibly helpful in the backend im not sure it has a place outside of it, it basically eliminates two core design principals of twtxt (human readability and integrating well with unix command line utilities) and makes new clients more difficult to build than it should be

Exactly, @zvava@twtxt.net, I agree. (Although, in my client at least, I wouldn’t use hashes anywhere.)

⤋ Read More

prologic

twtxt.net

Fri, Sep 26 6:19AM (22w ago)

↳ In-reply-to » @prologic to clarify: i meant the ability to parse feeds using unix command line utilities, as a principal of twtxtv1's design. im not sure how feasible it is to build a simple feed reader out of common scripting utilities when hashing is in play, and;

@alexonit@twtxt.alessandrocutolo.it Yeah I think we’re overstating the UNIX principles a bit here 🤣 I get what you’re trying to say though @zvava@twtxt.net 😅 If I could go back in time and do it all over again, I would have gotten the Hash length correct and I would have used SHA-256 instead. But someone way smarter than me designed the Twt Hash spec, we adopted it and well here we are today, it works™ 😅

⤋ Read More

alexonit

twtxt.alessandrocutolo.it

Fri, Sep 26 4:56AM (22w ago)

↳ In-reply-to » @prologic to clarify: i meant the ability to parse feeds using unix command line utilities, as a principal of twtxtv1's design. im not sure how feasible it is to build a simple feed reader out of common scripting utilities when hashing is in play, and;

That’s what I’m using right now, while my own client is still in the making.

A simple bash script to write a post in a mktemp file then clean it with regex.
I don’t even bother to hash the replies, I just open https://twtxt.net and copy the hash by hand since I’m checking the new posts from there anyway (temporarily, as I might end up DoS-ing everyone’s feed in my client right now).

⤋ Read More

zvava

twtxt.net

Fri, Sep 26 3:47AM (22w ago)

↳ In-reply-to » @bender Really? 🤔

plus, if hashv2 was implemented in combination with text fragments the way you proposed that would solve both scripting and human readability woes!!

…though, the presence of the text fragments then makes reversing the replied-to twt (and therefore its hash) trivial, which could allow clients to tolerate the omission of the hash — and while it would be ‘non-standard’ this would be the best of both worlds; potential to tolerate (or pave a glacial path toward? :o) human writable twts whilst keeping a unique id for twts that is universal across all pods

⤋ Read More

zvava

twtxt.net

Fri, Sep 26 3:47AM (22w ago)

↳ In-reply-to » @bender Really? 🤔

plus, if hashv2 was implemented in combination with text fragments the way you proposed that would solve both scripting and human readability woes!!

…though, the presence of the text fragments then makes reversing the replied-to twt (and therefore its hash) trivial, which could allow clients to tolerate the omission of the hash — and while it would be ‘non-standard’ this would be the best of both worlds; potential to tolerate (or pave a glacial path toward? :o) human writable replies whilst keeping a unique id for twts that is universal across all pods

⤋ Read More

zvava

twtxt.net

Fri, Sep 26 3:40AM (22w ago)

↳ In-reply-to » @bender Really? 🤔

@prologic@twtxt.net to clarify the i meant the ability to parse feeds using unix command line utilities, as a prinicpal of twtxtv1’s design. im not sure how feasible it is to build a simple feed reader out of common scripting utilities when hashing is in play, and;

i concede, it does make a lot of sense to fix up the hashing spec rather than completely supplant it at this point, just thinking about what the rewrite would be like is dreadful in and of itself x.x

⤋ Read More

zvava

twtxt.net

Fri, Sep 26 3:40AM (22w ago)

↳ In-reply-to » @bender Really? 🤔

@prologic@twtxt.net to clarify: i meant the ability to parse feeds using unix command line utilities, as a principal of twtxtv1’s design. im not sure how feasible it is to build a simple feed reader out of common scripting utilities when hashing is in play, and;

i concede, it does make a lot of sense to fix up the hashing spec rather than completely supplant it at this point, just thinking about what the rewrite would be like is dreadful in and of itself x.x

⤋ Read More

prologic

twtxt.net

Fri, Sep 26 2:34AM (22w ago)

↳ In-reply-to » @bender Really? 🤔

@zvava@twtxt.net Going to have to hard disagree here I’m sorry. a) no-one reads the raw/plain twtxt.txt files, the only time you do is to debug something, or have a stick beak at the comments which most clients will strip out and ignore and b) I’m sorry you’ve completely lost me! I’m old enough to pre-date before Linux became popular, so I’m not sure what UNIX principles you think are being broken or violated by having a Twt Subject (Subject) whose contents is a cryptographic content-addressable hash of the “thing”™ you’re replying to and forming a chain of other replies (a thread).

I’m sorry, but the simplest thing to do is to make the smallest number of changes to the Spec as possible and all agree on a “Magic Date” for which our clients use the modified function(s).

⤋ Read More

zvava

twtxt.net

Fri, Sep 26 1:42AM (22w ago)

↳ In-reply-to » @bender Really? 🤔

@prologic@twtxt.net the simplest thing to do is to completely forgo hashing anything because we are communicating using plain text files right now :3

⤋ Read More

zvava

twtxt.net

Fri, Sep 26 1:42AM (22w ago)

↳ In-reply-to » @bender Really? 🤔

@prologic@twtxt.net the simplest thing to do is to completely forgo hashing anything because we are communicating using plain text files right now :3 while i agree hashes are incredibly helpful in the backend im not sure it has a place outside of it, it basically eliminates two core design principals of twtxt (human readability and integrating well with unix command line utilities) and makes new clients more difficult to build than it should be

⤋ Read More

prologic

twtxt.net

Thu, Sep 25 10:49PM (22w ago)

↳ In-reply-to » @bender Really? 🤔

@bender@twtxt.net Well honestly, this is just it. My strong position on this is quite simple:

Do the simplest thing that could work.

It’s one of the age old UNIX philosphies.

Therefore, the simplest thing™ to do here is to just increase the hash length, mark a magic™ date/time as @lyse@lyse.isobeef.org has indicated and call it a day. We’ll then be fine for a few hundred years, at which point there’ll be no-one left alive to give a shit™ anyway 🤣

⤋ Read More

bender

twtxt.net

Thu, Sep 25 9:26PM (22w ago)

↳ In-reply-to » @bender Really? 🤔

@prologic@twtxt.net considering other alternatives we have seeing (of which I have lost track already), yes. Why don’t you guys (client makers) take a step at a time and, for now, increase the hash length to deal with the collisions. Then location-based addressing can be added… or not, you know. 😅

⤋ Read More

alexonit

twtxt.alessandrocutolo.it

Thu, Sep 25 1:04PM (22w ago)

↳ In-reply-to » TNO Threading (draft):
Each origin feed numbers new threads (tno:N). Replies carry both (tno:N) and (ofeed:<origin-url>). Thread identity = (ofeed, tno).

@prologic@twtxt.net I think a counter in the client is not a good choice given the decentralized nature of twtxt, especially if someone use multiple cients together.

After thinking about it for a while I got to two solutions:

Proposal 1: Thread syntax (using subject)

Each post have an implicit and an optional explicit root reference:

Implicit (no action needed, all data required are already there)
- URL + timestamp
Explicit (subject required)
- Identity (client generated)
- External reference
- Random value

We then add include a “root” subject in each post for generating explicit theads:

1. `[ROOT_ID] (REPLY_ID)`: simpler with no need of prefixes
2. `(root:ROOT_ID) (reply:REPLY_ID)`: more complex but could allow expansions
	- `(rt:ROOT_ID) (re:REPLY_ID)`: same but with a compact version
	- `($ROOT_ID) (>REPLY_ID)`: same but with a single characters

Each post can have both references, like the current hash approach the reference can be treated as a simple string and don’t have a real meaning.

Using a custom reference this way allows a client to decide how to generate them:

Identity: can be a content hash or signature or anything else, without enforcing how it is generated we can upgrade the algorithm/length freely
External references: can be provided from another system (Eg. 7e073bd345, yarnsocial/yarn latest commit)
Random value: like a UUID (Eg. 9a0c34ed-d11e-447e-9257-0a0f57ef6e07)

Proposal 2: Threaded mentions (featuring zvava)

Inspired by @zvava@twtxt.net’s solution it could be simplified into: #<nick url#timestamp> or #<url#timestamp>

It can be shown like a mentions or hidden like a subject.

If we’re using thinking of using a counter in the client, I think there’s no point in avoiding the timestamp anymore.

⤋ Read More

alexonit

twtxt.alessandrocutolo.it

Thu, Sep 25 11:04AM (22w ago)

↳ In-reply-to » I would personally rather see something like this:

@prologic@twtxt.net While it might work if you want to keep both, I think the point was to be able to use one or the other, if we still have to generate the hash anyway it might be pointless to use this format.

⤋ Read More

prologic

twtxt.net

Thu, Sep 25 8:47AM (22w ago)

↳ In-reply-to » Here is just a small list of things™ that I'm aware will break, some quite badly, others in minor ways:

Of course we still have to fix the hashing algorithm and length.

⤋ Read More

alexonit

twtxt.alessandrocutolo.it

Thu, Sep 25 6:50AM (22w ago)

↳ In-reply-to » Here is just a small list of things™ that I'm aware will break, some quite badly, others in minor ways:

@prologic@twtxt.net That is really great to hear!

If there are opposing opinions we either build a bridge or provide a new parallel road.

Also, I wouldn’t call my opinion a “stance”, I just wish for a better twtxt thanks to everyone’s effort.

The last thing we need to do is decide a proper format for the location-based version.

My proposal is to keep the “Subject extension” unchanged and include the reference to the mention like this:

// Current hash format: starts with a '#'
(#hash) here's text
(#hash) @<nick url> here's text

// New location format: valid URL-like + '#' + TIMESTAMP (verbatim format of feed source)
(url#timestamp) here's text
(url#timestamp) @<nick url> here's text

I think the timestamp should be referenced verbatim to prevent broken references with multiple variations (especially with the many timezones out there) which would also make it even easier to implement for everyone.

I’m sure we can get @zvava@twtxt.net, @lyse@lyse.isobeef.org and everyone else to help on this one.

I personally think we should also consider allowing a generic format to build on custom references, this would allow for creating threads using any custom source (manual, computed or external generated), maybe using a new “Topic extension”, here’s some examples.

// New format for custom references: starts with a '!' maybe?
(!custom) here's text
(!custom) @<nick url> here's text

// A possible "Topic" parse as a thread root:
[!custom] start here
[custom] simpler format

This one is just an idea of mine, but I feel it can unleash new ways of using twtxt.

⤋ Read More

itsericwoodward

itsericwoodward.com

Thu, Sep 25 12:02AM (22w ago)

I finally resolved my issues with hashing twts… with REGEX!

Dates in JavaScript are truly strange creatures.

⤋ Read More

alexonit

twtxt.alessandrocutolo.it

Tue, Sep 23 1:34AM (22w ago)

↳ In-reply-to » Here is just a small list of things™ that I'm aware will break, some quite badly, others in minor ways:

@lyse@lyse.isobeef.org @prologic@twtxt.net Can’t we find a middle ground and support both?

The thread is defined by two parts:

The hash
The subject

The client/pod generate the hash and index it in it’s database/cache, then it simply query the subject of other posts to find the related posts, right?

In my own client current implementation (using hashes), the only calculation is in the hash generation, the rest is a verbatim copy of the subject (minus the # character), if this is the common implemented approach then adding the location based one is somewhat simple.

function setPostIndex(post) {
    // Current hash approach
    const hash = createHash(post.url, post.timestamp, post.content);

    // New location approach
    const location = post.url + '#' + post.timestamp;

    // Unchanged (probably)
    const subject = post.subject;

    // Index them all
    addToIndex(hash, post);
    addToIndex(location, post);
    addToIndex(subject, post);
}

// Both should work if the index contains both versions
getThreadBySubject('#abcdef') => [post1, post2, post3]; // Hash
getThreadBySubject('https://example.com#2025-01-01T12:00:00') => [post1, post2, post3]; // Location

As I said before, the mention is already location based @<example https://example.com/twtxt.txt>, so I think we should keep that in consideration.

Of course this will lead to a bit of fragmentation (without merging the two) but I think this can make everyone happy.

Otherwise, the only other solution I can think of is a different approach where the value doesn’t matter, allowing to use anything as a reference (hash, location, git commit) for greater flexibility and freedom of implementation (this probably need the use of a fixed “header” for each post, but it can be seen as a separate extension).

⤋ Read More

prologic

twtxt.net

Mon, Sep 22 10:48PM (22w ago)

↳ In-reply-to » Here is just a small list of things™ that I'm aware will break, some quite badly, others in minor ways:

@lyse@lyse.isobeef.org I don’t think there’s any point in continuing the discussion of Location vs. Content based addressing.

I want us to preserve Content based addressing.

Let’s improve the user experience and fix the hash commission problems.

⤋ Read More

lyse

lyse.isobeef.org

Mon, Sep 22 1:00PM (22w ago)

↳ In-reply-to » Here is just a small list of things™ that I'm aware will break, some quite badly, others in minor ways:

@prologic@twtxt.net I know we won’t ever convince each other of the other’s favorite addressing scheme. :-D But I wanna address (haha) your concerns:

I don’t see any difference between the two schemes regarding link rot and migration. If the URL changes, both approaches are equally terrible as the feed URL is part of the hashed value and reference of some sort in the location-based scheme. It doesn’t matter.
The same is true for duplication and forks. Even today, the “cannonical URL” has to be chosen to build the hash. That’s exactly the same with location-based addressing. Why would a mirror only duplicate stuff with location- but not content-based addressing? I really fail to see that. Also, who is using mirrors or relays anyway? I don’t know of any such software to be honest.
If there is a spam feed, I just unfollow it. Done. Not a concern for me at all. Not the slightest bit. And the byte verification is THE source of all broken threads when the conversation start is edited. Yes, this can be viewed as a feature, but how many times was it actually a feature and not more behaving as an anti-feature in terms of user experience?
I don’t get your argument. If the feed in question is offline, one can simply look in local caches and see if there is a message at that particular time, just like looking up a hash. Where’s the difference? Except that the lookup key is longer or compound or whatever depending on the cache format.
Even a new hashing algorithm requires work on clients etc. It’s not that you get some backwards-compatibility for free. It just cannot be backwards-compatible in my opinion, no matter which approach we take. That’s why I believe some magic time for the switch causes the least amount of trouble. You leave the old world untouched and working.

If these are general concerns, I’m completely with you. But I don’t think that they only apply to location-based addressing. That’s how I interpreted your message. I could be wrong. Happy to read your explanations. :-)

⤋ Read More

alexonit

twtxt.alessandrocutolo.it

Mon, Sep 22 7:49AM (22w ago)

↳ In-reply-to » @zvava @lyse I also think a location based reference might be better.

@prologic@twtxt.net I can see the issues mentioned, but I think some can be fixed.

The current hash relies on a url field too, by specification, it will use the first # url = <URL> in the feed’s metadata if present, that too can be different from the fetching source, if that field changes it would break the existing hashes too, a better solution would be to use a non-URL key like # feed_id = <UNIQUE_RANDOM_STRING> with the url as fallback.
We can prevent duplications if the reference uses that same url field too or the client “collapse” any reference of all the urls defined in the metadata.
I agree that hashing based on content is good, but we still use the URL as part of the hashing, which is just a field in the feed, easily replicable by a bot, also noting that edits can also break the hash, for this issue an alternative solution (E.g. a private key not included in the feed) should be considered.
For offline reading the source would be downloaded already, the fetching of non followed feeds would fill the gap in the same way mentions does, maybe I’m missing some context on this one.
To prevent collisions there was a discussion on extending the hash (forgot if that was already fixed or not), but without a fallback that would break existing clients too, we should think of a parallel format that maintains current implementations unchanged, we are already backward compatible with the original that don’t use threads at all, a mention style format for that could be even more user-friendly for those clients.

We should also keep in mind that the current mention format is already location based (@<example https://example.com/twtxt.txt>) so I’m not that worried about threads working the same way.

Hope to see some other thought about this matter. 🤓

⤋ Read More

prologic

twtxt.net

Mon, Sep 22 3:07AM (22w ago)

↳ In-reply-to » @zvava @lyse I also think a location based reference might be better.

Here is just a small list of things™ that I’m aware will break, some quite badly, others in minor ways:

Link rot & migrations: domain changes, path reshuffles, CDN/mirror use, or moving from txt → jsonfeed will orphan replies unless every reader implements perfect 301/410 history, which they won’t.
Duplication & forks: mirrors/relays produce multiple valid locations for the same post; readers see several “parents” and split the thread.
Verification & spam-resistance: content addressing lets you dedupe and verify you’re pointing at exactly the post you meant (hash matches bytes). Location anchors can be replayed or spoofed more easily unless you add signing and canonicalization.
Offline/cached reading: without the original URL being reachable, readers can’t resolve anchors; with hashes they can match against local caches/archives.
Ecosystem churn: all existing clients, archives, and tools that assume content-derived IDs need migrations, mapping layers, and fallback logic. Expect long-lived threads to fracture across implementations.

⤋ Read More

prologic

twtxt.net

Mon, Sep 22 3:06AM (22w ago)

↳ In-reply-to » @zvava @lyse I also think a location based reference might be better.

We’ve been discussing the idea of changing the threading model from Content-based Addressing to Location-based addressing for years now. The problem is quite complex, but I feel I have to keep reminding y’all of the potential perils of changing this and the pros/cons of each model:

With content-addressed threading, a reply points at something that’s intrinsically identified (hash of author/feed URI + timestamp + content). That ID never changes as long as the content doesn’t. Switching to location-based anchors makes the reply target extrinsic—it now depends on where the post currently lives. In a pull-based, decentralised network, locations drift. The moment they do, thread identity fragments.

⤋ Read More

alexonit

twtxt.alessandrocutolo.it

Fri, Sep 19 2:58AM (23w ago)

↳ In-reply-to » @lyse i dont mind if the hash is not backward compatible but im not sure if this is the right way to proceed because the added complexity dealing with two hash versions isnt justified

@zvava@twtxt.net @lyse@lyse.isobeef.org I also think a location based reference might be better.

A thread is a single post of a single feed as a root, but the hash has the drawback of not referencing the source, in a distributed network like twtxt it might leave some people out of the whole conversation.

I suggest a simpler format, something like: (#<TIMESTAMP URL>)

This solves three issues:

Easier referencing: no need to generate a hash, just copy the timestamp and url, it’s also simpler to implement in a client without the rish of collisions when putting things together
Fetchable source: you can find the source within the reference and construct the thread from there
Allow editing: If a post is modified the hash becomes invalid since it depends on [ timestamp, url, content ]

⤋ Read More

lyse

lyse.isobeef.org

Tue, Sep 16 9:45AM (23w ago)

↳ In-reply-to » @lyse i dont mind if the hash is not backward compatible but im not sure if this is the right way to proceed because the added complexity dealing with two hash versions isnt justified

@zvava@twtxt.net There would be only one hash for a message. Some to be defined magic date selects which hash to use. If the message creation timestamp is before this epoch, hash it with v1, otherwise hammer it through v2. Eventually, support for v1 could be dropped as nobody interacts with the old stuff anymore. But I’d keep it around in my client, because why not.

If users choose a client which supports the extensions, they don’t have to mess around with v1 and v2 hashing, just like today.

As for the school of thought, personally, I’d prefer something else, too. I’m in camp location-based addressing, or whatever it is called. There more I think about it, a complete redesign of twtxt and its extensions would be necessary in my opinion. Retrofitting has its limits. Of course, this is much more work, though.

⤋ Read More

zvava

twtxt.net

Mon, Sep 15 6:41PM (23w ago)

↳ In-reply-to » @prologic im unsure how i feel about the hash v2 proposal, given it is completely backward incompatible with hash v1 it doesn't really solve any of the problems with it. it only delays collisions, and still fragments threads on post edits

@lyse@lyse.isobeef.org i dont mind if the hash is not backward compatible but im not sure if this is the right way to proceed because the added complexity dealing with two hash versions isnt justified

regular end users wont care to understand how twt hashes are formed, they just want to use twtxt! so i guess i could work in protecting users from themselves by disallowing post edits on old posts or posts with replies, but i’m not fond of this either really. if they want to break a thread, they can just delete the post (though i’ve noticed yarn handling post deletes dubiously…)

on activitypub i do genuinely find myself looking through several month or even year old posts sometimes and deciding to edit/reword them a little to be slightly less confusing, this should be trivial to handle on twtxt which is an infinitely simpler specification

⤋ Read More

movq

www.uninformativ.de

Mon, Sep 15 7:49AM (23w ago)

↳ In-reply-to » wait why are so many of my post hashes not generating correctly ;w;

@zvava@twtxt.net I was about to suggest that you post some examples. By now, we’re pretty good at debugging hashing issues, because that happens so often. 😂 But it looks like you figured it out on your own. ✌️

⤋ Read More

zvava

twtxt.net

Sun, Sep 14 9:20PM (23w ago)

im unable to figure out why bbycll is not generating posts hashes for @lyse@lyse.isobeef.org’s feed correctly (or at least different from the ones generated by yarn)

i’m pretty sure the timezone is stripped off the offset correctly (2025-09-14T12:45:00+02:00 → 2025-09-14T12:45:00Z) though messing with how the hash is generated i can’t get it to make one that matches…but all other hashes for all other feeds seem to be correct? does yarn use a different canonical url for lyse internally? is there a bug in the libraries im using? bwehhh

⤋ Read More

Searching yarn