@lyse@lyse.isobeef.org Yeah to avoid cutting off bits at the end making hashes end in either q
or a
š¤£
tt2
from @lyse and Twtxtory from @javivf?
@prologic@twtxt.net if I understand correctly itās just to increase hash size from 7 to 12 once it gets calculated, isnāt it? BTW is this change already approved? I still donāt understand how a proposal become an implementation in the twtxtverse š¤
if
clauses to this. My point is: Every time I see a hash, Iād like to have a hint as to where to find the corresponding twt.
The reason I think this can work so well and Iām in full support of it is that itās the least disruptive way to resolve the issue of:
where did this hash come from?
@prologic@twtxt.net Not sure Iād attach any if
clauses to this. My point is: Every time I see a hash, Iād like to have a hint as to where to find the corresponding twt.
@movq@www.uninformativ.de If weāre focusing on solving the āmissing rootsā problems. I would start to think about āclient recommendationsā. The first recommendation would be:
- Replying to a Twt that has no initial Subject must itself have a Subject of the form (hash; url).
This way itās a hint to fetching clients that follow B, but not A (in the case of no mentions) that the Subject/Root might (very likely) is in the feed url
.
If we must stick to hashes for threading, can we maybe make it mandatory to always include a reference to the original twt URL when writing replies?
Instead of
(<a href="https://yarn.girlonthemoon.xyz/search?q=%23123467">#123467</a>) hello foo bar
you would have
(<a href="https://yarn.girlonthemoon.xyz/search?q=%23123467">#123467</a> http://foo.com/tw.txt) hello foo bar
or maybe even:
(<a href="https://yarn.girlonthemoon.xyz/search?q=%23123467">#123467</a> 2025-04-30T12:30:31Z http://foo.com/tw.txt) hello foo bar
This would greatly help in reconstructing broken threads, since hashes are obviously unfortunately one-way tickets. The URL/timestamp would not be used for threading, just for discovery of feeds that you donāt already follow.
I donāt insist on including the timestamp, but having some idea which feed weāre talking about would help a lot.
7
to 12
and use the first 12
characters of the base32 encoded blake2b hash. This will solve two problems, the fact that all hashes today either end in q
or a
(oops) š
And increasing the Twt Hash size will ensure that we never run into the chance of collision for ions to come. Chances of a 50% collision with 64 bits / 12 characters is roughly ~12.44B Twts. That ought to be enough! -- I also propose that we modify all our clients and make this change from the 1st July 2025, which will be Yarn.social's 5th birthday and 5 years since I started this whole project and endeavour! š± #Twtxt #Update
July 1st. 63 days from now to implement a backward-incompatible change, apparently not open to other ideas like replacing blake with SHA, or discussing implementation challenges for other languages and platforms.
Finally just closing #18, #19 and #20 without starting a proper discussion and ignoring a āmicro consensusā feels⦠not right.
I donāt know what to think rather than letting it rest (May will be busy here) and focus on other stuff in the future.
7
to 12
and use the first 12
characters of the base32 encoded blake2b hash. This will solve two problems, the fact that all hashes today either end in q
or a
(oops) š
And increasing the Twt Hash size will ensure that we never run into the chance of collision for ions to come. Chances of a 50% collision with 64 bits / 12 characters is roughly ~12.44B Twts. That ought to be enough! -- I also propose that we modify all our clients and make this change from the 1st July 2025, which will be Yarn.social's 5th birthday and 5 years since I started this whole project and endeavour! š± #Twtxt #Update
I will be adding the code in for yarnd
very soon⢠for this change, with a if the date is >= 2025-07-01 then compute_new_hashes else compute_old_hashes
Finally I propose that we increase the Twt Hash length from 7
to 12
and use the first 12
characters of the base32 encoded blake2b hash. This will solve two problems, the fact that all hashes today either end in q
or a
(oops) š
And increasing the Twt Hash size will ensure that we never run into the chance of collision for ions to come. Chances of a 50% collision with 64 bits / 12 characters is roughly ~12.44B Twts. That ought to be enough! ā I also propose that we modify all our clients and make this change from the 1st July 2025, which will be Yarn.socialās 5th birthday and 5 years since I started this whole project and endeavour! š± #Twtxt #Update
I had Chick-fil-A breakfast today (sausage, egg, and cheese biscuit, hash browns, coffee, and orange juice). Then at lunch my work place offered hot dogs. I had two (kosher, if that matters), plus a coke, a macadamia nuts cookie, and a small chocolate brownie.
So, here I am, at home, feeling hungry but guilty and refusing to eat anything else for the rest of the day. To top it off, I have only clocked 4,000 steps today (and I donāt feel like walking). I am going to hell, am I?
dm-only.txt
feeds. š
by commenting out DMs are you giving up on simplicity? See the Metadata extension holding the data inside comments, as the client doesnāt need to show it inside the timeline.
I donāt think that commenting out DMs as we are doing for metadata is giving up on simplicity (itās a feature already), and it helps to hide unwanted DMs to clients that will take months to add itās support to something named⦠an extension.
For some other extensions in https://twtxt.dev/extensions.html (for example the reply-to hash #abcdfeg
or the mention @ < example http://example.org/twtxt.txt >
) is not a big deal. The twt is still understandable in plain text.
For DM, itās only interesting for you if you are the recipient, otherwise you see an scrambled message like 1234567890abcdef=
. Even if you see it, youāll need some decryption to read it. Iāve said before that DMs shouldnāt be in the same section that the timeline as itās confusing.
So my point stands, and as Iāve said before, we are discussing it as a community, so letās see what other maintainers add to the convo.
dm-only.txt
feeds. š
After reading you, @eapl.me@eapl.me, Iāll tell you my point of view.
In my opinion, a feed does not have to be equivalent to a timeline. A timeline is a representation of the feed adapted to a user. You may not be interested in seeing other peopleās threads or DMs. But perhaps they are interested in seeing mentions or DMs directed at them. It is important not to fall into the trap. With that clarificationā¦
I insist, this is my point of view, it is not an absolute truth: I donāt think extensions should be respectful of customers who are no longer maintained.
We cannot have a system that is simple, backwards compatible and extensible all at the same time. We have to give up some of the 3 points. I would not like to give up simplicity because it will then make it harder to maintain the customers who do stay. Therefore, I think it is better to give up backwards compatibility and play with new formulas in the extensions. I donāt think itās a good idea to make a hash keep so much load: a hashtag, a thread and also a DM.
MaxAgeDays
configuration at the pod level, that now some profiles are rather empty. This is only because well, they're a bit "inactive" so to speak š£ļø Not sure what to do about this at the moment... Open to ideas? š”
yes it used be http://
only and to keep hashes from breaking i added # url = http://...
and now we are stock with it due to the curret specs.
Hmmm thereās a bug somewhere in the way Iām ingesting archived feeds š¤
sqlite> select * from twts where content like 'The web is such garbage these days%';
hash = 37sjhla
feed_url = https://twtxt.net/user/prologic/twtxt.txt/1
content = The web is such garbage these days š Or is it the garbage search engines? š¤
created = 2024-11-14T01:53:46Z
created_dt = 2024-11-14 01:53:46
subject = #37sjhla
mentions = []
tags = []
links = []
sqlite>
Some A hole has been trying to pull every single Twtxt feed that existed/still exists since forever. How do I know? Welpā Theyāve been querying my Timeline⢠instance for all of it, every single twtxt file and twt Hash they can find. šš¤¦ It must have been going on for days and I have just noticed⦠+ itās all coming from the same ASN AS136907 HWCLOUDS-AS-AP HUAWEI CLOUDS
Thank you Huawei for the DDos you sons of Glitches!!!
@quark@ferengi.one No editing old Twts that are the root of a thread with replies in the ecosystem. Just results in a fork. Unless the client has an implementation that does not store Twts keyed by Hash.
Ha! I stand corrected, didnāt scrolled long enough. Indeed, it should be added (you will need an account on Millsā Gitea), noted.
si4er3q
. See https://twtxt.dev/exts/twt-hash.html, a timezone offset of +00:00
or -00:00
must be replaced by Z
.
@eaplme@eapl.me you wrote:
āThat PHP snippet could be merged into https://twtxt.dev/exts/twt-hash.htmlā
Why, though? AFAIK @andros@twtxt.andros.devās client is on Emacs, @lyse@lyse.isobeef.orgās is on Python (and Golang, for tt2
), @movq@www.uninformativ.deās is on Python, and @prologic@twtxt.netās is on Golang. All the client creator needs to know is in the documentation already, coding language agnostic.
si4er3q
. See https://twtxt.dev/exts/twt-hash.html, a timezone offset of +00:00
or -00:00
must be replaced by Z
.
just a note that we are doing that on PHP: https://github.com/eapl-gemugami/twtxt-php/blob/master/docs/03-hash-extension.md#php-72
That PHP snippet could be merged into https://twtxt.dev/exts/twt-hash.html
@david@collantes.us @andros@twtxt.andros.dev The correct hash would be si4er3q
. See https://twtxt.dev/exts/twt-hash.html, a timezone offset of +00:00
or -00:00
must be replaced by Z
.
(That said, thereās a bug in jenny as well. It only replaces +00:00
, not -00:00
. š¤”)
@andros@twtxt.andros.dev the hash on @aelaraji@aelaraji.comās last message (as I type this) is:
[si4er3q] [2025-04-16 22:49:11+00:00] [Am I tripping or `rsync` is actually THIS effing faster than `scp`!!? š«Ø]
So, si4er3q
@prologic@twtxt.net @bender@twtxt.net
What is the hash of the last message from?: https://aelaraji.com/twtxt.txt
dm-only.txt
feeds. š
@bender@twtxt.net @aelaraji@aelaraji.com The client should ignore twts if itās not compatible or not addressed to me. itās a simple regex to add! Itās similar to Twt Hash Extension, should they be in another file? They are child messages, not flat twt. Not of course!
@prologic@twtxt.net interesting. What would happen on a hash collision? š¤
@bender@twtxt.net Itās a bug in the UI for sure. The hash is the primary key.
@david@collantes.us Yeah, weāve been debugging that a bit yesterday. Looks like the wrong input (sometimes) gets fed to the hash function ā broken threads.
@movq@www.uninformativ.de @kat@yarn.girlonthemoon.xyz Heck yeah, thatās crazy! :-) Fingers crossed! (tt
also agrees with the right⢠hash)
./yarnc debug <your feed url>
:
The actual hash is fs7673q
.
./yarnc debug <your feed url>
:
@prologic@twtxt.net thatās not what I see. The hash znf6csa
cannot be found.
@prologic@twtxt.net There was no edit according to my Git history. š¤ On my end, the hash is fs7673q
and thatās also what kat used to reply.
Doesnāt look like it Hmmm
sqlite> select * from twts where content LIKE '%Linux installation%';
hash = znf6csa
feed_url = https://www.uninformativ.de/twtxt.txt
content = I wonder if my current Linux installation will actually make it to 20 years:
$ head -n 1 /var/log/pacman.log
[2011-07-07 11:19] installed filesystem (2011.04-1)
Itās not toooo far into the future.
It would be crazy ⦠20 years without reinstalling once ⦠phew. š„“
created = 2025-04-07T19:59:51Z
subject = (#znf6csa)
mentions = []
tags = []
links = []
@movq@www.uninformativ.de Apparently you wrote it :D The hash doesnāt lie? 𤣠https://twtxt.net/twt/znf6csa
@prologic@twtxt.net What happened here ā did I edit my twt or is this hash wrong? š„“
@prologic@twtxt.net Spring cleanup! Thatās one way to encourage people to self-host their feeds. :-D
Since Iām only interested in the url
metadata field for hashing, I do not keep any comments or metadata for that matter, just the messages themselves. The last time I fetched was probably some time yesterday evening (UTC+2). I cannot tell exactly, because the recorded last fetch timestamp has been overridden with todayās by now.
I dumped my new SQLite cache into: https://lyse.isobeef.org/tmp/backup.tar.gz This time maybe even correctly, if youāre lucky. Iām not entirely sure. It took me a few attempts (date and time were separated by space instead of T
at first, I normalized offsets +00:00
to Z
as yarnd does and converted newlines back to U+2028
). At least now the simple cross check with the Twtxt Feed Validator does not yield any problems.
@andros@twtxt.andros.dev sha256 hash of twt in json. Look at converter script
Amazing! It is a good tool for reading feeds. What you used to calculate the hash?
Hello, i want to present my new revolution twtxt v3 format - twjson
Thatās why you should use it:
- Itās easy to to parse
- Itās easy to read (in formatted mode :D)
- It used actually \n for newlines, you donāt need unprintable symbols
- Forget about hash collisions because using full hash
Here is my twjson feed: https://doesnm.p.psf.lt/twjson.json
And twtxt2json converter: https://doesnm.p.psf.lt/twjson.js
@eapl.me@eapl.me Interesting! Two points stood right out to me:
Why the hell are e-mail newsletters considered a valid option in the first place? Just offer an Atom feed and be done with it! Especially for a blog of this very type. This doesnāt even involve a third party service. Although, in addition he also links to Feedburner, what the fuck!? No e-mail address or the like is needed and subject to being disclosed.
When these spam mailers want to prevent resubscribing, then for fuckās sake, why donāt they use a hash of the e-mail address (I saw that in yarnd) for that purpose? Storing the e-mail address in clear text after unsubscribing is illegal in my book.
There are 82.108 read statuses, but only 24.421 messages in the cache. In contrast to the cache with the messages, the read statuses are never cleaned up when a feed was unsubscribed from. And the read statuses also contain old style hashes, before we settled on the what we have today. Still a huge difference. Hmm.
tt
reimplementation that I already followed with the old Python tt
. Previously, I just had a few feeds for testing purposes in my new config. While transfering, I "dropped" heaps of feeds that appeared to be inactive.
Thanks, @movq@www.uninformativ.de!
My backing SQLite database with indices is 8.7 MiB in size right now.
The twtxt
cache is 7.6 MiB, it uses Pythonās pickle
module. And next to it there is a 16.0 MiB second database with all the read statuses for the old tt
. Wow, super inefficient, it shouldnāt contain anything else, itās a giant, pickled {"$hash": {"read": True/False}, ā¦}
. What the heck, why is it so big?! O_o
(Back in tt
.) Well, it kinda worked. At least appending to the file. But my cache database got screwed up. I do not yet support replies, so the subject and and root hash columns have not been set at all, resulting in a message that is just not shown at all. I gotta do something about that next. The good thing is, though, after simply fixing the two columns the message appeared on screen.
@bender@twtxt.net Yeah, as you mentioned in the other thread, @andros@twtxt.andros.devās hashes appear to be not quite right. š¤
@andros@twtxt.andros.dev your client is breaking things, I am afraid. This hash (ptxsca
), which you seem to be using to reply to @movq@www.uninformativ.de is not the right one.
@movq@www.uninformativ.de I have no doubt that youāre not seeing the images correctly š. Itās just that itās broken when viewing them, in my case, and analyzing the URLs, Iāve seen everything I mentioned.
Regarding the hash, youāre right. Iāll have to investigate whatās going on. Iām having a hard time getting the hash generation to work properly.
@andros@twtxt.andros.dev Hm, looks correct to me. The image to be displayed is a thumbnail and this links to the full-sized image. The thumbnail (JPG) is auto-generated from the full image (PNG), hence the two extensions.
What does look strange, though, is that your client came up with the hash pqsmcka
, while it should have been te5quba
. š¤
Why not just use registry? It can be personal or hosted by someone like registry.twtxt.org. Just need to be adapt to support hashes
@prologic@twtxt.net We canāt agree on this idea because that makes things even more complicated than it already is today. The beauty of twtxt is, you put one file on your server, done. One. Not five million. Granted, there might be archive feeds, so it might be already a bit more, but still faaaaaaar less than one file per message.
Also, you would need to host not your own hash files, but everybody elseās as well you follow. Otherwise, what is that supposed to achieve? If people are already following my feed, they know what hashes I have, so this is to no use of them (unless they want to look up a message from an archive feed and donāt process them). But the far more common scenario is that an unknown hash originates from a feed that they have not subscribed to.
Additionally, yarndās URL schema would then also break, because https://twtxt.net/twt/<hash>
now becomes https://twtxt.net/user/prologic/<hash>
, https://twtxt.net/user/bender/<hash>
and so on. To me, that looks like you would only get hashes if they belonged to this particular user. Of course, you could define rules that if there is a /user/
part in the path, then use a different URL, but this complicates things even more.
Sorry, I donāt like that idea.
One of the biggest gripes of the community with the way the threading model currently works with Twtxt v1.2 (https://twtxt.dev) is this notion of:
What is this hash?
What does it refer to?
Idea: Why canāt we all agree to implement a simple URI scheme where we host our Twtxt feeds?
That is, if you host your feed at https://example.com/twtxt.txt
ā Why canāt or could you not also host various JSON files (letās agree on the spec of course) at https://example.com/twt/<hash>
? š¤
That way we solve this problem in a truly decentralised way, rather than every relying on yarnd
pods alone.
a few async ideas for later
The editing process needs a lot of consideration and compromises.
From one side, editing and deleting itās necessary IMO. People will do it anyway, and personally I like to edit my texts, so Iād put some effort on make it work.
Should we keep a history of edits? Should we hash every edit to avoid abuse? Should we mark internally a twt as deleted, but keeping the replies?
I think thatās part of a more complete āthreadā extension, although Iād say itās worth to agree on something reflecting the real usage in the wild, along with what people usually do on other platforms.
looks good to me!
About aliceās hash, using SHA256, I get 96473b4f
or 96473B4F
for the last 8 characters. Iāll add it as an implementation example.
The idea of including it besides the follow URL is to avoid calculating it every time we load the file (assuming the client did that correctly), and helps to track replies across the file with a simple search.
Also, watching your example Iām thinking now that instead of {url=96473B4F,id=1}
which is ambiguous of which URL we are referring to, it could be something like:
{reply_to=[URL_HASH]_[TWT_ID]}
/ {reply_to=96473B4F_1}
That way, the āfull twt IDā could be 96473B4F_1
.