True. Though if the idea turns out to be better.. then community will adopt it.
if you look at the subject for that twt you will see that it uses the extended hash format to include a URL address.
@bmallred@staystrong.run I forgot one more effect of edits. If clients remember the read status of massages by hash, an edit will mark the updated message as unread again. To some degree that is even the right behavior, because the message was updated, so the user might want to have a look at the updated version. On the other hand, if itās just a small typo fix, itās maybe not worth to tell the user about. But the client doesnāt know, at least not with additional logic.
Having said that, it appears that this only affects me personally, noone else. I donāt know of any other client that saves read statuses. But donāt worry about me, all good. Just keep doing what youāve done so far. I wanted to mention that only for the sake of completeness. :-)
I like this syntax, you have my vote, although Iād change it a bit like
#<Alice https://example.com/twtxt.com#2024-12-18T14:18:26+01:00>
Hashes are not a problem on PHP, I dont know why itās slow to calculate them from your side, but I agree with your points.
BTW, did you have the chance to read my proposal on twtxt 2.0? I shared a few ideas about possible improvements to discuss:
https://text.eapl.mx/a-few-ideas-for-a-next-twtxt-version
https://text.eapl.mx/reply-to-lyse-about-twtxt
@bmallred@staystrong.run Any edit automatically changes the twt hash, because the hash is built over the hash URL, message timestamp and message text. https://twtxt.dev/exts/twt-hash.html So, it is only a problem, if somebody replied to your original message with the old hash. The original message suddenly doesnāt exist anymore and the reply becomes detached, orphaned, whatever you wanna call it. Threading doesnāt break, though, if nobody replied to your message.
@andros@twtxt.andros.dev I believe you have just reproduced the bug⦠it looks like youāve replayed to a twt but the hash is wrong. I can see the hash here from Jenny, but it doesnāt look like it corresponds to any{twt,thing}. if you check it out on any yarn instance it wonāt look like a replay.
My hypothesis about that thing breaking my twts is that it might have something to do with the parenthesis surrounding the root twt hash in the replay twt-A
when I replay to it with fork-twt-B
; I imagine elisp interpreting those as a s-expression thus breaking the generation precess of hash (#twt-A) before prepending it to for-twt-B
⦠but then Iām too ignorant to figure out how to test my theory (heck I couldnāt even recalculate the hashes myself correctly in bash xD). Iāll keep trying tho.
@andros@twtxt.andros.dev yes, that usually happens when twts get edited and we just made a gentlemen agreement to avoid edits as much as possible (at least for the time being). But the thing is, That is not whatās happening with my broken twtsā hashes. Since Iāve bee mostly replaying to my own twts as a test and I know for sure that I havenāt edited any. (I usually fork-replay instead of edit a twt when needed)
@aelaraji@aelaraji.com Can you give me examples of hashes that you have detected wrong between Emacs client and twtxt.net?
Perhaps there is some character, some space, that is creating the discrepancy.
@prologic@twtxt.net Agreed! But clients can hallucinate and generate wrong hashes
aka Lies
𤣠Also, If you chheck your own twt on twtxt.net, it looks like a root twt instead of a replay.
the hash in that test replay should have been s243lua
instead š¤·
@prologic@twtxt.net Are you sure? xD ⦠it was supposed to be a replay to another twt, but the twt hash is wrong (I think).
@andros@twtxt.andros.dev is it me or twtxt-el generates a wrong twt hash when I use the [ ā³ Reply to twt ]
button?
@prologic@twtxt.net Of course you donāt notice it when yarnd only shows at most the last n messages of a feed. As an example, check out mckinleyās message from 2023-01-09T22:42:37Z. It has ā[Scheduled][Scheduled][Scheduled]ā⦠in it. This text in square brackets is repeated numerous times. If you search his feed for closing square bracket followed by an opening square bracket (][
) you will find a bunch more of these. It goes without question he never typed that in his feed. My client saves each twt hash Iāve explicitly marked read. A few days ago, I got plenty of apparently years old, yet suddenly unread messages. Each and every single one of them containing this repeated bracketed text thing. The only conclusion is that something messed up the feed again.
Definitely something going on with replies. This one was replying to the wrong twt and even when I got clever and pasted the right hash it didnāt work.
Iām realizing that my performance bottleneck is @prologic@twtxt.net ! It is actually calculating the hash to make the replicas, and specifically users with very long feeds š . Iām seriously thinking about enabling replies via configuration.
Well, thatās another bug: The search https://twtxt.net/search?q=%22LOOOOL%2C+great+programming+tutorial+music%22 yields the wrong hash. It should have been poyndha instead.
@lyse@lyse.isobeef.org Danke! Ja, es gibt noch unzƤhlige Stellschrauben an dem Ding. Deine Anmerkungen werde ich einarbeiten. Eine mobile Ansicht wƤr auch noch schƶn. Derzeit sitzt es auf dem Smartphone doch noch recht stramm.
@Unterhaltungen: Die von gestern zu verschlüsselten Nachrichten war ausschlaggebend für die Umsetzung. In āTimelineā und āYarnā haben mich die LƶsungsansƤtze bisher nicht überzeugt. Aber wir kƶnnen ja alle etwas von einander lernen.
another one would be to allow changing public keys over time (as it may be a good practice [0]
). A syntax like the following could help to know what public key you used to encrypt the message, and which private key the client should use to decrypt it:
!<nick url> <encrypted_message> <public_key_hash_7_chars>
Also Iād remove support for storing the message as hex, only allowing base64 (more compact, aiming for a minimalistic spec, etc.)
Or using the same twt hash method, but only for the URL, to generate the nick, if it doesnāt exist, like so, @5vxo4ia@twtxt.net
@doesnmppsflt@doesnm.p.psf.lt Not sure which bug youāre referring to. š¤ (Did I forget?)
Those long IDs like (#113797927355322708) are simply part of that feed. Looks like the author just dumps ActivityPub IDs into twtxt. I think this used to work in the past, but the corresponding spec (https://twtxt.dev/exts/hash-tag.html) has been deprecated and jenny doesnāt support ā actually, jenny never supported that.
jenny can only group threads by exactly one criterium (because it writes a Message-ID
into the mail file) and thatās the regular twt hash. So, anything else, like people doing ā#CoolTopicā, isnāt possible.
Iām still making progress with the Emacs client. Iām proud to say that the code that is responsible for reading the feeds is almost finished, including: Twt Hash Extension, Twt Subject Extension, Multiline Extension and Metadata Extension. Iām fine-tuning some tests and will soon do the first buffer that displays the twts.
Added TwtHash hashes to every message on my personal Twtxt HTML renderer. Code is not yet ready for prime-time. Need to work out some kinks still.
@aelaraji@aelaraji.com You could use https://lyse.isobeef.org/tmp/twthash.py to generate twt hashes. I cobbled that together in order to generate test data for my client.
@bender@twtxt.net So turns out something is setting my HashingURI to the value {{ .Profile.URI }}
and that is making my hashes wrong so it cannot delete or edit twts.
āDesigned by Apple in Californiaā stopped being sold in 2019. It sold for $199, and $299 (two sizes). Check the eBay pricing on that link, if you want to know the selling price today.
Righto, @eapl.me@eapl.me, ta for the writeup. Here we go. :-)
Metadata on individual twts are too much for me. I do like the simplicity of the current spec. But I understand where youāre coming from.
Numbering twts in a feed is basically the attempt of generating message IDs. Itās an interesting idea, but I reckon it is not even needed. Iād simply use location based addressing (feed URL + ā#ā + timestamp) instead of content addressing. If one really wanted to, one could hash the feed URL and timestamp, but the raw form would actually improve disoverability and would not even require a richer client. But the majority of twtxt users in the last poll wanted to stick with content addressing.
yarnd actually sends If-Modified-Since
request headers. Not only can I observe heaps of 304 responses for yarnds in my access log, but in Cache.FetchFeeds(ā¦)
we can actually see If-Modified-Since
being deployed when the feed has been retrieved with a Last-Modified
response header before: https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/cache.go#L1278
Turns out etags with If-None-Match
are only supported when yarnd serves avatars (https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/handlers.go#L158) and media uploads (https://git.mills.io/yarnsocial/yarn/src/commit/98eee5124ae425deb825fb5f8788a0773ec5bdd0/internal/media_handlers.go#L71). However, it ignores possible etags when fetching feeds.
I donāt understand how the discovery URLs should work to replace the User-Agent
header in HTTP(S) requests. Do you mind to elaborate?
Different protocols are basically just a client thing.
I reckon itās best to just avoid mixing several languages in one feed in the first place. Personally, I find it okay to occasionally write messages in other languages, but if that happens on a more regularly basis, Iād definitely create a different feed for other languages.
Isnāt the emoji thing ājustā a client feature? So, feed do not even have to state any emojis. As a user Iād configure my client to use a certain symbol for feed ABC. Currently, I can do a similar thing in tt
where I assign colors to feeds. On the other hand, what if a user wants to control what symbol should be displayed, similar to the feedās nick? Hmm. But still, my terminal font doesnāt even render most of emojis. So, Unicode boxes everywhere. This makes me think it should actually be a only client feature.
similar to data packets in NDN, each message has multiple names. a true name, which is an encoded cryptographic hash of the file itself. we call this kind of information self-certifying. given a true name, you can find a file and verify its integrity. additionally, agents can associate a self-certifying name with a pet name or subjective label of their choosing and share it with their friends/peers. zokoās triangle can suck it. gemini://sunshinegardens.org/~xjix/wiki/cryptogenāspecification/
@movq@www.uninformativ.de iām sorry if I sound too contrarian. Iām not a fan of using an obscure hash as well. The problem is that of future and backward compatibility. If we change to sha256 or another we donāt just need to support sha256. But need to now support both sha256 AND blake2b. Or we devide the community. Users of some clients will still use the old algorithm and get left behind.
Really we should all think hard about how changes will break things and if those breakages are acceptable.
I share I did write up an algorithm for it at some point I think it is lost in a git comment someplace. Iāll put together a pseudo/go code this week.
Super simple:
Making a reply:
- If yarn has one use that. (Maybe do collision check?)
- Make hash of twt raw no truncation.
- Check local cache for shortest without collision
- in SQL:
select len(subject) where head_full_hash like subject || '%'
- in SQL:
Threading:
- Get full hash of head twt
- Search for twts
- in SQL:
head_full_hash like subject || '%' and created_on > head_timestamp
- in SQL:
The assumption being replies will be for the most recent head. If replying to an older one it will use a longer hash.
If we stuck with Blake2b for Twt Hash(es); what do we think we need to reasonably go to in bit length/size?
=> https://gist.mills.io/prologic/194993e7db04498fa0e8d00a528f7be6
e.g: (turns out @xuu@txt.sour.is is right about Blak2b being easy/simple too!):
$ printf "%s\t%s\t%s" "https://example.com/twtxt.txt" "2024-09-29T13:30:00Z" "Hello World!" | b2sum -l 32 -t | awk '{ print $1 }'
7b8b79dd
These collisions arenāt important unless someone tries to fork. So.. for the vast majority its not a big deal. Using the grow hash algorithm could inform the client to add another char when they fork.
Thanks @david@collantes.us, good to know, but we need to agree on what character we use, otherwise the hashes will not be the same:)
Ok, i know how to command working (not sure), but seems it only grab from cache. Maybe make fetch from twtxt.net if hash not found?
@prologic@twtxt.net Regarding the new way of generating twt-hashes, to me it makes more sense to use tabs as separator instead of spaces, since the you can just copy/past a line directly from a twtxt-file that already go a tab between timestamp and message. But tabs might be hard to ātypeā when you are in a terminal, since it will activate autocompleteā¦š¤
Another thing, it seems that you sugget we only use the domain in the hash-creation and not the full path to the twtxt.txt
$ echo -e "https://example.com 2024-09-29T13:30:00Z Hello World!" | sha256sum - | awk '{ print $1 }' | base64 | head -c 12
@doesnm@doesnm.p.psf.lt twt
probably isnāt the best client Iām afraid. It doesnāt really cache twts by their key (hash) to display threads properly. Jenny however does š
twet display twts in raw format with some formatting (sadly no newlines). And for reply messages i just seen (#hash). But which text hidden on hash? currenly im open twtxt.net/twt/hash to see this
š Thanks for joining us on our Sept monthly Yarn.social meetup today yāall šāāļø We had @david@collantes.us @sorenpeter@darch.dk @doesnm@doesnm.p.psf.lt @falsifian@www.falsifian.org and @xuu@txt.sour.is šŖ Nice turn out! (not all at once of course, as we normally run this over 4 hours as we span many time zones!)
Things we talked about:
- Decentralised vs. Distributed
- Use of SHA256 for Twt Hash(es)
- We solved Edits! š„³
- UUID(s) probably wonāt work! (susceptible to sppofing)
- Helped @sorenpeter@darch.dk write some PHP to process/parse
User-Agent
and service his feed via a custom PHP script š
- @falsifian@www.falsifian.org introduced himself š
- Talked about Merkle Trees š³
Did I miss anything? š¤
More thoughts about changes to twtxt (as if we havenāt had enough thoughts):
- There are lots of great ideas here! Is there a benefit to putting them all into one document? Seems to me this could more easily be a bunch of separate efforts that can progress at their own pace:
1a. Better and longer hashes.
1b. New possibly-controversial ideas like edit: and delete: and location-based references as an alternative to hashes.
1c. Best practices, e.g. Content-Type: text/plain; charset=utf-8
1d. Stuff already described at dev.twtxt.net that doesnāt need any changes.
We wonāt know what will and wonāt work until we try them. So Iām inclined to think of this as a bunch of draft ideas. Maybe later when weāve seen it play out it could make sense to define a group of recommended twtxt extensions and give them a name.
Another reason for 1 (above) is: I like the current situation where all you need to get started is these two short and simple documents:
https://twtxt.readthedocs.io/en/latest/user/twtxtfile.html
https://twtxt.readthedocs.io/en/latest/user/discoverability.html
and everything else is an extension for anyone interested. (Deprecating non-UTC times seems reasonable to me, though.) Having a big long ātwtxt v2ā document seems less inviting to people looking for something simple. (@prologic@twtxt.net you mentioned an anonymous comment āyouāve ruined twtxtā and while I donāt completely agree with that commenterās sentiment, I would feel like twtxt had lost something if it moved away from having a super-simple core.)All that being said, these are just my opinions, and Iām not doing the work of writing software or drafting proposals. Maybe I will at some point, but until then, if youāre actually implementing things, youāre in charge of what you decide to make, and Iām grateful for the work.
@prologic@twtxt.net Done. Also, I went ahead and made two changes: changed hexadecimal to base64 for hashes (wasnāt sure if anyone objected), and changed āMUST follow the chainā to āSHOULD follow the chain.
We:
- Drop
# url=
from the spec.
- We donāt adopt
# uuid =
ā Something @anth@a.9srv.net also mentioned (see below)
We instead use the @nick@domain
to identify your feed in the first place and use that as the identify when calculating Twt hashes <id> + <timestamp> + <content>
. Now in an ideal world I also agree, use WebFinger for this and expect that for the most part youāll be doing a WebFinger lookup of @user@domain
to fetch someoneās feed in the first place.
The only problem with WebFinger is should this be mandated or a recommendation?
@prologic@twtxt.net a wise plan! Who knows, ideas change, and often plans do not hash, right? Mature, mature! :-)
Good writeup, @anth@a.9srv.net! I agree to most of your points.
3.2 Timestamps: I feel no need to mandate UTC. Timezones are fine with me. But I could also live with this new restriction. I fail to see, though, how this change would make things any easier compared to the original format.
3.4 Multi-Line Twts: What exactly do you think are bad things with multi-lines?
4.1 Hash Generation: I do like the idea with with a new uuid
metadata field! Any thoughts on two feeds selecting the same UUID for whatever reason? Well, the same could happen today with url
.
5.1 Reply to last & 5.2 More work to backtrack: I do not understand anything youāre saying. Can you rephrase that?
8.1 Metadata should be collected up front: I generally agree, but if the uuid
metadata field were a feed URL and no real UUID, there should be probably an exception to change the feed URL mid-file after relocation.
yarnd
does for example) and equally a 5x increase in on-disk storage as well. This is based on the Twt Hash going from a 13 bytes (content-addressing) to 63 bytes (on average for location-based addressing). There is roughly a ~20-150% increase in the size of individual feeds as well that needs to be taken into consideration (on the average case).
(#2024-09-24T12:44:35Z) There is a increase in space/memory for sure. But calculating the hashes also takes up CPU. Iām not good with that kind of math, but itās a tradeoff either way.
There is also a ~5x increase cost in memory utilization for any implementations or implementors that use or wish to use in-memory storage (yarnd
does for example) and equally a 5x increase in on-disk storage as well. This is based on the Twt Hash going from a 13 bytes (content-addressing) to 63 bytes (on average for location-based addressing). There is roughly a ~20-150% increase in the size of individual feeds as well that needs to be taken into consideration (on the average case).
So really your argument is just that switching to a location-based addressing ājust makes senseā. Why? Without concrete pros/cons of each approach this isnāt really a strong argument Iām afraid. In fact I probably need to just sit down and detail the properties of both approaches and the pros/cons of both.
I also donāt really buy the argument of simplicity either personally, because I donāt technically see it much more difficult to take a echo -e "<url>\t<timestamp>\t<content>" | sha256sum | base64
as the Twt Subject or concatenating the <url> <timestamp>
ā The āeffortā is the same. If weāre going to argue that SHA256 or cryptographic hashes are ātoo complicatedā then Iām not really sure how to support that argument.
@falsifian@www.falsifian.org I believe the preserve means to include the original subject hash in the start of the twt such as (#somehash)
Sorry, youāre right, I should have used numbers!
Iām donāt understand what āpreserve the original hashā could mean other than āmake sure thereās still a twt in the feed with that hashā. Maybe the text could be clarified somehow.
Iām also not sure what you mean by markdown already being part of it. Of course people can already use Markdown, just like presumably nothing stopped people from using (twt subjects) before they were formally described. But itās not universal; e.g. as a jenny user I just see the plain text.
@prologic@twtxt.net Thanks for writing that up!
I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.
I am not sure how I feel about all this being done at once, vs. letting conventions arise.
For example, even today I could reply to twt abc1234 with ā(#abc1234) Edit: ā¦ā and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.
Similarly we could just start using 11-digit hashes. We should iron out whether itās sha256 or whatever but thereās no need get all the other stuff right at the same time.
I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).
However I recognize that Iām not the one implementing this stuff, and itās less work to just have everything determined up front.
Misc comments (I havenāt read the whole thing):
Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. Iād suggest gaining 11 bits with base64 instead.
āClients MUST preserve the original hashā ā do you mean they MUST preserve the original twt?
Thanks for phrasing the bit about deletions so neutrally.
I donāt like the MUST in āClients MUST follow the chain of reply-to referencesā¦ā. If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldnāt declare the client non-conforming just because they didnāt get to all the bells and whistles.
Similarly I donāt like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (Iām again thinking again of the 40-line shell script).
For āwho followsā lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?
Why canāt feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasnāt too bad, but 1.0 would have been slightly simpler.
Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.
Iām a little sad about other protocols being not recommended.
I donāt know how I feel about including markdown. I donāt mind too much that yarn users emit twts full of markdown, but Iām more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.
Had to build a list of all feeds (that I follow) and all twts in them and there are two collisions already:
$ ./stats
Saw 58263 hashes
7fqcxaa
https://twtxt.net/user/justamoment/twtxt.txt
https://twtxt.net/user/prologic/twtxt.txt
ntnakqa
https://twtxt.net/user/prologic/twtxt.txt
https://twtxt.net/user/thecanine/twtxt.txt
Namely:
$ jenny -D https://twtxt.net/user/justamoment/twtxt.txt | grep 7fqcxaa
[7fqcxaa] [2022-12-28 04:53:30+00:00] [(#pmuqoca) @prologic@twtxt.net I checked the GitHub discussion, it became a request to join forces.
Do you plan on having them join?
Also for the name, how about:
- āprogitā or āprologitā (prologic official hard fork)
- āgit-stanceā (git instance)
- āGitTreeā (Gitea inspired, maybe to related)
- āGitomataā (git automata)
- āGit.Sourceā
- āForgorā (forgit is taken so I forgor) š¤£
- āSweetGitā (as salty chat)
- āPepper Gitā (other ingredients) š
- āGitHeartā (core of git with a GitHub sounding name)
- āGitTakaā (With music in mind)
Ok, enough fun⦠Hope this helps sprout some ideas from others if nothing is to your taste.]
$ jenny -D https://twtxt.net/user/prologic/twtxt.txt/5 | grep 7fqcxaa
[7fqcxaa] [2022-02-25 21:14:45+00:00] [(#bqq6fxq) Itās handled by blue Monday]
And:
$ jenny -D https://twtxt.net/user/thecanine/twtxt.txt | grep ntnakqa
[ntnakqa] [2022-01-23 10:24:09+00:00] [(#2wh7r4q) <a href="https://yarn.girlonthemoon.xyz/external?uri=https://twtxt.net/user/prologic/twtxt.txt">@prologic<em>@twtxt.net</em></a> I know, I was just hoping it might have also gotten fixed by that change, by some kind of backend miracles. š]
$ jenny -D https://twtxt.net/user/prologic/twtxt.txt/1 | grep ntnakqa
[ntnakqa] [2024-02-27 05:51:50+00:00] [(#otuupfq) <a href="https://yarn.girlonthemoon.xyz/external?uri=https://twtxt.net/user/shreyan/twtxt.txt">@shreyan<em>@twtxt.net</em></a> Ahh š]