👋 Thanks for joining us on our Sept monthly Yarn.social meetup today y’all 🙇♂️ We had @david@collantes.us @sorenpeter@darch.dk @doesnm@doesnm.p.psf.lt @falsifian@www.falsifian.org and @xuu@txt.sour.is 💪 Nice turn out! (not all at once of course, as we normally run this over 4 hours as we span many time zones!)
Things we talked about:
- Decentralised vs. Distributed
- Use of SHA256 for Twt Hash(es)
- We solved Edits! 🥳
- UUID(s) probably won’t work! (susceptible to sppofing)
- Helped @sorenpeter@darch.dk write some PHP to process/parse
User-Agent
and service his feed via a custom PHP script 😅
- @falsifian@www.falsifian.org introduced himself 👌
- Talked about Merkle Trees 🌳
Did I miss anything? 🤔
@lyse@lyse.isobeef.org It’s from 12pm to 4pm UTC so if you can make it at all, that’d be great 👍
james
instead 🤣
@doesnm@doesnm.p.psf.lt Are you sure? Not seen the mail yet…
@doesnm@doesnm.p.psf.lt Ooops you might want to re-send that to james
instead 🤣
@doesnm@doesnm.p.psf.lt My Salty public key is:
kex1fhxntuc0av7q48hlfj970ve297dzzghn82wp5cahr9r92y8rlrqqtwp983
@doesnm@doesnm.p.psf.lt Do you have a sample Caddy log file you can supply? I’ll see if we can improve the tool 👌
@doesnm@doesnm.p.psf.lt Fot a sample access log? Which tool are you using?
@doesnm@doesnm.p.psf.lt I couldn’t find any references to this anywhere either.
@doesnm@doesnm.p.psf.lt Like now?
We:
- Drop
# url=
from the spec.
- We don’t adopt
# uuid =
– Something @anth@a.9srv.net also mentioned (see below)
We instead use the @nick@domain
to identify your feed in the first place and use that as the identify when calculating Twt hashes <id> + <timestamp> + <content>
. Now in an ideal world I also agree, use WebFinger for this and expect that for the most part you’ll be doing a WebFinger lookup of @user@domain
to fetch someone’s feed in the first place.
The only problem with WebFinger is should this be mandated or a recommendation?
Something @anth@a.9srv.net said on ITC
17:42 I should also note in there that it doesn’t address the two things i really want it to: mandate utf-8 (which should be easy to fit in) and something for better @ mentions.
I actually agree with in both counts and it got me thinking…
you’ve ruined twtxt
Not sure what to say here. 🤔
Sharing the comments of the poll (anonymous so I have no idea whom the comments are from):
your poll should include questions about markdown. personally i think inline bits like style, links, images are yes. block quotes, code blocks, bullet lists are mid. but tables and footnotes are no.
Yes sorry about this, I wasn’t able to change much after publishing the poll 😅
Gemini/Gopher Twtxt feeds account for less than 1% in existence:
$ total=$(inspect-db yarns.db | jq -r '.Value.URL' | awk -F'//' '{if ($1 ~ /^https?/) print "http/https:"; else print $1}' | sort | uniq -c | awk '{sum+=$1} END {print sum}'); inspect-db yarns.db | jq -r '.Value.URL' | awk -F'//' '{if ($1 ~ /^https?/) print "http/https:"; else print $1}' | sort | uniq -c | awk -v total="$total" '{printf "%d %s %.2f%%\n", $1, $2, ($1/total)*100}' | sort -r
7 gemini: 0.66%
4 gopher: 0.38%
1046 http/https: 98.96%
@bender@twtxt.net Re that broken thread (#bqor23a)
. Its the same one. My pod doesn’t have the Root Twt: https://twtxt.net/twt/bqor23a => 404 Not Found.
How in the hell did you even reply to this in the first place?
@david@collantes.us SQLite
“For every complex problem, there is a solution that is clear, simple, and wrong.”
– H.L. Mencken
“Everything should be made as simple as possible, but not simpler.”
– Albert EinsteinThe beauty of simplicity lies in not losing the essence.
Starting a couple of new projects (geez where do I find the time?!):
HomeTunnel:
HomeTunnel is a self-hosted solution that combines secure tunneling, proxying, and automation to create your own private cloud. Utilizing Wireguard for VPN, Caddy for reverse proxying, and Traefik for service routing, HomeTunnel allows you to securely expose your home network services (such as Gitea, Poste.io, etc.) to the Internet. With seamless automation and on-demand TLS, HomeTunnel gives you the power to manage your own cloud-like environment with the control and privacy of self-hosting.
CraneOps:
craneops is an open-source operator framework, written in Go, that allows self-hosters to automate the deployment and management of infrastructure and applications. Inspired by Kubernetes operators, CraneOps uses declarative YAML Custom Resource Definitions (CRDs) to manage Docker Swarm deployments on Proxmox VE clusters.
And finally the legibility of feeds when viewing them in their raw form are worsened as you go from a Twt Subject of (#abcdefg12345)
to something like (https://twtxt.net/user/prologic/twtxt.txt 2024-09-22T07:51:16Z)
.
There is also a ~5x increase cost in memory utilization for any implementations or implementors that use or wish to use in-memory storage (yarnd
does for example) and equally a 5x increase in on-disk storage as well. This is based on the Twt Hash going from a 13 bytes (content-addressing) to 63 bytes (on average for location-based addressing). There is roughly a ~20-150% increase in the size of individual feeds as well that needs to be taken into consideration (on the average case).
So really your argument is just that switching to a location-based addressing “just makes sense”. Why? Without concrete pros/cons of each approach this isn’t really a strong argument I’m afraid. In fact I probably need to just sit down and detail the properties of both approaches and the pros/cons of both.
I also don’t really buy the argument of simplicity either personally, because I don’t technically see it much more difficult to take a echo -e "<url>\t<timestamp>\t<content>" | sha256sum | base64
as the Twt Subject or concatenating the <url> <timestamp>
– The “effort” is the same. If we’re going to argue that SHA256 or cryptographic hashes are “too complicated” then I’m not really sure how to support that argument.
@sorenpeter@darch.dk Points 2 & 3 aren’t really applicable here in the discussion of the threading model really I’m afraid. WebMentions is completely orthogonal to the discussion. Further, no-one that uses Twtxt really uses WebMentions, whilst yarnd
supports the use of WebMentions, it’s very rarely used in practise (if ever) – In fact I should just drop the feature entirely.
The use of WebSub OTOH is far more useful and is used by every single yarnd
pod everywhere (no that there’s that many around these days) to subscribe to feed updates in ~near real-time without having the poll constantly.
Can someone make the edit?
So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.
$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...
So I’m a location based system, how exactly do I reply to one of these two Twts from @Yarns@search.twtxt.net ? 🤔
2024-09-07T12:55:56Z 🥳 NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z 🥳 NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
Okay folks, I’ve spent all day on this today, and I think its in “good enough”™ shape to share:
Twtxt v2:
- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b
LOl 😂 Not only have a tried to write up a full Twtxt v2 specification, I’ve also written a Bash shell script that implements the new spec 😅
👋 Reminder folks of the upcoming Yarn.social monthly online meetup:
I hope to see @david@collantes.us @movq@www.uninformativ.de @lyse@lyse.isobeef.org @xuu@txt.sour.is @sorenpeter@darch.dk and hopefully others too @aelaraji@aelaraji.com @falsifian@www.falsifian.org and anyone else that sees this! 🙏 We’re hopefully going to primarily discuss the future of Twtxt and the last few weeks of discussions 🤣
- Event: Yarn.social Online Meetup
- When: 28th September 2024 at 12:00pm UTC (midday)
- Where: Mills Meet : Yarn.social
- Cadence: 4th Saturday of every Month
Agenda:
- Let’s talk about the upcoming changes to the Twtxt spec(s)
- See #xgghhnq
- See #xgghhnq
@aelaraji@aelaraji.com This is one of the reasons why yarnd
has a couple of settings with some sensible/sane defaults:
I could already imagine a couple of extreme cases where, somewhere, in this peaceful world one’s exercise of freedom of speech could get them in Real trouble (if not danger) if found out, it wouldn’t necessarily have to involve something to do with Law or legal authorities. So, If someone asks, and maybe fearing fearing for… let’s just say ‘Their well being’, would it heart if a pod just purged their content if it’s serving it publicly (maybe relay the info to other pods) and call it a day? It doesn’t have to be about some law/convention somewhere … 🤷 I know! Too extreme, but I’ve seen news of people who’d gone to jail or got their lives ruined for as little as a silly joke. And it doesn’t even have to be about any of this.
There are two settings:
$ ./yarnd --help 2>&1 | grep max-cache
--max-cache-fetchers int set maximum numnber of fetchers to use for feed cache updates (default 10)
-I, --max-cache-items int maximum cache items (per feed source) of cached twts in memory (default 150)
-C, --max-cache-ttl duration maximum cache ttl (time-to-live) of cached twts in memory (default 336h0m0s)
So yarnd
pods by default are designed to only keep Twts around publicly visible on either the anonymous Frontpage or Discover View or your Timeline or the feed’s Timeline for up to 2 weeks with a maximum of 150 items, whichever get exceeded first. Any Twts over this are considered “old” and drop off the active cache.
It’s a feature that my old man @off_grid_living@twtxt.net was very strongly in support of, as was I back in the day of yarnd
’s design (nothing particularly to do with Twtxt per se) that I’ve to this day stuck by – Even though there are some 😉 that have different views on this 🤣
Bahahahaha very clever @lyse@lyse.isobeef.org I look forward to reading your report ! 🤣 However…
$ yarnc debug https://twtxt.net/user/prologic/twtxt.txt | grep -E '^pqst4ea' | tee | wc -l
0
I very quickly proved that Twt was never from me 🤣
Yeah I’m curious to find out too beyond just “here say”. But regardless of whether we should or shouldn’t care about this or should or shouldn’t comply. We should IMO. I’d have to build something that horrendously violates someone’s rights in another country.
@falsifian@www.falsifian.org Do you have specifics about the GRPD law about this?
Would the GDPR would apply to a one-person client like jenny? I seriously hope not. If someone asks me to delete an email they sent me, I don’t think I have to honour that request, no matter how European they are.
I’m not sure myself now. So let’s find out whether parts of the GDPR actually apply to a truly decentralised system? 🤔
👋 Reminder that next Saturday 28th September will be out monthly online meetup! Hope to see some/all of you there 👌
@lyse@lyse.isobeef.org I don’t think this is true.
Can I get someone like maybe @xuu@txt.sour.is or @abucci@anthony.buc.ci or even @eldersnake@we.loveprivacy.club – If you have some spare time – to test this yarnd
PR that upgrades the Bitcask dependency for its internal database to v2? 🙏
VERY IMPORTANT If you do; Please Please Please backup your yarn.db
database first! 😅 Heaven knows I don’t want to be responsible for fucking up a production database here or there 🤣
Location Addressing is fine in smaller or single systems. But when you’re talking about large decentralised systems with no single point of control (kind of the point) things like independable variable integrity become quite important.
Speaking of AI tech (sorry!); Just came across this really cool tool built by some engineers at Google™ (currently completely free to use without any signup) called NotebookLM 👌 Looks really good for summarizing and talking to document 📃
An alternate idea for supporting (properly) Twt Edits is to denoate as such and extend the meaning of a Twt Subject (which would need to be called something better?); For example, let’s say I produced the following Twt:
2024-09-18T23:08:00+10:00 Hllo World
And my feed’s URI is https://example.com/twtxt.txt
. The hash for this Twt is therefore 229d24612a2
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2
You wish to correct your mistake, so you make an amendment to that Twt like so:
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
Which would then have a new Twt hash value of 026d77e03fa
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa
Clients would then take this edit:#229d24612a2
to mean, this Twt is an edit of 229d24612a2
and should be replaced in the client’s cache, or indicated as such to the user that this is the intended content.
@quark@ferengi.one My money is on a SHA1SUM hash encoding to keep things much simpler:
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e
Taking the last n characters of a base32 encoded hash instead of the first n can be problematic for several reasons:
Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.
In summary, using the first n characters generally preserves the intended randomness and collision resistance of the hash, making it a safer choice in most cases.
@quark@ferengi.one Do you mean something like this?
$ ./yarnc debug ~/Public/twtxt.txt | tail -n 1
kp4zitq 2024-09-08T02:08:45Z (#wsdbfna) @<aelaraji https://aelaraji.com/twtxt.txt> My work has this thing called "compressed work", where you can **buy** extra time off (_as much as 4 additional weeks_) per year. It comes out of your pay though, so it's not exactly a 4-day work week but it could be useful, just haven't tired it yet as I'm not entirely sure how it'll affect my net pay
So yeah no, whilst it technically works, neither jenny
nor yarnd
support it very well. Only at a very basic level.
-T/--template
in case you need a custom template 👌
@bender@twtxt.net I should put the template that is used by default as a file in the repo. Look at the source for now and you’ll see 😅
Just that yarnd
(at least) doesn’t support creating such a custom TwtSubject, but it will reply and respect and thread one if one was constructed.
@aelaraji@aelaraji.com I just added support for passing a custom template file via -T/--template
in case you need a custom template 👌
prologic@JamessMacStudio
Wed Sep 18 01:27:29
~/Projects/yarnsocial/twtxt2html
(main) 130
$ ./twtxt2html --help
Usage: twtxt2html [options] FILE|URL
twtxt2html converts a twtxt feed to a static HTML page
-d, --debug enable debug logging
-l, --limit int limit number ot twts (default all) (default -1)
-n, --noreldate do now show twt relative dates
-r, --reverse reverse the order of twts (oldest first)
-T, --template string path to template file
-t, --title string title of generated page (default "Twtxt Feed")
-v, --version display version information
pflag: help requested
@aelaraji@aelaraji.com Btw, I’m also open to ideas for this tool and welcome any contributions 👌
This scheme also only support threading off a specific Twt of someone’s feed. What if you’re not replying to anyone in particular?
@quark@ferengi.one We will fix this soon™ 🔜