I just built a poc search engine / crawler for Twtxt. I managed to crawl this pod (twtxt.net) and a couple of others (sorry @etux@twt.u53.us and @xuu@txt.sour.is I used your pods in the tests too!). So far so good. I might keep going with this and see what happens 😀
@lyse@lyse.isobeef.org @prologic@twtxt.net very curious… i worked on a very similar track. i built a spider that will trace off any follows =
comments and mentions from other users and came up with:
twters: 744
total: 52073
@prologic@twtxt.netd It is pretty basic, and depends on some local changes i am still working out on my branch.. https://gist.github.com/JonLundy/dc19028ec81eb4ad6af74c50255e7cee
@prologic@twtxt.net yeah it reads a seed file. I’m using mine. it scans for any mention links and then scans them recursively. it reads from http/s or gopher. i don’t have much of a db yet.. it just writes to disk the feed and checks modified dates.. but I will add a db that has hashs/mentions/subjects and such.
@prologic@twtxt.net the add function just scans recursivley everything.. but the idea is to just add and any new mentions then have a cron to update all known feeds
@prologic@twtxt.net sounds about right. I tend to try to build my own before pulling in libs. learn more that way. I was looking at using it as a way to build my twt mirroring idea. and testing the lex parser with a wide ranging corpus to find edge cases. (the pgp signed feeds for one)
@prologic@twtxt.net in theory shouldn’t need to let users add feeds.. if they get mentioned by a tracked feed they will get added automagically. on a pod it would just need to scan the twtxt feed to know about everyone.