prologic

twtxt.net

"Problems are Solved by Method" 🇦🇺👨‍💻👨‍🦯🏹♔ 🏓⚯ 👨‍👩‍👧‍👧🛥 -- James Mills (operator of twtxt.net / creator of Yarn.social 🧶)

Recent twts from prologic

For the time being… I’ve just blocked all of OpenAI(s) Bots. They (thankfully) publish a JSON endpoint that you can use to block all OpenAI crawlers from reaching your server (in my case, blocking it at the edge). Example:

proxy-1:~# curl -qs https://openai.com/gptbot.json | jq -r '.prefixes[].ipv4Prefix' | xargs -I{} ./block-ip.sh {}

Where block-ip.sh is simply:

#!/bin/sh

ufw insert 1 deny from "$1" to any

⤋ Read More
In-reply-to » Any idea What's this "twtxtfeevalidator/0.0.1" UA about? I thought I could ask before throwing a 1000GB file at it 🪤 could it be the same 'xt' thing @lyse was talking about the other day?

@aelaraji@aelaraji.com Yes! 👏 This is exactly what it is! 🤣 I will of course soon™ be hosting this service, likely at validator.twtxt.net 😅😅

⤋ Read More
In-reply-to » How in da fuq do you actually make these fucking useless AI bots go way?

@kat Haha 🤣 If someone figures this out, please let me know 🙏🙏 – In the meantime, I’m going to very soon™ write a daemon that will watch the audit log for repeated violations and add to the network firewall.

⤋ Read More
In-reply-to » Nice! I wrote another useful tool 👌

This is better:

proxy-1:~# ./audit-log-by-ip.sh 4.227.36.76 | coraza-log-formatter -m -
2025/01/04 23:17:04 4.227.36.76 58982 GET /external?aff-HY0BLO=&f=mediaonly&f=noreplies&nick=g1n&uri=https%3A%2F%2Fthe-president-codes.linegames.org null 0  On OWASP_CRS/4.7.0
Actionset: OWASP_CRS/4.7.0
Message: Bad User Agent
Severity: 0
Raw: SecRule REQUEST_HEADERS:User-Agent "@pmFromFile /etc/caddy/waf/bad_user_agents.txt" "id:2000,log,phase:1,deny,msg:'Bad User Agent'"

⤋ Read More

Nice! I wrote another useful tool 👌

proxy-1:~# ./audit-log-by-ip.sh 4.227.36.76 | coraza-log-formatter -m -
Actionset: OWASP_CRS/4.7.0
Message: Bad User Agent
Severity: 0
Raw: SecRule REQUEST_HEADERS:User-Agent "@pmFromFile /etc/caddy/waf/bad_user_agents.txt" "id:2000,log,phase:1,deny,msg:'Bad User Agent'"

⤋ Read More

How in da fuq do you actually make these fucking useless AI bots go way?

proxy-1:~# jq '. | select(.request.remote_ip=="4.227.36.76")' /var/log/caddy/access/mills.io.log | jq -s '. | last' | caddy-log-formatter -
4.227.36.76 - [2025-01-05 04:05:43.971 +0000] "GET /external?aff-QNAXWV=&f=mediaonly&f=noreplies&nick=g1n&uri=https%3A%2F%2Fmy-hero-ultra-impact-codes.linegames.org HTTP/2.0" 0 0
proxy-1:~# date
Sun Jan  5 04:05:49 UTC 2025

😱

⤋ Read More
In-reply-to » Having a lot of fun with Coraza today. A Web Application Firewall library written in Go that also happens to have a Caddy module.

@kat I’ve actually moved most of my stuff of of Cloudflare now 🤣 I’m actually very happy with my edge proxy setup that reverse proxies, caches and acts as a web application firewall 🥳

⤋ Read More
In-reply-to » morning yarn friends i've been playing with astro the SSG and it's a blast i see why my friends love it and rec it to everyone. i may think javascript was a mistake but this is super cool

@kat Have you seen the SSG that I built and use on all my static sites? zs 🤔

⤋ Read More

Oh gawd. I can’t enable caching on my edge proxy everywhere 😱 Some shit™ doesn’t deal with a caching reverse proxy in front of it very well for some reason I don’t have time to dig into right now 🤔

⤋ Read More

What’s a reasonable per second or per minute rate limit that I could apply in general at my edge proxy for all clients? (no matter what) … LIke a good reasonable upper bound? 🤔

⤋ Read More
In-reply-to » I just banned 41 bad user agents from accessing any of my services. 😱

@doesnm@doesnm.p.psf.lt No. I generally don’t put up any robots.txt files at all really, because they mostly get ignored. I don’t generally mind if “normal” web crawlers crawl things. But LLM(s) can go fuck themselves 🤣

⤋ Read More
In-reply-to » I just banned 41 bad user agents from accessing any of my services. 😱

@movq@www.uninformativ.de Yeah it’s starting to piss me off too 🤣 Not nearly as much as that guy, but stil. Anyway I’m having fun! Now I just need to find a good IP/Subnet list that I can blacklist entirely, ideally one that’s updated frequently so I can refresh firewall rules.

⤋ Read More
In-reply-to » I just banned 41 bad user agents from accessing any of my services. 😱

Bloody fucking hell. I think one of Google’s GenAI crawlers was just hitting my Gitea instance quite hard. Fuck 🤬 Geez

⤋ Read More
In-reply-to » @andros What do you mean by API? yarnd (which powers Yarn.social pods like twtxt.net) does have an API, however that API is designed for clients to interact with the pod and the user's account and feed. e.g: there is a command-line client called yarnc and I used to maintain a mobile native app (using Flutter).

@doesnm@doesnm.p.psf.lt It is the same API that yarnc the command-line client uses.

⤋ Read More
In-reply-to » Having a lot of fun with Coraza today. A Web Application Firewall library written in Go that also happens to have a Caddy module.

i.e: Not much point in running a WAF on a static site. But OTOH if there’s enough abuse from shitty assholes, there might be 🤔🤔

⤋ Read More
In-reply-to » Having a lot of fun with Coraza today. A Web Application Firewall library written in Go that also happens to have a Caddy module.

I’m just basically learning now how ModSecurity rules work and how to write my own.

The builtin OWASP rules are already working nicely 👌 – And yeah I won’t include the WAF on every site block, probably just my main/primary domain where I tend to run demo services and other things.

⤋ Read More
In-reply-to » Having a lot of fun with Coraza today. A Web Application Firewall library written in Go that also happens to have a Caddy module.

@kat If you’ve been following my yarns the other day about me getting off of Clownflare and building my own WAF, Proxy and effectively my own Edge network, you’ll know I’m doing this at the very edge 🤣🤣

⤋ Read More
In-reply-to » fighting for my life trying to learn golang WHAT THE FUCK IS A POINTER (rhetorical)

A pointer is basically a reference to a variable. It is typically used with structs and especially in pointer receiver methods so that you can modify fields of a struct.

⤋ Read More
In-reply-to » help i've had this account for barely 2 days and i'm nearly at 100 posts

Also post as much as you want! It’s a free world. It’s your feed. It’s your daughter. 🤣 nobody actually has to read any of it let alone follow you if they don’t want to. 🙃 that’s kind of the beauty of a truly decentralized slow social media ecosystem. 😎

⤋ Read More
In-reply-to » help i've had this account for barely 2 days and i'm nearly at 100 posts

@kat You should’ve seen me back in the day! These days I try to post a little less often so as not to cause too much noise in the ecosystem 🤣 nobody cares what I think anyway right? 😅

⤋ Read More
In-reply-to » @prologic Is it possible to interact with twtxt.net from outside? For example, an search API

@andros@twtxt.andros.dev What do you mean by API? yarnd (which powers Yarn.social pods like twtxt.net) does have an API, however that API is designed for clients to interact with the pod and the user’s account and feed. e.g: there is a command-line client called yarnc and I used to maintain a mobile native app (using Flutter).

What use-case did you have in mind?

⤋ Read More
In-reply-to » @prologic YAYYY fuck cloudflare!!! caddy+wireguard amazing combo

@kat So far it’s been alright. I wasn’t too impressed with Caddy’s logging capabilities though or the fact you have to custom build caddy just to support DNS-01 ACME challenge. But other than that, it’s okay.

⤋ Read More