@lyse@lyse.isobeef.org Cool 👌
Hmmm so I’ve sustained two DDoS attacks on my Gitea server today. A few hours apar. Still analyzing the traffic…
For the time being… I’ve just blocked all of OpenAI(s) Bots. They (thankfully) publish a JSON endpoint that you can use to block all OpenAI crawlers from reaching your server (in my case, blocking it at the edge). Example:
proxy-1:~# curl -qs https://openai.com/gptbot.json | jq -r '.prefixes[].ipv4Prefix' | xargs -I{} ./block-ip.sh {}
Where block-ip.sh
is simply:
#!/bin/sh
ufw insert 1 deny from "$1" to any
"twtxtfeevalidator/0.0.1"
UA about? I thought I could ask before throwing a 1000GB file at it 🪤 could it be the same 'xt' thing @lyse was talking about the other day?
@aelaraji@aelaraji.com Yes! 👏 This is exactly what it is! 🤣 I will of course soon™ be hosting this service, likely at validator.twtxt.net
😅😅
@kat Haha 🤣 If someone figures this out, please let me know 🙏🙏 – In the meantime, I’m going to very soon™ write a daemon that will watch the audit log for repeated violations and add to the network firewall.
This is better:
proxy-1:~# ./audit-log-by-ip.sh 4.227.36.76 | coraza-log-formatter -m -
2025/01/04 23:17:04 4.227.36.76 58982 GET /external?aff-HY0BLO=&f=mediaonly&f=noreplies&nick=g1n&uri=https%3A%2F%2Fthe-president-codes.linegames.org null 0 On OWASP_CRS/4.7.0
Actionset: OWASP_CRS/4.7.0
Message: Bad User Agent
Severity: 0
Raw: SecRule REQUEST_HEADERS:User-Agent "@pmFromFile /etc/caddy/waf/bad_user_agents.txt" "id:2000,log,phase:1,deny,msg:'Bad User Agent'"
Nice! I wrote another useful tool 👌
proxy-1:~# ./audit-log-by-ip.sh 4.227.36.76 | coraza-log-formatter -m -
Actionset: OWASP_CRS/4.7.0
Message: Bad User Agent
Severity: 0
Raw: SecRule REQUEST_HEADERS:User-Agent "@pmFromFile /etc/caddy/waf/bad_user_agents.txt" "id:2000,log,phase:1,deny,msg:'Bad User Agent'"
How in da fuq do you actually make these fucking useless AI bots go way?
proxy-1:~# jq '. | select(.request.remote_ip=="4.227.36.76")' /var/log/caddy/access/mills.io.log | jq -s '. | last' | caddy-log-formatter -
4.227.36.76 - [2025-01-05 04:05:43.971 +0000] "GET /external?aff-QNAXWV=&f=mediaonly&f=noreplies&nick=g1n&uri=https%3A%2F%2Fmy-hero-ultra-impact-codes.linegames.org HTTP/2.0" 0 0
proxy-1:~# date
Sun Jan 5 04:05:49 UTC 2025
😱
Done.
@lyse@lyse.isobeef.org Oh good! It works haha 🤣 I’ll bump it up a bit 👌
And now I’ve applied rate limits on every site to reasonable values 👌
@bender@twtxt.net Isn’t that why um yarning my progress 🤣
@kat I’ve actually moved most of my stuff of of Cloudflare now 🤣 I’m actually very happy with my edge proxy setup that reverse proxies, caches and acts as a web application firewall 🥳
@kat Have you seen the SSG that I built and use on all my static sites? zs 🤔
Oh gawd. I can’t enable caching on my edge proxy everywhere 😱 Some shit™ doesn’t deal with a caching reverse proxy in front of it very well for some reason I don’t have time to dig into right now 🤔
What’s a reasonable per second or per minute rate limit that I could apply in general at my edge proxy for all clients? (no matter what) … LIke a good reasonable upper bound? 🤔
@movq@www.uninformativ.de Yeah I swear to god the engineers that write this shit™ don’t know how to write distributed cralwers that don’t happy the shit™ out of their targets 🤦♂️
@doesnm@doesnm.p.psf.lt No. I generally don’t put up any robots.txt
files at all really, because they mostly get ignored. I don’t generally mind if “normal” web crawlers crawl things. But LLM(s) can go fuck themselves 🤣
@movq@www.uninformativ.de Yeah it’s starting to piss me off too 🤣 Not nearly as much as that guy, but stil. Anyway I’m having fun! Now I just need to find a good IP/Subnet list that I can blacklist entirely, ideally one that’s updated frequently so I can refresh firewall rules.
Bloody fucking hell. I think one of Google’s GenAI crawlers was just hitting my Gitea instance quite hard. Fuck 🤬 Geez
@movq@www.uninformativ.de Oh 🤦♂️
I just banned 41 bad user agents from accessing any of my services. 😱
@movq@www.uninformativ.de How do you manage to get those skulines on your photos? 🤔
yarnc
the command-line client uses.
@doesnm@doesnm.p.psf.lt No, it’s only designed for yarnd
. What did you have in mind here? 🤔
yarnd
(which powers Yarn.social pods like twtxt.net) does have an API, however that API is designed for clients to interact with the pod and the user's account and feed. e.g: there is a command-line client called yarnc
and I used to maintain a mobile native app (using Flutter).
@doesnm@doesnm.p.psf.lt It is the same API that yarnc
the command-line client uses.
i.e: Not much point in running a WAF on a static site. But OTOH if there’s enough abuse from shitty assholes, there might be 🤔🤔
I’m just basically learning now how ModSecurity rules work and how to write my own.
The builtin OWASP rules are already working nicely 👌 – And yeah I won’t include the WAF on every site block, probably just my main/primary domain where I tend to run demo services and other things.
@kat If you’ve been following my yarns the other day about me getting off of Clownflare and building my own WAF, Proxy and effectively my own Edge network, you’ll know I’m doing this at the very edge 🤣🤣
Having a lot of fun with Coraza today. A Web Application Firewall library written in Go that also happens to have a Caddy module.
@bender@twtxt.net Hey ! 👋
@eapl.me@eapl.me And here I always lived by:
Problems are solved by method.
– Dr. Don Abel.
🥱 morning y’all 👋 Soo tired 🥱 Need coffee!!! ☕️☕️☕️☕️
@lyse@lyse.isobeef.org It does not 🤣 Shsll I enable it? 🤣
@bender@twtxt.net It’s true! 🤣 It’s a total garbage nonsense title. But the actual research paper that the video references is real. Apple did in fact do a bunch of research and proved what we already know 🤣 – That is, AI is stupid 🤣
@movq@www.uninformativ.de Amend 🙏
But to be fair, we already knew this… I’ve observed it first hand, we knew it at the beginning. I’ll just leave you with this:
Stochastic Parrot
or put simply:
Artificial Incompetence
@movq@www.uninformativ.de Yup! 😅
I can walk you through some examples later tonight when I get back if you like?
A pointer is basically a reference to a variable. It is typically used with structs and especially in pointer receiver methods so that you can modify fields of a struct.
@kat Oh! I can totally help you 🤗 I love Go! 😍
Holy Smokes 🤣 And this has only been <24h 😱
Also post as much as you want! It’s a free world. It’s your feed. It’s your daughter. 🤣 nobody actually has to read any of it let alone follow you if they don’t want to. 🙃 that’s kind of the beauty of a truly decentralized slow social media ecosystem. 😎
@kat You should’ve seen me back in the day! These days I try to post a little less often so as not to cause too much noise in the ecosystem 🤣 nobody cares what I think anyway right? 😅
@kat yarnd
actually stores your feed in plain text on disk too 🤣
@andros@twtxt.andros.dev What do you mean by API? yarnd
(which powers Yarn.social pods like twtxt.net) does have an API, however that API is designed for clients to interact with the pod and the user’s account and feed. e.g: there is a command-line client called yarnc
and I used to maintain a mobile native app (using Flutter).
What use-case did you have in mind?
@kat So far it’s been alright. I wasn’t too impressed with Caddy’s logging capabilities though or the fact you have to custom build caddy just to support DNS-01 ACME challenge. But other than that, it’s okay.
@bender@twtxt.net Well technically now I can turn off ingress access to my infra on ports 80/43 etc and just rely on the outbound wireguard tunnelling for the ingress back in.