For the time beingā¦ Iāve just blocked all of OpenAI(s) Bots. They (thankfully) publish a JSON endpoint that you can use to block all OpenAI crawlers from reaching your server (in my case, blocking it at the edge). Example:
proxy-1:~# curl -qs https://openai.com/gptbot.json | jq -r '.prefixes[].ipv4Prefix' | xargs -I{} ./block-ip.sh {}
Where block-ip.sh
is simply:
#!/bin/sh
ufw insert 1 deny from "$1" to any