I'm wondering about deploying Nepenthes on my personal site. To punish any AI crawlers.
https://zadzmo.org/code/nepenthes/
On the one hand I'm wary about tanking my search rankings, on the other... I'm not finding the major search engines much good anymore anyways!
I do like SearchMySite, but I *think* its smart enough to not be tripped up by it.
O.K., I've got a clear answer already!
A few people are concerned about the wastefulness, but most are keen to see them poisoned! And reflecting upon my motivation to not be DoS'd again...
I think I'll configure aggressive ratelimits 1st, and then I'm seeing some more tools to choose between...
I considered Poison the WeLLMs before...
Locaine: https://git.madhouse-project.org/algernon/iocaine
Quixotic: https://marcusb.org/hacks/quixotic.html
Poison the WeLLMs: https://codeberg.org/MikeCoats/poison-the-wellms
@alcinnz I saw a comment about that software saying you will burn CPU cycles to make some other software burn CPU cycles, in the end it is a lot of wasted resources ending up in warming up the planet.
Made me rethink about it
@lucabtz Fair complaint.
I would deploy it in the defensive setting rather than offensive, which should minimize that drawback.
My goal is to avoid getting DoS'd again, so I'm open to other suggestions!
@alcinnz I'm not sure I know enough to evaluate the strengths and weaknesses of different approaches, but have you seen iocaine?
@alcinnz This explanation helped me understand the motivation for iocaine:
> This has the downside that I need to redirect explicitly, so unknown agents will not be trapped (while they will be caught in Nepanthes' maze!). On the other hand, it means that my content is unavailable for AI crawlers in the first place - at least to known ones.
https://come-from.mad-scientist.club/@algernon/statuses/01JHRNX1KFQV19AVTB54ZFB8EZ
@skyfaller Thanks for the links, I'm investing!
@alcinnz any search engine worth its salt honors robots.txt so using that you can protect them. These tools were specifically made to punish those who do not honor robots.txt
As far as I'm aware, google still honor it
@OliverUv The issue as I understand it is that any bot would see the Nepenthes tarpit, even will behaved ones. So unless the crawler's limiting how much of your site it visits...
Or we could ask well-behaved bots not to crawl the tarpit, I'm guessing the misbehaved ones won't honor that hint...
@alcinnz @skyfaller @OliverUv they all seem cool tools