Building on an anti-spam cybersecurity tactic known as tarpitting, he created Nepenthes, malicious software named after a carnivorous plant that will “eat just about anything that finds its way inside.”

Aaron clearly warns users that Nepenthes is aggressive malware. It’s not to be deployed by site owners uncomfortable with trapping AI crawlers and sending them down an “infinite maze” of static files with no exit links, where they “get stuck” and “thrash around” for months, he tells users. Once trapped, the crawlers can be fed gibberish data, aka Markov babble, which is designed to poison AI models. That’s likely an appealing bonus feature for any site owners who, like Aaron, are fed up with paying for AI scraping and just want to watch AI burn.

    • cm0002@lemmy.world
      link
      fedilink
      English
      arrow-up
      21
      ·
      2 months ago

      It might be initially, but they’ll figure out a way around it soon enough.

      Remember those articles about “poisoning” images? Didn’t get very far on that either

      • EldritchFeminity@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 months ago

        This kind of stuff has always been an endless war of escalation, the same as any kind of security. There was a period of time where all it took to mess with Gen AI was artists uploading images of large circles or something with random tags to their social media accounts. People ended up with random bits of stop signs and stuff in their generated images for like a week. Now, artists are moving to sites that treat AI scrapers like malware attacks and degrading the quality of the images that they upload.

  • aesthelete@lemmy.world
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    1
    ·
    2 months ago

    Notice how it’s “AI haters” and not “people trying to protect their IP” as it would be if it were say…China instead of AI companies stealing the IP.

  • pHr34kY@lemmy.world
    link
    fedilink
    English
    arrow-up
    30
    ·
    edit-2
    2 months ago

    I am so gonna deploy this. I want the crawlers to index the entire Mandelbrot set.

    I’ll train with with lyrics from Beck Hansen and Smash Mouth so that none of it makes sense.

  • NullPointer@programming.dev
    link
    fedilink
    English
    arrow-up
    16
    ·
    edit-2
    2 months ago

    why bother wasting resources with the infinite maze and just do what the old school .htaccess bot-traps do; ban any IP that hits the nono-zone defined in robots.txt?

  • Docus@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    2 months ago

    Does it also trap search engine crawlers? That would be a problem

    • Pasta Dental@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      23
      ·
      2 months ago

      The big search engine crawlers like googles or Microsoft’s should respect your robots.txt file. This trick affects those who don’t honor the file and just scrape your website even if you told it not to

    • Soup@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      ·
      2 months ago

      I imagine if those obey the robots.txt thing that it’s not a problem.

  • LovableSidekick@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    9
    ·
    edit-2
    2 months ago

    OTOH infinite loop detection is a well known coding issue with well known, freely available solutions, so this approach will only affect the lamest implementations of AI,