• some_guy@lemmy.sdf.org
    link
    fedilink
    arrow-up
    0
    ·
    1 month ago

    In a blogpost called, “AI crawlers need to be more respectful”, they claim that blocking all AI crawlers immediately decreased their traffic by 75%, going from 800GB/day to 200GB/day. This made the project save up around $1500 a month.

    “AI” companies are a plague on humanity. From now on, I’m mentally designating them as terrorists.

  • carrylex@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    1 month ago

    While AI crawlers are a problem I’m also kind of astonished why so many projects don’t use tools like ratelimiters or IP-blocklists. These are pretty simple to setup, cause no/very little additional load and don’t cause collateral damage for legitimate users that just happend to use a different browser.

    • bountygiver [any]@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 month ago

      the article posted yesterday mentioned a lot of these requests are only made once per IP address, the botnet is absolutely huge.

  • LiveLM@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    1 month ago

    If you’re wondering if it’s really that bad, have this quote:

    GNOME sysadmin, Bart Piotrowski, kindly shared some numbers to let people fully understand the scope of the problem. According to him, in around two hours and a half they received 81k total requests, and out of those only 3% passed Anubi’s proof of work, hinting at 97% of the traffic being bots

    And this is just one quote. The article is full of quotes of people all over reporting they can’t focus on their work because either the infra they rely on is constantly down, or because they’re the ones fighting to keep it functional.

    This shit is unsustainable. Fuck all of these AI companies.

      • LiveLM@lemmy.zip
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 month ago

        I’m sure that if it was that simple people would be doing it already…

      • Strawberry@lemmy.blahaj.zone
        link
        fedilink
        arrow-up
        1
        ·
        1 month ago

        The bots scrape costly endpoints like the entire edit histories of every page on a wiki. You can’t always just cache every possible generated page at the same time.

      • nutomic@lemmy.ml
        link
        fedilink
        arrow-up
        0
        ·
        1 month ago

        Cache size is limited and can usually only hold a limited number of most recently viewed pages. But these bots go through every single page on the website, even old ones that are never viewed by users. As they only send one request per page, caching doesnt really help.