KarlHeinzSchwuke@feddit.org to Technology@lemmy.worldEnglish · 3 months agoI was wrong about robots.txtevgeniipendragon.comexternal-linkmessage-square22fedilinkarrow-up192arrow-down117
arrow-up175arrow-down1external-linkI was wrong about robots.txtevgeniipendragon.comKarlHeinzSchwuke@feddit.org to Technology@lemmy.worldEnglish · 3 months agomessage-square22fedilink
minus-squarethedruid@lemmy.worldlinkfedilinkEnglisharrow-up41·3 months agoSo. If I can add something here for everyone’s benefit No search engine really obeys robots.txt Their publicly acknowledged crawlers do, but they have other crawlers that aren’t know that ignore the file. Google knows every inch of your site, allowed or not. See, just because a search engine says it doesn’t know, doesn’t mean it hasn’t crawled. Just doesn’t display the results based on your settings.
minus-squareell1e@leminal.spacelinkfedilinkEnglisharrow-up10arrow-down1·3 months agoAnd allowing the public crawler might also have it feed their AI: https://arstechnica.com/tech-policy/2025/07/cloudflare-wants-google-to-change-its-ai-search-crawling-google-likely-wont/
So. If I can add something here for everyone’s benefit
No search engine really obeys robots.txt
Their publicly acknowledged crawlers do, but they have other crawlers that aren’t know that ignore the file.
Google knows every inch of your site, allowed or not.
See, just because a search engine says it doesn’t know, doesn’t mean it hasn’t crawled. Just doesn’t display the results based on your settings.
And allowing the public crawler might also have it feed their AI: https://arstechnica.com/tech-policy/2025/07/cloudflare-wants-google-to-change-its-ai-search-crawling-google-likely-wont/