Google has quietly up to date its record of user-triggered fetchers with new documentation for Google NotebookLM. The significance of this seemingly minor change is that it’s clear that Google NotebookLM won’t obey robots.txt.
Google NotebookLM
NotebookLM is an AI analysis and writing software that permits customers so as to add an internet web page URL, which is able to course of the content material after which allow them to ask a spread of questions and generate summaries primarily based on the content material.
Google’s software can mechanically create an interactive thoughts map that organizes subjects from an internet site and extracts takeaways from it.
Consumer-Triggered Fetchers Ignore Robots.txt
Google Consumer-Triggered Fetchers are net brokers which can be triggered by customers and by default ignore the robots.txt protocol.
In keeping with Google’s Consumer-Triggered Fetchers documentation:
“As a result of the fetch was requested by a person, these fetchers usually ignore robots.txt guidelines.”
Google-NotebookLM Ignores Robots.txt
The aim of robots.txt is to provide publishers management over bots that index net pages. However brokers just like the Google-NotebookLM fetcher aren’t indexing net content material, they’re appearing on behalf of customers who’re interacting with the web site content material via Google’s NotebookLM.
How To Block NotebookLM
Google makes use of the Google-NotebookLM person agent when extracting web site content material. So, it’s attainable for publishers wishing to dam customers from accessing their content material may create guidelines that mechanically block that person agent. For instance, a easy resolution for WordPress publishers is to make use of Wordfence to create a customized rule to dam all web site guests which can be utilizing the Google-NotebookLM person agent.
One other approach to do it’s with .htaccess utilizing the next rule:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Google-NotebookLM [NC]
RewriteRule .* – [F,L]