Google has introduced a new web crawler called Google-CloudVertexBot, which is intended to crawl webpages on behalf of business clients using their Vertex AI platform. However, the documentation for this new bot is confusing in terms of scope and functionality, particularly when it comes to accessing public vs verified domains.
The new Google-CloudVertexBot has been added to Google's crawler documentation. Its primary function is to consume website material for clients who use Vertex AI. Unlike other Google bots linked with search or advertising, Google-CloudVertexBot collects data particularly for AI-related services.
According to the provided documentation, Google-CloudVertexBot collects content from websites to aid Vertex AI clients. The manual describes two types of webpage indexing: basic and advanced. Basic website indexing uses public website data, which contains text and images labelled with metadata, but does not provide domain verification. In comparison, advanced website indexing requires domain verification and indexing quotas.
The issue stems from the documentation's ambiguous language. While it appears that Google-CloudVertexBot is meant for usage on sites managed by the site owners, the changelog entry adds another degree of doubt. The changelog states that the new crawler was developed to assist site owners in identifying new crawler activity, implying that it may be used to scrape public sites, despite the documentation's claims to the contrary.
The documentation states that the bot is used at the "site owners' request," which complicates determining whether it will target public websites or just crawl addresses verified to be under the requester's control.
Due to the ambiguous instructions, webmasters may be unsure whether to restrict Google-CloudVertexBot using their robots.txt files as a preventive step. While the documentation claims that the crawler would only work on verified domains, the changelog reveals that it may still affect public sites. As Google specifies the bot's scope and functioning, website owners should keep informed and alter their site management methods to handle any unexpected crawling operations.