Note: this article describes how to monitor bots with Universal Analytics (AKA GA3) thanks to the measurement protocol. I will write another article in the future in case you want to implement it with the new version of Analytics GA4.
You have probably heard many times about GoogleBot, the Google crawler that crawls websites to see what resources (mainly pages and images) each website has and then indexes them in Google.
I'm sure you've also heard about Crawl Budget, i.e. the time a bot spends crawling your website, there are many articles published about it.
But have you ever thought about knowing how the Google bot (and other search engines) works on your website, what pages it crawls, what results it obtains? All this information could be very interesting in order to optimise the crawling results and speed up the indexing of your website.
Recently, while I was publishing a series of posts about the basics of Google Analytics I noticed that the pages were indexed in cerocoma, while other people were telling me on twitter that it took them longer to get indexed. That made me think that it would be good to know when and where the indexing bots go through a website.
Well, there is a way to monitor that, and that is through server logs, i.e. through AWStats reports. The problem is that you have to pre-process and update the data, and that makes it a bit of a tedious task.
Googling I found this article by Lino Urruñuela "Monitorizar GoogleBot con Google Analytics" in which he developed a solution to track this type of traffic. On the other hand I found this article "Monitorizar los Bots de Google desde tu WordPress con Analytics" by Santiago Alonso in which I had adapted the code that Lino showed in a plugin for WordPress and, with small modifications, I applied it to my website to be able to track the activity of the different crawlers.
How tracking works
The idea is simple, the bots or crawlers do the same as any other user: They visit a website and jump from link to link to collect information and then index it... well that in theory, because on a first visit what they do is go through the website, write down the urls in a list and go back another time to read the information and proceed to index it. What is clear is that there is a visit and that we have a very widespread tool to measure that: Google Analytics.
So we have it easy, we can use Analytics, a tool we know, to track the traffic of these bots.
To do this, the proposal is to use one of the methods offered by Analytics since version 3 (Universal Analytics), the same method used by spam bots to send hits to Analytics and which I already talked about in the tutorial "How to exclude bot sessions in Google Analytics", which allows us to create a hit and send it to a Google Analytics property: The measurement protocol.
In order to have this information in a clean way, the ideal is to create a new property in Google Analytics (a new UA) which is the one we will use to collect and process the data of the visits of bots and only that data.
1. Create and configure a Google Analytics Property
To create a new property, go to your Google Analytics account and click on the option to manage, the last one that appears in the vertical bar on the left. This will take you to the administration of your Analytics account.
Once in the administration screen, you will see two or three columns, depending on whether you have Universal Analytics Properties or the new GA4 version. Here's an aside, for now the implementation works ONLY in Universal Analytics, later I will write another post where we will see how to implement it in the new version.
Once this is clear, the next step is to create our new property, to do this, click on the blue button located at the top of the second column "+ Create property", a new screen will open in which we will configure the data of the new property: Property Name, reporting time zone (country and GMT zone) and currency (the latter is not important since we are not going to monetise but it is always good to do things right...).
Once this is done we come to an important step, as I said in a previous paragraph, this implementation only works with Universal Analytics, for this we must click on the text "Show advanced options", enable the option "Create a Universal Analytics property", put the url of our website and select the option "Create only a Universal Analytics property".
At this point we can click on the "Next" button, which will take us to a series of options to give Google information about our company that, in our case, are not necessary. After this we can click on the create button and we will have the property that we are going to use to monitor the bots.
We will have one last detail left. By creating this property, a view will also have been created. For all this to work, we must go to the configuration of the view and uncheck the option "Exclude all hits from known robots and spiders" because otherwise Analytics will not collect the data that we are going to pass.
2. Create custom dimensions
As we are going to use Santiago Alonso's plugin we are only going to create two custom dimensions to collect two types of data in Google Analytics: Bot User Agent and http Code. Thanks to these custom dimensions we will be able to see in our reports which version of the bot is the one that has visited our page and what has been the result of this visit, i.e. if the page is correct (Code 200) or does not exist (Code 404).
Here is an important detail when creating these custom dimensions. You must respect the following order when creating them. First create the Bot User Agent dimension and then the http Code dimension. The reason is that, as the plugin is configured, the user Agent identifier has to be 1 and the http code identifier has to be 2.
And one last detail, the scope of both dimensions must be that of a hit.
3. Installing and configuring the SEOBot Monitor plugin
Now it's time to install the SEOBot monitor plugin (in the link you can download it from the WordPress repository). Once installed, the configuration is very simple as it only has four parameters to configure:
- Google Analytics UA tracking Code. Here you will need to enter the tracking ID of the Universal Analytics property you created earlier.
- Page Title Origin. You can choose between the default title that you put on your WordPress pages or the SEO title if you have installed the Yoast SEO plugin. Personally I recommend the first option, as in some websites I have tested the second one did not work correctly.
- Default 404 page title. Here you can define which title, by default, you want to be picked up when the bot makes a request to a page that does not exist.
- RegEx for bot user agent. Thanks to this parameter, you will be able to define the tokens (identifiers) of the bots you want to track. to do so, you must include a regular expression similar to the one I show you below:
The syntax is as follows:
- Starts with "/", without the inverted commas.
- Add the tokens of the bots you want to track, separated from the "|" character (this is the slash on the number 1 key).
- Ends with "/i", without the inverted commas.
As for the definitions of the bot tokens. Here are the main ones:
- GoogleBot => Google search robot token (desktop and mobile).
- AdsBot-Google => Google Ads search bot token (desktop).
- AdsBot-Google-mobile =>Google Ads Search Bot Token (Mobile)
- .Bingbot => Bing search bot token (desktop and mobile).
- Slurp => Yahoo search robot token.
- DuckDuckbot => DuckDuckGo search bot token.
- BaiduSpider => Baidu search robot token.
- YandexBot =>Yandex search bot token.
- facebot => Facebook search bot token.
- ia_archiver => Alexa search robot token.
- Twitterbot => Twitter search bot token.
- LinkedInbot => LinkedIn search bot token.
And once you have configured all this, you will be able to monitor the visits of the bots you have defined.
Don't go yet
We have seen how to monitor the behaviour of bots on our website in a very simple way thanks to Google Analytics and the SEOBot monitor plugin.
I invite you to leave your impressions and/or doubts in the contact form and to suggest new topics that you would like me to cover in these tutorials. I will be happy to answer you by email and write in this blog.