Googlebot Javascript Ajax and Cookies its Complicated

Search bot's are notoriously bad at handling JavaScript...

Google's complicated realtionship with Client Side Scripts

So, following on from the previous post “Can the Googlebot read JavaScript? Ajax? Cookies?“, the wait is over, the results are in…
Overall the results are quite confusing, but lead me to make some strange, possible conclusions..

  • Googlebot CAN read JavaScript
  • Googlebot CAN execute, and read the results of, AJAX requests
  • Googlebot CAN NOT store/read Cookies

But the killer is:

  • Googlebot DOES NOT DOES use this information in search relevancy calculations

If we take a look at the cached version of the page it doesn’t actually tell us anything about the Googlebot. The only string that can be seen is the one that uses inline JavaScript which is pretty much expected as the Ajax requests are relative and not absolute meaning they fail on the cached version.
And, as we predicted, the cookie is not set, telling us that the Googlebot didn’t/can’t read cookies.

Googlebot reading Javascript, example 1

How do they know the string exists if they didn't parse the JavaScript?

The ability of the Googlebot to read the JS and Ajax is not immediately obvious.

JavaScript

Indeed if we take the first string: and search for it in Google, we get no results. Try it yourself.

….However

A search for the line immediately before the JavaScript text (“so… Relentless Marauder becomes…”) returns the page.
When you check out the page preview, the search query “so… Relentless Marauder becomes…” is highlighted.
Right below it is our JavaScript text….

This is really interesting as it shows the JavaScript text in the right context on the page. It also shows that:

Googlebot is able to read the JavaScript as a string of text.

But the only place that you can see evidence of that is in the page preview section, which is quite strange.

What makes this more interesting is that the JavaScript string is conspicuously missing in the snippet of this search result.
Given the exact search query we are presented with a snippet that shows the context of the query on the target page, but the JavaScript text is simply missing altogether.

The JavaScript string is missing in the snippet.

This would suggest to me that there are perhaps 2 kinds of Googlebot: The classic text based crawler and a more advanced, browser type bot that can handle JavaScript, CSS, etc.

On the results page here we see Google showing us two different interpretations of the content of our page. Ok in this case it is only 2 words, but theoretically it could be a huge difference.

Ajax Requests

So we know that the GoogleBot can read the JavaScript as a string of text, but chooses not to use it in calculating relevancy to a search query. But what about the Ajax requests we make?

GoogleBot reading the Ajax result, in context on the page

If we take the second test string: and run another Google search for that phrase we see a strange result Try it yourself.
As you can see, the search returns the file that contains that string. On the main test page we call this file with an Ajax call and show the contents. This file is not linked from any other page which would lead me to think that the GoogleBot has understood the structure of the Ajax request and sent off a spider to grab the contents of the file.

Whilst I can’t completely rule out that the GoogleBot got there via an external link, I think given the time frame and obscurity of the file location, this is very unlikely.

Again if we search for the text immediately prior to the string we are testing, we get a results page with the string showing up in the page preview and not in the snippet. This is exactly the same as with the JavaScript text and again we have Google showing us two different interpretations of the page content.

With the third test string we also run some referrer filtering to make sure that the text is only output when it is called via an Ajax request into the main page.
Again this file was found and indexed See here

But, again, when we run a search for the text prior to the expected string, we are presented with the ajax text shown in the page preview but not in the snippet. Try it Yourself. This is perhaps the strongest evidence I have seen yet that, in some way, the GoogleBot DOES execute Ajax requests and CAN read the resulting output.

Cookies

The fourth test yielded no results whatsoever, backing up the idea that the Googlebot CAN NOT store and read Cookies (which is good as its the main premise behind my other post “Faking Backlinks using the Referrer“).

Conclusion

So, in conclusion, it would seem that the GoogleBot has the ability to parse JavaScript content and to read the results of Ajax requests but for some reason these elements are not being used to calculate search relevancy in the same way that on page text is.

Why is this?

Happy Face

Happy Face Man is Happy.

I would guess that Google is doing this to protect the quality of its search results. At the moment the idea of reading JavaScript and Ajax seems to be new/experimental and it has the chance to mess up a lot of stuff in the SERPs if they were to suddenly switch to valuing such text. I would guess the current algorithm for calculating search relevancy has been hammered out over the last 15 or so years and is very, very, sensitive. This would also give some reasoning as to why Facebook comments are being crawled/indexed as the source can be trusted and possibility of ‘poisoning’ the SERPs with blackhat style stuff must surely be minimal. But these are just my thoughts.

Finally I would like to say a big thanks to everyone that helped spread the previous post, it was great to get feedback from other great SEO’s out there.