Made to Order Software Corporation Logo

Google Web Tools

Today, I stumbled upon a new link in Google. A link that brought me to a page full of tools a webmaster can use to know how their website is doing according to the Google Spider (the program that search all the pages on your website.)

https://www.google.com/webmasters/

I was not aware of the fact that you could download a CSV file (spreadsheet file compatible with Open Office and MS Excel) with all the errors last generated by the Google spider. You can also look at the errors directly in your account.

So… did I have errors? Well! Many. In part because each page is duplicated 3 times (English, French and Spanish), but also because, as I discovered, the Google Spider will at times change the case of your filenames before querying a new page. Some people say that Google is case sensitive as expected, but I say that Google must have an strtolower() call somewhere. Either that, or they use invalid links from other pages (pages outside of our website.)

The main problem here is that they do not show any referrer. So I cannot know whether it is Google or not. For sure, Google uses these wrong filenames to access our site.

Now, in most cases, I use lowercase names for my pages. Actually, I do not think that any page has a name written with capital letters. However, the files generated by Doxygen do. And that’s the ones that generate the case errors. I’m not too sure whether we can fix the Doxyfile, but I found a website mentioning the fact that the mod_speling module from Apache can do that for you automatically. So I turned that feature on to make sure that it would work properly. I will have to wait a week or so to see whether they still have problems. One concern is that the content of the all lowercase file will be 100% the same as the file without all the lowercase, meaning that Google will see what looks like duplicates.

In regard to mod_speling, you can find the documentation on the Apache website (apache.org). The module is defined in details and current supports two options: CheckCaseOnly and CheckSpelling. The first option will make sure that only the case sensitivity is removed. The second option turns on the spelling feature. Any filename that is different by only one character is accepted (it may also be two swapped letters as in: foo76.html instead of foo67.html) I suggest you be careful with the second option. The spelling feature is powerful but it can also generate strange behaviors. The first option is what I used.

Note that you can use that feature within a Directory tag. I also use that because the problem only happens on the Doxygen files, not all the files. Thus, I have the following in my Apache configuration file:

<Directory “…/en/sswf_docs”>
CheckCaseOnly on
</Directory>

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.