Wednesday, June 16, 2010

Cookieless Session and Google Indexing (Googlebot)

Hi,

Today I'm discussing one of the challenging problem that I come across during my 5+ years of career.

All of you must be developing website and these websites are automatically indexed by Google crawler named Googlebot. I was working on one of an e-commerce website which is developed in ASP.NET 3.5. It was previously developed in PHP and all of their pages were already crawled. After releasing 1st version we came to know that Google is not crawling our new pages. Though SEO is very vast area and genuinely I never involed in this process because we had an SEO experts in past who look after for this. But ththis time case was different and we didn't have any SEO expert. So I need to jump into it to findout what's wrong with the website and why Google is not able to crawl a new website pages.

So googleing for some time we just came to know my website was using forms authentication for user authentication purpose. If you remember web.config has one parameter named cookieless which we need to set either to AutoDetect or UseCookies to allow or disallow cookieless sessions. I had previously it was set to AutoDetect hence was appending session ids in the URL if u connect through cookieless browsers. Now when Googlebot tries to connect to website for crawling it was identified as Cookieless browser by ASP.NET engine and hence session id was appending in URL itself and 302 response was sent. That's was the issue because of Google was not able to index the new pages.

Possible Solutions:

1. The easiest solution to the problem is to stop using ASP.NET's AutoDetect session mode.

2. If you need to use that feature though, you simply can configure ASP.NET to recognize Google's spider as supporting cookies. This article shows how. (http://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx)

3. You can implement automatic support for URL-based sessions yourself. This takes some time to implement, and the benefits may not be worth the implementation cost. It works like this:
- you use cloaking to generate session IDs if the visitor is not a web spider
- start generating session IDs only when the session is really needed for tracking (for example, after the visitor adds items to his or her shopping cart). This way you don't feed your users with URL-based session IDs unless you really need to.

Happy Programming!!!

References:

http://www.seoegghead.com/blog/seo/aspnet-20-setting-dangerous-for-google-indexing-p217.html
http://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole