Indexing Limits – When do the spiders stop?
November 1st, 2007 | Posted by in SEO FactorsDuring a regular call with one of my clients, who’s site is in redesign process, asked how large can page size really be. Well, it has been my experience that more than 50K in size, is probably too much and most likely won’t get crawled. Actually, I have seen pages much larger than this get crawled and heard this from many other SEOs, but I wanted to see real actual data to support the max 50K rule.
I came across a great experiment in sitepoint, where 25 pages pages of different sizes (from 45 KB to 4151 KB) and inserted unique, non-existent keywords into each page at 10 KB intervals. These pages were clearly generated specifically for this experiment and not for human use.
I was pretty surprised to learn that the experiment established the fact that the leading search engines differ considerably in terms of the the amount of page text they’re able to crawl. For Yahoo!, the limit is 210KB; for Google, 520KB; and for MSN, it’s 1030KB.
You can follow any responses to this entry through the RSS 2.0 You can leave a response, or trackback.

can u give me more deetail for the case study u carried for this purpose
Sure – here’s the link to the page I was referring to about the experiment http://www.sitepoint.com/article/indexing-limits-where-bots-stop/ it is 2 years old now, but I think the bots are more sophisticated now and can probably take on as much or more of the page. I think the biggest issue to consider should be page download time rather than how much of the page the spiders can index – very true when considering those who still have dial up connection.