As part of our initial and weekly website health checks we attempt to spider your website and crawl all the pages we can find.
Due to size constraints (there are some huge sites out there!) we only display the first 3 levels of crawled pages in the 'Crawled Pages' table.
My Site Shows 000's More Pages Crawled Than I Have - What Should I do?
If the number of indexed pages reported for your site is a very large number (e.g. thousands) and you do not have that many pages on your site; then it is likely that our crawler has managed to index search engine results pages or pages created dynamically by your Content Management System. You should review the structure of your site and consider using robots.txt or the rel=canonical tag to help steer search engine robots in the right direction to focus their attention on the content you do want to get indexed.
My Site Shows 0 Pages Crawled - What Should I do?
Our spidering process is an important pre-requisite for a whole number of tasks, so if your site is reporting 0 pages crawled by our spider then please check the Automated Tasks to-do-list on the bottom right-hand side of the Site Audit tab.
If 'Spider Site' is showing a 'tick' that it is complete and your 'Recent Activity' report and the 'Indexed Pages' graph is showing 0 pages crawled by our spider then there is clearly a problem crawling your site. Please do contact us and we will try and do what we can to rectify the issue.
If 'Spider Site' is showing our software's icon then this means that the spidering is still in progress. Click on the icon next to the 'Spider Site' and it will take you to a new page telling you what time our system expects to have completed this task by.
Please then wait for the spidering process to finish. Once the process has finished, if the number of crawled pages is still showing 0 then please contact us.
What can I use Crawled Pages for?
Once our spider has crawled, indexed, analysed and counted all the pages on your site you can use the information we find to give you some valuable insights into the performance of your website and potential areas of concern.
It is useful to compare the number of pages we have found with our spider (called Curious George) and how many pages each major search engine is reporting in its respective index.
Here are some different scenarios and examples of what action may be necessary in each instance:
If we have found say 100 pages on your site and any other search engines is showing 0 pages then it is highly likely that your website is not indexed in that search engine. If this is the case, then the system will automatically generate a task to help you get your site indexed. Click on the relevant task and follow the instructions on screen.
From time to time search engines may stop giving out numbers of indexed pages, they may have gremlins in their systems or we may have in ours or they may give unusually high numbers. Our system tracks this daily, so it is often best to wait a day or two before taking any action. By the same token, if our system suddenly reports 0 indexed pages in a search engine and you have previously been indexed in that search engine, then it does not mean that your site has been dropped from that engine. It is most likely a temporary glitch and the numbers will return to expected levels the following day.
If we have found say 20 pages on your site but you think there are many more than that, then perhaps there is an issue that is preventing us from properly crawling your site. By comparing the numbers of indexed pages in other search engines you can determine whether there are fundamental problems with the 'crawl-ability' of your site.
If we have found say 50 pages on your site and there are approximately 40 in each search engine then you clearly have some pages (different for each engine) that are not yet indexed. The 'Review Unindexed Pages' task will give you advice and suggestions on actions you can take to get these pages indexed. Submitting an XML site map to the search engines is also useful.
If we have found say 50 pages on your site and one or more of the search engines is reporting many more pages than you know you have, then chances are you have a duplicate content issue.
This could be caused by a number of reasons but likely favourites:
Whilst we do not currently believe there is a duplicate content penalty for this it clearly does not help your SEO efforts. Search Engines allocate only a certain amount of 'crawl equity' to your site and if all they are finding is reams of identical content then this isn't going to make them increase their 'crawl equity', nor is it likely that they will get to all of your most important content; it also gives them confusing signals as to which pages to rank for a particular topic.
So if you think you have a duplicate content issue, then check the Webmaster Tools account for each search engine and perhaps ask for some professional help in resolving the issue.