How Googlebots Crawl and Index Your Website
NASHVILLE, TN – There are millions of websites out there competing for a top spot on Google. Google needs to know what all these sites are about to correctly categorize them and deliver relevant results to users.
“We use a huge set of computers to fetch (or “crawl”) billions of pages on the Web,” Google says. I’d like to know how many computers are working automatically to search and categorize these billions of pages. One report from last year estimated that Google uses about 900,000 servers. Google has never revealed how many computers they’re running, but they do give us some information on their energy efficiency at their data centers.
Googlebots are simply computer programs that run set algorithms to fetch websites. Googlebots have many names – spiders, robots and bots. These computer programs decide how often your website is crawled and which pages to fetch from it. Search engine optimization firms don’t know exactly how these algorithms are set up to crawl your pages, but Google provides us some insight in the Google Webmaster Tools.
“Google’s crawl process begins with a list of Web page URLs generated from previous crawl processes and augmented with Sitemap data provided by webmasters,” Google says.
The bots start crawling URLs on their lists, and each time a link is found it’s put on the list to be crawled. Dead links found are updated in the Google index.
Googlebot goes through each line of your website. Every word counts, and Google even notes where on the page the words are found. That’s why it’s important to have your most valuable keywords listed sooner rather than later.
“We process information included in key content tags and attributes, such as Title tags and ALT attributes,” Google says.
Make sure your HTML contains keywords in these important sections. ALT attributes pertain to images on your website. Title tags can be used to include desired keywords, but be careful not to “stuff” keywords in these tags. Updating your website often and including rich attribution tags are some of the best SEO techniques.
“Googlebot can process many, but not all, content types,” according to the Webmaster Tools. “For example, we cannot process the content of some rich media files or dynamic pages.”
Interestingly, Google’s blogging platform Blogger recently made dynamic pages part of their template options even though these pages can’t be crawled.
Now that you’ve optimized your HTML tags, it’s time to make sure all your links are working. Dead links and bad links are no good.
Google tells us that most websites in their database haven’t been manually added, but were found by automatic crawling. Google does miss some sites, but only those without links to them. Some sites don’t have many inbound links, and other sites are poorly designed for crawling. (Here are some helpful tips from Google on how to make your site easy to crawl.)
You can check to see if Google has indexed your site. Type “site:yoursite.com” into the Google query bar. If your site comes up, then Google has indexed it. Provide Google with a Sitemap of your website to encourage Google to crawl and index your site.
Remember to ask Google to crawl your site after you’ve notably updated a page or when you’ve added a new page. That will help Google know you’re ready for a new crawl.
Thanks for reading, and please feel free to leave comments and ask questions.




-
Recent Comments