Excerpt from Search Engine Journal
Google has published a fresh installment of its educational video series 'How Search Works,' explaining how its search engine discovers and accesses web pages through crawling.
In the seven-minute episode hosted by Google Analyst Gary Illyes, the company provides an in-depth look at the technical aspects of how Googlebot—the software Google uses to crawl the web—functions.
How Googlebot Crawls the Web
Googlebot starts by following links from known webpages to uncover new URLs, a process called URL discovery.
It avoids overloading sites by crawling each one at a unique, customized speed based on server response times and content quality.
Googlebot renders pages using a current version of the Chrome browser to execute any JavaScript and correctly display dynamic content loaded by scripts. It also only crawls publicly available pages, not those behind logins.
Here are some additional tactics for making your site more crawlable:
- Avoid crawl budget exhaustion – Websites that update frequently can overwhelm Googlebot’s crawl budget, preventing new content from being discovered. Careful CMS configuration and rel= “next” / rel= “prev” tags can help.
- Implement good internal linking – Linking to new content from category and hub pages enables Googlebot to discover new URLs. An effective internal linking structure aids crawlability.
- Make sure pages load quickly – Sites that respond slowly to Googlebot fetches may have their crawl rate throttled. Optimizing pages for performance can allow faster crawling.
- Eliminate soft 404 errors – Fixing soft 404s caused by CMS misconfigurations ensures URLs lead to valid pages, improving crawl success.
- Consider robots.txt tweaks – A tight robots.txt can block helpful pages. An SEO audit may uncover restrictions that can safely be removed.
Click here to read complete article at Search Engine Journal.