Early this month Google announced a new user agent for the robots.txt file to direct the Google News bot on what to do. The new user agent, Googlebot-News, is used just the same way you use the Googlebot agent. To make things clear though here are some of the examples given by Google on the use of the new user agent.
Include pages in Google web search, but not in News:
User-agent: Googlebot
Disallow:User-agent: Googlebot-News
Disallow: /Include pages in Google News, but not Google web search:
User-agent: Googlebot
Disallow: /User-agent: Googlebot-News
Disallow:Block different sets of pages from Google web search and Google News:
User-agent: Googlebot
Disallow: /latest_newsUser-agent: Googlebot-News
Disallow: /archives
According to Google “The pages blocked from Google web search and Google News can be controlled independently. This robots.txt file blocks recent news articles (URLs in the /latest_news folder) from Google web search, but allows them to appear on Google News. Conversely, it blocks premium content (URLs in the /archives folder) from Google News, but allows them to appear in Google web search.” Note that you can do this for any specific page.
Stop Google web search and Google News from crawling pages:
User-agent: Googlebot
Disallow: /
In this case since Googlebot is disallowed and there is no specific instruction on what to do with Google News, the news bot will play it safe and simply not crawl the page.
Happy New Year!
Please share your thoughts
Filed in: Google, SEO Tips, SEO lessons



























Recent Comments