April 19, 2010

Hands Off, Google!

By:

Hostway Team

By Melissa J Luther

If you have a Web site that you’ve worked hard to get ranked well with the search engines, you may be wondering why on earth you’d want to prevent any part of it from getting indexed. However, there are several types of pages you might want to share only with certain visitors. These might include:

Duplicate content: For example, print friendly or downloadable PDF versions of your HTML pages
Error message pages
Thank you and confirmation pages
Special landing pages: Pages you have designed specifically for PPC or Email advertising campaigns, for example

There are several ways to tell the search engines to ignore specific pages on your Web site, but the most common and easiest are meta tags and the robots.txt file.

The NoIndex Meta Tag

Probably the simplest way to exclude just a few pages is with a meta data tag on each page you want the search engines to ignore. The following tag on a page tells all the search engine robots to ignore it:

The search engines will still read these pages, but will not index them.

You can also specify specific bots to exclude, or include. For example:

This tag only works for HTML pages. You cannot block .pdf, .doc or other non-HTML files.

The Robots.txt File

The robots.txt file is a plain text file that lives in a site’s root directory and specifies to the search engines which pages and directories are off limits. Most search engine robots automatically look for this file. Like the meta tag, you can specify which robots it applies to.

To tell all robots to stay away from directories named “promo” and “print”, your robots.txt file would look like this:

User-agent: *

Disallow: /promo/

Disallow: /print/

Remember the trailing slash, or the bots will interpret it to mean they should ignore any file beginning with promo or print, which may not be your intent.

While you can specify individual files, rather than directories, this can quickly become cluttered and unmanageable. It’s much easier to organize your site in such a way that any files you want to exclude from indexing reside in only a few directories.

This method is not foolproof, because some search engines ignore robots.txt.

Other ways to block search engine indexing include password protection — search engines cannot access content protected by a password —and the x-Robots tag for blocking non-HTML content. You may have also heard that not linking to a page will keep search engines from indexing it. While this makes sense in theory, many Web masters have found that it rarely works in practice, and supposedly orphaned pages appeared in search engine results within weeks.

For most small Web sites, using a noindex tag, a robots.txt file or both is sufficient to block indexing of specific pages.

About the Author

Melissa J Luther, owner and founder of LookSee Information Solutions, LLC, helps small businesses create and maintain a strong online presence. She takes a multi-channel approach, with a well-optimized Web site as the center of an online presence that includes content creation, PPC advertising, linking and social media as appropriate.

Want to read more?

Our President: This is the Moment To Start Your Business

2022 and Beyond: Website Musts For a Post-Pandemic World

7 Tips for Generating More Leads and Conversions on Your Website

How to Know If You Need a Website Refresh

Web Design Trends and Standards for 2021

The Benefits of a WordPress Website

Stay in the Loop

Join Our Newsletter

Stay ahead of the pack with the latest news, web design advice, and digital insights, delivered straight to your inbox.

April 19, 2010

Hands Off, Google!

By:

Hostway Team

The NoIndex Meta Tag

The Robots.txt File

About the Author

You May Also Like:

Our President: This is the Moment To Start Your Business

2022 and Beyond: Website Musts For a Post-Pandemic World

7 Tips for Generating More Leads and Conversions on Your Website

How to Know If You Need a Website Refresh

Web Design Trends and Standards for 2021

The Benefits of a WordPress Website

Stay in the Loop

Join Our Newsletter

Products

Resources

Company

April 19, 2010

Hands Off, Google!

By:

Hostway Team

Related Products

Web Hosting

The NoIndex Meta Tag

The Robots.txt File

About the Author

You May Also Like:

Our President: This is the Moment To Start Your Business

2022 and Beyond: Website Musts For a Post-Pandemic World

7 Tips for Generating More Leads and Conversions on Your Website

How to Know If You Need a Website Refresh

Web Design Trends and Standards for 2021

The Benefits of a WordPress Website

Stay in the Loop

Join Our Newsletter

Products

Resources

Company