SEOClerks

Noindex, Nofollow and Robots.txt - What's the difference?



Write the reason you're deleting this FAQ

Noindex, Nofollow and Robots.txt - What's the difference?

I'm frequently amazed by the number of webmasters and programmers out there unable to tell the difference between nofollow and noindex tags as well as robots.txt.

Let's start with the nofollow tag. 

If you really want to stop Google or any main search engine from crawling a certain section of your website you need to implement a nofollow attribute. A lot of people mistake the nofollow tag with the noindex tag, hoping nofollow will prevent search engines from indexing certain pages. If a page has only the nofollow tag implemented crawlers can still be fetched manually but the spider won't go past the nofollow page. 

I personally use the nofollow tag for backlinks I don't want to transfer authority to, but also on pages I don't want crawlers to find. I never used the alone nofollow attribute on internal pages without also using the noindex tag simultaneously. 
I usually have two option. The first option is to place a page on noindex but allow robots to crawl it because I probably want some lower level pages to get indexed. The second option would be to place both nofollow and noindex to make sure the page won't get indexed and the links on the page itself don't get crawled. 

Second, comes the noindex tag. 

The soul purpose of the noindex tag is to prevent search engines from indexing a certain page. If really want to prevent a certain page for turning out in Google search results, implement the following line into the page source code: <META NAME="robots" CONTENT="noindex"> - it worked for me 100% of the time. I had problems with Google my robots.txt but I never saw Google or any other search engine index a page that had robots noindex implemented. 

Contrary to popular belief the noindex attribute won't prevent search engines from crawling your website at will. Sure it won't index anything, but it will crawl and index anything they find without a noindex tag. 
To ensure crawlers don't index and don't pass a certain page you need to implement both a noindex and a nofollow attribute, like this: <META NAME="robots" CONTENT="noindex,nofollow">

If you have a WordPress blog you can easily make both nofollow and noindex implementation with the help of an SEO plugin like Yoast SEO. 

Robots.txt

Robots.txt is the perfect way to prevent certain crawling robots and other spiders from crawling or indexing certain portions or sections of your website. Although robots.txt is usually respected by search engines, I've seen plenty of times Google indexing the homepage of a website I've placed "Disallow: /" which means I don't want any crawler coming and going through my website. 

So yeah, placing something at disallowed in robots.txt doesn't necessarily mean it won't get index. Google can overpass that and index it nevertheless. If this happens you might see results that have the following meta description: "A description for this result is not available because of this site's robots.txt – learn more"
If you really want to stop crawlers from finding and indexing a page you need to implement the above robots, noindex and nofollow tags. 

Unless I need to tell crawlers no to index and crawl a long list of URL filters or something along the lines of URLs that are generated which Google understand perfectly and doesn't ignore, I usually have a very simple robots.txt that always looks the same. 
Example of a simple and proper robots.txt that allows everything:

User-Agent: *
Disallow: 

Sitemap: http://www.website.com/sitemap.xml


As you can see, I also place my sitemap link inside the robots file because the robots.txt is one of the first things that crawlers check when they first start crawling a new website. 

And here it is. These are the main differences between robots.txt, noindex and nofollow. If you have anything to add regarding this subject please do it in the comments below! 

Comments

Please login or sign up to leave a comment

Join
Tronia
Thank you for giving us more detailed information regarding these three things. I can honestly admit that I only knew the very basic functions of each one but I could still tell them apart.

Noindex = don't access that site (for example Google), so its content won't appear on the search engine,
Nofollow = used for links, self-explanatory name.

There are also other META tags like nosnippet and noarchieve.

Definitly worth reading and learning.



Are you sure you want to delete this post?

Makefort
Yeah, I agree. I myself didn't know any of these, thanks for sharing. What I don't get is how can your site get indexed if you disallowed it in the first place. Shouldn't search engines be respecting some rules as well?



Are you sure you want to delete this post?

Corzhens
I am confused with this no-follow tag for the attribute of the web page. That means the crawler will not see it and that will not be included in the evaluation of search engines for the inclusion to the search list. Pardon me for this ignorant question but why make the web page a no-follow when you also want your web page to be included in the search list for the potential traffic? I guess it is really confusing to me.



Are you sure you want to delete this post?