19 Mar What is a Robots.txt file?
A what? Is this some sort of robot? Again a technical blog about how you can optimize your website in the best possible way. In our blog about 5 technical SEO optimizations we already highlighted a bit how you can optimize your website with some technical tricks. In this blog we will dive more into this specific Robots.txt file so you can learn more about SEO optimization.
Let’s start at the beginning, every website has to have its own Robots.txt file. The first thing a search engine crawler will do when it enters a website is looking for a Robots.txt file. This is happening because the search engine wants to know if there are any special instructions he has to deal with. You can see this file as an instruction document for the specific search engines you want to talk with.
These instructions that are found within the robots.txt file are simply called the guidelines of your website. The robots.txt file is very import from a SEO perspective. What it does is it is telling search engines what the best way is to crawl you’re website. Why you want to do this? You want to do this because search engines have a specific “crawl budget” for your website. This crawl budget will be used as efficient as possible. So you don’t want Google for example crawl pages that are not relevant for the search engine.
In short the Robots.txt file can manage the access to your website and make areas prohibited.
And when there is no robots.txt file on a website?
This is pretty simple. When there is no Robots.txt file on your website with defined guidelines the search engine assumes that he is allowed to crawl the entire website. You can see this file as guidelines for the search engine, since the search engine is still able to ignore the file or some parts of it. Most of the big search engines accept the guidelines though.
In the below example you can see an example where the search engine didn’t follow the guidelines in the robots.txt file.
How to make the best use of a Robots.txt file
As you’ve read there are a lot of possibilities with the robots.txt file. For larger websites this is enormously important especially for the webshops out there. Although there are still a lot of company’s who are not making the best use out of it. We at Digital Movers are specialized in SEO and have a lot of experience with issues in the SEO branch including Robots.txt files. We can help you to set it up in the best possible way.
The technical side of the Robots.txt file
Where can you find the robots.txt file? Simple type in https://example.com/robots.txt. Underneath you are seeing the Robots.txt file of the Digital Movers Website.
The first part of the code always start with User-agent: *
This simply means that every type of bot from every search engine is granted access to this domain.
When the bot reads this line he knows that it can start crawling the website’s pages.
Underneath the line “User-agent:” the bot sees Disallow: /wp-admin/ this means that he cannot index this piece of information. Because this is the backend of the website and controlled by WordPress. This can be Disallowed for various reasons, you will most likely find administration pages or pages who are in maintenance. Especially when you have a webshop it is highly important to disallow pages that are not putting value to the search engine. Think about pages like the checkout process, filtering options etc.
We at Digital Movers advise you to keep the number of Disallow pages minimal because blocking some parts of your website can result in bugs and lags in certain scripts on your website.
In the Allow section you see the link: /wp-admin/admin-ajax.php. This means that anyone can get access to the WordPress login section of Digital Movers because several people need to have access to that link. Since we don’t have a really broad website it is not needed to disallow many more pages.
Is your Robots.txt file secured?
Notice that your Robots.txt file is visible for anyone who wants to. It is therefore super important that things like passwords or sensitive information is not included in it.
Keep your Robots.txt very clean and don’t put any private pages in it. You can put pages in it that you are optimizing or sections of your website that you are optimizing. This can help you optimize your website and you are sure that none of the pages are indexed already.
Optimizing Robots.txt for SEO
How you optimize you’re Robots.txt file all depends on the content you have on your site. There are all kinds of ways to use Robots.txt to your advantage.
We’ll go over some of the most common ways to use it since we have clients that are having webshops and websites and for both the robots.txt file looks totally different.
One of the best ways to use the Robots.txt file is to maximize search engines’ crawl budgets by telling them to not crawl the parts of your site that aren’t displayed to your public. So if you want a bot to not crawl your page you can type this:
There are two other Directives you should know of: Noindex and Nofollow
We’ve earlier talked about you not wanting to get your page indexed remember? You might think that the Noindex Directive will prevent your page from getting indexed unfortunately that’s not true.
So theoretically you could disallow a page but it could still end up in the index. Generally you don’t want that, That’s why you need the Noindex directive. It works with the disallow directive to make sure bots don’t visit or index certain pages. IF you have any pages that you don’t want to get indexed you can simply just use them both here is an example: