problem-is-relativeOkay, so this post may come across a little rantish, but it needs to be said…. relative URLs SUCK for SEO.     I can’t even begin to count the number of times one of our crawlers gets stuck in an endless web of spidering a broken relative url that leads to 404 pages that have relative URLs…forever!

Dear Developers…  PLEASE stop using relative URLs.   I know it can makes things easier when you are building in a dev environment and pushing things to production, but ultimately it generates more work for you and creates SEO issues for your clients… oh, and using base href doesn’t solve all the issues.

Why is this so important?    First, let’s take a look at what a relative URL is:

A relative URL is a url that doesn’t contain the full domain name and path, but instead a portion of the path.   For example, when linking to your about us page from the homepage of your site, you might link to it using an anchor similar to this <a href=”about-us”>About Us</a> or this <a href=”/about-us”>About Us</a>.   In the first example, a browser knows to simply append the url to whatever the current page is, and in the second example, the / tells the browser to go to the main directory and append the rest.    Often times, you might even find a link such as <a href=”../about-us”>About Us</a>, which tells the browser to go back a directory and then append the URL.   Sounds like it could get a little confusing right?  Browsers can figure out where the link is supposed to go, so functionally, a user may not ever know the difference.

The best way to link to the about page from EVERY page on the site, would be to use an absolute URL, <a href=”http://www.domain.com/about-us”>About Us</a>. This tells the browser exactly where to go, and does not leave any room for interpretation or mistakes.   Relatively URLs were somewhat logical years ago when all pages were static HTML pages, and you didn’t have global templates for things like headers, navigation, etc.  Additionally, a myth regarding the impact on site speed was purported by various sources, which further caused developers to think they were doing something positive.

Enter reality…  Relative URLs suck!   Here are a few examples of how relative URLs can impact your search rankings:

  1. http vs. https – When using relative URLs, you can suddenly find that you have a complete duplicate of your site due to the spiders finding a link to an https page, which then contains relatively links to all other pages of your site.  Very quickly, all of your pages are indexed as both https and https versions.  NOT GOOD!
  2. Canonical URLs – Google recently accepted the rel=canonical as a method for webmasters to specify the canonical version of any URL.  The goal was to help eliminate duplicate content by letting a webmaster state explicitly what the URL of the page was supposed to be indexed as.  Unfortunately, using relative URLs in your canonical tag won’t help if your site can be visited through the direct IP address, and doesn’t have a redirect in place to handle it.  Even worse, as spiders crawl the page with the IP address, https, etc. it finds the canonical tag which is stating different URLs for the same content.. completely defeating the purpose of the canonical.   USE ABSOLUTE URLs FOR YOUR CANONICAL TAG!
  3. Infinite crawls – When using relative URLs in navigational elements and similar, you can easily create a situation where a spider can endlessly crawl your site.   For example, if you have a link in your menu such as <a href=”about-us”>About Us</a>, and a spider finds a broken internal link on one of your pages such as http://yourdomain.com/gobble, they are going to land on a 404 page that now has a link to http://yourdomain.com/gobble/about-us, which has a link to http://yourdomain/gobble/about-us/about-us, which has a link to http://yourdomain.com/gobble/about-us/about-us/about-us, and so on.  DON’T USE RELATIVE LINKS!

Hopefully I’ve made my point, and you will immediately go and check if you are using relative URLs for your site.  An easy way to check this is to view the source for a page on your site and then do a search on the page for any of the following;   <a href=”/   or  <a href=”.     This won’t find all instances, but if you find a few like this, odds are there are more to be fixed.