Having become a bit of a self proclaimed expert on mod_rewrite, I spend an inordinate amount of time answering mod_rewrite questions on #apache, on irc.freenode.net.
Unfortunately, much of that time is utterly wasted on something known as “Search Engine Optimization”, or SEO. SEO is an industry that has grown up around profound ignorance about how search engines work, and the basic theory goes like this. Certain URLs are “bad” and others are “good”, and you need to use mod_rewrite to convert “bad” URLs into “good” ones. Bad URLs are those that contain “?” and “&” and “=” and other non-alphanumeric characters. Thus, one must use mod_rewrite to create URLs that lack these characters.
This is all very well except for one thing – it’s nonsense, utterly untrue. Now, it may have been true 10 years ago, but since then, search engine algorithms have gotten better, and it’s just not the case any more.
Below is a comprehensive list of the search engines that matter:
The algorithm used by the search engines in this list goes basically like this: If your content is worthwhile, other people will link to you. Therefore, sites that have a lot of links to them are good sites and should appear at the top of searches.
That’s it. Nothing to do with the characters appearing in the URL. So if you’ve paid some SEO firm a great sum of money to increase your search engine ranking, the chances are very, very high that you’ve wasted that money.
And yet, it seems that more than half of the questions that we field on #apache have directly to do with this false belief in the principles of SEO. This is a great shame, since there are actually some people with legitimate questions, and they have to wait for the nonsense.
Now, notion of “good URLs” does have one redeeming quality. URLs that are easier to read to someone over the phone, and easier to remember, have a certain amount of value in marketing. This is undeniable. For this purpose, Apache provides the Redirect directive, whereby you redirect the easy-to-remember URL to the actual URL.
Another useful technique is to actually design your website with non-convoluted URLs from the start.
However, none of these techniques will improve your search engine ranking if your content isn’t good. Content is important. Your URL isn’t important.
I realize that there are folks who are convinced of the necessity of creating “pretty URLs”, and reading this article won’t in any way dissuade them. The reasoning seems to go that everybody is doing it, and therefore it must be right. I’ll not waste your time with explaining the fallacy in this position. I will, however, direct you to the page that Google themselves have compiled to tell you about how their search algorithms work.
And, yes, I know that there are actually other search engines that matter. Their ranking algorithms are not so very different, and even when they are, the people programming them are not so stupid as to be unaware that almost every site on the web is composed of applications that load content out of databases, and are therefore likely to have URLs containing query string characters. So de-valuing sites that contain those characters in their URLs would be exceedingly counterproductive. Give them some credit.
Updated Sept 20 18:43: Note that I’ve updated the title of this article, due to some of the feedback in the comments. I concede that there may be people in the “SEO” industry who are not snake-oil salesmen. I can even somewhat believe that all the satisfied customers never make it to #apache, and the folks who are there are the very ones who received bad advice. It’s more than a little damning that every SEO website that I’ve looked at is full of terrible advice. Presumably, then, the folks who are are good at their job are just being secretive.