Cruft-Free URLs

Saturday, September 4, 2004 at 1:02 am | Comments off

I've decided that it's time to clean up my URLs. I've really been digging cruft-free URLs lately, as the just look so much better and are much more user friendly than a non cruft-free URL.

What is URL cruft?

Basically, anything that isn't meaningful in the URL is cruft. In my example, I am basically just removing the .php extension from my links. The .php extension is simply telling the web server that the files need to be parsed as PHP and qualifies as cruft. The visitors to this site have no reason to ever see the .php extension, as it's meaningless to them.

How I rewrote the URLs

Since TypeSpace already produces cruft-free URLs for the archived entries, it was simply a matter of rewriting the desired URLs to the corresponding PHP page. It's really very easy to do.

Here's the relevant part of my .htaccess file:

RewriteEngine On
RewriteBase /
RewriteRule ^about/?$ /about.php
RewriteRule ^designer/?$ /designer.php
RewriteRule ^archives/?$ /archives.php
RewriteRule ^archives/.+ /xblogpro/showblog.php
RewriteRule ^sitemap/?$ /sitemap.php
RewriteRule ^contact/?$ /contact.php

...and so on for other URLs that I wished to rewrite.

Explaining the rules

First, what we did was turn on the rewrite engine with this command: RewriteEngine On. That's all there is to that one, either on or off.

Next, we set our rewrite base. For my needs (site wide usage) I set it to a /, which will use my root directory. If I'd only want to apply these rules to a sub-directory such as my software section, I'd set the rewrite base to /software/. Simple enough, eh?

Finally, we get to the rewrite rules themselves. These are a bit more complicated than what we've seen so far and can be used for far more than what I'm doing with them here.

Those familiar with regex will likely be able to pick up on this quickly, as rewriting is done with a simple set of regular expressions for the rules. Here's how I'm matching:

The ^ is the start of line anchor, which ensures it starts at the beginning. Next, we match the desired URL, and allow (but don't require) a trailing slash with this bit: /?$. That optionally (the ? is the 0 or 1 quantifier) matches a trailing slash, while the $ anchors it to the end. Lastly, we simply add the location of the page that we want to rewrite the URLs to.

Since I needed my archived entries to be sent to /xblogpro/showblog.php, I used this bit of code: RewriteRule ^archives/.+ /xblogpro/showblog.php. the .+ matches 1 or more of any character. So, if it finds more characters after the /archives/ part of the URL, people will be sent on to the /xblogpro/showblog.php page, which then fetches the desired entry.

Very simple stuff, but it makes a world of difference. Are you cruft-free?

Comments

DarkBlue
September 4th, 2004
1:49 AM | #

It's worth noting that, where possible, URL rewriting is best defined within Apache's "httpd.conf" rather than ".htaccess".

The reason is simple: Performance!

".htaccess" is opened and parsed for every page request whereas "httpd.conf" is opened and parsed once only, at server start-up.

The performance differential becomes noticeable when your rewriting is more complex or as your ".htaccess" file grows.

Under high-traffic conditions (/.) ".htaccess" becomes a real bottleneck.

DarkBlue
September 4th, 2004
1:54 AM | #

Oh, sorry, one more little thing I noticed. The line:

RewriteBase /

is superfluous. "mod_rewrite" assumes "/" by default. Thus "RewriteBase" only needs to be defined if it is not "/".

Jona
September 4th, 2004
4:52 AM | #

Just to clear up any misunderstandings, this is an Apache-only module -- if you're not running Apache as your webserver, you won't be able to use these commands.

Anyway, great tip, Ryan; I've been using cruft-free URL's since I loaded Apache as a local testing server (in the place of Aprelium's Abyss Webserver X1). In my upcoming blog, I'll be cruft-free as well.

Dean
September 4th, 2004
7:21 AM | #

Great topic, and thanks for the tip, Ryan. I never heard it referred to as 'cruft-free URLs' before, but I've been a big fan of Content Negotiation ever since reading about it a few years ago.

They sound like basically the same thing, except with content negotiation you write your links without file extensions initially, rather than going back later and converting them. But of course your way would be a lot faster if you have a large site to convert.

I've been wondering: Is it possible to do anything like this on a Windows server? Sounds like the answer is no but it sure would be nice if we could.

Scott
September 4th, 2004
3:06 PM | #

Awesome tip Ryan! I use the mod rewrite rules that come standard with Wordpress and modified them to fit my other pages.

DarkBlue
September 4th, 2004
6:26 PM | #

"Is it possible to do anything like this on a Windows server?"

It is possible providing you use Apache on your Windows server rather than IIS.

If you have to use IIS then you can enjoy similar (although not quite as clever or fast) rewriting with ISAPI_rewrite (http://www.isapirewrite.com/).

Matt Galaviz
September 5th, 2004
2:29 AM | #

This is also called SEF URL's, or Search Engine Friendly URL's as they don't spider urls with ?var=value type URL's as well, or that's what I've heard.

Bob
September 5th, 2004
4:16 AM | #

Awesome tip, Ryan! I implemented it on my site upon reading this, it makes it a bit more user friendly, methinks. Also, would you happen to know of any good tutorials on this? Apache.org was a bit confusing and I can't get Rewrite rules for pages in subdirectories to work...

DarkBlue
September 5th, 2004
7:21 PM | #

"Also, would you happen to know of any good tutorials on this?"

The very best documentation is the mod_rewrite author's own: A Users Guide to URL Rewriting with the Apache Webserver.

Ryan Brill
September 7th, 2004
7:26 PM | #

DarkBlue - Thanks for taking over while I was out for the weekend. ;)

For me (and probably most people), editing the httpd.conf file is not an option, as I am on a shared server. It's a good thing to note, though, for those who do have the capability to edit their httpd.conf file. Also, thanks for pointing out that RewriteBase / isn't needed - I need to learn to read. :D

Jonathan M. Hollin (DarkBlue)
September 7th, 2004
7:36 PM | #

"Thanks for taking over while I was out for the weekend."

No problem Ryan! ;-)

"editing the httpd.conf file is not an option, as I am on a shared server"

I appreciate that. However, Apache can have multiple "httpd.conf" files, all of which would be loaded at start-up. I blogged about this. It might be worth asking your web host if they'll support this and "include" your custom "httpd.conf" in their Apache configurations.

Ryan Brill
September 7th, 2004
7:39 PM | #

"It might be worth asking your web host if they'll support this and "include" your custom "httpd.conf" in their Apache configurations."

Cool, I'll do that. I kinda doubt they'll allow it, but hey, it doesn't hurt to ask...

Jonathan M. Hollin (DarkBlue)
September 7th, 2004
7:42 PM | #

"I kinda doubt they'll allow it..."

If they don't, ask them why.

You can't break Apache (or the other websites on your shared server) with Virtual Hosts, so there's no reason for the hosting company not to support them.

Matt Galaviz
September 8th, 2004
8:23 AM | #

It seems like you are following the Movable Type naming convention for these, when they should really be called "Search Engine Friendly" URLs. A quick search on google will return approx. 700 results for cruft-free, while returning almost triple that for SEF urls. For a search of "Search Engine Friendly" urls, try it yourself for the results. Whether you call it cruft free or Search Engine Friendly is a matter of semantics, but I feel that people will get a better understanding if they have both nomenclatures at their disposal.

Ray
September 8th, 2004
9:58 PM | #

Ah, yes. Cruft Free. I love Cruft Free... especially Cruft Free Singles. Those little slices of processed cheese in individual packages. They're awesome for taking your lunch to work. That way the cheese isn't on the sandwich all morning to get soggy. It can stay in it's individual wrapper until you get ready to slap it on your ham 'n cheese on rye. All this talk of Cruft Free has got me hungry.

...What's that? "Cruft" Free you say? Oh!!! I thought you said "Kraft" Free! :) Well, don't I look foolish!

Jason Hoffman
September 9th, 2004
7:52 PM | #

To comment on the content negiotiation note by Dean.

For example, the problem with MT isn't that it writes a file named archives/january.php but that it writes permalinks on pages as archives/january.php.

You can write a page to the filesystem at archives/january.php and then if content negiotiation is on (it's a feature of Apache, Zeus, IIS, ...) the file archives/january.php can be pulled up at http://domain.com/archives/january, http://domain.com/archives/january/ or http://domain.com/archives/january.php

So the key is have your CMS make cruft URLs in permalinks.

Ryan Brill
September 9th, 2004
9:26 PM | #

Hey Jason, maybe I shouldn't take my own entry off topic, but when are you guys going to start offering reseller hosting over at TextDrive?

Right now would be a fine time for me. ;)

Harish
November 27th, 2004
8:35 AM | #

Can you suggest a soln for cleaning the URL's of ASP sites.

Dirk
December 8th, 2004
1:05 AM | #

You might want to check out this module for IIS:

http://www.port80software.com/products/pagexchanger/

Even their site has cruft-free urls on IIS. They seem like a pretty cool company. Sort of giving IIS the Apache treatment.

Jonathan M. Hollin (DarkBlue)
December 8th, 2004
1:28 PM | #

Looks good Dirk. They certainly have some great software for those poor IIS users.

Ali Karbassi
March 27th, 2006
8:18 AM | #

Why not use 'Options +MultiViews'. The only thing I've read going against 'Options +MultiViews' is: http://www.gerd-riesselmann.net/archives/2005/04/beware-of-apaches-multiviews

Comments are automatically closed after 45 days