[ andrewho dot co dot uk ]

Clean URLs on jekyll/Apache

I use a static site generator, specifically jekyll, to transform some templates into a set of static *.html files. However, I like to keep the URLs looking clean, and not display the .html extension both because I think it looks better and also so that the URLs purely reflect the content and not the underlying files or CMS used to serve that content. In short, whilst the file being served might be $DOCUMENT_ROOT/weblog/title.html, the canonical URL for that resource should be /weblog/title. Here’s how I do that in .htaccess.

The first step is to turn on mod_rewrite:

RewriteEngine On

Create 404.html and let .htaccess know about it:

ErrorDocument 404 /404.html

All resources should be accessed via the main domain name, not a subdomain:

RewriteCond %{HTTP_HOST} ^[^\.]+\.andrewho\.co\.uk$ [NC]
RewriteRule ^(.*)$ http://andrewho.co.uk/$1 [R=301,L]

Remove trailing slashes (note that mod_dir fiddles around with them so disable that behaviour here too):

RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,]
Options -Indexes
DirectorySlash Off

Hide the *.html files by redirecting all requests for foo.html to foo:

RewriteCond %{THE_REQUEST} ^(GET|HEAD)\ /.+\.html\ HTTP
RewriteRule ^(.+)\.html$ http://%{HTTP_HOST}/$1 [R=301,L]

If the client requests /foo but that doesn’t exist, then try /foo.html:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^.+$ %{REQUEST_FILENAME}.html [L]

Throw that all into .htaccess (in that order) and you should have clean URLs.