Converting a WordPress blog into a static site
My personal blog at el73 hasn't seen an update in years. Constantly upgrading WordPress because of a newly discovered security hole on a website I no longer publish to feels a lot like wasting time.
That's why I decided to finally remove the WordPress parts from the equation and turn the blog into a static, archived site. Two tools made this conversion easier than I could have imagined from the start.
Wget
The first step was to archive the entire published site. With a little help
from wget
I got all of around 500 pages downloaded in a little less than an
hour, thanks to Peter Upfold
and this command: $ wget -mk -w 20 http://el73.be/
Cleaning up the HTML
So now I've got a local mirror of the site, but I need to change some things around. Since I'm not planning on keeping WordPress around, the form to add a comment will have to be removed. Plus there's a link to search the website's contents in the navigation. Since that's also powered by WordPress, it'll have to be removed as well.
My first attempt to clean up the HTML consisted of processing all the HTML files and removing
those two parts using regular expressions. Now, I'm no command line guru
chaining 5 separate commands into a single perfectly working
combination of nix glory. And as it turns out, OSX doesn't support the
full range of arguments offered by grep
on other systems
such as Linux anyway. Sure, I could have delved in and wasted a lot of time on
getting this to work, but it would be just that: wasting time. I'm not
planning on doing this again anytime soon, so whatever works best and
*right now is going to get my vote.
As it turns out, PhantomJS is what worked best for me. PhantomJS allowed me to use a simple script that would walk the entire mirrored site and collect the HTML files. After that, it was a matter of loading each HTML file, removing the elements I wanted removed using plain DOM operations and saving the content back to file.
var cleanUpHtml = function(fp) {
// Open the file at the given location
console.log('Cleaning up HTML in ' + fp);
var address = 'file://' + fp;
console.log('Opening file at ' + address);
var page = new WebPage();
page.open(address, function(status) {
if (status !== 'success') {
console.log('Unable to load ' + fp);
} else {
var innerHtml = page.evaluate(function() {
// Remove the comment form
var el = document.getElementById('commentform');
if (el) {
el.parentNode.removeChild(el);
}
// Remove the search link
el = document.getElementById('nav');
if (el && el.tagName == 'UL') {
var lis = el.getElementsByTagName('LI');
if (lis && lis.length == 4) {
el.removeChild(lis[3]);
}
}
// And return the inner HTML of the root
return document.documentElement.innerHTML;
});
// Wrap the inner HTML
fs.write(fp, '<!DOCTYPE HTML><html lang="nl">' + innerHtml + '</html>', 'w');
console.log(fp + ' processed');
}
});
}
Upload... and profit!
So that's how I could turn an entire website backed by WordPress into a site consisting of nothing but static assets in about 2 hours of work, not counting the time needed to download and process the pages. Upgrading to the next two patched versions of WordPress would probably cost me the same or more.