So, I’ve just finished importing 610 entries into my blog’s database. As I mentioned before, I had used a tool called Warrick to scour the caches of various search (Google, Yahoo, etc) and archiving (Internet Archive) services to reconstruct a snapshot of cubanlinks.org. This process left me with 1029 files scattered throughout 1106 directories.
(Remember, /archives/2003/12/01/some-post/index.html is comprised of 5 directories and 1 file)
So, the next step was to get these static files into the database. I accomplished this by writing a script in Ruby to extract title/date/body information from the page and insert these values into the database using ActiveRecord. Thank god for Typo being kind enough to fill in the other values for me on save (guid, permalink, etc).
It’s unclear how many posts never got recovered with Warrick in the first place. Eyeballing it, I’d say I have at least 80% of my posts. And you know what? I’ll take that.