Refinerycms-wordpress-import

This litte project is an importer for WordPress XML dumps into refinerycms(-blog).

You can find the source code on github: github.com/mremolt/refinerycms-wordpress-import

Keep in mind that links to other pages of your blog are just copied, as WordPress exports them as <a>-Tags. If your site (blog) structure uses new urls, the links WILL break! For example, if you used the popular WP blog url structure “YYYY-MM/slug”, be warned that Refinery just uses “blog/slug”. So your inner site links will point to the old WP url.

Prerequisites

As refinerycms-wordpress-import is an addon for RefineryCMS, is shares the prerequisites with it. So you’ll first need a running installation of refinerycms and refinerycms-blog. Make sure the site is running, all migrations are run and you created the first refinery user.

Installation

Just add the gem to your projects Gemfile:

gem 'refinerycms-wordpress-import'

Or if you want to stay on the bleeding edge:

gem 'refinerycms-wordpress-import', :git => 'git://github.com/mremolt/refinerycms-wordpress-import.git'

and run

bundle

Usage

Importing the XML dump is done via rake tasks:

rake wordpress:reset_blog

This one basically deletes all data from blog relevant tables (taggings, tags, blog_comments, blog_categories, blog_posts, blog_categories_blog_posts). Use this one first, if you want a clean import of your old blog.

rake wordpress:import_blog[file_name]

This one does all the heavy work of parsing the dump and importing the data into refinery tables. The parameter is the path to the dump file. Got a report from a Mac user, that the ~ didn’t work in the path. I’ll have a look at it, but till then, don’t use it please.

If you don’t want to import draft posts, you can set the ENV variable ONLY_PUBLISHED to true:

rake wordpress:import_blog[file_name] ONLY_PUBLISHED=true

The task will then skip all posts that are not published.

rake wordpress:reset_and_import_blog[file_name]

This one combines the two previous tasks.

If you also want to import the cms part of WordPress, three more rake tasks manage the import into RefineryCMS Pages:

rake wordpress:reset_pages

This task deletes all data from the cms tables, ensuring a clean import. Otherwise existing pages could break the import because of duplicate IDs.

rake wordpress:import_pages[file_name]

This task imports all the WordPress pages into Refinery. The page structure (parent - child) is preserved.

If you want to skip the draft pages, add the ONLY_PUBLISHED parameter to this task, just like with wordpress:import_blog.

rake wordpress:import_pages[file_name] ONLY_PUBLISHED=true

If you want to clean the tables and import in one task:

rake wordpress:reset_and_import_pages[file_name]

Finally, if you want to reset and import all data including media (see below):

rake wordpress:full_import[file_name]

Importing media files

The WP XML dump contains absolute links to media files linked inside posts, like:

www.mysite.com/wordpress/wp-content/uploads/2011/05/cv.txt

The dump does NOT contain the files itself! To get them imported, this gem downloads the files from the given URL and imports them to refinery. So for a working media import the old site with the media URLs must still be online.

After importing the files, this gem replaces the old links in pages and blog posts with the new generated ones. It parses all existing records searching for the right pattern. That means, you have to import pages and posts FIRST to get the URLs replaced.

Now to the rake tasks for media import:

rake wordpress:reset_media

This task deletes all data from the media tables (images and resources), ensuring a clean import.

rake wordpress:import_and_replace_media[file_name]

This task imports all the WordPress media into Refinery. After the import it parses all pages and blog posts, replacing the legacy links with the current refinery ones.

If you want to clean the tables and import in one task:

rake wordpress:reset_import_and_replace_media[file_name]

Usage on ZSH

One more hint for users of zsh (like myself):

The square brackets following the rake task need to be escaped on zsh, as they have a special meaning there. So the syntax is:

rake wordpress:reset_and_import_blog\[file_name\]

Ugly, but it works. This is the case for all rake tasks by the way, not just mine.

Feedback

This is still a very new gem. It manages to import my own blog and a standard WordPress 3.1 dump with some sample posts. The first feedback is quite good, so it seems, the gem doesn’t eat the machines it is installed on.

If you want to help make it even more stable, please throw your own WP dumps against it and see what happens. If you encounter any bugs, please file a bug report here on github. A sample dump that breaks this gem would be really helpful in that case.

For extra karma, fork it, fix it yourself and send a pull request! ;-)