// index

hyrme.com

Importing wikipedia content into drupal

Recently I was asked to help import wikipedia content into a drupal server. The import was relatively straight forward but getting the pages to show up correctly was another matter and still a work in progress.

If you would like to accomplish this here is what you will need to do:

  • the following modules installed and configured in drupal
  • CCK node to contain the wiki content - you could dump it into a story node if you wish.
  • Create a new input format that includes the Pear wiki filter and the MediaWiki Format.
  • Grab your wikipedia datadump - see here for the entire wikipedia datadump http://meta.wikimedia.org/wiki/Data_dumps
  • Install the Parse::MediaWikiDump library for perl
  • A perl script can be then used to dump the wikidump files into CSV files which are then imported into drupal using the mysql or pgsql command line tools.
  • The parse::mediawikidump library is extremely straightforward to use.
  • In the mysql import you will need to specify the content type of the imported content,replace the author information with your own if necessary - unless you want to create every author or have your authors in the system and specify that the content uses the new filter format that you defined in the previous step.

The pear mediawiki filter module does not support the template tags for mediawiki so you will need to either filter them out or write your own filter to handle these tags.

Click here for a sample of the imported wiki pages. You will notice that the pages contain several tags that are not yet handled by the pear filter.

Contact me to find out more and see if I can help you get your own wikipedia data imported into drupal.