How to Extract and Parse WordPress XML Data Export Backup

Recently I had to extract and parse WordPress XML Data that came from WordPress’s Export Tool, and migrate the data back into another WordPress installation. For a multitude of reasons, the WordPress Backup (WXR) file could not simply be imported into the new WordPress installation, so I wrote a WordPress XML Backup XML Parser to migrate all the data for me.

Using PHP’s SimpleXML Library, here is a basic tutorial on how to parse out the XML data from WordPress’s WXR Backup.

Before you begin, open your WXR file in a text editor and save it with a .xml extension.

First off, parsing out the post title and post link is easy, as shown below.

// Open your WXR file in a Text Editor and save it with a .xml extension
// Load the XML file
$xml = simplexml_load_file( 'your-wordpress-backup.xml' );
	if( $xml ) {
		// each post is represented by an item/node, so loop through each one
		foreach( $xml->channel->item as $content ) {
			$title = (string) $content->title;
                        $link  = (string) $content->pubDate;
                }
        }

In your XML file, you will notice that some items include a colon in their item/node name. These items have to be accessed and extracted differently. Take, for example, the content:encoded node, which holds the body of your post item. At the very top of your XML file, you will see a list of URLs inside the rss node.

wordpress-xml-node-encoded

Those URLs are used to parse specific namespaces from the node. The text before the colon represents a namespace, and what URL needs to be used to properly parse the data. Besides content:encoded, you will see another one for nodes starting with wp:.

$xml = simplexml_load_file( 'your-wordpress-backup.xml' );
	if( $xml ) {
		foreach ( $wp_children->postmeta as $wp_meta ) {
			$meta_key = $wp_meta->meta_key;
				if( $meta_key == '_thumbnail_id' ) {
					$post_thumbnail_id = (int) $wp_meta->meta_value;
				}
		}
	}
// Open your WXR file in a Text Editor and save it with a .xml extension
// Load the XML file
$xml = simplexml_load_file( 'your-wordpress-backup.xml' );
	if( $xml ) {
		// each post is represented by an item/node, so loop through each one
		foreach( $xml->channel->item as $content ) {
			$title = (string) $content->title;
                        $link  = (string) $content->pubDate;
                        $namespaces 	   = $content->getNameSpaces(true);
                        $wp_children 	   = $content->children($namespaces['wp']);
			$dc_children 	   = $content->children($namespaces['dc']);
			$content_children  = $content->children($namespaces['content']);
			$category_children = $content->children($namespaces['category']);
                }
        }

The above code will pull out the children from the namespace, you will see that various items/nodes have their content inside a CDATA wrapper. To extract that value and make it usable (ie: echo it, or set a variable to its value), you have to explicitly set what the data type is:

<?php
$post_body				= (string) $content_children->encoded;

wordpress-xml-backup-custom-metaWhen it comes to handling posts that have custom meta fields attached to it, the key and value can be parsed and extracted like so:

<?php 
$xml = simplexml_load_file( 'your-wordpress-backup.xml' );
	if( $xml ) {
		$category_children = $content->children($namespaces['category']);
			foreach( $category_children->category as $type ) {
				$attributes = $type->attributes();
				$domain = $attributes->domain;
				$value  = $attributes->nicename;
		                $cdata  = $type;
		    }
	}

Those are the 4 different ways to extract and parse WordPress XML data from a WordPress WXR Backup File. I had originally developed this to handle a Custom Post Type and Custom Taxonomy WordPress infrastructure, so the above methods are completely valid for that situation as well. Feel free to post any questions, suggestions or other comments.

This article has 3 comments

    • Michael Reply

      let me know if you have any questions about this. i ended up having to write a script that would re-import a WordPress backup into a new WordPress install, but i wasn’t able to use the WordPress Import Tool. that’s how i ended up writing a XML parser based on WordPress’s backup.

Leave a Comment

Your email address will not be published. Required fields are marked *