03 October 2012

After spending a few hours setting up a basic UI to manage an XML-based application, I ran into a corruption problem. Configurations were written to an XML file, and some configurations could contain an HTML string. This did not cause any problems for the XML parser itself, but the XML was sent through Pear's XML_Beautifier before being written to the static file. That is an important step when humans (mainly me) need to be able to read, update and `svn diff` the XML files long-term.

A simplified example XML:

<tag>&lt;p&gt;Hello World!&lt;/p&gt;</tag>

Using Pear's XML_Beautifier, ($b->formatString($str)), you get:

<tag>&lt; p &gt; Hello World! &lt; /p &gt;</tag>

Obviously, this has broken the HTML markup contained in the XML. I traced it pretty far through the class, but the bug seems to be at a low low level such that I did not want to risk affecting other systems. Instead, I simply switched over to DomDocument:

$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xmlStr);
$str = $dom->saveXML();

The result encoded \r characters (instead of eliminating or ignoring them) and indented 2 spaces (instead of tabs). But it worked reliably. I added a couple lines of PHP to clean up those downsides, and everything is humming along safely and reliably.



blog comments powered by Disqus