Adding a Doctype Declaration to a DOMDocument in PHP

Posted in PHP and Programming on Monday, the 9th of March, 2009.

Tagged: , , , and

I've recently been spending quite a lot of time with PHP's DOM extension, which is extremely useful for both generating and parsing XML.

In this particular case, I'm generating XML, and it's imperative that the XML markup which the code is generating should contain a Doctype declaration (DTD). It isn't hard to do that using DOM, but it did take a little bit of hunting around in the manual and online, so here's a quick overview of how to add a Doctype declaration to a DOMDocument.

Of course, there isn't anything as handy as, say, $document->addDoctype('xhtml'), that would be too easy. Instead, the first step is to instance DOMImplementation, and have that create an instance of DOMDocumentType.

Once the DOMDocumentType object has been created, it can be passed as a parameter to DOMImplementation::createDocument(), which returns a DOMDocument that you can start working with.

That isn't terribly clear, so let's take a look at an example in action. In this case let's imagine that we're generating a WML page, and so our DTD will look like:

<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.3//EN"
        "http://www.wapforum.org/DTD/wml13.dtd">

DOMImplementation::createDocumentType() takes three parameters: "the qualified name of the document type to create", "the external subset public identifier" and finally "the external subset system identifier". If that means as little to you as it did to me, don't worry: they can all be copied and pasted directly from an existing Doctype declaration. The PHP code is as follows:

<?php
 
$implementation = new DOMImplementation();
 
$dtd = $implementation->createDocumentType('wml',
        '-//WAPFORUM//DTD WML 1.1//EN',
        'http://www.wapforum.org/DTD/wml_1.1.xml');
 
$document = $implementation->createDocument('', '', $dtd);

From then on, you can work with $document exactly as with any other DOMDocument, for that is what it is.

For good measure, here's an example of creating an XHTML Mobile Profile (XHTML-MP) document:

<?php
 
$implementation = new DOMImplementation();
 
$dtd = $implementation->createDocumentType('html',
        '-//WAPFORUM//DTD XHTML Mobile 1.1//EN',
        'http://www.openmobilealliance.org/tech/DTD/xhtml-mobile11.dtd');
 
$document = $implementation->createDocument('', '', $dtd);

Once this is all in place, the Doctype declaration will automatically be added to the WML or XHTML generated by, for example, DOMDocument::saveXml().

Validation

One really nice side effect of specifying the DTD in a DOMDocument is that the document itself can subsequently be validated against the DTD very easily. It's as simple as:

<?php
 
$document->validate();

That's a nice win, and a helpful addition to your test suite if you happen to be unit testing code which generates markup - which can otherwise be very fiddly indeed. DOMDocument::validate() returns a boolean true/false, so it should be trivial to integrate this into tests written against any testing framework.

Comments

Posted by José on Friday, the 6th of November, 2009.

Hi, and thanks for its post, really help me so much to made something like that, but i have a doubt. How can insert a DocType to a DOMDocument that i already load from a XML file whit $domdocument->load('my.xml')?
Its file have not a doctype declared, but for make a validation i need insert it.
Can you give me some idea?

Posted by Ciaran McNulty on Friday, the 6th of November, 2009.

@José - You could create a new document, then import the nodes from the existing document into it:

$child = $yourDom->importNode($simonsDom->documentElement);
$yourDom->documentElement->appendChild($child, true);

Posted by José on Friday, the 6th of November, 2009.

Thanks again for the answer & time, i do this:
$node = $documentXML->importNode($loadedXML->documentElement, true);
//$documentXML is an empty XML Document $documentXML->appendChild($node);

but the result isn't the expected:
when echo $documentXML->saveXML() i see some changes in the structure of the XML's tree and its element's attributes and values; sorry for the necesary question. But, are you some idea why is it?

[comment snipped]

Posted by Simon Harris on Tuesday, the 10th of November, 2009.

José -

I've snipped your later comments as they didn't display as I assume you hoped they might. They also contained your email address in plain text.

You'll probably have more luck asking questions like yours over at Stack Overflow - you'll find a much larger audience there, and many of the people there really have a lot of time on their hands, so you're sure to find help.

Posted by Paulo Fonseca on Tuesday, the 13th of April, 2010.

Hi, i used your code in the following way:

<?php

$implementation = new DOMImplementation();

$dtd = $implementation->createDocumentType('ementa', '', 'ementa.dtd');

$document = $implementation->createDocument('', '', $dtd);

//$document = new DOMDocument('1.0', 'UTF-8');

$document->formatOutput = true;

//Cria a Root
$root = $document->createElement( "ementa" );
$document->appendChild( $root );

This will create the xml file with the correct DOCTYPE.

My question:

The 1st header generated is:

<?xml version="1.0"?>

How can i add encoding info to this header to make it like this:

<?xml version="1.0" encoding="UTF-8"?>

Many Thanks for your help!

Posted by Windigo on Sunday, the 1st of August, 2010.

Thanks so much; this was "less than obvious" in the PHP documentation at the best, and exactly what I was looking for!

Posted by Terence Simpson on Saturday, the 20th of November, 2010.

@Paulo
You can set the encoding after creating the document node with:
$document->encoding = 'utf-8';

Posted by nickl- on Sunday, the 3rd of June, 2012.

After some deliberation and lots of persistence I managed to change the doctype on render this is what the render function looks like:


public function render($doctype='')
{
if ($doctype) {
$doc = new DOMDocument();
$doc->loadHTML($doctype);
$dt = $doc->doctype;
$di = new DOMImplementation();
$dt = $di->createDocumentType($dt->name, $dt->publicId, $dt->systemId);
$this->dom->replaceChild($dt, $this->dom->doctype);
}
return preg_replace('/\n/', '', $this->dom->saveXML());
}
Hope that helps =)

Posted by mavigozler on Wednesday, the 24th of October, 2012.

The saveXML() method does not output the XML declaration attributes "encoding" or "standalone" no matter if they are set by the class user, nor does it output the DTD. Apparently the PHP library developers thought it was ONLY important to add the DTD and 'encoding' and 'standalone' attributes for purposes of validating the DOMDocument XML markup and not for generating the serialized string using the saveXML() method.

Here is a problem with that though: if you set the 'standalone' attribute to be 'true', this means that you MUST embed the DTD or an XML schema in the markup for the document to absolutely be standalone.

So for the DOMDocument to work and the saveXML() method to work properly with standalone=true, the method MUST output a string that contains the validated DTD just after the XML declaration OR a XML schema definition.

The DOMDocument class maintainers should go back to the drawing board.

Enter your comment: