Wednesday, 31 May 2006

A really common problem facing people moving over from SGML to XML (and yes, there are still industries such as aerospace that are still thoroughly SGML!) and from XML DTDs to XML Schemas (including RELAX NG, Schematron, XSD) is the unwillingness to forgo entities references for special characters. ISO defines a whole lot of special characters: © and so on.

In XML, you can define entities for special characters up in the internal subset of the prolog and use them in element values of attribute values. Or you can have entity declarations in external parameter entites that form part of the DTD. So if you get rid of external DTDs, you also get unresolved entity references in your information set.

There is a way out. You can define various rules in XSLT to overcome this. But wait, I hear you cry, I am no XSLT programmer, why isn’t there a standard way of declaring entities for special characters in a simple direct format that can be implemented easily?

Good news. Martin Bryan and the WG1 team at ISO SC34 are defining a standard format for renaming elements, attributes, enumerated values, namespaces, prefxes and, here’s the good news, for mapping undeclared entity references in data content to string values: the (special characcter) entity problem finally solved! I’ve seen working code for this, which should be released soon.

The format is called DSRL (pronounced “disrule”) and will be ISO DSDL Part 7

