May 2019
Text by Jang Graat

Image: © Besjunior/istockphoto.com

Jang F.M. Graat is a philosopher and self-taught programmer with more than three decades of experience in technical communication. He lives in Amsterdam where he founded his company Smart Information Design.


jang[at]smartinfodesign.com
www.smartinfodesign.com


 


 

Customizing DITA – the Relax(i)NG way

DITA was specifically designed to cater to a large range of content domains. But only when customized to your specific needs will it reveal its true power.

When someone asks me how many elements there are in DITA, I answer "too many and too few". This is because DITA, like other XML standards, defines its elements semantically. And how many semantic tags do you need to describe the content in your content domain?

The DITA standard contains the most used elements across most of the current content domains where it is being applied. But your content does not require all of those elements. At the same time, there is always something very special about your domain that others do not need at all. And of course, other domains have a need for semantics you would not even know how to apply.

DITA was specifically designed to cater to such a diverse universe of content domains. And because of its design, it makes little sense to use DITA without customizing it. Using non-customized DITA means not understanding what DITA is all about. This article aims to take away your misconceptions and, most of all, your fears about customizing DITA. A follow-up article will be published in tcworld magazine in July explaining how you can add your own semantic tags to DITA without breaking away from the standard. Together, these two articles will show you all it takes to make DITA fit your content like a glove.

 

Why should everyone use and customize DITA?

When doing a repair job on the roof, you would be crazy to carry all your tools up the ladder. Instead, you would probably use a belt that contains only the tools you are likely to need for this particular job. When you create structured content, the tags are your tools, and with DITA, you have a set of preconfigured toolbelts that you can choose from (by including or excluding domains). Each toolbelt is a document shell and contains all the tags available in that type of document, plus the content models that define which elements are allowed where.

Instead of taking one of the already preconfigured document shells, you can create your very own shell containing the exact set of tools you need for your particular job. It is even possible to add your own very special tags without making your content incompatible. In that sense, DITA is the only true XML (eXtensible) standard.

Unlike any other XML standard for technical content, DITA allows configuring document shells without losing the capability to reuse content across different content domains. After all, configuring a document is merely defining a smaller set of tags to be used in your customized document shell – which means your content will always fit the more elaborate model of the standard. And, even when you add your own specialized tags, they can always be traced back to the element on which they are based – so that any DITA processing software can work with your special tags without choking on them. This is truly unique, and we would be silly not to put it into practice.

 

Relax – customization no longer requires a nerd

Until DITA 1.3, creating toolbelts for specific jobs (i.e. specific business domains) required a fair bit of knowledge of DTDs and the way the modular DTDs of DITA are designed. An exclusion of a domain or a single element would take up to five consistent edits in different files. And if any of those edits contained a typo or were done the wrong way, you would end up with an invalid DTD – or with a valid but non-DITA DTD.

With the 1.3 release of the DITA standard, configuring the document shells has become much easier. This is due to a change in the way the content models are defined. The modular set of DTDs have been replaced with a modular set of Relax NG files. This change makes all the difference, as DTDs were never really good for modularity, whereas Relax NG is. In fact, there are a lot of concepts in Relax NG that are very similar to the reuse we know and love in DITA. Relax NG files – especially the ones created for DITA 1.3 – are a collection of reusable patterns and pattern references.

Instead of having to make multiple changes to DTDs to remove a single element from your toolbelt, you usually only need to edit one single line in Relax NG. Where DTDs are really not defined as XML at all, Relax NG is. This makes configuring a content model in Relax NG a lot easier than trying to make it work in DTDs. And once you are done with your customizations, there are tools available to transform your Relax NG modules into the shell DTD that your authoring system uses for validation.

By the way, for those using DITA 1.2, Relax NG files have attributes to indicate which additions were done for version 1.3. DITA, unlike many other XML standards, is fully forward- AND backward-compatible, so there is no valid reason at all to stick with an older version of the standard. And even if you must, it is really easy to filter the DITA 1.3 additions from the Relax NG files, leaving you with just the DITA 1.2 materials, but in a much better format than the older DTDs.

 

Basic Relax NG model

Relax NG is based on patterns. Each pattern has a name by which it can be referenced from anywhere in the file. This should make DITA users happy, as it is very similar to the conref mechanism. You define your pattern in one spot and reference it wherever you want to reuse the same one. Yes, DTDs also allow this type of reuse, but they rely on entities, which are not nearly as easy to handle (and they are not really XML).

Let’s look at a fictional content model to understand the idea of patterns:

This element has a mandatory "title" pattern, followed by a single optional "metadata" pattern, then some content (at least one piece of plain text or a "ph" or "term" pattern), and finally any number of "note" patterns.

Note: It seems a little tedious to talk about patterns instead of elements all the time, but there is a good reason for this, which will become clearer in the section on specialization. In the remainder of this article, I am using quotation marks to indicate the pattern and angular brackets to indicate an element.

 

Customizing DITA patterns is constraining them

When we want to customize "myelement.content", we have several options. If we want to force at least one "note" at the end, we can change the <zeroOrMore> into <oneOrMore>. Making the "metadata" mandatory is done by unwrapping the <optional>. We can also remove the "metadata" and the "note" altogether (remove the elements around it, as <optional>, <zeroOrMore> and <oneOrMore> cannot be empty).

The rules for customizing DITA are quite simple, and they all come down to the same single strategy that allows DITA to remain forward- and backward-compatible:

You cannot introduce any new content items anywhere.

The reason for this restriction of your freedom is that your content must be a subset of the full DITA model, so that any DITA processing tool can make sense of it. This is even true for specialized elements, as will be explained in the next section of this article.

The above strategy leaves you with the following set of allowed actions:

 

  • unwrap <optional>: makes the referenced pattern mandatory
  • remove <optional>: makes the referenced pattern not allowed here
  • unwrap <choice>: makes all the patterns in it mandatory
  • remove <choice>: makes all the patterns in it not allowed here
  • remove a pattern from a <choice>: limits the choice to the remaining patterns (when there is only one pattern left in the <choice> you should unwrap <choice>)
  • change <zeroOrMore> into <oneOrMore>: makes sure the contained pattern is used at least once
  • change <zeroOrMore> into <optional>: limits the contained pattern to one optional occurrence
  • unwrap <zeroOrMore>: limits the contained pattern to one mandatory occurrence
  • remove <zeroOrMore>: makes the contained pattern not allowed here
  • unwrap <oneOrMore>: limits the contained pattern to one mandatory occurrence

None of the above actions will either introduce a pattern where it was not allowed before or remove a mandatory pattern. This means the changed content model is a subset of the model that you started with, and this ensures you remain within the DITA standard.

There are a couple more complex actions that are allowed when you have a <choice> as the only child of either a <zeroOrMore> or a <oneOrMore> pattern. Here is an example from the DITA <section> element:

In the actual DITA model definition, the "section.content" model contains a <zeroOrMore> with a reference to the "section.cnt" pattern, which then contains the shown choice. I have unwrapped the "section.cnt" pattern to clarify the example.

Note that you can have any number of "title" patterns in a section, and they can occur anywhere. An obvious restriction most DITA users would want to put in practice is having only one mandatory "title" at the start of the section. This is achieved by moving the "title" reference to the start of the content pattern, before the <zeroOrMore>. The new content model for section looks like this:

This is allowed, as it will still be valid against the original definition. In fact, you could also copy the "title" reference there – this would force at least one "title" at the start of a section but allows more "title" patterns anywhere else inside the section. Finally, it is also perfectly legal to push the "title" to the very end of the section, right after the <zeroOrMore>. I cannot think of a practical use case for this, but as the DITA model needs to cater to any possible business domain, its base content models tend to be very loose. Which is why every organization should really apply a set of constraints as the first step in a sound content strategy.

 

Modularity and pattern overrides

Modularity is as natural to Relax NG as it is to DITA. Patterns can be defined in separate files, and these files can be combined via the <include> element, with an @href that points to the file to be imported. All the imported pattern definitions come into scope. To remove a full domain from your DITA shell, you simply remove (or comment out) the line that includes the Relax NG file for the unwanted domain. So, if I work in the machinery business and have no need for any software, programming or user interface elements, I can remove a lot of elements by editing my concept shell like this:


So, what if I want to use some of the patterns defined in an included file, but not all of them? And what if I want to constrain a pattern from an included file in this topic shell, but not in all the others at the same time? This is where the pattern override feature of Relax NG comes in handy.

The override is a pattern definition that is placed inside the <include> for the file in which the original pattern is contained. This allows constraining the content model for an element in one topic type while keeping it unconstrained in all the other topic types. If I want to apply the “title” constraint for a <section> that was shown previously to all my topics, I would edit the file in which the "section.content" pattern is defined: topicMod.rng. But if I only want to make that constraint for my reference topic, I would add an override to the reference.rng file:

Unfortunately, the original DTDs for DITA used nested included files, which was not a very wise design decision. The Relax NG files needed to remain backward-compatible and are using the same unfortunate design flaw. This means that the override pattern is only really useful for the domains and not for modules with common elements. (I have edited the basic DITA 1.3 Relax NG files to bring all nested includes up to the topic shells. This allows me to keep all editing restricted to the single top-level Relax NG file that defines a topic. If you are interested in this set of adapted Relax NG files, drop me an email.)

A very useful Relax NG pattern for overrides is <notAllowed>. This makes it totally easy to remove a particular element from your entire topic. Let’s say you need to allow some but not all elements from the highlight domain. You will then simply create a <notAllowed> pattern for all the elements you want to suppress:

Again, this does not make any changes to the included module – so that other topics may still use all the elements in the highlight domain.

 

Closing remarks

If you are using <oXygen/> for authoring DITA, you can use your edited Relax NG files to do validation. Instead of the usual DOCTYPE declaration that points to a DTD, you will have to link your DITA topic to the top-level Relax NG file. The details to make this work fall outside the scope of this article but are easy enough to find in the support forum.

If your authoring application (or CCMS) requires DTDs to validate your DITA content, you still need to transform the Relax NG files to DTD. This can be done with a free plugin for the DITA Open Toolkit, which was developed by Eliot Kimber. This transform creates a modular set of DTDs in the style that was used before DITA 1.3 – with all the module names, entity declarations and even documentation transferred to the DTDs as if you had manually edited them.

Another option is to use a transform that I have recently created. This transform creates a single (i.e. non-modular) DTD out of your set of Relax NG modules for a particular topic type. The transform is available via www.smartdita.com/dita-tailor. If all else fails, you can drop me an email and I am sure we can figure something out.

In part 2 of this article (which will be published later in 2019) I will cover specialization. Here, you will learn how to add your own tags to DITA without breaking away from the standard – and without causing any hiccups in your DITA authoring or processing applications.