November 2015
By Jang Graat

Image: © PeskyMonkey/

Jang F.M. Graat is a philosopher and self-taught programmer, with almost three decades of experience in technical communication. He lives in Amsterdam and recently joined the US-based company The Content Era.




Faster than agile – live XML documents

Live XML content offers a vast number of opportunities for technical writers. However, its proper adoption requires a paradigm shift.

The Content Era is here

Who needs a table of contents or an index if we have Google? And, who needs Google if the info can be requested by an app and processed into shapes, graphs and colors that users learn to recognize much faster than their reading skills would ever allow?

We live in an era where content may be requested anywhere, anytime. Consumers of our content keep raising their demands – not for more content, but for more localized, more personalized and more technologically advanced content. We went from printed manuals to PDFs and from PDFs to web pages, and then to responsive HTML5 and embedded Help functionality. The consumers have changed, too. More target groups than ever may be accessing your content for a wider variety of reasons. Some of these groups are not even human. Recently, non-human users of the Internet have outnumbered human users. And all of these consumers request small snippets of information, but faster, much, much faster.

What is the problem?

We used to have one manual that explained everything about the product: it came in the box and started with congratulations on buying such a wonderful product, followed by installation instructions, etc. Nobody ever read such manuals cover to cover (except the occasional reviewer). Since those long-gone days, manuals have become shorter, as they were partly replaced by Help files, web-based Help systems and instruction videos.

Once the path to the Web was found, users started demanding more personalized info, and the documentation department started using personas to develop multiple sets of documentation for the same product, each catering to a specific target user group. With every new persona, the number of publications for the same product increased. Still, this was not a big problem, as publication was only done once every year and everything was placed on a website rather than being printed, bound and put into a box.

All seemed well in the world of web-based technical documentation. But in the past decade, most product development, especially in software, has become a non-stop evolutionary process. Small changes are being implemented on a weekly or even daily basis, and parts of the documentation are quickly outdated. All of a sudden, the tech docs department has to produce a growing number of publications on much shorter notice than ever before.

Copycat behavior

Many technical documentation teams are simply following their product development counterparts in adopting agile methodologies. Where they used to have months or even years to do a thorough rewrite of the entire product manual, they are now issuing small updates every couple of weeks. Instead of working towards a complete overhaul of the previous version of the manual, the authors are tinkering here and there, only changing what is required due to changes in the product or customer complaints.

There is nothing wrong with this approach to documentation: Changing only what needs to change is a sound strategy to bring down the pressure and gradually improve the product. But what happens next is complete republication of the materials, in the same way that a software company has to rebuild their entire product even if only one line of code was debugged. Many companies have installed an automated (re)publication process that runs overnight. This is where the documentation team forgets to think outside the book, and thereby loses out on opportunities that would indeed make it faster than their agile counterparts.

What is not a problem?

For software development, it makes perfect sense to rebuild the entire product after even the smallest change. Software might crash when there is even the slightest inconsistency in the code (that is why these inconsistencies are usually referred to as bugs, not elephants). But information does not crash, at least not in the way a hardware or software product does. Even if one page contains a typo, an error or a broken link, all the other pages still “work” the way they are supposed to. And since the Web does not have page numbering, inserting pages does not require a rebuild of all subsequent pages and cross-references.

The real reason for companies to rebuild their entire documentation with every change is that these companies still live in the book paradigm of the past. When the manual was printed and packed into the box with the product, there was no other choice: even with one small change you had to reprint the entire manual. But with the web-based content of today, single pages can be replaced without ever touching any of the others.

The problem is not the outdated materials. It is the outdated book paradigm, in which the unit of publication is the entire documentation set. Changing that process offers a wealth of opportunities that allows the technical documentation department to overtake the agile teams in product development and become the fastest team in the house.

Thinking outside the book

The only solution that really makes a difference is letting go of the oh-so-familiar book paradigm and going live. Not by making the automated publication cycle go even faster, with hourly builds instead of weekly or daily ones. Neither am I proposing to publish each individual page whenever it gets changed. Instead, I store each piece of content directly on the server, in the XML format that your editing software can handle. Publishing (in the sense that it has nowadays, meaning transforming the content into a format that a web browser can handle) is deferred to the moment when the content is needed.

There is no technological reason to change your XML content into HTML5 or other formats before passing it to a browser on the user’s device. Today’s web servers have sufficient technology and power to carry out the transformation to the required formats on the fly at the moment a piece of information is requested. Publish on demand. One page at a time, or rather – one piece of content at a time, as the “user” might be an app that only requires one small part from your documentation set.

Once this seemingly small mental step is taken, the opportunities become overwhelming. In this article, I am only scratching the surface, as there is so much more that can be done with live XML content.

Adaptive vs. responsive design

Since the inclusion of media queries in CSS and the subsequent development of libraries like Bootstrap, the world of the Web has gone responsive. The motto of responsive web design is “mobile first”. But do media queries really achieve this goal? What they can do is make certain content invisible, or replace a high-resolution image with a low-resolution one. But all content still needs to be transferred to your smartphone, just to be hidden from view. Counter to common belief, I have plenty of real-life proof that the Internet is not simply available and fast wherever I go. Unlimited bandwidth is a fairy tale.

With the “live XML documents” approach, publication of the content is carried out when a lot of knowledge is available about the user who is requesting the information. Device type, screen size, browser type etc. – all of this information is part of the request that any browser sends to a web server when loading a page. Even more information can be added via forms, login procedures and/or cookies.

With all that information available, transformation to personalized content becomes an easy task. Using the same XML content, the process may create a fully responsive HTML5 page for a desktop computer or a trimmed-down HTML page with low-resolution images for a smartphone. Depending on screen size, geographical location, authorization, etc., the process can transform the requested content to a certain type of media. If the requesting device has a screen reader, indicating a blind user, all images may be skipped and replaced with speakable text. Everything becomes more efficient and more personalized at the same time.

Live editing and reviewing

Having the XML content available on the server opens up more opportunities than just creating adaptive design, especially in a world where all browsers support HTML5, and HTML5 allows in-browser content editing. Using another standard technology called AJAX, the browser can send the changed data back to the server, where it is integrated with the original XML to create a changed version. With this technology, correcting a typo in content that is published on a live XML document website can even be corrected on a smartphone.

As the content is and remains XML, live editing can be controlled much better than would be possible in a Wiki. Attribute values can be used to define who is authorized to make changes to each particular part of the content. Using the rich semantics of XML, it is easy to give an editor access to the running text, while allowing an engineer to make changes to a table of data.

Of course the integration process can be adapted to different needs as well. Instead of changing the XML source directly, the changes can be passed into a database and offered to the author as tracked changes. All that is required on the reviewer’s device is a modern browser and the URL of the page to be reviewed.

One source fits all

As indicated above, we are seeing more types of users accessing the same website for information. Many companies are still maintaining two almost completely separate sites: one for the end users and one for their own service staff. This separation involves lots of logistical problems, as content needs to be prepared and maintained in two different sets. Very often, the internal documentation is a chaotic collection of Word documents, drawings, emailed notes etc. -- not a well-structured website that is easy to navigate and pleasant to look at on all kinds of devices.

With the live XML document approach, there is no longer a need for full separation of the information that is handed out to the two main groups of users (end users and internal staff). Instead, a small piece of authorization code on the external website suffices to make the on-demand publication process yield different sets of information from the same XML source. Content that is marked as “internal only” will only be pushed out when the person requesting the page has been authorized as internal staff.

A relatively new development in this content era is the request for information by non-human consumers. The Internet of Things is often explained by a fridge that warns you that your milk is too old or automatically orders new milk from the shop. In reality, the Internet of Things is much more subtle and ubiquitous than this. Most of the time, you will not even notice all the non-human users of the Internet. You merely notice that some things in modern life have become a lot easier than they used to be.

Many of these non-human users of the Internet search for information, too. And with the ability to browse comes the reversed ability of a live XML document site to serve these non-human agents in a way that is optimized for them. It does not require rocket science to have a device send an XQuery string to retrieve a piece of data from the same XML content that describes a product to a human user.

Automatic localization

As a final example of options that become available in a live XML document approach, it is quite easy to have a transformation automatically localize certain pieces of content, such as temperatures, dimensions and even spelling. If the author has marked up the content with semantics that define the measurement units, the on-demand publication of the content can convert any measurement to another unit, defined by the locale of the user who requests the page.

If this type of automatic localization had been in place six years ago, the Mars Climate Orbiter might not have burnt up in the Martian atmosphere. That 125 million dollar disaster was due to a forgotten measurement conversion.