November 2016
By Fabrice Lacroix

Source: Google.com

Fabrice Lacroix is the CEO of Antidot and Fluid Topics, a software vendor specialized in search technology, content enrichment and dynamic content publishing. With more than 150 clients, Antidot is well known for its advanced technology and breakthroughs in the field of machine learning. Fluid Topics is its dynamic content delivery solution.


lacroix[at]antidot.net
www.fluidtopics.com


 


 

Writing for the 21st century

Industry 4.0 and new technologies such as intelligent agents or Augmented Reality are bringing a new user experience and transforming customer support. Their coming of age forces us to think about how we produce technical documentation to make the most of this revolution.

Technical documentation is part and parcel of the product experience. It is essential to many phases of the customer journey, from pre-sales, installation, use and maintenance through to support. Historically delivered on paper in the form of books or manuals, technical documentation is a narrative provided via words and phrases, written by humans to be read by other humans.

The advent of digital technology has disrupted this state of affairs, raising all sorts of questions concerning information architecture. The former supremacy of large books and manuals has been challenged by the emergence of short content focused on one particular topic (articles, knowledge bases). This has in turn engendered fresh challenges on how to organize this profusion of information in order to make it consistent and meaningful. Embodied by products such as Fluid Topics, Dynamic Content Delivery has managed to provide an effective response to this transformation.

The shift from book to article remains firmly within the scope of content that continues to be written and read by humans in the form of typed text. However, it falls short in accommodating the next step. Indeed, a revolution is underway, one that will introduce a new way of accessing and using information with new interfaces and a groundbreaking user experience, far removed from traditional pages of text. It will combine:

  • Self-documented devices that give predictive and contextual indications
  • Augmented Reality, already present on tablets and mobile devices
  • Wearable tools such as smart glasses
  • Interactive agents, already standard fare with Siri by Apple, or OK Google

These innovations sound appealing, but they require us to rethink our approach to technical communication. Can anyone seriously imagine reading a three-page PDF article using connected eyewear? Or listening to an intelligent agent reading a ten-minute response to a question? Technical documentation as customarily produced, and especially as it continues to be published today in the form of short or long texts, is clearly inappropriate.

Information professionals must take stock of the challenges and prepare for them. They must write for machines, i.e. produce an appropriate form of information whose consumption – or ingestion – and use are in line with these new tools.
How? And in what form? Flat text is useless. And one thing is certain: hyper-structured knowledge bases such as those used in the 1990s do not provide the answer, due to their overly complex modeling, costly maintenance, slow expert-driven input, etc.

What approach should these professionals adopt to satisfy today's publishing methods and anticipate those of tomorrow? The answer is quite straightforward and within reach, since it involves implementing three practices that are already known and approved. We call it the Content Value Path:

  • Structure
  • Metadata
  • Semantics

We will review these three steps below and show how they tie in closely with one another, and how their combined effect opens up new opportunities.

1. Structure

Structure is meant here as part of the structured content authoring concept, a widespread practice in the sphere of technical documentation that consists of breaking down the content into little pieces (from a few lines to a few paragraphs) called components or topics. These are subsequently assembled via maps (similar to tables of contents) to create the final content. The corresponding standard and tools are well known, for example DITA or S1000D. This approach runs contrary to writing long, unstructured documents using word processing tools. It was designed to optimize the production and maintenance of large bodies of documentation by writing in parallel, avoiding duplicate content by recycling topics, facilitating modifications, reducing translation costs, etc.



Note that, in this productivist approach, the granularity of the topics is determined by production issues and is potentially decorrelated from the content itself, i.e. the subjects broached in the text. In our Content Value Path approach, the breakdown of the topics must be aligned with the subject because we need consistent and complete grains of information. Excessively long topics that deal with several subjects must thus be broken down to a more granular level. Conversely, excessively small topics (such as a phrase or fragment) resulting from a given documentation technique must be assembled within intermediate maps that are not necessarily intended for publication, but that serve to define this consistent level of information.

 


Why is the breakdown so important? Because it enables the respective technologies and algorithms to work with complete grains of information, and thus be able to target the elements needed to answer a question more effectively and unambiguously. But to do that, we must still add the metadata and semantics layers.

2. Metadata

Metadata is data that describes other data. For example, the date a topic is created, as well as the author and publishing status, is management metadata. Other types of metadata may play a more editorial role and, as such, are part of the content itself, for example, the software application or version, the type of task described (installation, maintenance, etc.), or the required level of expertise. Metadata can be dates, numbers, free keywords or, more typically in documentation, labels taken from controlled lists: flat or hierarchical lists (here we speak of taxonomy). This metadata can be positioned on the topics or maps (in which case it is conveyed to all the topics included in the map).

The link between structure and metadata clearly emerges: for metadata to be meaningful and usable, it must apply to the entire topic unambiguously. Thus, for example, if a topic contains information on both the installation and maintenance of a product, it will have to be split into two distinct topics, one containing information specific to the installation, the other information specific to maintenance so that each can be labeled with more accuracy.

The underlying issue to address concerns the choice of metadata: which metadata, with which values? While there are clearly a number of obvious options and best practices, there is no universal response as such. It will depend on your products, your content and how you want the content to be used. Here are some typical use cases:

  • In a search engine, create filters – also called facets – so that users can refine their search, for example by only retaining responses specific to their product.
  • In online Help, display contextual information linked to the exact version of the product and its configuration.
  • For a maintenance operation, display a bar code and an error number on the machine. The operator will respectively scan and enter them to access the maintenance procedure. The bar code is translated into a list of the machine's constituent subsystems and applicability conditions, all of which represent the metadata used to filter the content.




If you are starting from scratch with your metadata, or if you have already gathered some metadata and are wondering how to proceed, follow these steps:

  1. First, define some use cases via scenarios that involve typical users based on standard techniques involving characters and storytelling.
  2. Next, identify the metadata needed to support these scenarios: which criteria are necessary to extract the content with filters?
  3. Match the metadata with the content. Here, you may have to adapt your content's granularity, as mentioned in the Structure step.

This last step may appear somewhat daunting and even beyond your reach if you have thousands or even tens of thousands of topics. This is where technology steps in. Today's automatic classification algorithms, which use the latest technological advances in artificial intelligence, are extremely accurate. They are able to learn from a representative set of manually labeled topics (supervised learning phase), and then proceed on their own (automatic classification phase).

Consequently, with just a few hundred pre-labeled topics, you can tag thousands or even millions of topics in a matter of minutes. You can do the same for content from any other sources (wikis, knowledge bases), and thus benefit from a fully aligned body of documentation. Here too, we must stress the need for topics with the right level of granularity. The more focused the topic's content is (in particular concerning topics used for learning), the more precise will be the automatic labeling – easily as good as that produced by humans.




3. Semantics

The last step in the Content Value Path, and no doubt the least familiar in the technical documentation world, is that of semantic enablement. This technique is in fact widely used for Web content in the context of Search Engine Optimization (SEO). It involves labeling the text using tags that allow algorithms to unambiguously identify and extract information such as the name of persons or products, dates, quantities, references, events, etc.

In the example below, we can see that, in addition to text, Google displays structured data: rating and reviewer.




This is possible because the web pages have specifically marked this information via tags (see schema.org, RDFa, JSON-LD and Microdata). For example, behind the sentence


one can see the relevant entities:


And the actual text includes markers that look like this:


This example shows how an algorithm can automatically and unambiguously extract the essential data used to build a knowledge graph and then respond to complex questions such as "What is the list of maintenance procedures involving component X23b", or "Give me the list of tools needed for an intervention on machine b45."

The discerning reader will have noticed that the standards used to write structured documentation already provide semantics. Thus, while the list of steps involved in a maintenance task would be written as an ordered list in an HTML wiki or using a word processing tool:



in DITA, the same content would be written like this:



Even if this level of semantics represents a significant contribution, it is insufficient in our Content Value Path, as it is limited to the structural aspects of documentation. It must therefore be improved by a more granular form of semantic enablement in order to identify the informational elements specific to the business context. How do we accomplish this?

  1. Here again, you must define the use cases to which you wish to respond: which questions might your user persona ask, in which circumstances and with which objectives?
  2. This will generate the key information elements, and you will be able to list the types of entities that must be labeled (products, references, components, technologies, measurements, dates, etc.).
  3. But the most important question is: how do you go about labeling all your content? Will you have to insert all these markers manually? This would appear to be a superhuman and unrealistic task, especially if you have thousands of pages of documentation.

Here again, technology comes to the rescue. Thanks once more to the progress made in machine learning, algorithms can automatically perform this marking with a very high level of precision. They only need a few dozens of manually tagged topics to learn before being able to pursue this on their own. This enhancement task is usually performed outside the content management system (CMS) and is integrated into the publishing system, which performs it on the fly during the analysis of received content.





Conclusion

Structure, metadata and semantic enablement of content – these are the three steps inherent in the Content Value Path.
As we have seen, the way to obtain enhanced content is clearly marked:

  • Define the use cases and standard questions to which you wish to respond.
  • Go on to deduce the metadata and the necessary entities.
  • Switch to structured documentation and adapt the granularity of the breakdown into topics and maps.
  • Place metadata and labeling entities on a representative sample of your content. Then use these elements to "teach" automatic entity classification and extraction tools.

For this last step, you will need to rely on your dynamic publication tool, which must integrate and provide the building blocks needed to enhance content. This is, for example, offered by the Fluid Topics solution.

The enhanced data can then be efficiently "consumed", not only by humans but also by algorithms, thus opening up a world full of opportunities. Remember that smart glasses and Augmented Reality are already here. And intelligent agents will be commonplace in the next three years. So what about you? Will you be ready?