December 2017
By Jang F.M. Graat

Image: © Besjunior/istockphoto.com

Jang F.M. Graat is a philosopher and self-taught programmer, with more than three decades of experience in technical communication. He lives in Amsterdam where he founded his company Smart Information Design.

jang[at]smartinfodesign.com
www.smartinfodesign.com


 


 

Troubleshooting reengineered

Are you still trying to solve defects and malfunctions by delivering large volumes of documentation and providing training to your customer’s service engineer? Perhaps it is time to rethink your approach and start creating embedded troubleshooting flowcharts.

This article outlines a methodology that is undervalued in the world of technical information development: the use of flowcharts. Instead of merely drafting flowcharts in the design phase and pushing them aside when the implementation is done, I am suggesting the use of flowcharts as the core of real-life troubleshooting. This approach can save a lot of time, effort and money.

You cannot be an expert on everything

Let’s begin by analyzing the situation today: Every day, expert service engineers board airplanes to fly to customer sites around the globe. All too often, the problem is solved by some simple actions that could easily have been taken by an employee with basic engineering skills. The problem is not the actions that need to be performed; it is deciding which actions will solve the problem. More often than not, the solution turns out to be a simple replacement of a defective part, or simply changing a setting that has caused an avalanche of problems and symptoms that then covered up the original issue and made it almost impossible for a non-expert service engineer to fix it.

In the software industry, problems tend to be deeply embedded in the code and often require a developer to fix the issue. The advantage is that today’s fast and reliable internet connections allow engineers to log into the customer's system remotely. Using standard remote access software packages, developers can completely take over your computer system and make the required changes without leaving their desk. This saves a lot of time and travel expenses.

Other industries rely on on-site service engineers to fix malfunctions. Machine manufacturers try to solve troubleshooting issues by creating massive amounts of documentation and/or providing extensive training for their customers’ service engineers. But this approach is highly questionable, and here is why: The manufacturer's service engineers can concentrate on the machines that their company produces, which are usually variations of the same basic design. Having expert knowledge about one machine makes them capable of solving issues with all of them. Simply by being exposed to just one type of machinery, they become specialists in their company's business domain. The customer's service engineers, on the other hand, are in a completely different position: They need to maintain and service a dozen or more different machines for which they are responsible.

Another issue is that the machines usually work fine during the first months or even years after installation. When a machine eventually starts to malfunction, the training that was provided by the machine’s manufacturer (often by expert engineers who are not necessarily experts in training) is long past. The customer’s service engineer might have used the knowledge acquired during the training to correctly maintain and service the machine, but when the servicing information is needed the most, little is left of all that precious knowledge and rarely used skills. In addition, service engineers do not always remain with the same customer throughout their career, and the new engineer rarely gets the required training.

How problems are really solved

Machine documentation invariably contains a chapter on troubleshooting. In most cases, this is simply a three-column table linking symptoms to possible causes and remedies. The remedies may just be hints or references to service procedures that are listed in other sections of the manual. These tables are so common that various technical documentation standards include a troubleshooting topic of this type. Sometimes an extra column is added to the table to allow for combinations of symptoms, but it is almost impossible to create all conceivable combinations of symptoms in a table without making it impossible to navigate.

When the customer’s service engineer gives up and calls in the expert, a completely different method of finding and fixing the cause of the problem is set in motion. The expert – perhaps without even realizing it – uses a mental flowchart to find out what the ultimate cause of the symptoms may be.

Whenever the problem is too complex (many symptoms at the same time) or too vague (no clear indication where the real problem is), the expert will start to check on basic conditions, trying to rule out possible causes one by one, zooming in on the probable cause until the real issue is found and can be fixed.

Let me give you a simple example from my own life: A few years ago I was on a business trip when my wife called me in a state of mild panic – she needed to get to work but the car wasn’t starting. This is a vague symptom, so I asked whether she could hear a click when turning the key. She told me that there was no sound at all. This removed a number of possible causes (out of gas, forgot to use the choke, etc.). I asked her to switch on the lights and tell me whether they worked. The lights worked, so I could rule out a dead battery. I suspected that the starter engine was in a position where it would not move no matter how much electric power was applied to it. There is just a fraction of an inch where the magnetic fields cancel each other out. The roadside service would open the hood and hit the starter engine with a hammer in these cases, but I could not begin to explain where the starter engine is, and my wife could have hit something else instead. So I told her to put the car in gear and push it forward or backward a few inches. She followed my instructions and then tried to start again. The engine fired up and she made it to work in time.

I must add that this was a pretty old European car for which I knew this strategy would work. But nevertheless, it illustrates how we use mental flowcharts to solve real-life problems. Our mind does not use a lookup table that matches symptoms to causes and remedies. Instead, it navigates a map on which it can move from one area to another, zooming in and out as required to find the one spot where the solution lies.

Making flowcharts explicit and interactive

It has always puzzled me why technical authors ignore their natural mental flowcharting skills when creating troubleshooting chapters. This may be due to two main reasons: a lack of imagination and unavailable tools. The first one I am trying to eliminate by writing this article and giving presentations on the topic at various conferences. The second reason can be solved by using adequate flowchart modeling software.

Visio used to be the only suitable software package for creating flowcharts. For the past several years, an online product called Lucidchart has been quite popular. But both of these software packages (and lesser-known similar ones) are mainly known and used in the business modeling world, rather than in technical documentation. Which is a shame: Once you start creating flowcharts, you quickly find that this is really the only effective way to design potent troubleshooting information.

So how are you going to present the flowchart to the customer? The few troubleshooting flowcharts that I have found were just images embedded in the technical documentation. These flowcharts are necessarily unspecific, as there is only that much space on a page. The true power of troubleshooting flowcharts lies in their complexity, their level of detail, the coverage of every possible fault-finding scenario. These flowcharts quickly become too complex and too large to be presented in their entirety at one glance.

This leads to my conclusion that flowcharts need to be processed into interactive media to become effective for troubleshooting. As the majority of technical documentation has already moved to the Web, this is not a huge step to take. It just requires the right tools to create and maintain the flowcharts efficiently, and to convert them into the required interactive browser pages.

Self-diagnosing machines

But there is more to gain, especially in the domain of large, specialized machines. These machines are equipped with lots of sensors that keep track of production conditions. The signals are used to adapt to all possible circumstances. Service engineers push virtual buttons on a touchscreen instead of making manual changes to physical settings.

Under normal operating conditions, machines are pretty much self-regulating. They are equipped with sensors that feed signals into the central processing unit, which in turn activates servos and valves to influence the production process. Of course, there are limits to what the machine's actuators can do, and there will also be limits to the conditions the machine is keeping track of. Still, most machinery does a pretty good job at continuously diagnosing itself.

All the available signals are of importance to the service engineer when trouble arises. This is why logs were invented: They show the expert engineer what happened in the seconds and minutes leading up to the machine failure. But the core intelligence is still in the mind of the service engineer, who scans through the logs to detect patterns or specific messages that feed into the mental (or explicit) troubleshooting flowchart.

This is where I propose a paradigm shift, by pushing troubleshooting intelligence into the machine itself. Instead of showing an error code and letting the service engineer solve the issue at hand, the machine uses an embedded troubleshooting flowchart to check on possible causes and zoom in on the true cause of the failure. When the limitations of the machine's set of signals and actuators are met, the engineer is called upon via the touchscreen (much like telling my wife to switch on the headlights in the real-life example given above). The input given by the engineer allows the machine to proceed in the flowchart.

When the machine's embedded troubleshooting procedure reaches a conclusion about a defective part, it will show the replacement procedure on the screen and wait for the engineer to perform the procedure. When the engineer is done, the machine proceeds to the next step in the flowchart to check the result of the intervention. Depending on those checks, more procedures may be required until the true cause of the problem is solved.

Moving intelligence to where it belongs

The key to this approach lies in putting the intelligence where it has the highest chance of being effective. First of all, the machine gets direct access to lots of signals without human intervention. More importantly, each machine only needs to know about itself, whereas in the traditional paradigm, depicted at the start of this article, the service engineer needs to be an expert on all the machines his company operates. When the intelligence about a particular machine is placed inside the machine itself, the engineer only requires basic engineering and servicing skills to fix problems that may arise. Less effort creating huge sets of manuals, less hours spent on training for cases that may never happen in the course of the machine's life span (or in the remaining time of the engineer's employment at this company). No more effort and money wasted on trying to put intelligence in places where it is least effective (the documentation and/or the head of the customer's service engineer).

When trouble arises, the machine will use its built-in intelligence to determine what the causes may be. Where required, it will tell the engineer which actions to take, and which parts to replace, until the problem is solved. Only in rare cases that are not covered by the embedded troubleshooting flowchart will the customer have to call in help from the manufacturer and an expert board an airplane.

But wait, there is more

With troubleshooting procedures embedded in the machine comes the virtually free option of capturing every step in the troubleshooting procedure in a detailed report. Creating this report does not require any of the hopelessly flawed and outdated methods to do debriefing with an exhausted service engineer (who has just spent a couple of frantic days on the other side of the planet to solve a complex problem and is not looking forward to the backlog of regular work hat is waiting for him back home).

And there is potential for even more. As logging all kinds of operational data is an integral part of today’s complex machinery, predictive maintenance data can be accessed from the flowcharted troubleshooting procedures. This will hugely increase the effectiveness of such procedures, as it enriches the generic troubleshooting strategies with the unique operational history of each individual machine.