May 2019
Text by Michael Schaffner

Image: © amtitus/istockphoto.com

Prof. Dr. Michael Schaffner is a lecturer at the FOM College for Economics & Management, Berlin. He teaches business administration with a focus on organization- and innovation management. He is doing research at the KompetenzCentrum for technology- and innovation management in the field of knowledge management in the context of Industry 4.0 and technical communication. As owner of BIOS Dr.-Ing. Schaffner Consulting Ltd he supports change-processes as coach and consultant, i.e. in technical communication.


michael[at]schaffner.de  


www.schaffner.de

The paradox world of numbers

"Weighing the pig does not make it any fatter" – a proverb that proofs to be especially true when dealing with indicator systems. But what chances do key performance indicators (KPI) provide and how can they be properly calculated and applied?

Key performance indicators are the "language of managers". In order to defend strategies or results, senior management often likes to push the use of KPIs to assess performance. The risks that might arise from not handling data properly are often ignored. However, even some basic knowledge of statistics can help optimize the use of key performance indicators.

KPIs can be tempting. After all, they provide the hope of having everything under control. And preventing a loss of control usually equals less stress. The paradox: just like weighing the pig does not make it any fatter or constantly testing students does not make them any wiser, a survey to determine KPIs does not optimize any process. Should KPIs fail and not reflect the desired result, they can even increase the stress potential. A basic understanding of statistics and a systematic approach to dealing with KPIs can help.

Key Performance Figures are "en vogue" and many people hope that by using them, they will create more appreciation for their performance. However, the reluctance of too much effort or being unaware of the correct procedures, often creates a negligent use of KPIs. The survey, evaluation and interpretation of KPIs often lacks stringency, thus causing the KPIs as well as their bearers to lose credibility.

Not like this

During the colonial rule in Vietnam, a bounty for rats was announced. It was calculated by the amount of rat tails people would bring in. The clever Vietnamese people promptly created a flourishing breeding of rats. Conclusion: people do what they are being rewarded for. The possible consequences of using economically relevant KPIs in incentive systems must be kept in mind.

The salary structure of employees is examined for a sector study. In a company with seven employees, the study shows a medium income of Euro 2,393.-. The data series is: [1500, 1500, 1600, 1650, 1700, 1800, 7000]. The fact that the statistical outlier refers to the employed managing director is not being disclosed. This will distort the result. For data series with outliers, using the median (Euro 1,650) is much more suitable. Conclusion: The evaluation of data requires a suitable calculation basis. There is an actual correlation between the birth rate and the number of stork couples in European countries, however there is no causality between the two. It can simply be explained by the size of the countries (further variable) [1]. Conclusion: When interpreting data, the cause-and-effect relationship must be closely analyzed.

The threat of nitrogen dioxide caused by Diesel engines, in comparison to petrol engines, is deducted from the fact that limits are exceeded much more frequently in Germany. However, this is misleading. While the EU orients itself on WHO regulations regarding the nitrogen dioxide limit (40 micrograms), it allows twice the limit of the WHO when it comes to fine dust (40 instead of 20 micrograms), due to economic policy. [2]. Conclusion: When interpreting data, tactical influences on the development, surveying and evaluation of KPIs need to be considered.

 

KPIs go a long way

Still, KPIs are an important control parameter, providing a compact, quantifiable statement regarding a condition or change. They are a bit like GPS-coordinates – without a GPS device, a route planner or a map with latitude and longitude they are not really useful. The navigational concept of KPIs is called “controlling”. The aim and purpose of controlling is securing and enhancing the rationality of the management as well as supporting the decision-making – to optimize meeting performance targets. [4].

According to the cybernetic view, controlling serves the purpose of regulating and managing dynamic systems, in order to maintain certain functions, i.e. maintaining course or adapting the system to new conditions (for example taking countermeasures to extreme crosswinds) [5]. If the GPS coordinates are not precise, this will lead to wrong destinations (results), just like a faulty route algorithm would.

Controlling needs to first establish which KPIs should ideally be used and how these can be interpreted. There is nothing more grotesque than collecting the wrong KPIs and then speculate about them.

Keep this in mind

Performance indicators and measured values must be clear, credible and consistent. But simply working with KPIs does not optimize any processes. We learned this already from the proverb above. And there is not a single, most important success indicator you need to pay attention to. In many occasions, several KPIs need to be combined, in order to reach a conclusive interpretation. For example: many people think the rate of reused text modules in an editorial system is a KPI for economic efficiency. But is that really the case? Should this not be evaluated by also looking at the search effort? And what about quality? A frequent and possibly not reflected use of text modules might lead to context errors and the need for corrections which may also influence efficiency and the quality of documents.

There are also some phenomena that cannot be measured. Statistics differentiate between manifest variables that can be "observed" (measured, counted) directly and latent variables. The latter cannot be "observed" directly. Rules of correspondence use manifest variables, also called indicators, to make latent variables measurable. If these interpretations are to be credible, operationalizations of non-measurable phenomena must be conclusive. For example: The quality of an editorial performance can only be measured indirectly through indicators. But what are suitable indicators – spelling mistakes, reclamations, revision loops or text comprehension of the customer? How can this be measured?

If the phenomena "economic efficiency" is only measured by using the KPI "re-use quota" or wrong indicators are being used for the phenomena "quality", results are not reliable and appear implausible and inconsistent. All of this leads to the conclusion that a logical structure of indicator systems (sorted entirety of KPIs) is needed, following these key questions:

 

  • Which goals shall be reached, which process shall be controlled
  • How can the target achievements be monitored?
  • Which KPIs enable a clear monitoring?
  • How can conclusive interpretations lead to sensible corrections?

 

 

Avoiding the absurd

To avoid discrepancies, a systematic approach must be taken (chart 01, left). Useful tools to identify facts about relevant stakeholders might be the stakeholder-analysis and the balanced scorecard (BSC). The strategy of the department or company will provide the necessary guideline [6]. A holistic workshop might be a good place to do this as it integrates many different perspectives. Afterwards, the established goals and controlling objectives are turned into suitable KPIs. Their calculation basis and evaluation options are noted down in a key data sheet. This might be, for example, an exact description of the formula and basic data, survey methods and periods, statistical comparator and interpretation tools.

During the collection of data, the special importance of meta data in CCMS- and CAT-systems becomes apparent. Anything that cannot be calculated or filtered is not usable for controlling. Phases one through three of the beginning stages are therefore mostly repetitive, in order to achieve congruence between interests, KPIs and the possibility of data collection. It is also important to keep in mind the economic aspect - costs of the survey should not thwart its usefulness. The objective interpretation of KPIs determine subsequent controlling measures. There are three basic mistakes that are frequently made when designing, surveying and interpreting KPIs:

 

  • Unsuitable KPIs are designed due to false target settings
  • Quality criteria and regulations are being ignored during the survey
  • Statistical variation is being ignored when interpreting data, this creating false causality

 

 

Adjusting the direction

Controlling can be looked at from different angles, for example regarding the target level (strategic or operational), the nature of data (qualitative or quantitative) or the orientation (cost-, efficiency- or effectiveness-controlling). These angles can be combined. A strategic-quantitative KPI in effectiveness-controlling might be the number of annual patents to measure innovation management. Customer satisfaction with technical documentation would be an operative-qualitative KPI in efficiency-controlling (measured, for example, through a scale-based customer survey) – see the Table below.

 

When looking at the dimension "orientation of controlling", it can often be observed that companies like to establish a budget control and expect this to also control effectiveness (target: "do the right thing!") or efficiency (conserving resources: "Do the right thing, in the right way!"). One example of this is when observing the investment budget is supposed to show the high efficiency of the documentation department, or when observing the set purchase budget for free-lance services is supposed to show a high efficiency of the purchase process. Nonsense, of course, but it occurs just the same. To avoid contradictions, a BSC-workshop should therefore differentiate between the three distinct orientations of controlling (table 1) and link them with controlling targets – "Should budgets, economic efficiency or performance targets be controlled?" [7].

Afterwards KPIs suitable for BSC-logic – similar to survey options – are being identified, target specifications are being set and control measures to ensure observance of the KPI are being defined. The goal should not be, however, to juggle as many KPIs as possible. This might unbalance survey effort and controlling profitability, making the work of controlling sloppier and cause it to lose credibility and controlling effect.

Like soccer without rules

Empiric work refers to the methodical-systematic collection of data and the rule-based production of knowledge. Data can be qualitative (non-numerical) or quantitative (measurable or countable). KPI work is initially quantitative empiricism, for example, turnover, error rate, reutilization rate. But controlling can also monitor qualitative phenomena like quality, adherence to delivery dates, or customer satisfaction. These must be made measurable for this purpose (operationalization). The correspondence between non-measurable phenomena and measurable indicators is delicate. Determining the quality of an editorial work by simply counting orthographic mistakes (because this might be easy to measure) is an example of an unclever approach. Making sure the necessary data can be accessed is the prime target.

A central question could be: What information about the meta data is available? This can and should also be criteria for system decisions. Some CCMS-producers claim that customers do not yet request controlling-relevant meta data. Controlling is usually based on quantitative and qualitative empirical measures (Mixed Methods). In order to protect credibility and transparency, quality criteria of quantitative and qualitative empirical research should also be considered. But first, let’s have a look at data collection.

Samples must be representative

The evaluation of empirical data is done statistically. Descriptive statistics analyze one-dimensional series of data (X1, x2, x3,…xn) by looking at measured value (average, variation) and determine connections between multi-dimensional series of data (for example 2-dim:xi und yi) regarding correlation and causality [8].

In cases where cost or time might dictate an only partial collection of data (samples) the inductive statistic helps transfer results from the (random) sample to the respective statistical population [9]. This is done through mathematical methods of probability calculus.

If a random sample (for example number of terminology errors in a 1-per mill-sample) can provide data for the population, this is called representativeness. There are two distinct methods of sample surveys:

 

  • Random sampling: every element of the population can become part of the survey by chance.
  • Systematic sampling: elements are chosen in a way in which the frequency distribution of relevant characteristics correlates with the population. However, this requires knowledge about the distribution of characteristics in the population.

In random sampling, a random generator can pick certain elements from the statistical population. For 100.000 text modules and a 1-per-mille random sample to check quality, 100IDs have to be randomly generated and drawn.  This is a relatively simple survey. However, stochastic knowledge is needed in order to draw conclusions for the statistical population.

To draw every one-thousandth text module would be the more sensible choice of elements. However, it would need to be made sure that the frequency distribution of relevant characteristics (for example respective editor, client, type of text) in the random sample are identical to the statistical population. Project-induced densification effects might prevent this.

Ideally, quota sampling can be used. In this method, characteristics are defined and their rate (frequency) determined. The chosen characteristics need to correlate with the surveilled object (module size, editor, language). It can be very difficult to then chose a sample that represents the same characteristics as the population that is supposed to be surveyed.

Metadata might be able to provide a respective selection; however, such metadata is often not available or the technical possibilities of the editorial system are not sufficient. Should the selection be made by a person, subjective selection bias might distort the result.

Survey quality             

Let’s get back to the quality criteria of quantitative and qualitative empiricism. During the survey of quantitative-empirical data, generally two main criteria can be distinguished [11]: Validity, reliability and objectivity. Economic efficiency is often used as a secondary criterion. This is seen as the relation between survey effort and controlling value.

The validity indicates the degree of accuracy in which a survey measures (or counts) what it was constructed to measure. (meaning: no systematic errors). This is usually greatly influenced by the available survey options (for example by the quality of metadata). Especially for indicators representing phenomena that cannot be measured directly, results may be blurred. For example: Which KPIs can measure the customers’ text comprehension (like scale-based individual questions)?

Reliability refers to credibility of a survey, to the extend to which a repeated survey under the same conditions would deliver the same result (meaning: no random mistakes). The reproducibility requires external factors to not influence the survey: for example, the original data might be accidentally affected by a specific person (i.e. an intern). Especially when surveys are conducted by human beings (for example interviews), reliability cannot be met for psychological reasons – will people state their actual opinion?

Objectivity means a survey result is neither effected in its implementation, evaluation, or interpretation by the people conducting the survey. Different experts will draw the same results. This is referred to as inter-subjectivity. If, for example, the results are manipulated, or interpretations can vary, the results are not objectively reliable.

For controlling-processes, it is advisable to keep in mind the quality criteria, used in qualitative research, this will help establish plausibility [12]:

 

  • Procedure documentation
  • Argumentative hedging of interpretations
  • Rule conformity
  • Close reference to the subject
  • Communicative validation
  • Triangulation

From a scientific point of view, results that do not have a documented survey method (lacking transparency) are useless. Documentation should include controlling-logic, survey method, formula calculation, a link to assisting data or interpretations.

Should the empirical data not proof any evident facts directly and needs to be interpreted, these interpretations have to be explained. This includes references to a common underlying understanding, the theory of the interpretation, like accepted regularities and a conclusive argumentation. Critical refutations or adverse interpretations help make the argumentation authentic.

The described approach should be taken systematically and step-by-step. Deviations from the set rules are possible, but need to be explained, for example, why the survey period is changed.

The close reference to the subject requires the survey to be based on a realistic framework of the working world. It should not constitute a laboratory or experimental study.

The validity of qualitative results (for example interpretations or interviews) can be ensured by allowing feedback on the results (communicative validity). If the survey participants can confirm the compiled results, a certain validity can be assumed.

Triangulation is a very important research strategy. Whenever comparable or plausible results are delivered through different approaches and based on different data sources, for the same question, the validity is strengthened.

The KPI itself is not important

KPIs are often based on one-dimensional data series (x1, x2, x3,…xn), marked by a mean value. For example: A 10% random sample is drawn from 110 documents. Their orthographical errors are counted and standardized to 1.000 words. The data series reads: (1,2: 0,7; 1,6; 0,8; 0,9; 3,5; 1,7; 0,9; 1,1; 1,3; 0,75) Due to one very faulty document (orthQuota: 3,5) the arithmetical mean value (1,314) might not be a good KPI, if this outlier was caused by an untypical situation, like a student intern. The data series could be trimmed (ignoring the outlier). The trimmed mean value is 1,095. Also, the median could be used (1,1). The modal value (0,9) is probably not suitable as a KPI because the most frequent value does not provide a plausible result.

The location parameter is often not sufficient to describe the characteristics of a value to a satisfying extend. The arithmetical mean value, for example, looks at how far a single value is removed from this location parameter – showing how variable they are. The statistical variation is used to analyze the data.

Variation shows the distribution of individual values around the mean value. Data series a=(10,50,51) and b=(49,50,51) for example, have the same mean value (x=50). However, their variation is different. Data series a shows a high variation, data series b only a low variation. One of the most important variation parameters is variance (s²). Data series a shows a variance of s²=1,367 ((40²+0²+50²)/3). Data series b has a lower variance, s²=0,666 ((1²+0²+1²)/3).

When interpreting a KPI it is not just important how close a quality indicator is located to a set target, but also what the variation of the individual value looks like. With a high variance in a data series examining orthographic errors in documents (i.e. the mentioned data series with 11 values shows s²=0,58) a company is likely to have many satisfied customers, but also a few very delighted and very disappointed customers. This might lead to customer complaints. A low variance on the other hand (example: the 11 values show a variety of 1,3), would lead to a variance of nearly 0 and therefore to a very high customer satisfaction. In case 1,3 is also the target value of our KPI, a high customer satisfaction rate can be assumed. (chart 03, p.61)

Correlation does not equal causality

KPIs can be related by a common development, can develop together or away from each other (correlation). If an indicator y develops in strict reliance to another (independent) indicator x, we talk about a cause-and-effect relationship (causality) (x -> y). This can be shown quantitively in two data series (x1,x2,x3,…xn) and (y1, y2, y3,…yn) (for example size of document related to number of errors). However, qualitative correlations are also possible.

There is a big difference between a mere correlation between 2 variables x and y and an actual effect caused by variable x towards variable y (causality). Should a correlation between the two variables be apparent, there can be four different reasons for that. The following phenomena are valid for both quantitative and qualitative data, however, the examples focus on non-numerical data.

  1. There is an actual causality from x to y. It does not have to be monocausal. There might also be a third variable z influencing y. (x -> y and z -> y) Compliance with terminology "x" influences the editorial text quality "y". Furthermore, the editors ability to articulate "z" also influences "y".
  2. There is a so-called mutual causality (x -> y and y -> x). Example: degree of prominence of a product "x" and sales figures "y".
  3.  There is an indirect correlation in which variable "x" only influences variable "y" through a third (not observed) variable "z". Example: implementing an editorial system "x" does not have a direct influence on the customer’s satisfaction with product literature "y". Since the editorial quality "z" will be improved, however, a correlation can be established.
  4. There is a mere coincidental effect (spurious correlation). Example: Improvement of workplace lighting "x" and improvement of text quality "y".

It is very dangerous to interpret a causal correlation without thorough analysis or to use an alleged connection to construct a clear cause-and-effect relationship.

With qualitative phenomena – under strict compliance with qualitative quality criteria – logical conclusions can be used to prevent paradox interpretations. The inductive approach uses observations to deduce possible principles. Specific statements are deducted from a general principle (assumption leads to conclusion). Multi-perspective discussions can be helpful with this as they insure objectivity to some degree. Another helpful tool could be dialectic argumentation – providing each thesis with an anti-thesis.

For quantitative phenomena (data series), correlation calculation is used. The co-variance s helps determine a general correlation (for example document size and error rate) and the correlations co-efficient r helps determine direction and strength. A linear correlation between two observed variables x and y, r can vary between "-1" and "+1". It constitutes the positive or negative linear correlation. For r=0, the two variables are linearly independent.

Let’s get started

Without getting into too many statistical details, several helpful conclusions can be drawn for everyday work life, by a few, simple considerations. The goal should always be to make sure controlling is working with reliable KPIs. In summery this article introduced:

 

  • Strategic development of key performance indicators
  • Conclusive target setting
  • Sensible sampling
  • Observing quality criteria
  • Detecting variations around the indicator
  • Discovering causality

In any case, the cart should not be put before the horse by looking at indicators the system has calculated and then trying to find a suitable application. The starting point always has to be a target-oriented approach "what do we want to control?" Afterwards a suitable survey method needs to be defined and the survey source to be provided. A vital part in this will be a diligent maintenance of meta-data.

Many companies are complaining “we receive useless indicators”. Taking a closer look reveals that the top-down-guidelines are often the result of a missing bottom-up-engagement. This should be prevented and can be counteracted by being proactive. The 6-point-plan can help provide a starting point to convincing superiors of the most suitable indicator system for the controlling of a certain field. Most managements have a tendency to encourage such initiative.

 

 

Resources

[1] Matthews, R. (2000): Storks deliver babies. In: Teaching Statistics.

[2] Lesch’s Kosmos (2018): Feinstaub & Co. – die Wahrheit über das Risiko.

[3] Horváth, P. (2011): Controlling, München: Vahlen.

[4] Küpper, H.-U.; Friedl, G.; Hofmann, C.; Hofmann, Y. (2013): Controlling – Konzeption, Aufgaben, Instrumente; Stuttgart: Schäffer-Poeschel.

[5] Schaffner, Michael (2016): Bewertungs- und Steuerungsmechanismen im Übersetzungsmanagement; Transline-Seminar.

[6] Schaffner, Michael (2013): Das Richtige richtig tun. In: technische kommunikation.

[7] in Anlehnung an: Wunderer, R; Sailer, M. (1987): Die Controlling-Funktion im Personalwesen, in: Personalführung.

[8] Schuldenzucker, U.: Prüfungstraining – Deskriptive Statistik, Stuttgart: Schäffer-Poeschel.

[9] wirtschaftslexikon.gabler.de/Archiv/2213/inferenzstatistik-v10.html

[10] Bleymüller, J. (2012): Statistik für Wirtschaftswissenschaftler, 16. Aufl., München: Vahlen.

[11] Friedrichs, J. (1990): Methoden empirischer Sozialforschung; Opladen: Westdeutscher Verlag.

[12] Flick, U. (2016): Qualitative Sozialforschung – Eine Einführung; Reinbeck bei Hamburg: Rowohlt.