“I can predict things. I can improve the uptime and the reliability. I can intervene and cause a better outcome before there’s a problem. ”
In previous articles I described how to consider availability in Performance Based Contracts (PBCs). In this companion article, we are now going to look at reliability and how we describe it within our PBCs.
Similar to availability, considering reliability of a system or service is one of the highest priorities for both buyer and seller. But what is reliability and how do we describe it within our PBC?
In general terms reliability is defined as “The probability that an item can perform its intended function for a specified interval under stated conditions.” Using this definition reliability requirements typically reflects either success or failure as a:
- as a whole number (e.g. the number of successes or failures; 1, 10, 100);
- as a percentage (e.g. 95% success or fail); or
- as a rate (e.g. 1 failure per million operations).
Moreover, these measures of reliability represent either:
- end-to-end, all-encompassing measures such as “mission success rate” defined as percentage of missions that successfully completed when considering all missions undertaken; or
- technical specifications, such as Mean Time Between Failure (MTBF) or Mean Time Between Critical Failure (MTBCF), defined as the average time (measured in units such as operating hours, distance, number of operations, etc.) between failures.
While the term “mission” is being used here, this simply refers to the intended function or outcome. The mission could be the processing of a financial transaction, delivery of a meal, or completion of a flight from one city to another. The key to defining a successful mission is the seller has met the buyer’s (or end customer’s) need.
In terms of defining whether the measure is success or failure, people vary on which is better to use. I always prefer to use success, regardless of whether representing the buyer or seller, since it creates a positive discussion focused on success (as opposed to negative discussions on failures. The only exception to this is for very high reliability systems such as telecommunication / computer equipment that have ultra high percentages such as 99.995%. Here the numbers are so high both buyer and seller need to focus on the failures / exceptions as opposed to success.
Essential to defining reliability is criticality of the failure to the essential operation of the product or service; sometimes referred either:
- Mission Critical Failure – which makes the product / services unusable (i.e. the ‘mission’ cannot continue); or
- Logistics Failure – which may degrade the product / services, however, they can still be used / delivered, albeit at a lower quality or in a slower timeframe.
As you can see from these descriptions, the type of failure becomes central to how we deal with a reliability performance measure. For example, consider a vehicle, whether motorbike, car, bus or truck. All these vehicles have lights that enable them to drive safely at night or in low visibility conditions. So what if the lights on your car did not work and you had to drive to the shops for food on a beautiful clear, sunny day? How would you see this failure? A Mission Critical Failure or Logistics Failure? Since you could still drive safely to the shops then possible this is only a Logistics Failure. However, what if it was night-time (i.e. dark outside) or there was a long tunnel between your house and the shops? How would you consider this now? Mission Critical Failure or Logistics Failure?
Alternatively, what if your engine did not start or the radio / CD / MP3 player did not work? Typically, any failure to an engine would automatically be considered a Mission Critical Failure since the primary role of a vehicle is movement. However, the failure of the radio / CD / MP3 player would only be considered a Logistics Failure since it is not required to achieve the mission (although those of us with small children on long road trips may dispute this!).
To make it simpler to understand and to avoid disputes, especially for large, highly complex systems, many organisations define the systems that, if are not working, would automatically be considered a mission failure. For example, in the airline industry this list is known as the Master Minimum Equipment List (MMEL) that defines all the equipment that needs to be in working order for the aircraft to be considered in a working (sometimes called an operational or serviceable) state. In the Australian Department of Defence, more generically, this list is known as the Mission Critical Item List (MCIL).
Now we have defined what reliability is, in the next article, we will consider the commercial aspects of reliability by looking at who is responsible for failure.