Equipment

Building A Fault Tolerant Rebreather: Our Path to Simplicity

Divesoft’s factory instructor trainer Jakub Šimánek presents the design philosophy and considerations behind the creation of its family of Liberty rebreathers, which feature fully-redundant electronics that keep the rebreather operating despite one or more electronic failures.

Published

3 years ago

February 2, 2021

InDEPTH

by Jakub Šimánek
Photos courtesy of Divesoft unless otherwise noted. Illustrations by Aleš Procháska. Header photo by Martin Strmiska.

Full Disclosure: Divesoft is a sponsor of InDepth.

With open circuit diving, there is a general consensus that what is simple is safer and more functional, and the community has adopted simple, yet sufficiently redundant configurations. However, as rebreathers have come to the fore, it is time to ask what is necessary and simplest, and what can no longer be simplified.

The simplest way is not always the best way

The simplest, functionally ingenious, and—in principle—trouble-free rebreather is undoubtedly the oxygen rebreather. Simplicity itself. Breathing bag, CO₂ absorber, oxygen bottle, oxygen regulator, breathing hose, and directional mouthpiece is all you need. I wish this was the final solution! Unfortunately, as we know, with only oxygen we can not safely dive very deep and it is necessary to complicate the device quite a bit to do so.

Rebreathers have been developed through semi-closed, electronic closed circuit (eCCR), manual closed circuit (mCCR, and passive semi-closed (PSCR). We already know that if we want a safe device that allows us to dive deep, we cannot do without electronics, because all mechanical solutions have their limitations. SCR wastes too much gas, PSCR is much more economical, but decompression is not ideal. Manual CCR is as powerful as electronically controlled closed circuit in terms of gas savings and decompression efficiency, but has depth limitations due to the blocked reference ports in the first stage regulator, which sense ambient pressure. As a result the intermediate and ultimately the low pressure delivered by the regulator does not increase with depth.

While an open circuit regulator valve delivers a fixed pressure of gas above ambient pressure, a mCCR constant flow valve, aka a leaky or trickle valve or fixed orifice, delivers a fixed pressure of gas, in this case oxygen, independent of depth. Accordingly, the valve will deliver oxygen until the ambient pressure is equal to the pressure of gas exiting of the flow valve. So, for example, if the flow valve delivers a pressure of 10 bar, the depth will be limited to 90 m/294 ft or 10 bar of pressure ATA.

This is not the only thing that is problematic on mCCR. mCCRs have been designed with maximum simplicity and with the elimination of “dangerous” electronics—excluding oxygen sensors—in mind. This moves the most critical part, i.e., the control of the partial pressure of oxygen, to the constant flow nozzle with manual addition by the diver. However, the diver is a human who can suffer from bad moods and bad concentration. They can underestimate the situation, have limited ability to concentrate on multiple things at once, especially in ‘critical’ situations and are impacted by limited visibility, cramped space, inert gas effects and great depths etc.

We want to avoid electronics and take control, but we must acknowledge that the weakest link in the whole chain is ourselves. How does our human error rate compare with electronics? How accurate are our oxygen injection calculations vs the machines? Aren’t human factors the hottest topic of recent times in the diving world? The answers are obvious when we consider that one makes three to six mistakes per hour.¹

Personally, not counting the testing of pre-production prototypes, I have not had a dive computer fail underwater since 1996. There are no statistics on underwater electronics failure, but I would argue that the ratio of human error rate vs. the error rate of electronic dive computers is quite high. And what is the difference between a dive computer and a rebreather control electronics? Only that the control electronics of the rebreather processes information from multiple sources and has a software algorithm for controlling the solenoid.

Yet some people choose mCCR anyway. Research² has even shown that people are more willing to trust other people and forgive inevitable human mistakes rather than trusting computers.

Machines can’t think, and we can’t do that for them. We have to think for ourselves, but we can entrust straight forward calculations to computers. They are much better at it than we are.

Important Things Must Be Redundant

Electronic devices have become an integral part of our lives. We wake up with them, we move with them on the ground, in the air, in space, and underwater. We spend the whole day with them, and we fall asleep again with them in the night. Some electronic devices work 24 hours a day, 7 days a week.

Electronics help us in critical situations. When we are driving a car (ABS and other electronic systems), we let it navigate us from point A to point B. We entrust our lives to it when we travel by airliners, which today cannot function without electronics at all. The development of electronics is constantly advancing, but we must admit that electronics, like any other part of the device, can fail. If it is a mobile phone or a television, it is not usually a tragedy, but if it is a hard drive on which you have your family photos, or the results of your many years of work, for example, or a device on which human life depends, the data or device must have a reliable back up.

We all know it from both diving and everyday life, that those who do not back up their data will be sorry when they lose it. Those who do not have a backup plan in diving, a backup source of gas, can lose their lives. We are talking about redundancy.

Redundancy when diving with open circuit means having two first and two second stages, a buddy team, enough gas for the diver and their partner to surface, a buoyancy compensator that is backed up by a dry suit and any measuring device, such as a computer, sufficiently backed up by a second measuring device. So there is partly a kind of team backup. CCR failures cannot neccesarily be solved by team backup; a complete OC bailout ascent should be the last solution when there is no other option left. CCR cannot be backed up by a team member; CCR cannot be shared by two divers. The rebreather must contain its own back up.

What is a Fault Tolerant System?

A fault tolerant system is a feature that allows a system (often a computer system) to continue to work properly even if one of its components fails. Fault tolerance is desirable for systems with high availability or providing vital functions such as a space shuttle or an aircraft, so why not a rebreather? Life depends on it just as much.

As already mentioned, a rebreather is a complex system. What does such a system consist of? The electronic control unit receives information from several sensors, evaluates the data, and calculates the appropriate next action such as firing the injector, adjusting the decompression calculation, etc. The input systems are as follows: a pressure sensor, oxygen sensor, RTC (real time counter), and possibly additional helium or CO₂ sensors. We also need a power source, i.e., a battery, and a user interface in the form of a handset or a simple display of PO₂ values. Those are the basics.

Therefore, in order to appreciate what a fault tolerant rebreather is, let’s first look at a standard eCCR as shown in Fig. 1. The system is very vulnerable. A single error can cause us to either execute special skills or go straight to bailout. Try to imagine a failure of a battery, a single sensor, a solenoid, or a control unit.

The next diagram illustrates a fault tolerant rebreather schema—namely the Divesoft Liberty CCR (Fig.2)

*Fig. 2* The Divesoft Liberty Rebreather.

We start the dive with a fully functional rebreather. The loop is not very different from a conventional rebreather, except that all elements are doubled (a manual add valve (MAV) and automatic diluent valve (ADV), an oxygen MAV and two solenoids). In addition, the electronics are doubled: two control units, two main displays, two secondary displays (HUD and buddy display (BD), solenoids, batteries, and all sensors are doubled. These are actually two self-sufficient systems that are interconnected and can communicate with each other.

Let’s phase out the individual components and observe how robust and resilient the fault tolerant system is. Let’s say I just cut off the handset cable in the wreck. Nothing is happening, I still have a second handset that shows me all the data and I am able to control the whole device. In addition, water does not leak into the device, because the cables are protected by watertight partitions on all inputs and outputs.

Oops, I broke the second handset against a rock. Still, nothing is happening. All the sensors work and I am able to read the ppO₂ from the HUD. One of the batteries is exhausted/failed. This means the loss of one CU. But, I still have two oxygen sensors and therefore two O₂ sensors, depth sensors, a solenoid. I can still continue on the unit and return safely to the surface without any need to intervene.

How To Make A System Fault Tolerant

The fault tolerant rebreather design consists of three basic parts:

1. A robust software solution

2. Hardware redundancy

3. A fault detection system

Complex software for modern CCRs consists of individual software modules. The software modules (tasks) are: ppO₂ measurement, ambient pressure measurement, ppO₂ regulation, decompression calculation, and the user interface. Which of these components probably experiences the most errors? Of course, every software engineer sees it right away; it’s the user interface—the most complex part of almost any software. What’s with that?

The solution is simple. We separate the user interface into another hardware-separated unit, i.e., a handset. It communicates with the rest of the system, i.e., with the control unit via the bus, in a fixed, formal, precisely defined manner. In the event of a fault, it only stops working, but the vital core of the system goes on (Fig.4).

Because the control unit (CU) does not include a user interface, it can be programmatically divided into small, simple modules that are easier to verify.

The second principle is hardware redundancy. For hardware redundancy to really work as fault tolerant, it requires a sophisticated system.

Figure 5 shows a primitive rebreather without redundancy. One mistake is enough and it is inoperative (Fig.5).

*Fig.5: A control unit, sensor and solenoid*

We improve the system by placing three oxygen sensors. In addition to these, we connect a backup monitoring system, and the battery is doubled (Fig.6).

Figure 7 shows another improvement: a separate oxygen sensor for backup. As in the previous case, the backup is short-circuit-proof on the measuring bus (Fig.7), all of which combine to form a fail-safe system. If any one element fails, it may not necessarily remain functional, but we will immediately learn about the problem and go to the bailout.

Further improvements: everything is completely doubled. In case of any one fault, the system remains functional (Fig.8). In this case, we can already talk about a fault tolerant system.

Third principle: a fault detection system? If I only have two oxygen sensors (Fig. 9) in each part of the system, it is not possible to automatically evaluate which one is defective (but a diver can decide manually because he or she sees all four sensors). How do we solve the problem?

Add three plus three sensors. This will, of course, double the cost of sensors. (Fig.10)

Alternatively, enable each system to see all four sensors. The problem is that it is not at all easy to prevent a short circuit on one system (ex: flooding with salt water) not to short the other system at the same time (Fig.11).

Or, simply connect both systems via a bus, through which they can send the measured values from the sensors to each other. The bus is much easier to disable in the event of a short circuit thanks to analog measurement.

And, other sensors (depth, temperature, etc.) are treated in the same way, effectively doubling them.

Internal complexity does not mean complexity for the user.

If we look at the fault tolerant rebreather scheme, one may think that the system is too complex, but that’s just on the inside. We have already shown that an increased number of components does not lead to a greater risk; on the contrary, in the case of a completely redundant system, it leads to greater safety.

Externally, on the other hand, the control of such a system is very simple and the user does not feel the internal complexity at all. It’s like driving a modern car full of the latest cutting-edge technology, and all we need to do is step on the gas and turn the steering wheel and enjoy the d(r)ive.

Sources:

Edkins G., Human Factors, Human Error and the role of bad luck in Incident Investigations, May 23th 2016
Andrew Prahl and Lyn M. Van Swol; “The Computer Said I Should: How Does Receiving Advice From a Computer Differ From Receiving Advice From a Human,” presented at the 66th Annual International Communication Association Conference, Fukuoka, Japan, 9-13 June 2016.
Procháska Aleš; Principles of Fault tolerant rebreather“ 2016, powerpoint presentation

Dive Deeper:

aquaCORPS #12 Survivors: Designing a Redundant Life -Support System by William C. Stone (1995)

Jakub Šimánek graduated as a biology and physical education teacher from Charles University in Prague. Thanks to his father, he has been diving since childhood. He transferred his experience from teaching, biology, diving and sports to his instructor activities. Since 2012, he has been working in Divesoft, where he participates in the analysis and development of diving equipment, mainly rebreathers. He has been working as a Factory Instructor Trainer since 2014 and is an author of training procedures for this device. He is currently actively involved in developing and diving with bailout rebreathers.