I worked for many years with, and for Stratus Technologies, a company that made fault tolerant computers – computers that just didn’t go down. One of the important things that we did at Stratus was longevity testing.
All software errors are not detectable quickly – some take time. Sometimes, just leaving a system to idle for a long time can cause problems. And we used to test for all of those things.
Which is why, when I see stuff like this, it makes me wonder what knowledge we are losing in this mad race towards ‘agile’ and ‘CI/CD’.
Airbus A350 software bug forces airlines to turn planes off and on every 149 hours
The AWD reads, in part
Prompted by in-service events where a loss of communication occurred between some avionics systems and avionics network, analysis has shown that this may occur after 149 hours of continuous aeroplane power-up. Depending on the affected aeroplane systems or equipment, different consequences have been observed and reported by operators, from redundancy loss to complete loss on a specific function hosted on common remote data concentrator and core processing input/output modules.
and this:
Required Action(s) and Compliance Time(s):
Repetitive Power Cycle (Reset):
(1) Within 30 days after 01 August 2017 [the effective date of the original issue of this AD], and, thereafter, at intervals not to exceed 149 hours of continuous power-up (as defined in the AOT), accomplish an on ground power cycle in accordance with the instructions of the AOT .
What is ridiculous about this particular issue is that it comes on the heals of Boeing 787 software bug can shut down planes’ generators IN FLIGHT, a bug where the generators would shutdown after 250 days of continuous operation, a problem that prompted this AWD!
Come on Airbus, my Windows PC has been up longer than your dreamliner!