Detecting System Failures in Autonomous Systems
Category Engineering Friday - November 10 2023, 23:36 UTC - 1 year ago MIT engineers have developed an automated sampling algorithm which can be used to quickly identify a range of potential failures in autonomous systems, and suggest repairs to avoid system breakdowns. The algorithm optimizes potential solutions to the system breakdowns by predicting accuracy based on a cost-utility score.
From vehicle collision avoidance to airline scheduling systems to power supply grids, many of the services we rely on are managed by computers. As these autonomous systems grow in complexity and ubiquity, so too could the ways in which they fail. Now, MIT engineers have developed an approach that can be paired with any autonomous system, to quickly identify a range of potential failures in that system before they are deployed in the real world. What's more, the approach can find fixes to the failures, and suggest repairs to avoid system breakdowns.
The team has shown that the approach can root out failures in a variety of simulated autonomous systems, including a small and large power grid network, an aircraft collision avoidance system, a team of rescue drones, and a robotic manipulator. In each of the systems, the new approach, in the form of an automated sampling algorithm, quickly identifies a range of likely failures as well as repairs to avoid those failures.
The new algorithm takes a different tack from other automated searches, which are designed to spot the most severe failures in a system. These approaches, the team says, could miss subtler though significant vulnerabilities that the new algorithm can catch.
"In reality, there's a whole range of messiness that could happen for these more complex systems," says Charles Dawson, a graduate student in MIT's Department of Aeronautics and Astronautics. "We want to be able to trust these systems to drive us around, or fly an aircraft, or manage a power grid. It's really important to know their limits and in what cases they're likely to fail." .
Dawson and Chuchu Fan, assistant professor of aeronautics and astronautics at MIT, are presenting their work this week at the Conference on Robotic Learning in Atlanta.
Sensitivity over adversaries .
In 2021, a major system meltdown in Texas got Fan and Dawson thinking. In February of that year, winter storms rolled through the state, bringing unexpectedly frigid temperatures that set off failures across the power grid. The crisis left more than 4.5 million homes and businesses without power for multiple days. The system-wide breakdown made for the worst energy crisis in Texas' history.
"That was a pretty major failure that made me wonder whether we could have predicted it beforehand," Dawson says. "Could we use our knowledge of the physics of the electricity grid to understand where its weak points could be, and then target upgrades and software fixes to strengthen those vulnerabilities before something catastrophic happened?" .
Dawson and Fan's work focuses on robotic systems and finding ways to make them more resilient in their environment. Prompted in part by the Texas power crisis, they set out to expand their scope, to spot and fix failures in other more complex, large-scale autonomous systems. To do so, they realized they would have to shift the conventional approach to finding failures.
Designers often test the safety of autonomous systems by identifying their most likely, most severe failures. They start with a computer simulation of the system that represents its underlying physics and all the variables tat could influence system performance. They then have the computer scan through all those variables and identify the most likely, most severe system breakdowns or so-called 'adversaries' -- events or combinations of events that could result in failure.
Share