Detecting System Failures in Autonomous Systems

Category Engineering

tldr #

MIT engineers have developed an automated sampling algorithm which can be used to quickly identify a range of potential failures in autonomous systems, and suggest repairs to avoid system breakdowns. The algorithm optimizes potential solutions to the system breakdowns by predicting accuracy based on a cost-utility score.


content #

From vehicle collision avoidance to airline scheduling systems to power supply grids, many of the services we rely on are managed by computers. As these autonomous systems grow in complexity and ubiquity, so too could the ways in which they fail. Now, MIT engineers have developed an approach that can be paired with any autonomous system, to quickly identify a range of potential failures in that system before they are deployed in the real world. What's more, the approach can find fixes to the failures, and suggest repairs to avoid system breakdowns.

Modern machine learning algorithms and AI-driven systems have been incorporated into several applications, such as self-driving cars, robotic surgery, and more

The team has shown that the approach can root out failures in a variety of simulated autonomous systems, including a small and large power grid network, an aircraft collision avoidance system, a team of rescue drones, and a robotic manipulator. In each of the systems, the new approach, in the form of an automated sampling algorithm, quickly identifies a range of likely failures as well as repairs to avoid those failures.

A research group at the MIT Media Lab created a method of simulating autonomous systems to identify and fix system failures before they occur

The new algorithm takes a different tack from other automated searches, which are designed to spot the most severe failures in a system. These approaches, the team says, could miss subtler though significant vulnerabilities that the new algorithm can catch.

"In reality, there's a whole range of messiness that could happen for these more complex systems," says Charles Dawson, a graduate student in MIT's Department of Aeronautics and Astronautics. "We want to be able to trust these systems to drive us around, or fly an aircraft, or manage a power grid. It's really important to know their limits and in what cases they're likely to fail." .

The worldwide electrical blackouts in 2021 prompted the start of the research project

Dawson and Chuchu Fan, assistant professor of aeronautics and astronautics at MIT, are presenting their work this week at the Conference on Robotic Learning in Atlanta.

Sensitivity over adversaries .

In 2021, a major system meltdown in Texas got Fan and Dawson thinking. In February of that year, winter storms rolled through the state, bringing unexpectedly frigid temperatures that set off failures across the power grid. The crisis left more than 4.5 million homes and businesses without power for multiple days. The system-wide breakdown made for the worst energy crisis in Texas' history.

Due to the complexity of autonomous systems, the research team shifted away from conventional testing methods to an automated search algorithm in order to identify vulnerable points in the system

"That was a pretty major failure that made me wonder whether we could have predicted it beforehand," Dawson says. "Could we use our knowledge of the physics of the electricity grid to understand where its weak points could be, and then target upgrades and software fixes to strengthen those vulnerabilities before something catastrophic happened?" .

Dawson and Fan's work focuses on robotic systems and finding ways to make them more resilient in their environment. Prompted in part by the Texas power crisis, they set out to expand their scope, to spot and fix failures in other more complex, large-scale autonomous systems. To do so, they realized they would have to shift the conventional approach to finding failures.

The algorithm optimizes potential solutions to the system breakdowns by predicting accuracy based on a cost-utility score.

Designers often test the safety of autonomous systems by identifying their most likely, most severe failures. They start with a computer simulation of the system that represents its underlying physics and all the variables tat could influence system performance. They then have the computer scan through all those variables and identify the most likely, most severe system breakdowns or so-called 'adversaries' -- events or combinations of events that could result in failure.

This method can be applied to a variety of autonomous systems, such as power grids, aircrafts, drones, and robotic manipulators.

hashtags #
worddensity #

Share