New Statistical Technique for Using Machine Learning Predictions to Test Scientific Hypotheses

Category Machine Learning

tldr #

Researchers from the University of California, Berkeley have developed a new technique, called prediction-powered inference (PPI), which uses real-world data to correct the output of large, general machine learning models, such as AlphaFold. This technique is meant to test scientific hypotheses more accurately and safely by identifying potential errors and biases in AI models.


content #

Over the past decade, AI has permeated nearly every corner of science: Machine learning models have been used to predict protein structures, estimate the fraction of the Amazon rainforest that has been lost to deforestation and even classify faraway galaxies that might be home to exoplanets.But while AI can be used to speed scientific discovery—helping researchers make predictions about phenomena that may be difficult or costly to study in the real world—it can also lead scientists astray. In the same way that chatbots sometimes "hallucinate," or make things up, machine learning models can sometimes present misleading or downright false results.

PPI requires much less real-world data than other methods of using ML predictions

In a paper published online in Science, researchers at the University of California, Berkeley, present a new statistical technique for safely using the predictions obtained from machine learning models to test scientific hypotheses.

The technique, called prediction-powered inference (PPI), uses a small amount of real-world data to correct the output of large, general models—such as AlphaFold, which predicts protein structures—in the context of specific scientific questions.

PPI can also be used to rank scientific hypotheses according to their plausibility

"These models are meant to be general: They can answer many questions, but we don't know which questions they answer well and which questions they answer badly—and if you use them naively, without knowing which case you're in, you can get bad answers," said study author Michael Jordan, the Pehong Chen Distinguished Professor of electrical engineering and computer science and of statistics at UC Berkeley. "With PPI, you're able to use the model, but correct for possible errors, even when you don't know the nature of those errors at the outset." .

Machine learning models can be biased due to the data they are trained on

The risk of hidden biases .

When scientists conduct experiments, they're not just looking for a single answer—they want to obtain a range of plausible answers. This is done by calculating a "confidence interval," which, in the simplest case, can be found by repeating an experiment many times and seeing how the results vary.

In most science studies, a confidence interval usually refers to a summary or combined statistic, not individual data points. Unfortunately, machine learning systems focus on individual data points, and thus do not provide scientists with the kinds of uncertainty assessments that they care about. For instance, AlphaFold predicts the structure of a single protein, but it doesn't provide a notion of confidence for that structure, nor a way to obtain confidence intervals that refer to general properties of proteins.

PPI uses the prediction and real-world data together to help determine which predictions are more accurate

Scientists may be tempted to use the predictions from AlphaFold as if they were data to compute classical confidence intervals, ignoring the fact that these predictions are not data. The problem with this approach is that machine learning systems have many hidden biases that can skew the results. These biases arise, in part, from the data on which they are trained, which are generally existing scientific research that may not have had the same focus as the current study.

PPI can help scientists identify potential issues with machine learning models that provide questionable results

"Indeed, in scientific problems, we're often interested in phenomena which are at the edge between the known and the unknown," Jordan said. "Very often, th .


hashtags #
worddensity #

Share