Accelerating Bayesian Inference with Deterministic ADVI

Category Computer Science

tldr #

Bayesian inference is a commonly used method in scientific fields that can be slow and require a lot of computation time. The new technique, DADVI, automates this process and provides more accurate results faster, making it applicable to a variety of research areas.


content #

Pollsters trying to predict presidential election results and physicists searching for distant exoplanets have at least one thing in common: They often use a tried-and-true scientific technique called Bayesian inference.

Bayesian inference allows these scientists to effectively estimate some unknown parameter—like the winner of an election—from data such as poll results. But Bayesian inference can be slow, sometimes consuming weeks or even months of computation time or requiring a researcher to spend hours deriving tedious equations by hand.

Bayesian inference is named after the mathematician Thomas Bayes who first described the technique in the 18th century.

Researchers from MIT and elsewhere have introduced an optimization technique that speeds things up without requiring a scientist to do a lot of additional work. Their method can achieve more accurate results faster than another popular approach for accelerating Bayesian inference.

Using this new automated technique, a scientist could simply input their model and then the optimization method does all the calculations under the hood to provide an approximation of some unknown parameter. The method also offers reliable uncertainty estimates that can help a researcher understand when to trust its predictions.

Bayesian inference is based on Bayes' theorem which calculates the probability of an event happening based on prior knowledge or information.

This versatile technique could be applied to a wide array of scientific quandaries that incorporate Bayesian inference. For instance, it could be used by economists studying the impact of microcredit loans in developing nations or sports analysts using a model to rank top tennis players.

"When you actually dig into what people are doing in the social sciences, physics, chemistry, or biology, they are often using a lot of the same tools under the hood. There are so many Bayesian analyses out there.

The application of Bayesian inference is not limited to scientific fields, but can also be used in fields such as finance and medicine.

"If we can build a really great tool that makes these researchers lives easier, then we can really make a difference to a lot of people in many different research areas," says senior author Tamara Broderick, an associate professor in MIT's Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society.

The use of Bayesian inference has become more popular in recent years due to the increase in available data and computing power.

Broderick is joined on the paper by co-lead authors Ryan Giordano, an assistant professor of statistics at the University of California at Berkeley; and Martin Ingram, a data scientist at the AI company KONUX. The paper was recently published in the Journal of Machine Learning Research.

Faster results .

When researchers seek a faster form of Bayesian inference, they often turn to a technique called automatic differentiation variational inference (ADVI), which is often both fast to run and easy to use.

Deterministic ADVI is able to handle high-dimensional data with much greater efficiency compared to other methods.

But Broderick and her collaborators have found a number of practical issues with ADVI. It has to solve an optimization problem and can do so only approximately. So, ADVI can still require a lot of computation time and user effort to determine whether the approximate solution is good enough. And once it arrives at a solution, it tends to provide poor uncertainty estimates.

Rather than reinventing the wheel, the team took many ideas from ADVI but turned them around to create a technique called deterministic ADVI (DADVI) that doesn't have these downsides.

The DADVI method can also be applied to machine learning tasks such as training neural networks.

With DADVI, rather than optimizing the relationship between a model's parameters and a probability distribution to match observed data, DADVI optimizes two probability distributions' mutual information (which measures how much information the output distribution captures about the input parameters) to get a sense for which input parameters match well with the observed data.

DADVI approximates the mutual information when it can to make these calculations more efficient: It uses an approximation technique that tracks how certain we are about different parameters and uses that information to make big calculations simpler.

DADVI also includes other smart heuristics to calculate certain calculations only when they must be to get accurate results.


hashtags #
worddensity #

Share