Neural Scaling of Large Neural Networks for data analysis

Category Computer Science

tldr #

Researchers at MIT recently investigated the neural scaling behavior of large DNN-based models trained to generate advantageous chemical compositions and learn interatomic potentials. Their paper, published in Nature Machine Intelligence, shows how quickly the performance of these models can improve as their size and the pool of data they are trained on are increased.


content #

Deep neural networks (DNNs) have proved to be highly promising tools for analyzing large amounts of data, which could speed up research in various scientific fields. For instance, over the past few years, some computer scientists have trained models based on these networks to analyze chemical data and identify promising chemicals for various applications.Researchers at the Massachusetts Institute of Technology (MIT) recently carried out a study investigating the neural scaling behavior of large DNN-based models trained to generate advantageous chemical compositions and learn interatomic potentials. Their paper, published in Nature Machine Intelligence, shows how quickly the performance of these models can improve as their size and the pool of data they are trained on are increased.

The research was initially started in 2021, prior to the release of AI-based platforms ChatGPT and Dall-E 2.

"The paper 'Scaling Laws for Neural Language Models' by Kaplan et al., was the main inspiration for our research," Nathan Frey, one of the researchers who carried out the study, told Tech Xplore. "That paper showed that increasing the size of a neural network and the amount of data it's trained on leads to predictable improvements in model training. We wanted to see how 'neural scaling' applies to models trained on chemistry data, for applications like drug discovery." .

The team studied two distinct models - a large language model (LLM) and a graph neural network (GNN)-based model.

Frey and his colleagues started working on this research project back in 2021, thus before the release of the renowned AI-based platforms ChatGPT and Dall-E 2. At the time, the future upscaling of DNNs was perceived as particularly relevant to some fields and studies exploring their scaling in the physical or life sciences were scarce.

The researchers' study explores the neural scaling of two distinct types of models for chemical data analysis: a large language model (LLM) and a graph neural network (GNN)-based model. These two different types of models can be used to generate chemical compositions and learn the potentials between different atoms in chemical substances, respectively.

The research compared the effects of the model's size against the size of the dataset used to train it.

"We studied two very different types of models: an autoregressive, GPT-style language model we built called 'ChemGPT' and a family of GNNs," Frey explained. "ChemGPT was trained in the same way ChatGPT is, but in our case ChemGPT is trying to predict the next token in a string that represents a molecule. The GNNs are trained to predict the energy and forces of a molecule." .

To explore the scalability of the ChemGPT model and of GNNs, Frey and his colleagues explored the effects of a model's size and the size of the dataset used to train it on various relevant metrics. This allowed them to derive a rate at which these models improve as they become larger and are fed more data.

Incorporating physics in GNNs through a property called 'equivariance' can have dramatic effects on scaling efficiency.

"We do find 'neural scaling behavior' for chemical models, reminiscent of the scaling behavior seen in LLM and vision models for various applications," Frey said.

"We also showed that we are not near any kind of fundamental limit for scaling chemical models, so there is still a lot of room to investigate further with more compute and bigger datasets, Incorporating physics into GNNs via a property called 'equivariance' has a dramatic effect on improving scaling efficiency, which is an exciting result becaus that's something that can potentially be incorporated into many other models." .

The research was published in the prestigious journal Nature Machine Intelligence.

hashtags #
worddensity #

Share