Revolutionary Thought Speech Interface from fMRI Scans

Category Technology

tldr #

In a recent study, researchers developed a noninvasive brain-computer interface capable of converting thoughts into words. The data collected was used to train a large language model, GPT-1, to predict whole sentences users were hearing with surprising accuracy. While the technology is still in early development stages, it has potential applications in assisting people with communication disabilities.

content #

A noninvasive brain-computer interface capable of converting a person’s thoughts into words could one day help people who have lost the ability to speak as a result of injuries like strokes or conditions including ALS.

In a new study, published in Nature Neuroscience today, a model trained on functional magnetic resonance imaging (fMRI) scans of three volunteers was able to predict whole sentences they were hearing with surprising accuracy—just by looking at their brain activity. The findings demonstrate the need for future policies to protect our brain data, the team says.

The GPT-1 model was developed by OpenAI, a leading research laboratory in the artificial intelligence field

Speech has been decoded from brain activity before, but the process typically requires highly invasive electrode devices to be embedded within a person’s brain. Other noninvasive systems have tended to be restricted to decoding single words or short phrases.

This is the first time whole sentences have been produced from noninvasive brain recordings collected through fMRI, according to the interface’s creators, a team of researchers from the University of Texas at Austin. While normal MRI takes pictures of the structure of the brain, functional MRI scans evaluate blood flow in the brain, depicting which parts are activated by certain activities.

Functional Magnetic Resonance Imaging (fMRI) is a type of neuroimaging technique used to measure brain activity

First, the team trained GPT-1, a large language model developed by OpenAI, on a data set of English sentences sourced from Reddit, 240 stories from The Moth Radio Hour, and transcriptions of the New York Times’s Modern Love podcast. The researchers wanted the narratives to be interesting and fun to listen to, because that was more likely to produce good fMRI data than something that left the participants bored. "We all like to listen to podcasts, so why not lie in an MRI scanner listening to podcasts?" jokes Alexander Huth, assistant professor of neuroscience and computer science at the University of Texas at Austin, who led the project.

The team used 240 stories from The Moth Radio Hour, plus transcriptions of the New York Times’s Modern Love podcast. to train the GPT-1 model.

During the study, three participants each listened to 16 hours of different episodes of the same podcasts while in an MRI scanner, plus a couple of TED talks. The idea was to collect a wealth of data the team says is over five times larger than the language data sets typically used in language-related fMRI experiments.

The model learned to predict the brain activity that reading certain words would trigger. To decode, it guessed sequences of words and checked how closely that guess resembled the actual words. It predicted how the brain would respond to the guessed words, and then compared that with the actual measured brain responses.

The study was conducted on three participants and each participant listened to 16 hours of podcast episodes while in an MRI scanner.

When they tested the model on new podcast episodes, it was able to recover the gist of what users were hearing just from their brain activity, often identifying exact words and phrases. For example, a user heard the words "I don’t have my driver’s license yet." The decoder returned the sentence "She has not even started to learn to drive yet." .

The researchers also showed the participants short Pixar videos that didn’t contain any dialogue, and recorded their brain responses in a separate experiment designed to test whether the decoder was able to recover the general content of what the user was watching. It turned out that it was.

The noninvasive brain-computer interface was successful in decoding users' brain activity to provide a clear gist of what the user was hearing or watching.

Romain Brette, a theoretical neuroscientist at the Vision Institute in Paris who was not involved in the experiment, is not wholly convinced by the technology’s efficacity. He believes that it's still too early for something like this to become available for use, despite the impressive results.

hashtags #
worddensity #