PAT: Physics-Aware Transformer for Multidimensional Computational Imaging

Category Science

tldr #

An international group of researchers led by Minghao Hu and Zongliang Wu conducted experiments in which they compare classic snapshot compressive imaging systems with their proposed system, PAT, which involves multiscale manifold sampling and neural decompression. They found that their method achieves comparable image quality, and holds strong promise due to expected advancements in artificial neural network processing capabilities.


content #

An international group of researchers led by Minghao Hu of the University of Arizona in Tucson and Zongliang Wu of Westlake University in Hangzhou conducted experiments in multidimensional computational imaging to compare classic snapshot compressive imaging systems with their proposed system, which involves multiscale manifold sampling and neural decompression. They found that their method achieves comparable image quality, and holds strong promise due to expected advancements in artificial neural network processing capabilities. The group's research was published in Intelligent Computing.

Using the PAT system, multiple camera arrays can be set up and utilized for capturing images with more accuracy and less complexity

Cameras and other sensors can be used to capture rich data, but the more accurate it is, the more unmanageable the data becomes to capture, store, and process. The solution is sampling, that is, taking strategically spaced or masked snapshots of the target and combining them to get something that approximates the original by taking advantage of computationally guessable correlations in the data. The challenge is to make smart trade-offs so that the setup and operation is manageable and the result is good quality despite being compressed.

The datasets used for the research was simulated, as there is still no real-time application for PAT

Among the collection of techniques called snapshot compressive imaging, coded-aperture snapshot spectral imagers are specialized for capturing color information, and coded aperture compressive temporal imagers are specialized for video. These systems, called CASSI and CACTI, may be eclipsed by methods that rely on neural networks for image reconstruction. Neural networks make for systems that are more compact and easily implemented because they perform optimization in a more generalized way, allowing less complicated types of sampling, such as multiscale manifold sampling. CASSI-style "interlaced" sampling is like using color filters in photography, whereas multiscale manifold sampling is more like stereo-pair imaging, and can be implemented on camera arrays, which are flexible and scalable. Such camera arrays can even be programmed while they are capturing images.

In the research, CASSI and CACTI were used as control methods for comparison

The researchers' multiscale manifold sampling system uses a kind of neural network called a "transformer" for visual data processing. In particular, it is a physics-aware transformer network, thus the system is called PAT. The researchers compared PAT and CASSI for imaging with nine color bands using simulated data. Their quantitative analysis shows that both methods have strengths and weaknesses: PAT images score higher in structural similarity, and CASSI images score higher for peak signal-to-noise ratio. They also compared PAT and CACTI for video using simulated data. The two systems achieved similar results, but the PAT system is easier and cheaper to set up. In addition, the researchers demonstrated PAT by successfully reconstructing two scenes. One was captured in high-resolution grayscale and in low-resolution color, and the other was captured using a four-camera array that captured three colors plus texture.

PAT is advantageous for not relying on complex signal processing operations and rather using a more compact and easily implemented neural network system

hashtags #
worddensity #

Share