Detecting Spoofing Attacks with Physiological-Physical Feature Fusion

Tuesday - May 16 2023, 21:07 UTC - 2 years ago

tldr #

A research team led by Junxiao Xue proposed a voice spoofing detection method based on physiological-physical feature fusion yielding a 5% and 7% performance improvement for tandem decision cost function and equal error rate scores respectively in comparison to existing methods. The team also evaluated the performance of baseline models that introduced face features showing different degrees of performance improvement for the proposed model.

content #

Biometric speech recognition systems are often subject to various spoofing attacks, the most common of which are speech synthesis and speech conversion attacks.

These spoofing attacks can cause the biometric speech recognition system to incorrectly accept these spoofing attacks, which can compromise the security of this system.

Researchers have made many efforts to address this problem. But existing voice spoofing detection methods only consider the physical features of speech, resulting in poor detection performance.

The research team led by Junxiao Xue found that their proposed model’s performance improved by 5% and 7% in terms of Error Rate and Tandem Decision Cost Function scores, respectively

To solve the problem, a research team led by Junxiao Xue published their new research on March 27, 2023 in Frontiers of Computer Science.

The team proposed a voice spoofing detection method based on physiological-physical feature fusion. The method included a feature extractor, a densely connected convolutional neural network with squeeze and excitation blocks (SE-DenseNet), and a feature fusion strategy. Compared to existing methods, the tandem decision cost function and equal error rate scores improved by 5% and 7% respectively.

The team showed that their proposed model was more effective than existing models such as single systems

Specifically, physiological features in the audio were first extracted from a pre-trained convolutional network. SE-DenseNet was then used to extract the physical features. Such a densely connected model had high parametric efficiency and squeeze and excitation blocks enhanced the efficiency of feature transmission. Finally, the two features were integrated into the classification network for voice spoofing detection.

The method used a pre-trained convolutional network to extract the physiological features and a densely connected convolutional neural network with squeeze and excitation blocks (SE-DenseNet) to extract the physical features.

They compared the proposed model with some of the best single systems. The experiments showed that their proposed model performs better on both EER and t-DCF. To validate the effectiveness of the face features, they also evaluated the performance of some baseline models that introduced face features. It was found that different baseline methods showed different degrees of performance improvement when combined with the face features, proving that the face features are practicable for the baseline models.

In addition,the experiments revealed that different baseline models showed different degrees of performance improvement when combined with face features.

Future work may attempt to extract more accurate face features and study more effective feature fusion strategies to detect spoofing attacks.

hashtags #

biometricspeechrecognition voicespoofing voicespoofingdetection physiological-physicalfeaturefusion se-densenet facefeatures

worddensity #

features (9, 2.93%)
spoofing (7, 2.28%)
speech (5, 1.63%)
attacks (5, 1.63%)
face (5, 1.63%)