LI, Mu, XU, Wenjin, ZENG, Chao and WANG, Ning (2025). Self-Supervised Voice Denoising Network for Multi-Scenario Human-Robot Interaction. Biomimetics, 10 (9): 603. [Article]
Documents
36218:1060518
PDF
Wang-SelfSupervisedDenoisingNetwork(VoR).pdf - Published Version
Available under License Creative Commons Attribution.
Wang-SelfSupervisedDenoisingNetwork(VoR).pdf - Published Version
Available under License Creative Commons Attribution.
Download (4MB) | Preview
Abstract
Human-robot interaction (HRI) via voice command has significantly advanced in recent years, with large Vision-Language-Action (VLA) models demonstrating particular promise in human-robot voice interaction. However, these systems still struggle with environmental noise contamination during voice interaction and lack a specialized denoising network for multi-speaker command isolation in an overlapping speech scenario. To overcome these challenges, we introduce a method to enhance voice command-based HRI in noisy environments, leveraging synthetic data and a self-supervised denoising network to enhance its real-world applicability. Our approach focuses on improving self-supervised network performance in denoising mixed-noise audio through training data scaling. Extensive experiments show our method outperforms existing approaches in simulation and achieves 7.5% higher accuracy than the state-of-the-art method in noisy real-world environments, enhancing voice-guided robot control.
More Information
Statistics
Downloads
Downloads per month over past year
Metrics
Altmetric Badge
Dimensions Badge
Share
Actions (login required)
![]() |
View Item |