Self-Supervised Voice Denoising Network for Multi-Scenario Human-Robot Interaction.

LI, Mu, XU, Wenjin, ZENG, Chao and WANG, Ning (2025). Self-Supervised Voice Denoising Network for Multi-Scenario Human-Robot Interaction. Biomimetics, 10 (9): 603. [Article]

Documents
36218:1060518
[thumbnail of Wang-SelfSupervisedDenoisingNetwork(VoR).pdf]
Preview
PDF
Wang-SelfSupervisedDenoisingNetwork(VoR).pdf - Published Version
Available under License Creative Commons Attribution.

Download (4MB) | Preview
Abstract
Human-robot interaction (HRI) via voice command has significantly advanced in recent years, with large Vision-Language-Action (VLA) models demonstrating particular promise in human-robot voice interaction. However, these systems still struggle with environmental noise contamination during voice interaction and lack a specialized denoising network for multi-speaker command isolation in an overlapping speech scenario. To overcome these challenges, we introduce a method to enhance voice command-based HRI in noisy environments, leveraging synthetic data and a self-supervised denoising network to enhance its real-world applicability. Our approach focuses on improving self-supervised network performance in denoising mixed-noise audio through training data scaling. Extensive experiments show our method outperforms existing approaches in simulation and achieves 7.5% higher accuracy than the state-of-the-art method in noisy real-world environments, enhancing voice-guided robot control.
More Information
Statistics

Downloads

Downloads per month over past year

View more statistics

Metrics

Altmetric Badge

Dimensions Badge

Share
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Actions (login required)

View Item View Item