REN, Sirui, ZENG, Chao, YANG, Chenguang and WANG, Ning (2026). Toward high-precision robotic assembly in large workspaces using multimodal reinforcement learning. Robotic Intelligence and Automation. [Article]
Documents
37174:1220837
PDF
ФH,.pdf - Accepted Version
Available under License Creative Commons Attribution.
ФH,.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (8MB) | Preview
Abstract
Purpose
This study aims to address the challenges of complex contact dynamics, structural constraints and perceptual uncertainty in robotic peg-in-hole assembly tasks, particularly under large workspace and high-precision requirements.Design/methodology/approach
The authors propose a reinforcement learning framework that integrates multimodal perception, self-supervised representation modeling and hybrid control mechanisms. The framework takes visual images, proprioceptive states and target pose information as inputs. A self-supervised pretraining phase jointly optimizes image reconstruction and forward prediction to learn structurally aware and temporally consistent latent state representations. Furthermore, a Spectrum Random Masking (SRM) technique introduces frequency-domain perturbations to the visual modality, encouraging stable spectral feature learning and enhancing perception robustness for sim-to-real transfer. During execution, an adaptive impedance control mode is activated when excessive contact force is detected, mitigating insertion impacts and jamming.Findings
Real-world experiments in extended operational spaces and across diverse peg-hole geometries demonstrate that the proposed method achieves millimeter-level accuracy, high success rates in seen configurations and strong zero-shot generalization to unseen shapes. These results validate the effectiveness of the framework in ensuring robust and precise robotic assembly across large and variable workspaces.Originality/value
This work presents a novel reinforcement learning framework that integrates multimodal perception, spectral feature learning and hybrid control switching to tackle the challenges of high-precision peg-in-hole assembly. By introducing SRM for robust visual representation and combining it with adaptive impedance control, the framework offers a distinctive solution that enhances sim-to-real transfer and generalization in complex assembly scenarios.More Information
Statistics
Downloads
Downloads per month over past year
Metrics
Altmetric Badge
Dimensions Badge
Share
Actions (login required)
![]() |
View Item |


Tools
Tools