Toward high-precision robotic assembly in large workspaces using multimodal reinforcement learning

REN, Sirui, ZENG, Chao, YANG, Chenguang and WANG, Ning (2026). Toward high-precision robotic assembly in large workspaces using multimodal reinforcement learning. Robotic Intelligence and Automation. [Article]

Documents
37174:1220837
[thumbnail of ФH,.pdf]
Preview
PDF
ФH,.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (8MB) | Preview
Abstract

Purpose

This study aims to address the challenges of complex contact dynamics, structural constraints and perceptual uncertainty in robotic peg-in-hole assembly tasks, particularly under large workspace and high-precision requirements.

Design/methodology/approach

The authors propose a reinforcement learning framework that integrates multimodal perception, self-supervised representation modeling and hybrid control mechanisms. The framework takes visual images, proprioceptive states and target pose information as inputs. A self-supervised pretraining phase jointly optimizes image reconstruction and forward prediction to learn structurally aware and temporally consistent latent state representations. Furthermore, a Spectrum Random Masking (SRM) technique introduces frequency-domain perturbations to the visual modality, encouraging stable spectral feature learning and enhancing perception robustness for sim-to-real transfer. During execution, an adaptive impedance control mode is activated when excessive contact force is detected, mitigating insertion impacts and jamming.

Findings

Real-world experiments in extended operational spaces and across diverse peg-hole geometries demonstrate that the proposed method achieves millimeter-level accuracy, high success rates in seen configurations and strong zero-shot generalization to unseen shapes. These results validate the effectiveness of the framework in ensuring robust and precise robotic assembly across large and variable workspaces.

Originality/value

This work presents a novel reinforcement learning framework that integrates multimodal perception, spectral feature learning and hybrid control switching to tackle the challenges of high-precision peg-in-hole assembly. By introducing SRM for robust visual representation and combining it with adaptive impedance control, the framework offers a distinctive solution that enhances sim-to-real transfer and generalization in complex assembly scenarios.
More Information
Statistics

Downloads

Downloads per month over past year

View more statistics

Metrics

Altmetric Badge

Dimensions Badge

Share
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Actions (login required)

View Item View Item