UniDoorManip: Learning Universal Door Manipulation Policy Over Large-scale and Diverse Door Manipulation Environments

*indicates equal contribution indicates corresponding authors
1Beijing University of Posts and Telecommunications, 2School of CS, Peking University


Our Proposed Environment, Dataset and Universal Manipulation Policy. We build a novel door manipulation environment equipped with a large-scale door dataset covering 6 door categories with hundreds of door bodies and handles, and configure it with different realistic door manipulation mechanisms. To learn the universal door manipulation policy, we propose a novel framework which can generalize to unseen shapes and categories.

Abstract

Learning a universal manipulation policy encompassing doors with diverse categories, geometries and mechanisms, is crucial for future embodied agents to effectively work in complex and broad real-world scenarios. Due to the limited datasets and unrealistic simulation environments, previous works fail to achieve good performance across various doors. In this work, we build a novel door manipulation environment reflecting different realistic door manipulation mechanisms, and further equip this environment with a large-scale door dataset covering 6 door categories with hundreds of door bodies and handles, making up thousands of different door instances. Additionally, to better emulate real-world scenarios, we introduce a mobile robot as the agent and use the partial and occluded point cloud as the observation, which are not considered in previous works while possessing significance for real-world implementations. To learn a universal policy over diverse doors, we propose a novel framework disentangling the whole manipulation process into three stages, and integrating them by training in the reversed order of inference. Extensive experiments validate the effectiveness of our designs and demonstrate our framework's strong performance in simulation and real-world.

Video

Dataset


Annotation and Generation

Mechanism


Method


Full Pipeline

Results


Simulation Results

RealWorld Results

BibTeX

@article{li2024unidoormanip,
  title={UniDoorManip: Learning Universal Door Manipulation Policy Over Large-scale and Diverse Door Manipulation Environments},
  author={Li, Yu and Zhang, Xiaojie and Wu, Ruihai and Zhang, Zilong and Geng, Yiran and Dong, Hao and He, Zhaofeng},
  journal={arXiv preprint arXiv:2403.02604},
  year={2024}
}

Motivating Projects


PartNet-Mobility Dataset (in SAPIEN Simulator)
CVPR 2020
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, Hao Su
PartNet-Mobility dataset is a collection of 2K articulated objects with motion annotations and rendernig material. The dataset powers research for generalizable computer vision and manipulation. The dataset is a continuation of ShapeNet and PartNet.
VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects
ICLR 2022
Ruihai Wu*, Yan Zhao*, Kaichun Mo*, Zizheng Guo, Yian Wang, Tianhao Wu, Qingnan Fan, Xuelin Chen, Leonidas J. Guibas, Hao Dong
We design an interaction-for-perception framework VAT-MART to learn such actionable visual representations by simultaneously training a curiosity-driven reinforcement learning policy exploring diverse interaction trajectories and a perception module summarizing and generalizing the explored knowledge for pointwise predictions among diverse shapes.

Other Related Projects


RLAfford: End-to-End Affordance Learning for Robotic Manipulation
ICRA 2023
Yiran Geng*, Boshi An*, Haoran Geng, Yuanpei Chen, Yaodong Yang, Hao Dong
we take advantage of visual affordance by using the contact information generated during the RL training process to predict contact maps of interest. Such contact prediction process then leads to an end-to-end affordance learning framework that can generalize over different types of manipulation tasks.
AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-shot Interactions
ECCV 2022
Yian Wang*, Ruihai Wu*, Kaichun Mo*, Jiaqi Ke, Qingnan Fan, Leonidas J. Guibas, Hao Dong
We propose a novel framework, named AdaAfford, that learns to perform very few test-time interactions for quickly adapting the affordance priors to more accurate instance-specific posteriors.
DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Manipulation
ICLR 2023
Yan Zhao*, Ruihai Wu*, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun Mo, Hao Dong
we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks. The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet inter-connected subtasks for efficient learning.