TY - GEN
T1 - MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation
AU - Cai, Dingding
AU - Heikkilä, Janne
AU - Rahtu, Esa
PY - 2023
Y1 - 2023
N2 - Acquiring labeled 6D poses from real images is an expensive and time-consuming task. Though massive amounts of synthetic RGB images are easy to obtain, the models trained on them suffer from noticeable performance degradation due to the synthetic-to-real domain gap. To mitigate this degradation, we propose a practical self-supervised domain adaptation approach that takes advantage of real RGB(-D) data without needing real pose labels. We first pre-train the model with synthetic RGB images and then utilize real RGB(-D) images to fine-tune the pre-trained model. The fine-tuning process is self-supervised by the RGB-based pose-aware consistency and the depth-guided object distance pseudo-label, which does not require the time-consuming online differentiable rendering. We build our domain adaptation method based on the recent pose estimator SC6D and evaluate it on the YCB-Video dataset. We experimentally demonstrate that our method achieves comparable performance against its fully-supervised counterpart while outperforming existing state-of-the-art approaches.
AB - Acquiring labeled 6D poses from real images is an expensive and time-consuming task. Though massive amounts of synthetic RGB images are easy to obtain, the models trained on them suffer from noticeable performance degradation due to the synthetic-to-real domain gap. To mitigate this degradation, we propose a practical self-supervised domain adaptation approach that takes advantage of real RGB(-D) data without needing real pose labels. We first pre-train the model with synthetic RGB images and then utilize real RGB(-D) images to fine-tune the pre-trained model. The fine-tuning process is self-supervised by the RGB-based pose-aware consistency and the depth-guided object distance pseudo-label, which does not require the time-consuming online differentiable rendering. We build our domain adaptation method based on the recent pose estimator SC6D and evaluate it on the YCB-Video dataset. We experimentally demonstrate that our method achieves comparable performance against its fully-supervised counterpart while outperforming existing state-of-the-art approaches.
U2 - 10.1007/978-3-031-31438-4_31
DO - 10.1007/978-3-031-31438-4_31
M3 - Conference contribution
SN - 9783031314377
T3 - Lecture Notes in Computer Science
SP - 467
EP - 481
BT - Image Analysis - 23rd Scandinavian Conference, SCIA 2023, Proceedings
A2 - Gade, Rikke
A2 - Felsberg, Michael
A2 - Kämäräinen, Joni-Kristian
PB - Springer
T2 - Scandinavian Conference on Image Analysis
Y2 - 18 April 2023 through 21 April 2023
ER -