Open-Pose 3D zero-shot learning: Benchmark and challenges

Weiguang Zhao; Guanyu Yang; Rui Zhang; Chenru Jiang; Chaolong Yang; Yuyao Yan; Amir Hussain; Kaizhu Huang

doi:10.1016/j.neunet.2024.106775

Open-Pose 3D zero-shot learning: Benchmark and challenges

Weiguang Zhao, Guanyu Yang, Rui Zhang^*, Chenru Jiang, Chaolong Yang, Yuyao Yan, Amir Hussain, Kaizhu Huang

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D

Original language	English
Article number	106775
Journal	Neural Networks
Volume	181
DOIs	https://doi.org/10.1016/j.neunet.2024.106775
Publication status	Published - Jan 2025

Keywords

3D classification
Open-pose
Text–image matching
Zero-shot

Access to Document

10.1016/j.neunet.2024.106775

Cite this

@article{9f10a1413fa5472f8617b0f690b932e8,

title = "Open-Pose 3D zero-shot learning: Benchmark and challenges",

abstract = "With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D",

keywords = "3D classification, Open-pose, Text–image matching, Zero-shot",

author = "Weiguang Zhao and Guanyu Yang and Rui Zhang and Chenru Jiang and Chaolong Yang and Yuyao Yan and Amir Hussain and Kaizhu Huang",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2025",

month = jan,

doi = "10.1016/j.neunet.2024.106775",

language = "English",

volume = "181",

journal = "Neural Networks",

issn = "0893-6080",

}

TY - JOUR

T1 - Open-Pose 3D zero-shot learning

T2 - Benchmark and challenges

AU - Zhao, Weiguang

AU - Yang, Guanyu

AU - Zhang, Rui

AU - Jiang, Chenru

AU - Yang, Chaolong

AU - Yan, Yuyao

AU - Hussain, Amir

AU - Huang, Kaizhu

PY - 2025/1

Y1 - 2025/1

N2 - With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D

AB - With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D

KW - 3D classification

KW - Open-pose

KW - Text–image matching

KW - Zero-shot

UR - http://www.scopus.com/inward/record.url?scp=85206480425&partnerID=8YFLogxK

U2 - 10.1016/j.neunet.2024.106775

DO - 10.1016/j.neunet.2024.106775

M3 - Article

C2 - 39423498

AN - SCOPUS:85206480425

SN - 0893-6080

VL - 181

JO - Neural Networks

JF - Neural Networks

M1 - 106775

ER -

Open-Pose 3D zero-shot learning: Benchmark and challenges

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this