3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset

Junjie Zhang; Tianci Hu; Xiaoshui Huang; Yongshun Gong; Dan Zeng

3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset

Junjie Zhang, Tianci Hu, Xiaoshui Huang^*, Yongshun Gong, Dan Zeng

^*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

1 Citation (Scopus)

Abstract

Evaluating the performance of Multi-modal Large Language Models (MLLMs), integrating both point cloud and language, presents significant challenges.The lack of a comprehensive assessment hampers determining whether these models truly represent advancements, thereby impeding further progress in the field.Current evaluations heavily rely on classification and caption tasks, falling short in providing a thorough assessment of MLLMs.A pressing need exists for a more sophisticated evaluation method capable of thoroughly analyzing the spatial understanding and expressive capabilities of these models.To address these issues, we introduce a scalable 3D benchmark, accompanied by a large-scale instruction-tuning dataset known as 3DBench, providing an extensible platform for a comprehensive evaluation of MLLMs.Specifically, we establish the benchmark that spans a wide range of spatial and semantic scales, from object-level to scene-level, addressing both perception and planning tasks.Furthermore, we present a rigorous pipeline for automatically constructing scalable 3D instruction-tuning datasets, covering 10 diverse multi-modal tasks with more than 0.23 million QA pairs generated in total.Thorough experiments evaluating trending MLLMs, comparisons against existing datasets, and variations of training protocols demonstrate the superiority of 3DBench, offering valuable insights into current limitations and potential research directions.Codes are available at https://github.com/Inshsang/3DBench.

Original language	English
Title of host publication	Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
Editors	Kate Larson
Publisher	International Joint Conferences on Artificial Intelligence
Pages	1706-1714
Number of pages	9
ISBN (Electronic)	9781956792041
Publication status	Published - 2024
Externally published	Yes
Event	33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 - Jeju, Korea, Republic of Duration: 3 Aug 2024 → 9 Aug 2024

Publication series

Name	IJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)	1045-0823

Conference

Conference	33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
Country/Territory	Korea, Republic of
City	Jeju
Period	3/08/24 → 9/08/24

Cite this

Zhang, J., Hu, T., Huang, X., Gong, Y., & Zeng, D. (2024). 3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset. In K. Larson (Ed.), Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 (pp. 1706-1714). (IJCAI International Joint Conference on Artificial Intelligence). International Joint Conferences on Artificial Intelligence.

Zhang, Junjie ; Hu, Tianci ; Huang, Xiaoshui et al. / 3DBench : A Scalable 3D Benchmark and Instruction-Tuning Dataset. Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024. editor / Kate Larson. International Joint Conferences on Artificial Intelligence, 2024. pp. 1706-1714 (IJCAI International Joint Conference on Artificial Intelligence).

@inproceedings{339adb9e5ad747dfa77300462a6e9ec0,

title = "3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset",

abstract = "Evaluating the performance of Multi-modal Large Language Models (MLLMs), integrating both point cloud and language, presents significant challenges.The lack of a comprehensive assessment hampers determining whether these models truly represent advancements, thereby impeding further progress in the field.Current evaluations heavily rely on classification and caption tasks, falling short in providing a thorough assessment of MLLMs.A pressing need exists for a more sophisticated evaluation method capable of thoroughly analyzing the spatial understanding and expressive capabilities of these models.To address these issues, we introduce a scalable 3D benchmark, accompanied by a large-scale instruction-tuning dataset known as 3DBench, providing an extensible platform for a comprehensive evaluation of MLLMs.Specifically, we establish the benchmark that spans a wide range of spatial and semantic scales, from object-level to scene-level, addressing both perception and planning tasks.Furthermore, we present a rigorous pipeline for automatically constructing scalable 3D instruction-tuning datasets, covering 10 diverse multi-modal tasks with more than 0.23 million QA pairs generated in total.Thorough experiments evaluating trending MLLMs, comparisons against existing datasets, and variations of training protocols demonstrate the superiority of 3DBench, offering valuable insights into current limitations and potential research directions.Codes are available at https://github.com/Inshsang/3DBench.",

author = "Junjie Zhang and Tianci Hu and Xiaoshui Huang and Yongshun Gong and Dan Zeng",

note = "Publisher Copyright: {\textcopyright} 2024 International Joint Conferences on Artificial Intelligence. All rights reserved.; 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 ; Conference date: 03-08-2024 Through 09-08-2024",

year = "2024",

language = "English",

series = "IJCAI International Joint Conference on Artificial Intelligence",

publisher = "International Joint Conferences on Artificial Intelligence",

pages = "1706--1714",

editor = "Kate Larson",

booktitle = "Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024",

}

Zhang, J, Hu, T, Huang, X, Gong, Y & Zeng, D 2024, 3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset. in K Larson (ed.), Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024. IJCAI International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence, pp. 1706-1714, 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, Korea, Republic of, 3/08/24.

3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset. / Zhang, Junjie; Hu, Tianci; Huang, Xiaoshui et al.
Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024. ed. / Kate Larson. International Joint Conferences on Artificial Intelligence, 2024. p. 1706-1714 (IJCAI International Joint Conference on Artificial Intelligence).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - 3DBench

T2 - 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024

AU - Zhang, Junjie

AU - Hu, Tianci

AU - Huang, Xiaoshui

AU - Gong, Yongshun

AU - Zeng, Dan

PY - 2024

Y1 - 2024

N2 - Evaluating the performance of Multi-modal Large Language Models (MLLMs), integrating both point cloud and language, presents significant challenges.The lack of a comprehensive assessment hampers determining whether these models truly represent advancements, thereby impeding further progress in the field.Current evaluations heavily rely on classification and caption tasks, falling short in providing a thorough assessment of MLLMs.A pressing need exists for a more sophisticated evaluation method capable of thoroughly analyzing the spatial understanding and expressive capabilities of these models.To address these issues, we introduce a scalable 3D benchmark, accompanied by a large-scale instruction-tuning dataset known as 3DBench, providing an extensible platform for a comprehensive evaluation of MLLMs.Specifically, we establish the benchmark that spans a wide range of spatial and semantic scales, from object-level to scene-level, addressing both perception and planning tasks.Furthermore, we present a rigorous pipeline for automatically constructing scalable 3D instruction-tuning datasets, covering 10 diverse multi-modal tasks with more than 0.23 million QA pairs generated in total.Thorough experiments evaluating trending MLLMs, comparisons against existing datasets, and variations of training protocols demonstrate the superiority of 3DBench, offering valuable insights into current limitations and potential research directions.Codes are available at https://github.com/Inshsang/3DBench.

AB - Evaluating the performance of Multi-modal Large Language Models (MLLMs), integrating both point cloud and language, presents significant challenges.The lack of a comprehensive assessment hampers determining whether these models truly represent advancements, thereby impeding further progress in the field.Current evaluations heavily rely on classification and caption tasks, falling short in providing a thorough assessment of MLLMs.A pressing need exists for a more sophisticated evaluation method capable of thoroughly analyzing the spatial understanding and expressive capabilities of these models.To address these issues, we introduce a scalable 3D benchmark, accompanied by a large-scale instruction-tuning dataset known as 3DBench, providing an extensible platform for a comprehensive evaluation of MLLMs.Specifically, we establish the benchmark that spans a wide range of spatial and semantic scales, from object-level to scene-level, addressing both perception and planning tasks.Furthermore, we present a rigorous pipeline for automatically constructing scalable 3D instruction-tuning datasets, covering 10 diverse multi-modal tasks with more than 0.23 million QA pairs generated in total.Thorough experiments evaluating trending MLLMs, comparisons against existing datasets, and variations of training protocols demonstrate the superiority of 3DBench, offering valuable insights into current limitations and potential research directions.Codes are available at https://github.com/Inshsang/3DBench.

UR - http://www.scopus.com/inward/record.url?scp=85200617763&partnerID=8YFLogxK

M3 - Conference Proceeding

AN - SCOPUS:85200617763

T3 - IJCAI International Joint Conference on Artificial Intelligence

SP - 1706

EP - 1714

BT - Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024

A2 - Larson, Kate

PB - International Joint Conferences on Artificial Intelligence

Y2 - 3 August 2024 through 9 August 2024

ER -

3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this