MVImgNet: A Large-scale Dataset of Multi-view Images

被引：31

作者：

Yu, Xianggang ^{[1
,2
]}

Xu, Mutian ^{[1
,2
]}

Zhang, Yidan ^{[1
,2
]}

Liu, Haolin ^{[1
,2
]}

Ye, Chongjie ^{[1
,2
]}

Wu, Yushuang ^{[1
,2
]}

Yan, Zizheng ^{[1
,2
]}

Zhu, Chenming ^{[1
,2
]}

Xiong, Zhangyang ^{[1
,2
]}

Liang, Tianyou ^{[1
,2
]}

Chen, Guanying ^{[1
,2
]}

Cui, Shuguang ^{[1
,2
]}

Han, Xiaoguang ^{[1
,2
]}

机构：

[1] CUHKSZ, FNii, Shenzhen, Peoples R China

[2] CUHKSZ, SSE, Shenzhen, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

国家重点研发计划;

关键词：

D O I：

10.1109/CVPR52729.2023.00883

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Being data-driven is one of the most iconic properties of deep learning algorithms. The birth of ImageNet [24] drives a remarkable trend of 'learning from large-scale data' in computer vision. Pretraining on ImageNet to obtain rich universal representations has been manifested to benefit various 2D visual tasks, and becomes a standard in 2D vision. However, due to the laborious collection of real-world 3D data, there is yet no generic dataset serving as a counterpart of ImageNet in 3D vision, thus how such a dataset can impact the 3D community is unraveled. To remedy this defect, we introduce MVImgNet, a large-scale dataset of multi-view images, which is highly convenient to gain by shooting videos of real-world objects in human daily life. It contains 6.5 million frames from 219,188 videos crossing objects from 238 classes, with rich annotations of object masks, camera parameters, and point clouds. The multi-view attribute endows our dataset with 3D-aware signals, making it a soft bridge between 2D and 3D vision. We conduct pilot studies for probing the potential of MVImgNet on a variety of 3D and 2D visual tasks, including radiance field reconstruction, multi-view stereo, and view-consistent image understanding, where MVImgNet demonstrates promising performance, remaining lots of possibilities for future explorations. Besides, via dense reconstruction on MVImgNet, a 3D object point cloud dataset is derived, called MVPNet, covering 87,200 samples from 150 categories, with the class label on each point cloud. Experiments show that MVPNet can benefit the real-world 3D object classification while posing new challenges to point cloud understanding. MVImgNet and MVPNet will be public, hoping to inspire the broader vision community.

引用

页码：9150 / 9161

页数：12

共 50 条

[1] MVImgNet2.0: A Larger-scale Dataset of Multi-view Images
Han, Xiaoguang
Wu, Yushuang
Shi, Luyue
Liu, Haolin
Liao, Hongjie
Qiu, Lingteng
Yuan, Weihao
Gu, Xiaodong
Dong, Zilong
Cui, Shuguang
ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (06):
[2] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks
Yao, Yao
Luo, Zixin
Li, Shiwei
Zhang, Jingyang
Ren, Yufan
Zhou, Lei
Fang, Tian
Quan, Long
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1787 - 1796
[3] MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures
Xiong, Zhangyang
Li, Chenghong
Liu, Kenkun
Liao, Hongjie
Hu, Jianqiao
Zhu, Junyi
Ning, Shuliang
Qiu, Lingteng
Wang, Chongjie
Wang, Shijie
Cui, Shuguang
Han, Xiaoguang
arXiv, 2023,
[4] A Large-Scale Hierarchical Multi-View RGB-D Object Dataset
Lai, Kevin
Bo, Liefeng
Ren, Xiaofeng
Fox, Dieter
2011 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2011, : 1817 - 1824
[5] Multi-sensor large-scale dataset for multi-view 3D reconstruction
Voynov, Oleg
Bobrovskikh, Gleb
Karpyshev, Pavel
Galochkin, Saveliy
Ardelean, Andrei-Timotei
Bozhenko, Arseniy
Karmanova, Ekaterina
Kopanev, Pavel
Labutin-Rymsho, Yaroslav
Rakhimov, Ruslan
Safin, Aleksandr
Serpiva, Valerii
Artemov, Alexey
Burnaev, Evgeny
Tsetserukou, Dzmitry
Zorin, Denis
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21392 - 21403
[6] RoScenes: A Large-Scale Multi-view 3D Dataset for Roadside Perception
Zhu, Xiaosu
Sheng, Hualian
Cai, Sijia
Deng, Bing
Yang, Shaopeng
Liang, Qiao
Chen, Ken
Gao, Lianli
Song, Jingkuan
Ye, Jieping
COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 331 - 347
[7] Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Sener, Fadime
Chatterjee, Dibyadip
Shelepov, Daniel
He, Kun
Singhania, Dipika
Wang, Robert
Yao, Angela
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21064 - 21074
[8] Campus Map: A Large-Scale Dataset to Support Multi-View VO, SLAM and BEV Estimation
Ross, James
Kaygusuz, Nimet
Mendez, Oscar
Bowden, Richard
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 8566 - 8572
[9] A LARGE SCALE MULTI-VIEW RGBD VISUAL AFFORDANCE LEARNING DATASET
Khalifa, Zeyad
Shah, Syed Afaq Ali
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1325 - 1329
[10] Automatic tie-points extraction for triangulation of large-scale oblique multi-view images
Yan L.
Fei L.
Ye Z.
Xia W.
Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2016, 45 (03): : 310 - 317and338

← 1 2 3 4 5 →