厦门大学空间感知与计算实验室

学术资源

常用三维数据集：

6 个用于点云分类和分割的三维数据集（包含物体模型，室内场景和室外场景）；

4 个用于自动驾驶的点云配准三维数据集（包含城市街道，乡间小路，野外森林）；

4 个以图片为主的目标识别，分类任务数据集。

[1]、Stanford 3D dataset（S3DIS）

简介：用于点云分类分割的数据集。使用 Matterport 扫描仪获取的彩色 3D 点云数据，总数为 695,878,620 点。数据集由五个不同建筑物的五个大型室内区域组成，每个建筑物覆盖约 1900,450,1700,870 和 1100 平方米，其中主要包括办公区域，教育和展览空间，以及会议室，个人办公室，休息室，开放空间，大厅，楼梯和走廊。
Description： A data set used for point cloud classification and segmentation. The color 3D point cloud data obtained using the Matterport scanner is 695,878,620 points in total. The data set consists of five large indoor areas of five different buildings, each covering approximately 1900, 450, 1700, 870 and 1100 square meters, including office areas, education and exhibition spaces, and conference rooms, personal offices, lounges, open spaces, halls, stairs and corridor.

下载链接： http://buildingparser.stanford.edu/dataset.html#Download

论文链接： https://arxiv.org/abs/1702.01105

[2]、ScanNet

简介：用于点云分类分割的数据集。 ScanNet 是一个 RGB-D 视频数据集，包含在 707 个不同空间中获取的经过 1513 次扫描得到的包含 2.5M RGB-D 图像的真实环境。该数据集带有 3D 相机姿势，表面重建和实例级语义分段的注释。使用这些数据有助于在多个 3D 场景理解任务上实现最先进的性能，包括 3D 对象分类，语义体素标注和 CAD 模型检索。
Description： A data set used for point cloud classification and segmentation. ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans,annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval.

下载链接： http://www.scan-net.org/

论文链接： https://arxiv.org/abs/1702.04405

[3]、ShapeNet

ShapeNetCore 简介： ShapeNetCore 是完整 ShapeNet 数据集的子集，具有单个清洁3D模型和手动验证的类别和对齐注释。它涵盖55个常见对象类别，约有51,300个独特的 3D 模型。 PASCAL 3D +的 12 个对象类别，一个流行的计算机视觉 3D基准数据集，都由ShapeNetCore 涵盖。
Description： ShapeNetCore is a subset of the full ShapeNet dataset with single clean 3D models and manually verified category and alignment annotations. It covers 55 common object categories with about 51,300 unique 3D models. The 12 object categories of PASCAL 3D+, a popular computer vision 3D benchmark dataset, are all covered by ShapeNetCore.

ShapeNetSem 简介： ShapeNetSem 是一个更小，更密集注释的子集，由 12,000个模型组成，分布在更广泛的 270 个类别中。除了手动验证的类别标签和一致的对齐外，这些模型还使用实际维度进行注释，在类别级别估算其材料成分，并估算其总体积和重量。
Description： ShapeNetSem is a smaller, more densely annotated subset consisting of 12,000 models spread over a broader set of 270 categories. In addition to manually verified category labels and consistent alignments, these models are annotated with real-world dimensions, estimates of their material composition at the category level,and estimates of their total volume and weight.

下载链接： https://www.shapenet.org/

论文链接： https://arxiv.org/abs/1512.03012?context=cs.CG

[4]、ModelNet40

简介：用于点云分类分割的数据集。包含约 40 个对象类别（如飞机，表格，植物等），用三角形网格表示的 12311 个 CAD 模型。数据分为 9843 个培训模式和2468 个测试模式。
Description：A data set used for point cloud classification and segmentation. Contains about 40 object categories (such as airplanes, tables, plants, etc.), 12311 CAD models represented by triangular meshes. The data is divided into 9843 training modes and 2468 test modes.

下载链接： http://modelnet.cs.princeton.edu/

论文链接： https://arxiv.org/abs/1406.5670

[5]、Sematic 3D

简介：用于点云分类分割的数据集。包含 8 个类标签的分类基准（1.人造地形，2.自然地形， 3.高植被， 4.低植被， 5.建筑物， 6.硬景观， 7.扫描人工制品， 8.汽车），总共提供了超过 10 亿个点的户外点云数据集。
Description： A data set used for point cloud classification and segmentation.semantic-8 is a benchmark for classification with 8 class labels, namely {1: man-made terrain, 2: natural terrain, 3: high vegetation, 4: low vegetation, 5: buildings, 6: hard scape, 7: scanning artefacts, 8: cars}. In total over a billion points are provided.

下载链接： http://www.semantic3d.net/view_dbase.php?chl=1

论文链接： https://arxiv.org/abs/1704.03847

[6]、SUN3D

简介：用 RGB-D 传感器扫描获得的一个完整的三维空间数据集。数据库提供具有语义对象分段和摄像机姿态的 RGB-D 帧，例如整个空间的点云，三维物体模型，一个物体的所有视点及其相对于该物体的姿态和一个房间的地图，并且从鸟瞰的角度显示所有物体及其语义标签。
Description： A complete 3D spatial data set obtained by scanning with RGB-D sensors. The database provides RGB-D frames with semantic object segmentation and camera poses, such as a point cloud of the entire space, a three-dimensional object
model, all viewpoints of an object and its pose relative to the object and a map of a room, and from a bird's eye view The angle shows all objects and their semantic labels.

下载链接： http://sun3d.cs.princeton.edu/

论文链接： https://ieeexplore.ieee.org/document/6751312

[7]、Kitti

简介：目前国际上最大的自动驾驶场景下的计算机视觉算法评测数据集,用于评测立体图像(stereo)，光流(optical flow)，视觉测距(visual odometry)， 3D 物体检测(object detection)和 3D 跟踪(tracking)等计算机视觉技术在车载环境下的性能。KITTI 包含市区、乡村和高速公路等场景采集的真实图像数据，每张图像中最多达 15 辆车和 30 个行人，还有各种程度的遮挡与截断。整个数据集由 389 对立体图像和光流， 39.2 km 视觉测距序列以及超过 200k 3D 标注物体的图像标注物体的图像组成，以 10Hz 的采样频率及同步。
Description： At present, the largest computer vision evaluation data set in the world's largest autopilot scene is used to evaluate stereo, optical flow, visual odometry, 3D object detection and 3D. The performance of computer vision technology such as tracking in an in-vehicle environment. KITTI contains real-world image data from scenes such as urban, rural, and highways, with up to 15 vehicles and 30 pedestrians per image, as well as varying degrees of occlusion and truncation. The entire data set consists of 389 pairs of stereo images and optical streams, 39.2 km visual ranging sequences, and images of more than 200k 3D labeled objects labeled objects, with a sampling frequency of 10 Hz and synchronization.

下载链接： http://www.cvlibs.net/datasets/kitti/index.php

论文链接： https://ieeexplore.ieee.org/document/6248074

[8]、Oxford Robotcar

简介：一个具有挑战性的自动驾驶新数据集: Oxford RobotCar 数据集。在 2014年 5 月至 2015 年 12 月期间，平均每周使用牛津机器人车平台(Oxford RobotCar platform)两次穿越牛津市中心。获取了超过 1000 公里的记录驾驶，从安装在车上的 6 个摄像头收集了近 2000 万张照片，连同激光雷达、 GPS 和 INS 真值。所有天气情况，包括大雨、夜晚、阳光直射和雪，都收集了数据。道路和建筑工程在一年内显著改变了路段的路线，从开始到结束的数据收集。
Description： A challenging new dataset for autonomous driving: the Oxford RobotCar Dataset. Over the period of May 2014 to December 2015 we traversed a route through central Oxford twice a week on average using the Oxford RobotCar platform, an autonomous Nissan LEAF. This resulted in over 1000km of recorded driving with almost 20 million images collected from 6 cameras mounted to the vehicle, along with LIDAR, GPS and INS ground truth. Data was collected in all weather conditions, including heavy rain, night, direct sunlight and snow. Road and building works over the period of a year significantly changed sections of the route from the beginning to the end of data collection.

下载链接： https://robotcar-dataset.robots.ox.ac.uk/

论文链接： https://robotcar-dataset.robots.ox.ac.uk/images/robotcar_ijrr.pdf

[9]、Apollo Scape

简介：该数据集提供了 ApolloScape 数据集及其自动驾驶应用程序。 ApolloScape包含更大更丰富的标签，包括每个站点的整体语义密集点云，立体声，像素语义标签，车道标记，实例分割， 3D 汽车实例，来自多个站点，城市的各种驾驶视频中每个帧的高精确位置和白天。对于每项任务，它包含的图像数量比 SOTA 数据集多 15 倍。
Description： The dataset present the ApolloScape dataset and its applications for autonomous driving. ApolloScape contains much large and richer labelling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labelling,lanemark labelling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities and daytimes. For each task, it contains at lease 15x larger amount of images than SOTA datasets.

下载链接： http://apolloscape.auto/self_localization.html

论文链接： https://arxiv.org/abs/1803.06184

[10]、ETH

简介： ETH 提出了在现代机器人容易遇到的环境多样性的位置获得的八个点云序列，范围从公寓内到林地区域。数据集的核心由 3D 激光点云组成，为每个姿势提供支持数据（重力，磁北和 GPS）。并且确保了扫描仪在 mm 范围精度内的全球定位，而与环境条件无关。这将允许在映射具有挑战性的环境（例如在现实世界中发现的环境）时开发改进的配准算法。
Description： ETH proposes eight point cloud sequences acquired in locations covering the environment diversity that modern robots are susceptible to encounter, ranging from inside an apartment to a woodland area. The core of the data sets consists of 3D laser point clouds for which supporting data (Gravity, Magnetic North and GPS) are given for each pose. A special effort has been made to ensure global
positioning of the scanner within mm-range precision, independent of environmental conditions. This will allow for the development of improved registration algorithms when mapping challenging environments, such as those found in real-world situations.

下载链接：

https://projects.asl.ethz.ch/datasets/doku.php?id=laserregistration:laserregistration

论文链接：
https://journals.sagepub.com/action/doSearch?AllField=Challenging+data+sets+for+point+cloud+registration+algorithms&SeriesKey=ijra

[11]、RueMonge2014

简介：数据集为 2D 和 3D 立面分割以及逆过程建模提供了标准基准。包括了从巴黎的一条街道获得的 428 个高分辨率和多视图图像。使用多视图图像重建具有大约 1M 点的点云。为 2D 图像和点云提供了具有七个语义类别的门，商店，阳台，窗户，墙壁，天空和屋顶的地面真实标签。
Description： The dataset provides a standard benchmark for 2D and 3D elevation segmentation and inverse process modeling. Includes 428 high-resolution and multi-view images from a street in Paris. A point cloud having a point of about 1 M is reconstructed using a multi-view image. For 2D images and point clouds, there are seven semantic categories of doors, shops, balconies, windows, walls, sky and roofs with real labels.

下载链接： https://varcity.ethz.ch/3dchallenge/

论文链接： https://link.springer.com/chapter/10.1007%2F978-3-319-10602-1_34

[12]、MNIST

简介： MNIST 手写数字数据库具有 60,000 个示例的训练集和 10,000 个示例的测试集。其中数字已经过尺寸标准化，并以固定尺寸的图像为中心。
Description： The MNIST handwritten digit database has 60,000 sample training sets and 10,000 sample test sets. The numbers have been standardized in size and centered on fixed-size images.

下载链接： http://yann.lecun.com/exdb/mnist/

论文链接：https://ieeexplore.ieee.org/document/726791?reload=true&arnumber=726791

[13]、ImageNet

简介：一个用于视觉对象识别软件研究的大型可视化数据库。包含 2 万多个类别，超过 1400 万张的手动注释图像，以指示图片中的对象。同时在超过 100 万个图像中，提供好了目标的边界框。
Description： A large visualization database for visual object recognition software research. Contains more than 20,000 categories of over 20 million manual annotation images to indicate objects in the image. At the same time, in more than one million images, the bounding box of the target is provided.

下载链接： http://www.image-net.org/
论文链接： http://www.image-net.org/papers/ImageNet_2010.pdf

[14]、Synthia Dataset

简介：包含一系列从虚拟城市渲染的照片获得的图像帧，并带有精确的像素级语义注释。其目的是在驾驶场景的背景下帮助解决语义分段和相关的场景理解问题。数据集包含大量数据和真值信息：来自视频流的 200,000 个高清图像和来自独立快照的 20,000 个高清图像，具有如下特点：场景多样性：欧洲风情小镇，现代城市，高速公路和绿地；各种动态物体：汽车，行人和骑自行车者；季节变化：冬季，秋季，春季和夏季专用主题；照明条件和天气：动态灯光和阴影，几种日间模式，降雨模式和夜间模式；传感器配置： 8 个 RGB 相机组成双筒 360º相机，8 个深度传感器。
Description： Contains a series of image frames taken from photos rendered in a virtual city with precise pixel-level semantic annotations. Its purpose is to help solve semantic segmentation and related scene understanding problems in the context ofdriving scenarios. The data set contains a large amount of data and truth information:200,000 high-definition images from the video stream and 20,000 high-definition
images from independent snapshots with the following characteristics: scene diversity:European-style towns, modern cities, highways and green spaces; Dynamic objects:cars, pedestrians and cyclists; seasonal changes: special topics for winter, autumn,spring and summer; lighting conditions and weather: dynamic lighting and shadows,several day modes, rainfall mode and night mode; sensor configuration: The eight RGB cameras form a dual-tube 360o camera with eight depth sensors.

下载链接： http://synthia-dataset.net/

论文链接： https://ieeexplore.ieee.org/document/7780721

顶会论文链接.docx

常用数据集清单.pdf

关于文献阅读和科研选题 .pdf

剑桥大学：语言和写作决定人生发展的潜力.pdf