SphereCraft: A Dataset for Spherical Keypoint Detection,
Matching and Camera Pose Estimation
Christiano Gava 1
Yunmin Cho2
Federico Raue 1
Sebastian Palacio 1
Alain Pagani 1
Andreas Dengel 1,3
1 DFKI 2 Aimmo Germany GmbH 3 University of Kaiserslautern-Landau

Blender projects + 3D meshes Spherical images Depth maps Ground truth keypoint correspondences

Accepted at WACV-2024

Abstract

This paper introduces SphereCraft, a dataset specifically designed for spherical keypoint detection, matching, and camera pose estimation. The dataset addresses the limitations of existing datasets by providing extracted keypoints from various detectors, along with their ground-truth correspondences. Synthetic scenes with photo-realistic rendering and accurate 3D meshes are included, as well as real-world scenes acquired from different spherical cameras. SphereCraft enables the development and evaluation of algorithms targeting multiple camera viewpoints, advancing the state-of-the-art in computer vision tasks involving spherical images.

Overview

Synthetic Scenes
Our dataset comprises 21 synthetic scenes of different types, sizes, and complexity and yields over 2M image pairs for training and testing spherical keypoint matching models. We generate indoor and outdoor synthetic scenes with high-resolution RGB spherical images along with their depth maps and ground truth camera poses. A selection of popular handcrafted and learned keypoints is then extracted from each image and accurate ground truth keypoint correspondences are established. A highly accurate 3D mesh from each synthetic scene is also included. The resulting data (RGB images, depth maps, camera poses, 3D meshes, keypoints and their correspondences) allows future approaches to be trained and evaluated on exactly the same data. Additionally, we release all Blender projects so other researchers can optionally render the same scenes at different resolutions or create their own version of the data according to their needs.

For each synthetic scene, we manually place a set of cameras to homogeneously cover it. We refer to this initial set as anchor cameras. For each anchor camera, we randomly generate a set of satellite cameras in its vicinity. Akin to data augmentation, the idea is to automatically produce several novel views of the scene from many different positions and orientations. For instance, the figure below shows an anchor image (top left) along with its 9 satellite images.

The resolution of the rendered spherical images and depth maps is 2048x1024 pixels. Below is a sample from each synthetic scene. The number in parenthesis indicates the number of images in that scene.
Bank (930) Barbershop (80) Berlin (280)
Classroom (370) Garage (1090) Harmony (380)
Italian Flat (270) Kartu (640) Lone Monk (670)
Medieval Port (2160) Middle East (4300) Passion (600)
Rainbow (930) Seoul (330) Shapespark (860)
Showroom (1340) Simple (310) Tokyo (90)
Urban Canyon (4090) Vitoria (550) Warehouse (900)


Real-World Scenes
Along with synthetic scenes, we provide another 9 real scenes, captured with Civetta and Ricoh Theta-S cameras. They convey 4 indoor and 5 outdoor scenes of different sizes and complexity, with resolutions considerably higher than the synthetic images. Images captured with Civetta are 7070x3535 pixels, whereas those acquired with Theta-S are 5376x2688 pixels. Unlike synthetic scenes, here ground truth depth maps, camera poses and keypoint correspondences are not available, but we provide keypoints extracted at the resolutions aforementioned. Once again, the number in parenthesis indicates the number of images in that scene.
Berlin Street (186) Church (54) Corridors (116)
Meeting Room 1 (18) Meeting Room 2 (21) Stadium (74)
Town Square (35) Train Station (112) Uni (71)

Download SphereCraft

Our dataset is available at Zenodo. It comprises a total of 9 records. Although it is possible to download SphereCraft using Zenodo's web interface, we recommend the download script we release in SphereCraft's Github page.

Code and more details

Please visit SphereCraft's Github page.

The template for this website is borrowed from Richard Zhang.