Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion

Dartmouth College¹, Arizona State University², Carnegie Mellon University³, University of Maryland⁴
ICCP 2024 (TPAMI Special Issue)

Abstract

Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view (360º viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known "missing cone" problem, which results in poor reconstruction along the depth axis. In this paper, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance).

In this paper, we extend Gaussian splatting for sonar and build fusion techniques that reconstruct geometry using the complementary information from both the cameras and sonars. Our extension involves the development of splatting operations along the z-axis tailored for these sensor types. The specific contributions of this paper include:

A novel forward model to render the transient of Gaussian point clouds for two types of sonars: Echosounder and Forward-Looking Sonar (FLS).
Fusion algorithms for cameras and sonars.
Validation on synthetic, emulated hardware, and real hardware datasets showing that fusion Gaussian splatting results in better geometric (60%) and photometric (5 dB) reconstruction than standard camera-only Gaussian splatting.

Ray View Transformation and Z-Axis Splatting (a) This illustration shows the camera view, which transforms the Gaussians from the world view to the camera view. (b) The Gaussians are transformed into the ray view through an local affine approximation of the projection transform using the Jacobian (J). (c) The transformed 3D Gaussian is then projected (splat) onto the xy-plane for rendering camera and z-axis for rendering echosounder (for collocated camera and echosounder). The gray Gaussian is occluded by the Gaussian in the front, so the Transmission(T) of that Gaussian is smaller than the others independent of whether we are rendering camera or sonar. Each ray undergoes splatting independently, ensuring that if a Gaussian is rasterized by multiple rays, it will be splatted multiple times.

Purpose: Small baseline and Missing Cone Problem

In small baseline imaging scenarios, camera images fail to capture depth-axis covariances and variance, resulting in missing cone problems in Fourier space. This limits 3D reconstruction fidelity due to missing frequency information. Our approach leverages time-resolved measurements from sonar to capture the z-axis projection of the volume, addressing the missing cone issue. By combining sonar data with camera images, we enhance 3D reconstruction, particularly in scenarios with limited camera baselines.

Experiment Pipeline

Simulation and emulation pipeline for both echosounder and FLS fusion techniques (a) Raw depth image captured with Time-of-Flight (ToF) camera. (b) An RGB image captured with a camera. (c) Simulated echosounder intensity was generated using the depth histogram and utilized as ground truth during training. (d) A 3D Gaussian scene. We use xy-splatting to render RGB images and z-splatting to render echosounder depth intensity distribution. (e) Simulated FLS intensity generated by histogramming depth per row. (f) A 3D Gaussian scene, and we splat along xy-direction to render RGB image and along yz-direction to render FLS image. We minimize the sum of RGB loss and corresponding depth loss to train the camera-sonar fusion algorithms.

Experiment Results

Novel view synthesis comparison: The incorporation of depth information notably mitigates the presence of floaters in the reconstructed scene. Moreover, depth information accurately positions the Gaussian kernels, particularly in scenes with uniform color or overexposure.

Geometry comparison on one-object scenes We captured the data by moving the camera only along the x-axis. We show ground truth meshes and superimpose the reconstructed Gaussians as point clouds. In the highlighted regions, we can observe that camera-only methods reconstruct the geometry inaccurately along the z-axis, whereas the proposed fusion techniques reconstruct the geometry accurately.

Novel view synthesis comparison on emulated hardware We set up a Cornell box in the lab, captured both RGB and depth images, and emulated the echosounder and FLS data. In the reconstructed scene, all methods work well on the objects with high-contrast textures. However, the RGB-only technique fails to reconstruct the white object with the same color as the background and also suffers from color bleeding. Our methods, on the other hand, successfully reconstruct the white object and do not suffer from color bleeding. For different random seeds, RGB-only techniques have high variance in the reconstructed results and have poor reconstructions (b).

Qualitative comparison of echosounder real-data results. The comparison presents two scenes captured using a DSLR camera and echo-sonar with a turntable setup. Our method demonstrates a noticeable improvement in performance over the baseline RGB-only method, both quantitatively and qualitatively.

Experimental reconstructions using RGB only and with FLS fusion. When using RGB-only measurements, we observe high error along depth (red box). Integrating FLS measurements improves depth resolution.

BibTeX

@article{qu2024z, title={Z-splat: Z-axis gaussian splatting for camera-sonar fusion}, author={Qu, Ziyuan and Vengurlekar, Omkar and Qadri, Mohamad and Zhang, Kevin and Kaess, Michael and Metzler, Christopher and Jayasuriya, Suren and Pediredla, Adithya}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2024}, publisher={IEEE} }