MixRT: Mixed Neural Representations
For Real-Time NeRF Rendering

1Georgia Institute of Technology, 2Meta
*Work done while interning at Meta. Corresponding Author.

Abstract


Neural Radiance Field (NeRF) has emerged as a leading technique for novel view synthesis, owing to its impressive photorealistic reconstruction and rendering capability. Nevertheless, achieving real-time NeRF rendering in large-scale scenes has presented challenges, often leading to the adoption of either intricate baked mesh representations with a substantial number of triangles or resource-intensive ray marching in baked representations. We challenge these conventions, observing that high-quality geometry that is represented by meshes with substantial triangles, is not necessary for achieving photorealistic rendering quality. Consequently, we propose MixRT, a novel NeRF representation, that includes a low-quality mesh, a view-dependent-displacement map, and a compressed NeRF model. This design effectively harnesses the capabilities of existing graphics hardware, thus enabling real-time NeRF rendering on edge devices. Leveraging a highly-optimized WebGL-based rendering framework, our proposed MixRT attains real-time rendering speeds on edge devices (>30 FPS at a resolution of 1280 x 720 on a Macbook M1 Pro laptop), betster rendering quality (0.2 PSNR higher on indoor scenes of the Unbounded-360 datasets), and smaller storage (80%) compared to SotA methods.

Proposed MixRT Presentations


pipeline

An overview of our proposed MixRT rendering pipeline: MixRT integrates three core components: a low-quality mesh, a view-dependent displacement map, and a NeRF model compressed into a hash table. This combination aims to maximize utilization of diverse hardware resources. To render an image pixel: (1) We use rasterizer hardware to perform mesh rasterization, determining the ray-mesh intersection point, $p$. (2) Leveraging texture mapping units, we use texture coordinates to access maps containing the spherical harmonics (SH) coefficients and scale, computing the calibrated point, $p_{cali}$. (3) Lastly, $p_{cali}$ is processed by SIMD units, retrieving embeddings for its eight closest vertices from the 3D grid stored as a hash table. A small MLP network then converts these interpolated embeddings into the final rendered color.

Real-Time Interactive Viewer Demos


Collision Animation

stump

kitchenlego

officebonsai

kitchencounter

bicycle

gardenvase

fulllivingroom

Acknowledgements


The website template was borrowed from Instant Neural Graphics Primitives.