RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering

EIC Lab @ Georgia Institute of Technology

News


description [Jul 16rd 2022] Paper accepted to IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2022).

Abstract


Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF algorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) processing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adaptively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF's sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a dual-purpose bi-direction adder & search tree and a high-density sparse search unit to coordinate the two aforementioned encoding formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7x - 3,201x) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions.

Background and motivation


Preliminaries of NeRF


nvs

(a) Novel view synthesis

evaluation_platform

(b) NeRF's pipeline

tensorf_pipeline

(c) TensoRF's pipeline


(a) An illustration of novel view synthesis, which is the rendering task that NeRF targets to resolve. (b) NeRF-based rendering includes Step 1 Map pixels to rays by marching camera rays through the scene, Step 2 Query the features (i.e., the RGB color and the density) of points along the rays by inputting their locations and distance to an MLP model, and Step 3 Render pixels' colors. (c) TensoRF achieving SOTA NeRF efficiency replaces Step 2 (i.e., query the features of points along the rays using a MLP) in NeRF with both Step 2-1 which locates pre-existing points using an occupancy grid and Step 2-2 which computes pre-existing points' features based on a decomposed embedding grid in terms of matrix-vector pairs.

Profiling on SOTA efficient NeRF solutions



profile_tensorf

Among Step 1 (i.e., map pixels to rays), Step 2-1 (i.e., locate the pre-existing points), Step 2-2, and Step 3 (i.e., render pixels' colors), SOTA efficient NeRF solutions (e.g., TensoRF) are bottlenecked by Step 2-1 and Step 2-2.

Algorithm-hardware co-design


RT-NERF: Proposed algorithm


algs

(a) Efficient Rendering Pipeline

tile

(b) View-Dependent Rendering Ordering

(a) The proposed rendering pipeline which directly computes the geometry of pre-existing points, enabling occupancy grid accesses that are both fewer and more regular than the SOTA rendering pipeline. (b) The tiled sub-space (marked as yellow) that is closet to the origin of the target view will be processed first during our rendering process based on the current target view.

RT-NERF: Proposed accelerator



hardware_overall

Overall architecture of the proposed accelerator, illustrating the block diagram of both the (a) overall micro-architecture and (b) Parallel Processing Unite.

Evaluation


Summary of the evaluated devices' specifications


evaluation_platform

For a fair comparison with both edge and cloud baseline devices, we configure two RT-NeRF hardware settings accordingly: one for the edge device (denoted as RT-NeRF-Edge) and one for the cloud device (denoted as RT-NeRF-Cloud). For RT-NeRF-edge, we set the corresponding hardware configuration such that RT-NeRF-edge can achieve > 30FPS throughput requirement for all 8 datasets of Synthetic-NeRF, which results in an average of 45FPS for 8 datasets. For RT-NeRF-Cloud, we configure the hardware resources to match the power of RTX 2080Ti.

Speedup and energy efficiency


bench_chart

The normalized speedup and energy efficiency achieved by our proposed RT-NeRF and 5 baseline devices on the 8 datasets of Synthetic-NeRF. All the legends follow the “device (algorithm)” format.

Citation


@article{li2022rt-nerf,
    author = {Chaojian Li and Sixu Li and Yang Zhao and Wenbo Zhu and Yingyan Lin},
    title = {RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering},
    journal = {IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2022)},
    year = {2022},
    publisher = {IEEE/ACM},
    address = {San Diego, CA, USA}
}

Acknowledgements


The work is supported by the NSF CCRI program (Award number: 2016727) and the NSF RTML program (Award number: 1937592).

The website template was borrowed from Instant Neural Graphics Primitives.