Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF algorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) processing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adaptively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF's sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a dual-purpose bi-direction adder & search tree and a high-density sparse search unit to coordinate the two aforementioned encoding formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7x - 3,201x) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions.
Background and motivation
Preliminaries of NeRF
(a) An illustration of novel view synthesis, which is the rendering task that NeRF targets to resolve.
(b) NeRF-based rendering includes Step 1 Map pixels to rays by marching camera rays through the scene, Step 2 Query the features (i.e., the RGB color and the density) of points along the rays by inputting their locations and distance to an MLP model, and Step 3 Render pixels' colors.
(c) TensoRF achieving SOTA NeRF efficiency replaces Step 2 (i.e., query the features of points along the rays using a MLP) in NeRF with both Step 2-1 which locates pre-existing points using an occupancy grid and Step 2-2 which computes pre-existing points' features based on a decomposed embedding grid in terms of matrix-vector pairs.
Profiling on SOTA efficient NeRF solutions
Among Step 1 (i.e., map pixels to rays), Step 2-1 (i.e., locate the pre-existing points), Step 2-2, and Step 3 (i.e., render pixels' colors), SOTA efficient NeRF solutions (e.g., TensoRF) are bottlenecked by Step 2-1 and Step 2-2.
Algorithm-hardware co-design
RT-NERF: Proposed algorithm
(a) The proposed rendering pipeline which directly computes the geometry of pre-existing points, enabling occupancy grid accesses that are both fewer and more regular than the SOTA rendering pipeline.
(b) The tiled sub-space (marked as yellow) that is closet to the origin of the target view will be processed first during our rendering process based on the current target view.
RT-NERF: Proposed accelerator
Overall architecture of the proposed accelerator, illustrating the block diagram of both the (a) overall micro-architecture and (b) Parallel Processing Unite.
Evaluation
Summary of the evaluated devices' specifications
For a fair comparison with both edge and cloud baseline devices, we configure two RT-NeRF hardware settings accordingly: one for the edge device (denoted as RT-NeRF-Edge) and one for the cloud device (denoted as RT-NeRF-Cloud).
For RT-NeRF-edge, we set the corresponding hardware configuration such that RT-NeRF-edge can achieve > 30FPS throughput requirement for all 8 datasets of Synthetic-NeRF, which results in an average of 45FPS for 8 datasets.
For RT-NeRF-Cloud, we configure the hardware resources to match the power of RTX 2080Ti.
Speedup and energy efficiency
The normalized speedup and energy efficiency achieved by our proposed RT-NeRF and 5 baseline devices on the 8 datasets of Synthetic-NeRF.
All the legends follow the “device (algorithm)” format.
Citation
@article{li2022rt-nerf,
author = {Chaojian Li and Sixu Li and Yang Zhao and Wenbo Zhu and Yingyan Lin},
title = {RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering},
journal = {IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2022)},
year = {2022},
publisher = {IEEE/ACM},
address = {San Diego, CA, USA}
}
Acknowledgements
The work is supported by the NSF CCRI program (Award number: 2016727) and the NSF RTML program (Award number: 1937592).