Publication Details
Daniele Bonatto



Interactive view synthesis is a fundamental task in computer vision, aiming at recreating a natural scene from any viewpoint using a sparse set of images. The primary focus of this Ph.D. thesis is to explore the acquisition processes and the ability to render high-quality novel views dynamically to a user. Furthermore, this research targets real-time rendering as the second objective. In this thesis, we explore two different ways a scene can be reconstructed. The first option is to estimate the three-dimensional (3D) structure of the scene, by means of a set of points in space called a point cloud (PC). Such PC can be captured by a variety of devices and algorithms such as Time of Flight (ToF) cameras, stereo matching, or structure-from-motion (SfM). Alternatively, the scene can be represented by a set of input views with their associated depth maps, which can be used in depth image-based rendering (DIBR) to synthesize new images. We explore depth image-based rendering algorithms, using pictures of a scene and their as- sociated depth maps. These algorithms project the color values at the novel view position using the depth information. However, the quality of the depth map highly impacts the accuracy of the synthesized views. Therefore, we started by improving the Depth Estimation Reference Software (DERS) of the Moving Picture Experts Group (MPEG), a worldwide standardization committee for video compression. Unlike DERS, our Reference Depth Estimation (RDE) software can take any number of input views, leading to more robust results. It is currently used to generate novel depth maps for standardized datasets. The depth estimation did not reach real-time generation it takes minutes to hours to create a depth map depending on the input views. We therefore explored active depth sensing devices, such as Microsoft Kinect, to acquire color data and depth maps simultaneously. With the availability of these depth maps, we address the DIBR problem by providing a novel algorithm that seamlessly blends several views together. We focus on obtaining a real-time rendering method in particular, we exploited the Open Graphics Library (OpenGL) pipeline to rasterize novel views and customize dynamic video loading algorithms to provide frames from video data to the software pipeline. The developed Reference View Synthesizer (RVS) software achieves 2x90 frames per second in a head-mounted display while rendering natural scenes. RVS was initially the default rendering tool in the MPEG-Immersive (MPEG-I) community. Over time, it has evolved to function as the encoding tool and continues to play a crucial role as the reference verifcation tool during experiments. We tested our methods on conventional, head-mounted, and holographic displays. Finally, we explored advanced acquisition devices and display mediums, such as (1) plenoptic cameras for which we propose a novel calibration method and an improved conversion to sub-aperture views, and (2) a three-layers holographic tensor display, able to render multiple views without wearing glasses. Each piece of this work contributed to the development of photo-realistic methods we captured and open-sourced several public datasets of high quality and precision to the research community. They are also used by MPEG to develop novel algorithms for the future of immersive television.