Abstrakti
Three-dimensional (3D) imaging refers to a set of techniques aimed at sensing 3D visual scenes, processing them and recreating them on various displays. 3D imaging has always been of research interest given the quest for ultra-realistic and interactive visualization of the world around us. 3D imaging technologies are at the core of a number of important applications such as 3D cinema, free-viewpoint video, advanced driver assistance systems, robot vision, simultaneous localization and mapping, and human face identification in mobile devices, just to name a few. Still, there are research challenges and problems which require further research. Unlike other research topics in signal/image processing, 3D imaging needs to address the full processing chain. It starts with proper sensing in terms of sensor hardware, and topologies involving aligned multi-modal sensors. It then goes on to proper data representation and compression, moves on through 3D reconstruction in terms of multi-view multi-depth video or geometrical models and ends up with rendering and visualization on displays aimed at higher quality of user immersion. Only such a holistic approach would ensure that 3D visual scenes are analyzed and visualized correctly: that their geometry is faithfully represented, the textures are photo-realistically rendered, and the user experience is therefore greatly enhanced. Failing to take into account all the main links in the 3D data processing chain would inevitably lead to degraded quality and might eventually discourage the use of 3D imaging. People would have to continue seeing the recreated visual world on flat screens in plain old 2D.
Starting with 3D scene sensing, one has to select the sensing modes, which generally vary from passive (stereo or multi-) camera systems to active depth sensors or optimized combinations of these. When dealing with multiple sensors, an important problem is to know their exact relative position in order to interpret the projected corresponding points correctly; a problem described as stereo calibration. While the problem has arguably been solved in its supervised form, more advanced solutions are needed for non-supervised cases, i.e. when cameras have to be calibrated seamlessly for the user based only on the features of the target image. This is especially important when mechanical or other misalignments affect the sensing quality. A major problem with active sensors is the existence of measurement imperfections, i.e. noise caused by weak illuminating signals, low-lighting, low material reflectivity, and other sensor or scene-related factors. Thus, the problem of denoising and enhancing such data becomes of primary importance. Working with multi-modal sensors naturally requires fusing the multiple modes in effective 3D representations, which also have to be suitable for compression, storage and subsequent rendering.
This thesis presents novel solutions for all the links in the 3D imaging chain. Our approach could be described as ‘pushing to the limits’. We have considered cases where the sensing is complicated by a number of factors which we summarize as low-sensing conditions, these being low-power of the device, miniaturization requirements for the sensor, low-light, lowreflectivity and low transmission bandwidth. Finding solutions for such difficult cases would ensure that 3D imaging techniques would work in any conditions. Our main object of interest is the depth modality, which provides information about a scene’s geometry and, when aligned with the color modality, can serve for depth image-based rendering (DIBR) of the desired virtual views. Depth can be estimated either by ‘passive stereo’ camera set-ups or by active sensors utilizing the “Time-of-flight” (ToF) principle.
For passive stereo, we have analyzed the effect of the image processing pipeline (IPP) on the quality of estimated depth maps. We have built a model of a mobile IPP and quantified the influence of all the processing blocks on the quality of the subsequent depth estimation, implemented with a set of state-of-the-art techniques. We place specific emphasis on the influence of (even small) mechanical misalignments, which have to be tackled by on-the-fly recalibration. We have developed a novel recalibration technique tailored for mobile stereo, where sensors are supposed to be rigidly fixed but might not always be so. Our approach considers roughly calibrated cameras and aims at constraining the number of their degrees of freedom, which yields a robust solution and speeds up the algorithm.
For the case of active depth sensing, we have concentrated on the use of miniaturized ToF sensors, where the illumination sources and the device’s power are reduced so that they can be integrated in mobile devices. In the considered ToF devices, the range data is measured by the elapsed time during which a light signal illuminates a scene and travels back to the sensing elements. The range accuracy of a typical ToF device is strongly correlated with the light intensity of the received reflected light signal – a weaker signal implies less accurate measurement. In low-sensing operating mode, the captured data has to be post-processed in order to achieve the desired measurement accuracy that is achieved in normal operating mode. We have thoroughly modelled two noises always presented in the low-sensing case. First, we have modelled a spatially-correlated noise cast as Fixed-pattern noise (FPN). Such noise is particularly pronounced in low-sensing conditions and has to be removed as a first step in any further processing. We have developed a method which effectively suppresses FPN by means of adaptive notch filtering. Furthermore, we have modelled the remaining noise in terms of probability distributions and validated the derived models with empirical measurements. Based on the new models, we have devised an effective denoising method which favors the use of a complex-valued representation of the sensed signal and makes use of its naturally stabilized noise variance.
Current ToF devices have certain technological limitations such as low spatial resolution and limited ability to capture color information. A solution for this is to combine two or more devices to capture color ((V)iew) and depth (Z) data and fuse them into a 3D representation referred to as “View-plus-depth” (V+Z). We have investigated the case of multi-sensor data fusion and developed appropriate methods also incorporating the modules for virtual view rendering and dis-occlusion in-painting. Finally, we have analyzed the 3D data representation by V+Z and developed a new method for its efficient asymmetric representation which has competitive performance in compression and fusion tasks.
The thesis includes a list of the software modules developed during the course of related research. It allows the developed methods and models to be used in a wide range of applications in mobile 3D imaging, car and robot navigation, and 3D realistic visualization.
Starting with 3D scene sensing, one has to select the sensing modes, which generally vary from passive (stereo or multi-) camera systems to active depth sensors or optimized combinations of these. When dealing with multiple sensors, an important problem is to know their exact relative position in order to interpret the projected corresponding points correctly; a problem described as stereo calibration. While the problem has arguably been solved in its supervised form, more advanced solutions are needed for non-supervised cases, i.e. when cameras have to be calibrated seamlessly for the user based only on the features of the target image. This is especially important when mechanical or other misalignments affect the sensing quality. A major problem with active sensors is the existence of measurement imperfections, i.e. noise caused by weak illuminating signals, low-lighting, low material reflectivity, and other sensor or scene-related factors. Thus, the problem of denoising and enhancing such data becomes of primary importance. Working with multi-modal sensors naturally requires fusing the multiple modes in effective 3D representations, which also have to be suitable for compression, storage and subsequent rendering.
This thesis presents novel solutions for all the links in the 3D imaging chain. Our approach could be described as ‘pushing to the limits’. We have considered cases where the sensing is complicated by a number of factors which we summarize as low-sensing conditions, these being low-power of the device, miniaturization requirements for the sensor, low-light, lowreflectivity and low transmission bandwidth. Finding solutions for such difficult cases would ensure that 3D imaging techniques would work in any conditions. Our main object of interest is the depth modality, which provides information about a scene’s geometry and, when aligned with the color modality, can serve for depth image-based rendering (DIBR) of the desired virtual views. Depth can be estimated either by ‘passive stereo’ camera set-ups or by active sensors utilizing the “Time-of-flight” (ToF) principle.
For passive stereo, we have analyzed the effect of the image processing pipeline (IPP) on the quality of estimated depth maps. We have built a model of a mobile IPP and quantified the influence of all the processing blocks on the quality of the subsequent depth estimation, implemented with a set of state-of-the-art techniques. We place specific emphasis on the influence of (even small) mechanical misalignments, which have to be tackled by on-the-fly recalibration. We have developed a novel recalibration technique tailored for mobile stereo, where sensors are supposed to be rigidly fixed but might not always be so. Our approach considers roughly calibrated cameras and aims at constraining the number of their degrees of freedom, which yields a robust solution and speeds up the algorithm.
For the case of active depth sensing, we have concentrated on the use of miniaturized ToF sensors, where the illumination sources and the device’s power are reduced so that they can be integrated in mobile devices. In the considered ToF devices, the range data is measured by the elapsed time during which a light signal illuminates a scene and travels back to the sensing elements. The range accuracy of a typical ToF device is strongly correlated with the light intensity of the received reflected light signal – a weaker signal implies less accurate measurement. In low-sensing operating mode, the captured data has to be post-processed in order to achieve the desired measurement accuracy that is achieved in normal operating mode. We have thoroughly modelled two noises always presented in the low-sensing case. First, we have modelled a spatially-correlated noise cast as Fixed-pattern noise (FPN). Such noise is particularly pronounced in low-sensing conditions and has to be removed as a first step in any further processing. We have developed a method which effectively suppresses FPN by means of adaptive notch filtering. Furthermore, we have modelled the remaining noise in terms of probability distributions and validated the derived models with empirical measurements. Based on the new models, we have devised an effective denoising method which favors the use of a complex-valued representation of the sensed signal and makes use of its naturally stabilized noise variance.
Current ToF devices have certain technological limitations such as low spatial resolution and limited ability to capture color information. A solution for this is to combine two or more devices to capture color ((V)iew) and depth (Z) data and fuse them into a 3D representation referred to as “View-plus-depth” (V+Z). We have investigated the case of multi-sensor data fusion and developed appropriate methods also incorporating the modules for virtual view rendering and dis-occlusion in-painting. Finally, we have analyzed the 3D data representation by V+Z and developed a new method for its efficient asymmetric representation which has competitive performance in compression and fusion tasks.
The thesis includes a list of the software modules developed during the course of related research. It allows the developed methods and models to be used in a wide range of applications in mobile 3D imaging, car and robot navigation, and 3D realistic visualization.
Alkuperäiskieli | Englanti |
---|---|
Kustantaja | Tampere University of Technology |
Sivumäärä | 73 |
ISBN (elektroninen) | 978-952-15-4290-9 |
ISBN (painettu) | 978-952-15-4282-4 |
Tila | Julkaistu - 29 marrask. 2018 |
OKM-julkaisutyyppi | G5 Artikkeliväitöskirja |
Julkaisusarja
Nimi | Tampere University of Technology. Publication |
---|---|
Vuosikerta | 1608 |
ISSN (painettu) | 1459-2045 |