Abstrakti
Information about the surrounding environment perceived by the human eye is one of the most important cues enabled by sight. The scientific community has put a great effort throughout time to develop methods for scene acquisition and scene understanding using computer vision techniques.
The goal of this thesis is to study geometry in computer vision and its applications. In computer vision, geometry describes the topological structure of the environment. Specifically, it concerns measures such as shape, volume, depth, pose, disparity, motion, and optical flow, all of which are essential cues in scene acquisition and understanding.
This thesis focuses on two primary objectives. The first is to assess the feasibility of creating semantic models of urban areas and public spaces using geometrical features coming from LiDAR sensors. The second objective is to develop a practical Virtual Reality (VR) video representation that supports 6-Degrees-of-Freedom (DoF) head motion parallax using geometric computer vision and machine learning.
The thesis’s first contribution is the proposal of semantic segmentation of the 3D LiDAR point cloud and its applications. The ever-growing demand for reliable mapping data, especially in urban environments, has motivated mobile mapping systems’ development. These systems acquire high precision data and, in particular 3D LiDAR point clouds and optical images. A large amount of data and their diversity make data processing a complex task. A complete urban map data processing pipeline has been developed, which annotates 3D LiDAR points with semantic labels. The proposed method is made efficient by combining fast rule-based processing for building and street surface segmentation and super-voxel-based feature extraction and classification for the remaining map elements (cars, pedestrians, trees, and traffic signs). Based on the experiments, the rule-based processing stage provides substantial improvement not only in computational time but also in classification accuracy. Furthermore, two back ends are developed for semantically labeled data that exemplify two important applications: (1) 3D high definition urban map that reconstructs a realistic 3D model using input labeled point cloud, and (2) semantic segmentation of 2D street view images.
The second contribution of the thesis is the development of a practical, fast, and robust method to create high-resolution Depth-Augmented Stereo Panoramas (DASP) from a 360-degree VR camera. A novel and complete optical flow-based pipeline is developed, which provides stereo 360-views of a real-world scene with DASP. The system consists of a texture and depth panorama for each eye. A bi-directional flow estimation network is explicitly designed for stitching and stereo depth estimation, which yields state-of-the-art results with a limited run-time budget. The proposed architecture explicitly leverages geometry by getting both optical flow ground-truths. Building architectures that use this knowledge simplifies the learning problem. Moreover, a 6-DoF testbed for immersive content quality assessment is proposed.
Modern machine learning techniques have been used to design the proposed architectures addressing many core computer vision problems by exploiting the enriched information coming from 3D scene structures. The architectures proposed in this thesis are practical systems that impact today’s technologies, including autonomous vehicles, virtual reality, augmented reality, robots, and smart-city infrastructures.
The goal of this thesis is to study geometry in computer vision and its applications. In computer vision, geometry describes the topological structure of the environment. Specifically, it concerns measures such as shape, volume, depth, pose, disparity, motion, and optical flow, all of which are essential cues in scene acquisition and understanding.
This thesis focuses on two primary objectives. The first is to assess the feasibility of creating semantic models of urban areas and public spaces using geometrical features coming from LiDAR sensors. The second objective is to develop a practical Virtual Reality (VR) video representation that supports 6-Degrees-of-Freedom (DoF) head motion parallax using geometric computer vision and machine learning.
The thesis’s first contribution is the proposal of semantic segmentation of the 3D LiDAR point cloud and its applications. The ever-growing demand for reliable mapping data, especially in urban environments, has motivated mobile mapping systems’ development. These systems acquire high precision data and, in particular 3D LiDAR point clouds and optical images. A large amount of data and their diversity make data processing a complex task. A complete urban map data processing pipeline has been developed, which annotates 3D LiDAR points with semantic labels. The proposed method is made efficient by combining fast rule-based processing for building and street surface segmentation and super-voxel-based feature extraction and classification for the remaining map elements (cars, pedestrians, trees, and traffic signs). Based on the experiments, the rule-based processing stage provides substantial improvement not only in computational time but also in classification accuracy. Furthermore, two back ends are developed for semantically labeled data that exemplify two important applications: (1) 3D high definition urban map that reconstructs a realistic 3D model using input labeled point cloud, and (2) semantic segmentation of 2D street view images.
The second contribution of the thesis is the development of a practical, fast, and robust method to create high-resolution Depth-Augmented Stereo Panoramas (DASP) from a 360-degree VR camera. A novel and complete optical flow-based pipeline is developed, which provides stereo 360-views of a real-world scene with DASP. The system consists of a texture and depth panorama for each eye. A bi-directional flow estimation network is explicitly designed for stitching and stereo depth estimation, which yields state-of-the-art results with a limited run-time budget. The proposed architecture explicitly leverages geometry by getting both optical flow ground-truths. Building architectures that use this knowledge simplifies the learning problem. Moreover, a 6-DoF testbed for immersive content quality assessment is proposed.
Modern machine learning techniques have been used to design the proposed architectures addressing many core computer vision problems by exploiting the enriched information coming from 3D scene structures. The architectures proposed in this thesis are practical systems that impact today’s technologies, including autonomous vehicles, virtual reality, augmented reality, robots, and smart-city infrastructures.
Alkuperäiskieli | Englanti |
---|---|
Julkaisupaikka | Tampere |
Kustantaja | Tampere University |
ISBN (elektroninen) | 978-952-03-1979-3 |
ISBN (painettu) | 978-952-03-1978-6 |
Tila | Julkaistu - 2021 |
OKM-julkaisutyyppi | G5 Artikkeliväitöskirja |
Julkaisusarja
Nimi | Tampere University Dissertations - Tampereen yliopiston väitöskirjat |
---|---|
Vuosikerta | 425 |
ISSN (painettu) | 2489-9860 |
ISSN (elektroninen) | 2490-0028 |