Abstrakti
Immersive virtual reality (VR) technology is becoming mainstream nowadays. This technology makes use of omnidirectional content in order to create immersion in the virtual environment. Omnidirectional content is captured in a way that it covers the entire 360◦ field-of-view (FOV) around the capturing device. Thus, it is able to create the three Degrees-of-Freedom (3-DoF) experience in VR. In order to create an immersive experience, VR technology is required to use stereoscopic omnidirectional video in high resolution, quality and frame rates. Such requirements introduce significant challenges in the encoding and streaming stages of this technology.
The most common way of compressing omnidirectional video is by means of existing 2D image/video codecs such as High EfficiencyVideo Coding (HEVC/H.265) and Versatile Video Coding (VVC/H.266) standards. Therefore, this spherical content is projected over 2D image planes to be used in the 2D chain of the codec. However, the projection process introduces different sampling characteristics to the content compared to the spherical version. These characteristics can be represented as over-sampling of the content in different parts of the projected image. This oversampling results in content stretching, deformations and non-linear motion behaviors. On the other hand, existing codecs are not optimized for such behaviors in the content, consequently, the resulting compression performance is sub-optimal for the projected video.
This thesis investigates and proposes new approaches for improving the motion estimation and compensation performances for non-linear motion of the projected omnidirectional videos in HEVC and VVC standards. The first contribution for this purpose is the motion vector scaling method, which attempts to provide uniform motion vector predictors for the coding block. The scaling factor is derived based on the geometry characteristics of the projection plane and the position of the blocks in that plane. In the second and third contributions, a novel method is proposed for adaptively and efficiently predicting the motion information of the block based on a learning process from the neighboring motion information in full block and sub-block levels. The performances of the proposed methods have been assessed over diverse video datasets which are commonly used in the standardization activities and by following the standard simulation protocols and were shown to provide high compression improvements while retaining codec’s complexity in a reasonable range.
In recent years, tile-based viewport-adaptive streaming (VAS) methods have been considered for delivering omnidirectional content, where a portion of the content, i.e. viewport, is transmitted in the highest resolution and the remaining parts, i.e. non-viewport, are sent in lower resolutions. The reason being that VR content is mainly consumed via Head-Mounted Display (HMD) devices that have limited FOVs for example, 110◦×90◦. Since a user can see only a portion of the 360◦ video at each time instance, transmitting the whole VR video at the highest resolution requires a large bandwidth. Even though, tile-based VAS methods provide significantly better streaming performances compared to traditional streaming, but these methods use frequent Intra Random Access Points (IRAPs) for viewport switching. These IRAPs are intra-coded pictures in the bitstream, thus, they include higher bitrates compared to the inter-coded pictures. The frequent IRAPs in the bitstream make the VAS method sub-optimal for VR video streaming.
For resolving the sub-optimal performance of VAS, this thesis develops novel solutions for enabling viewport switching operations without using frequent IRAP pictures in the bitstream. In the first contribution, a multi-layer SHVC-ROI scheme is proposed. The SHVC-ROI method utilizes the inter-layer prediction (ILP) functionality of the codec for coding the high-quality switching points as inter-coded pictures. The use of ILP requires the method to stream the whole 360◦ low-quality video, hence, no switching occurs for this content. Thus, longer IRAP intervals than conventional ones are used for low-quality content. This streaming configuration resolves the frequent IRAP need in both high- and low-quality content. In the second contribution, a single-layer Simulcast HEVC method is proposed for using infrequent IRAPs in low-quality content. This method follows the same logic as the low-quality coding scheme of the SHVC-ROI where longer IRAP periods are considered and the whole 360◦ low-quality content is sent to the user. In addition to the mentioned advantages, both of these contributions benefit from not using tiling in the low-quality content, thus, avoid the compression overhead of tiling schemes in encoding and streaming of such content. Finally, the Shared Coded Picture (SCP) technique is proposed for enabling the viewport switching without frequent IRAPs in both quality versions of the content while using the standard single-layer coding scheme. To this end, certain pictures (i.e. SCPs) in the video are coded in a way that they are identical in both quality versions of the content. Consequently, these identically-coded pictures are used for switching from one version of the bitstream to another. Furthermore, the SCPs are inter predicted from the previous SCP picture in the bitstream. Thus, they require significantly lower bitrates than the intracoded switching point pictures. The performances of the proposed methods have shown significant streaming bitrate reductions compared to the existing state-of-theart methods.
The most common way of compressing omnidirectional video is by means of existing 2D image/video codecs such as High EfficiencyVideo Coding (HEVC/H.265) and Versatile Video Coding (VVC/H.266) standards. Therefore, this spherical content is projected over 2D image planes to be used in the 2D chain of the codec. However, the projection process introduces different sampling characteristics to the content compared to the spherical version. These characteristics can be represented as over-sampling of the content in different parts of the projected image. This oversampling results in content stretching, deformations and non-linear motion behaviors. On the other hand, existing codecs are not optimized for such behaviors in the content, consequently, the resulting compression performance is sub-optimal for the projected video.
This thesis investigates and proposes new approaches for improving the motion estimation and compensation performances for non-linear motion of the projected omnidirectional videos in HEVC and VVC standards. The first contribution for this purpose is the motion vector scaling method, which attempts to provide uniform motion vector predictors for the coding block. The scaling factor is derived based on the geometry characteristics of the projection plane and the position of the blocks in that plane. In the second and third contributions, a novel method is proposed for adaptively and efficiently predicting the motion information of the block based on a learning process from the neighboring motion information in full block and sub-block levels. The performances of the proposed methods have been assessed over diverse video datasets which are commonly used in the standardization activities and by following the standard simulation protocols and were shown to provide high compression improvements while retaining codec’s complexity in a reasonable range.
In recent years, tile-based viewport-adaptive streaming (VAS) methods have been considered for delivering omnidirectional content, where a portion of the content, i.e. viewport, is transmitted in the highest resolution and the remaining parts, i.e. non-viewport, are sent in lower resolutions. The reason being that VR content is mainly consumed via Head-Mounted Display (HMD) devices that have limited FOVs for example, 110◦×90◦. Since a user can see only a portion of the 360◦ video at each time instance, transmitting the whole VR video at the highest resolution requires a large bandwidth. Even though, tile-based VAS methods provide significantly better streaming performances compared to traditional streaming, but these methods use frequent Intra Random Access Points (IRAPs) for viewport switching. These IRAPs are intra-coded pictures in the bitstream, thus, they include higher bitrates compared to the inter-coded pictures. The frequent IRAPs in the bitstream make the VAS method sub-optimal for VR video streaming.
For resolving the sub-optimal performance of VAS, this thesis develops novel solutions for enabling viewport switching operations without using frequent IRAP pictures in the bitstream. In the first contribution, a multi-layer SHVC-ROI scheme is proposed. The SHVC-ROI method utilizes the inter-layer prediction (ILP) functionality of the codec for coding the high-quality switching points as inter-coded pictures. The use of ILP requires the method to stream the whole 360◦ low-quality video, hence, no switching occurs for this content. Thus, longer IRAP intervals than conventional ones are used for low-quality content. This streaming configuration resolves the frequent IRAP need in both high- and low-quality content. In the second contribution, a single-layer Simulcast HEVC method is proposed for using infrequent IRAPs in low-quality content. This method follows the same logic as the low-quality coding scheme of the SHVC-ROI where longer IRAP periods are considered and the whole 360◦ low-quality content is sent to the user. In addition to the mentioned advantages, both of these contributions benefit from not using tiling in the low-quality content, thus, avoid the compression overhead of tiling schemes in encoding and streaming of such content. Finally, the Shared Coded Picture (SCP) technique is proposed for enabling the viewport switching without frequent IRAPs in both quality versions of the content while using the standard single-layer coding scheme. To this end, certain pictures (i.e. SCPs) in the video are coded in a way that they are identical in both quality versions of the content. Consequently, these identically-coded pictures are used for switching from one version of the bitstream to another. Furthermore, the SCPs are inter predicted from the previous SCP picture in the bitstream. Thus, they require significantly lower bitrates than the intracoded switching point pictures. The performances of the proposed methods have shown significant streaming bitrate reductions compared to the existing state-of-theart methods.
Alkuperäiskieli | Englanti |
---|---|
Julkaisupaikka | Tampere |
Kustantaja | Tampere University |
ISBN (elektroninen) | 978-952-03-1852-9 |
ISBN (painettu) | 978-952-03-1851-2 |
Tila | Julkaistu - 2021 |
OKM-julkaisutyyppi | G5 Artikkeliväitöskirja |
Julkaisusarja
Nimi | Tampere University Dissertations - Tampereen yliopiston väitöskirjat |
---|---|
Vuosikerta | 375 |
ISSN (painettu) | 2489-9860 |
ISSN (elektroninen) | 2490-0028 |