Abstract
The current practical approaches for depth-aware pose estimation convert a human pose from a monocular 2D image into 3D space with a single computationally intensive convolutional neural network (CNN). This paper introduces the first open-source algorithm for binocular 3D pose estimation. It uses two separate lightweight CNNs to estimate disparity/depth information from a stereoscopic camera input. This multi-CNN fusion scheme makes it possible to perform full-depth sensing in real time on a consumer-grade laptop even if parts of the human body are invisible or occluded. Our real-time system is validated with a proof-of-concept demonstrator that is composed of two Logitech C930e webcams and a laptop equipped with Nvidia GTX1650 MaxQ GPU and Intel i7-9750H CPU. The demonstrator is able to process the input camera feeds at 30 fps and the output can be visually analyzed with a dedicated 3D pose visualizer.
Original language | English |
---|---|
Title of host publication | Proceedings of the 28th ACM International Conference on Multimedia, MM '20 |
Publisher | ACM |
ISBN (Print) | 978-1-4503-7988-5 |
DOIs | |
Publication status | Published - Oct 2020 |
Publication type | A4 Article in conference proceedings |
Event | ACM MULTIMEDIA - Duration: 1 Jan 1900 → … |
Conference
Conference | ACM MULTIMEDIA |
---|---|
Period | 1/01/00 → … |
Publication forum classification
- Publication forum level 1