Binocular Multi-CNN System for Real-Time 3D Pose Estimation

Teo Niemirepo, Marko Viitanen, Jarno Vanne

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

215 Downloads (Pure)


The current practical approaches for depth-aware pose estimation convert a human pose from a monocular 2D image into 3D space with a single computationally intensive convolutional neural network (CNN). This paper introduces the first open-source algorithm for binocular 3D pose estimation. It uses two separate lightweight CNNs to estimate disparity/depth information from a stereoscopic camera input. This multi-CNN fusion scheme makes it possible to perform full-depth sensing in real time on a consumer-grade laptop even if parts of the human body are invisible or occluded. Our real-time system is validated with a proof-of-concept demonstrator that is composed of two Logitech C930e webcams and a laptop equipped with Nvidia GTX1650 MaxQ GPU and Intel i7-9750H CPU. The demonstrator is able to process the input camera feeds at 30 fps and the output can be visually analyzed with a dedicated 3D pose visualizer.
Original languageEnglish
Title of host publicationProceedings of the 28th ACM International Conference on Multimedia, MM '20
ISBN (Print)978-1-4503-7988-5
Publication statusPublished - Oct 2020
Publication typeA4 Article in conference proceedings
Duration: 1 Jan 1900 → …


Period1/01/00 → …

Publication forum classification

  • Publication forum level 1


Dive into the research topics of 'Binocular Multi-CNN System for Real-Time 3D Pose Estimation'. Together they form a unique fingerprint.

Cite this