Image and Video Captioning with Augmented Neural Architectures

Rakshith Shetty, Hamed R. Tavakoli, Jorma Laaksonen

    Research output: Contribution to journalArticleScientificpeer-review

    11 Citations (Scopus)


    Neural-network-based image and video captioning can be substantially improved by utilizing architectures that make use of special features from the scene context, objects, and locations. A novel discriminatively trained evaluator network for choosing the best caption among those generated by an ensemble of caption generator networks further improves accuracy.
    Original languageEnglish
    Pages (from-to)34-46
    Number of pages13
    JournalIEEE Multimedia
    Issue number2
    Publication statusPublished - 1 Apr 2018
    Publication typeA1 Journal article-refereed


    • Feature extraction
    • Neural networks
    • Computational modeling
    • Multimedia communication
    • Object recognition
    • Detectors
    • image captioning
    • mulimodal learning
    • recurrent networks
    • deep learning
    • pervasive computing
    • ubiquitous computing
    • video captioning
    • neural networks

    Publication forum classification

    • Publication forum level 1


    Dive into the research topics of 'Image and Video Captioning with Augmented Neural Architectures'. Together they form a unique fingerprint.

    Cite this