Robust Direction Estimation with Convolutional Neural Networks-based Steered Response Power

Pasi Pertilä, Emre Cakir

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    59 Citations (Scopus)
    32 Downloads (Pure)

    Abstract

    The steered response power (SRP) methods can be used to build a map of sound direction likelihood. In the presence of interference and reverberation, the map will exhibit multiple peaks with heights related to the corresponding sound's spectral content. Often in realistic use cases, the target of interest (such as speech) can exhibit a lower peak compared to an interference source. This will corrupt any direction dependent method, such as beamforming.

    Regression has been used to predict time-frequency (TF) regions corrupted by reverberation, and static broadband noise can be efficiently estimated for TF points. TF regions dominated by noise or reverberation can then be de-emphasized to obtain more reliable source direction estimates. In this work, we propose the use of convolutional neural networks (CNNs) for the prediction of a TF mask for emphasizing the direct path speech signal in time-varying interference. SRP with phase transform (SRP-PHAT) combined with the CNN-based masking is shown to be capable of reducing the impact of time-varying interference for speaker direction estimation using real speech sources in reverberation.
    Original languageEnglish
    Title of host publication2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    PublisherIEEE
    Pages6125-6129
    ISBN (Electronic)978-1-5090-4117-6
    DOIs
    Publication statusPublished - 2017
    Publication typeA4 Article in conference proceedings
    EventIEEE International Conference on Acoustics, Speech and Signal Processing -
    Duration: 1 Jan 19001 Jan 2000

    Publication series

    Name
    ISSN (Electronic)2379-190X

    Conference

    ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing
    Period1/01/001/01/00

    Publication forum classification

    • Publication forum level 1

    Fingerprint

    Dive into the research topics of 'Robust Direction Estimation with Convolutional Neural Networks-based Steered Response Power'. Together they form a unique fingerprint.

    Cite this