Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases

Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

4 Downloads (Pure)

Fingerprint

Dive into the research topics of 'Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases'. Together they form a unique fingerprint.

Engineering & Materials Science