TY - GEN
T1 - Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications
AU - Diaz-Guerra Aparicio, David
AU - Politis, Archontis
AU - Miguel, Antonio
AU - Beltran, Jose Ramon
AU - Virtanen, Tuomas
PY - 2024
Y1 - 2024
N2 - Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state. However, this approach results in the information from all the sources being contained in a single ordered vector, which is not optimal for permutation-invariant problems such as multi-source tracking. In this paper, we present a new recurrent architecture that uses unordered sets to represent both its input and its state and that is invariant to the permutations of the input set and equivariant to the permutations of the state set. Hence, the information of every sound source is represented in an individual embedding and the new estimates are assigned to the tracked trajectories regardless of their order.
AB - Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state. However, this approach results in the information from all the sources being contained in a single ordered vector, which is not optimal for permutation-invariant problems such as multi-source tracking. In this paper, we present a new recurrent architecture that uses unordered sets to represent both its input and its state and that is invariant to the permutations of the input set and equivariant to the permutations of the state set. Hence, the information of every sound source is represented in an individual embedding and the new estimates are assigned to the tracked trajectories regardless of their order.
U2 - 10.48550/arXiv.2306.08510
DO - 10.48550/arXiv.2306.08510
M3 - Conference contribution
SP - 2137
BT - Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023
PB - European Acoustics Association
T2 - Convention of the European Acoustics Association Forum Acusticum
Y2 - 11 September 2023 through 15 September 2023
ER -