TY - GEN
T1 - ReMove:Leveraging Motion Estimation for Computation Reuse in CNN-Based Video Processing
AU - Khodarahmi, Masoumeh
AU - Modarressi, Mehdi
AU - Elahi, Ardavan
AU - Pakdaman, Farhad
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In this paper, we propose a method to reduce the computational load of Convolutional Neural Networks (CNNs) when processing video frames by exploiting computation reuse based on input similarity. Specifically, our approach leverages the temporal redundancy present in video sequences. In the existing computation reuse methods, if certain pixels of two consecutive frames are similar, the computations related to those pixels are skipped, and the results are reused from the previous frame. While pixel-wise comparison between consecutive frames can introduce overhead and partially offset computation reduction, we mitigate this by utilizing motion estimation information inherent in coded video frames. Motion estimation indicates whether a current block of the frame has already appeared in previous frames, allowing for direct reuse of computations without additional comparison overhead. Furthermore, we optimize by fusing CNN layers until the block size becomes smaller than the filter size, ensuring that not only the first layer's computations but also multiple CNN layers' computations are skipped. The experimental results demonstrate an average reduction of 35.4% in the computation amount of the VGG-16 CNN model with no significant loss in accuracy.
AB - In this paper, we propose a method to reduce the computational load of Convolutional Neural Networks (CNNs) when processing video frames by exploiting computation reuse based on input similarity. Specifically, our approach leverages the temporal redundancy present in video sequences. In the existing computation reuse methods, if certain pixels of two consecutive frames are similar, the computations related to those pixels are skipped, and the results are reused from the previous frame. While pixel-wise comparison between consecutive frames can introduce overhead and partially offset computation reduction, we mitigate this by utilizing motion estimation information inherent in coded video frames. Motion estimation indicates whether a current block of the frame has already appeared in previous frames, allowing for direct reuse of computations without additional comparison overhead. Furthermore, we optimize by fusing CNN layers until the block size becomes smaller than the filter size, ensuring that not only the first layer's computations but also multiple CNN layers' computations are skipped. The experimental results demonstrate an average reduction of 35.4% in the computation amount of the VGG-16 CNN model with no significant loss in accuracy.
KW - CNN
KW - computation reduction
KW - computation reuse
KW - input similarity
KW - motion vector
U2 - 10.1109/CPSAT64082.2024.10745388
DO - 10.1109/CPSAT64082.2024.10745388
M3 - Conference contribution
AN - SCOPUS:85211891528
T3 - International Symposium on Cyber-Physical Systems (Applications and Theory)
BT - 2024 5th CPSSI International Symposium on Cyber-Physical Systems (Applications and Theory), CPSAT 2024
PB - IEEE
T2 - CPSSI International Symposium on Cyber-Physical Systems (Applications and Theory)
Y2 - 16 October 2024 through 17 October 2024
ER -