Estimation de la position des premiers temps dans un signal audio musical


We show the inputs that led to the maximum activation of the networks. The audio is also provided, with a super-imposed clic at the estimated downbeat position. It should match the 21st or the 41st time frame depending on the input size. We display 5 inputs per network for two differents styles. The first style is pop/rock music, and the second one is classical music.


1. Harmonic Network

Audio signal DNN input
SH1: hcnn_rock_1
SH2: hcnn_rock_2
SH3: hcnn_rock_3
SH4: hcnn_rock_4
SH5: hcnn_rock_5
SH6: hcnn_classical_1
SH7: hcnn_classical_2
SH8: hcnn_classical_3
SH9: hcnn_classical_4
SH10: hcnn_classical_5

2. Rhythmic Network

Audio signal DNN input
SR1: rcnn_rock_1
SR2: rcnn_rock_2
SR3: rcnn_rock_3
SR4: rcnn_rock_4
SR5: rcnn_rock_5
SR6: rcnn_classical_1
SR7: rcnn_classical_2
SR8: rcnn_classical_3
SR9: rcnn_classical_4
SR10: rcnn_classical_5

3. Melodic Network

Audio signal DNN input
SM1: mcnn_rock_1
SM2: mcnn_rock_2
SM3: mcnn_rock_3
SM4: mcnn_rock_4
SM5: mcnn_rock_5
SM6: mcnn_classical_1
SM7: mcnn_classical_2
SM8: mcnn_classical_3
SM9: mcnn_classical_4
SM10: mcnn_classical_5

4. Bass Network

Audio signal DNN input
SB1: bcnn_rock_3
SB2: bcnn_rock_2
SB3: bcnn_rock_1
SB4: bcnn_rock_4
SB5: bcnn_rock_5
SB6: bcnn_classical_1
SB7: bcnn_classical_2
SB8: bcnn_classical_3
SB9: bcnn_classical_4
SB10: bcnn_classical_5