Estimation de la position des premiers temps dans un signal audio musical

We show several audio examples where 1) our system fails and the state of the art is successful, or 2) our system is successful and the state of the art fails. A clic is superimposed at the ground truth downbeat positions.

1. Our system fails and the state of the art is successful

Folk song - 3 tatums per beat
Pop song - kick pattern and bass network
Rap song - rapping highlight the third beat

2. Our system is successful and the state of the art fails

Progressive rock song - 5 beats per bar
Jazz song - bass and melody good together
Blues song - melodic network helps
Funk song - Robust to open hi-hat
Reggae song - Good generalization outside of the training set
Classical song - Works without strong rhythm or bass