Do you think this approach would generalize to audio data like music? Any potential blockers for that, that you can think of?