Works when i put it to worldspawn :-o.
Sometimes, stuff works that is outside the specifications, and maybe this is the case.
Or the music is stereo, which would make sense for a developer to allow in worldspawn as the map music. But a target_speaker would of course not work properly with a stereo file.
So, how many channels and bits per sample and what rate does that sample have?