It's non-standard, and it's difficult to play audio. The speaker needs to be fed samples at a constant rate of 1 every 1/44,000 of a second. And the files are often big.
So you usually have several...